maitai.log() to hand Maitai the request/response pair after each call. Maitai normalizes the native payloads by provider and runs them through the same pipeline as proxied traffic, so your existing client keeps doing inference while you still get Maitai observability: Sentinels, Sessions, request history, and data for Test Sets.
Maitai never calls the model on this path — you do. That means no provider keys are needed by the Maitai client (only your
MAITAI_API_KEY), and inference latency is entirely yours. Logging is fire-and-forget and is engineered to never raise into your application.How it works
Run inference with your own client
Call OpenAI / Anthropic / Gemini (or any OpenAI-compatible API) exactly as you do today, with your own keys.
Hand the pair to maitai.log()
Pass the native
request and response along with the intent, application, and provider. The call returns immediately; the send happens on a background thread (sync) or task (async).Maitai normalizes and stores it
The native payloads are converted into Maitai’s OpenAI-compatible schema based on
provider and stored as PROD traffic (inference location CLIENT).Quickstart
Run inference with your own client, then log the pair. Reusing the samerequest dict you pass to your provider keeps the logged inputs perfectly in sync with what you sent.
Provider examples
Send the provider’s native payloads — Maitai converts them server-side.request and response accept either plain dicts or the provider SDK’s objects (they’re serialized for you).
Parameters
The Intent / action type for the request (also called
action_type). Scopes Sentinels, configuration, and quality tooling.The Application reference name (shown in the Portal). Groups your traffic.
Source provider for the pair, used to normalize the native payloads. One of
openai, anthropic, or gemini. Use openai for any OpenAI-compatible API (Groq, Together, vLLM, etc.) where payloads are already in OpenAI shape.The provider’s native request payload — the dict you sent to your client, or the SDK request object. Serialized automatically.
The provider’s native response — the SDK response object (e.g. OpenAI
ChatCompletion, Anthropic Message, Gemini GenerateContentResponse) or its dict form. Serialized automatically.Optional but recommended. Groups related requests into a Session. If omitted, Maitai generates one server-side.
Optional caller-supplied identifier for correlating the logged request with your own systems.
Optional end-user identifier stored with the request.
Optional metadata tags stored with the request for filtering and debugging.
Optional per-call client-side sampling rate between
0.0 and 1.0. Overrides the default (see Sampling and kill-switch). The surviving rate is recorded so server-side aggregates can correct for sampling.Optional latency metrics. Provide a
RequestTimingMetric (from maitai.models.metric import RequestTimingMetric) to record response time.Timing is only recorded when both
time_request_start and time_request_end are set; partial metrics are ignored.Behavior and guarantees
- Non-blocking.
maitai_client.log(...)returns immediately. Sync clients send on a background thread; async clients schedule a task. - Never raises into your code. Bad inputs, network errors, and serialization failures are swallowed and reported to Maitai’s internal error metric. Instrumentation can’t take down your request path.
- No inference, no inline corrections. Maitai does not call the model and does not apply corrections here (corrections require server-side inference — see Model Request). Sentinels still evaluate the stored traffic asynchronously.
- Stored as PROD / CLIENT. Logged pairs are recorded as production traffic with the inference location marked
CLIENT, so they’re indistinguishable downstream from Maitai-routed requests for monitoring and Test Sets.
Sampling and kill-switch
Dial volume without touching call sites using environment variables:| Variable | Default | Effect |
|---|---|---|
MAITAI_LOGGING_ENABLED | true | Set to a falsey value (0, false, no, off, empty) to disable all logging. |
MAITAI_LOG_SAMPLE_RATE | 1.0 | Fraction of events to send (0.0–1.0). A per-call sample_rate overrides this default. |
Raw HTTP (any language)
Themaitai.log() helper ships in the Python SDK. From Node, Go, or any other stack, PUT the same payload to the endpoint directly:
intent → action_type and application → application_ref_name. request and response are the provider’s native payloads as JSON.
{ "success": true }. It returns 400 for caller mistakes — an unsupported provider, or a missing action_type / application_ref_name.
For anthropic and gemini, send each provider’s native request/response shape (Anthropic Messages API, google-genai
GenerateContent). Maitai maps finish reasons, tool_use/function_call → tool calls, usage, and Anthropic thinking into its OpenAI-compatible schema.When to use this vs. routing through Maitai
Logging is the right fit when you must keep inference on your own infrastructure or client but still want full observability. If you can route inference through Maitai instead (server_side_inference=true, the default), you additionally unlock server-side features like automatic corrections and input_safety_score — see Model Request.