Skip to main content
Already running inference with your own provider client (OpenAI, Anthropic, Gemini, or any OpenAI-compatible API) and don’t want to move it behind Maitai’s inference service? Use maitai.log() to hand Maitai the request/response pair after each call. Maitai normalizes the native payloads by provider and runs them through the same pipeline as proxied traffic, so your existing client keeps doing inference while you still get Maitai observability: Sentinels, Sessions, request history, and data for Test Sets.
Maitai never calls the model on this path — you do. That means no provider keys are needed by the Maitai client (only your MAITAI_API_KEY), and inference latency is entirely yours. Logging is fire-and-forget and is engineered to never raise into your application.

How it works

1

Run inference with your own client

Call OpenAI / Anthropic / Gemini (or any OpenAI-compatible API) exactly as you do today, with your own keys.
2

Hand the pair to maitai.log()

Pass the native request and response along with the intent, application, and provider. The call returns immediately; the send happens on a background thread (sync) or task (async).
3

Maitai normalizes and stores it

The native payloads are converted into Maitai’s OpenAI-compatible schema based on provider and stored as PROD traffic (inference location CLIENT).
4

Sentinels run; traffic is queryable

The logged pair flows through the same pipeline as Maitai-routed requests, so Sentinels evaluate it, Sessions thread it, and it’s available for debugging and building Test Sets.

Quickstart

Run inference with your own client, then log the pair. Reusing the same request dict you pass to your provider keeps the logged inputs perfectly in sync with what you sent.
import maitai
from openai import OpenAI

maitai_client = maitai.Maitai()  # only needs MAITAI_API_KEY
openai_client = OpenAI()         # your own client + OPENAI_API_KEY

request = {
    "model": "gpt-4o",
    "messages": [{"role": "user", "content": "Summarize this text..."}],
}

# You run inference, not Maitai.
response = openai_client.chat.completions.create(**request)

# Fire-and-forget: returns immediately, never raises.
maitai_client.log(
    intent="SUMMARIZATION",
    application="demo_app",
    provider="openai",
    request=request,
    response=response,  # SDK objects are serialized automatically
    session_id="YOUR_SESSION_ID",
    metadata={"source": "byo_inference"},
)

Provider examples

Send the provider’s native payloads — Maitai converts them server-side. request and response accept either plain dicts or the provider SDK’s objects (they’re serialized for you).
# provider="openai" covers OpenAI and any OpenAI-compatible API
# (Groq, Together, vLLM, ...). Point your own client wherever you like.
import maitai
from openai import OpenAI

maitai_client = maitai.Maitai()
openai_client = OpenAI()

request = {
    "model": "gpt-4o",
    "messages": [{"role": "user", "content": "Summarize this text..."}],
}
response = openai_client.chat.completions.create(**request)

maitai_client.log(
    intent="SUMMARIZATION",
    application="demo_app",
    provider="openai",
    request=request,
    response=response,
    session_id="YOUR_SESSION_ID",
)

Parameters

intent
string
required
The Intent / action type for the request (also called action_type). Scopes Sentinels, configuration, and quality tooling.
application
string
required
The Application reference name (shown in the Portal). Groups your traffic.
provider
string
required
Source provider for the pair, used to normalize the native payloads. One of openai, anthropic, or gemini. Use openai for any OpenAI-compatible API (Groq, Together, vLLM, etc.) where payloads are already in OpenAI shape.
request
object
required
The provider’s native request payload — the dict you sent to your client, or the SDK request object. Serialized automatically.
response
object
required
The provider’s native response — the SDK response object (e.g. OpenAI ChatCompletion, Anthropic Message, Gemini GenerateContentResponse) or its dict form. Serialized automatically.
session_id
string
Optional but recommended. Groups related requests into a Session. If omitted, Maitai generates one server-side.
reference_id
string | number
Optional caller-supplied identifier for correlating the logged request with your own systems.
user_id
string
Optional end-user identifier stored with the request.
metadata
map<string, any>
Optional metadata tags stored with the request for filtering and debugging.
sample_rate
number
Optional per-call client-side sampling rate between 0.0 and 1.0. Overrides the default (see Sampling and kill-switch). The surviving rate is recorded so server-side aggregates can correct for sampling.
timing
object
Optional latency metrics. Provide a RequestTimingMetric (from maitai.models.metric import RequestTimingMetric) to record response time.
Timing is only recorded when both time_request_start and time_request_end are set; partial metrics are ignored.

Behavior and guarantees

  • Non-blocking. maitai_client.log(...) returns immediately. Sync clients send on a background thread; async clients schedule a task.
  • Never raises into your code. Bad inputs, network errors, and serialization failures are swallowed and reported to Maitai’s internal error metric. Instrumentation can’t take down your request path.
  • No inference, no inline corrections. Maitai does not call the model and does not apply corrections here (corrections require server-side inference — see Model Request). Sentinels still evaluate the stored traffic asynchronously.
  • Stored as PROD / CLIENT. Logged pairs are recorded as production traffic with the inference location marked CLIENT, so they’re indistinguishable downstream from Maitai-routed requests for monitoring and Test Sets.

Sampling and kill-switch

Dial volume without touching call sites using environment variables:
VariableDefaultEffect
MAITAI_LOGGING_ENABLEDtrueSet to a falsey value (0, false, no, off, empty) to disable all logging.
MAITAI_LOG_SAMPLE_RATE1.0Fraction of events to send (0.01.0). A per-call sample_rate overrides this default.
Short-lived processes (scripts, lambdas) don’t need to do anything special — queued logs are drained automatically at interpreter exit, so in-flight events aren’t dropped.

Raw HTTP (any language)

The maitai.log() helper ships in the Python SDK. From Node, Go, or any other stack, PUT the same payload to the endpoint directly:
PUT https://api.trymaitai.ai/chat/completions/log
The HTTP body uses the underlying field names: intentaction_type and applicationapplication_ref_name. request and response are the provider’s native payloads as JSON.
curl --request PUT \
  --url https://api.trymaitai.ai/chat/completions/log \
  --header "Content-Type: application/json" \
  --header "x-api-key: $MAITAI_API_KEY" \
  --data '{
    "provider": "openai",
    "action_type": "SUMMARIZATION",
    "application_ref_name": "demo_app",
    "session_id": "YOUR_SESSION_ID",
    "metadata": { "source": "byo_inference" },
    "request": {
      "model": "gpt-4o",
      "messages": [
        { "role": "user", "content": "Summarize this text..." }
      ]
    },
    "response": {
      "id": "chatcmpl_example",
      "object": "chat.completion",
      "created": 1730000000,
      "model": "gpt-4o",
      "choices": [
        {
          "index": 0,
          "message": { "role": "assistant", "content": "Here is the summary..." },
          "finish_reason": "stop"
        }
      ],
      "usage": { "prompt_tokens": 42, "completion_tokens": 18, "total_tokens": 60 }
    }
  }'
On success the endpoint returns { "success": true }. It returns 400 for caller mistakes — an unsupported provider, or a missing action_type / application_ref_name.
For anthropic and gemini, send each provider’s native request/response shape (Anthropic Messages API, google-genai GenerateContent). Maitai maps finish reasons, tool_use/function_call → tool calls, usage, and Anthropic thinking into its OpenAI-compatible schema.

When to use this vs. routing through Maitai

Logging is the right fit when you must keep inference on your own infrastructure or client but still want full observability. If you can route inference through Maitai instead (server_side_inference=true, the default), you additionally unlock server-side features like automatic corrections and input_safety_score — see Model Request.