Skip to main content

Documentation Index

Fetch the complete documentation index at: https://docs.trymaitai.com/llms.txt

Use this file to discover all available pages before exploring further.

Some teams already run inference with their own LLM client (OpenAI SDK, Groq SDK, etc.) and want Maitai to index + store the traffic for observability, debugging, and building Test Sets — without running inference through Maitai’s inference service. Maitai supports this via a storage endpoint:
  • PUT /chat/completions/response
This endpoint stores a request/response pair as PROD traffic and marks the inference location as CLIENT server-side. If you’re okay with Maitai wrapping your provider client, you can run inference locally and have the SDK automatically store the request/response in Maitai. Key flags:
  • server_side_inference = false (don’t route inference through Maitai)
  • evaluation_enabled = false (store-only; no evaluation)
import maitai

client = maitai.Maitai()  # uses MAITAI_API_KEY + provider keys (ex: OPENAI_API_KEY) from env

response = client.chat.completions.create(
    messages=[
        {"role": "user", "content": "Summarize this text..."},
    ],
    application="demo_app",
    intent="SUMMARIZATION",
    session_id="YOUR_SESSION_ID",
    model="gpt-4o",
    server_side_inference=False,
    evaluation_enabled=False,
    metadata={"source": "client_inference"},
)

Option B: Bring your own client, and upload request+response for storage only

If you already have an LLM response from your own client, you can upload the request/response pair directly to Maitai for indexing.

Required fields

  • chat_completion_request.application_ref_name
  • chat_completion_request.action_type (this is the “Intent” / “ApplicationAction”)
  • chat_completion_request.session_id (strongly recommended so the traffic threads into Sessions)
  • chat_completion_request.params (the model inputs; at minimum include messages and model)
  • chat_completion_response (OpenAI-style chat.completion response shape)

cURL example

curl --request PUT \
  --url https://api.trymaitai.ai/chat/completions/response \
  --header "Content-Type: application/json" \
  --header "x-api-key: $MAITAI_API_KEY" \
  --data '{
    "chat_completion_request": {
      "application_ref_name": "demo_app",
      "session_id": "YOUR_SESSION_ID",
      "action_type": "SUMMARIZATION",
      "params": {
        "model": "gpt-4o",
        "messages": [
          { "role": "user", "content": "Summarize this text..." }
        ]
      },
      "metadata": { "source": "byo_client" }
    },
    "chat_completion_response": {
      "id": "chatcmpl_example",
      "object": "chat.completion",
      "created": 1730000000,
      "model": "gpt-4o",
      "choices": [
        {
          "index": 0,
          "message": { "role": "assistant", "content": "Here is the summary..." },
          "finish_reason": "stop",
          "is_correction": false
        }
      ],
      "correction_applied": false,
      "first_token_time": 0,
      "response_time": 0
    }
  }'

Notes

  • This endpoint is storage/indexing only. It does not run inference, evaluations, or corrections.
  • Maitai stores these as PROD traffic and marks inference location as CLIENT server-side.