Skip to main content
Maitai lets you attach files — images, video, and documents (PDFs) — to a chat completion request. Maitai stores the file, hands it to the model in the format that provider expects, and runs inference. There are two ways to send a file, and you can mix them freely:
  1. Inline with the request — attach the file directly in the message. Best for one-off requests; nothing to pre-upload.
  2. Upload first, then reference by file_id — call files.upload(...) once to get a file_id, then reference it on any number of later requests without re-sending the bytes. Best when you ask multiple questions about the same (especially large) file.

Quickstart

Attach a local file inline. Maitai reads the bytes, validates that the target model accepts that file type, and includes it in the request.
import maitai

client = maitai.Maitai()

messages = [
    {
        "role": "user",
        "content": [
            {"type": "text", "text": "Describe what happens in this video."},
            maitai.file_content_part_from_path("clip.mp4"),
        ],
    }
]

response = client.chat.completions.create(
    messages=messages,
    application="demo_app",
    intent="VIDEO_ANALYSIS",
    session_id="YOUR_SESSION_ID",
    model="gemini-3.5-flash",
    server_side_inference=True,
)

print(response.choices[0].message.content)
File inputs require server-side inference (server_side_inference=True), which is the default. Maitai needs to run the request to upload and convert the file for the provider.

What’s supported

File support depends on the model. Maitai validates the file against the target model and returns a 400 if the model does not accept that file type.
ProviderImageVideoDocument (PDF)
Gemini (e.g. gemini-3.5-flash)
OpenAI (e.g. gpt-4o)
Video is currently supported on Gemini models only. Sending a video to a model that does not support it (for example an OpenAI model) returns a 400 error — Maitai will not silently route it elsewhere. Choose a Gemini model for video.
Supported file types:
  • Images — PNG, JPEG, GIF, WebP
  • Video — MP4, WebM, MOV (Gemini)
  • Documents — PDF

The file content part

A file is just another entry in a message’s content array. It has the shape:
{ "type": "file", "file": { "file_data": "data:video/mp4;base64,<...>", "filename": "clip.mp4" } }
You normally don’t write this by hand — use the helpers below.

Python

maitai.file_content_part_from_path(path, mime_type=None)
dict
Build an inline file part from a local file path. Reads the file and base64-encodes it. The MIME type is inferred from the file extension when not provided.
import maitai

# Image
maitai.file_content_part_from_path("diagram.png")

# Document
maitai.file_content_part_from_path("report.pdf")

# Explicit MIME type
maitai.file_content_part_from_path("recording", mime_type="video/mp4")

Node

fileContentPartFromBytes(bytes, mimeType, filename?)
object
Build an inline file part from raw bytes (Uint8Array or Buffer). The MIME type is required; filename is optional but recommended for documents.
import fs from "node:fs";
import { fileContentPartFromBytes } from "maitai";

const bytes = fs.readFileSync("diagram.png");
fileContentPartFromBytes(bytes, "image/png", "diagram.png");

Multiple files in one message

Combine text and several files in a single message. Order is preserved.
messages = [
    {
        "role": "user",
        "content": [
            {"type": "text", "text": "Compare these two screenshots."},
            maitai.file_content_part_from_path("before.png"),
            maitai.file_content_part_from_path("after.png"),
        ],
    }
]

Reuse a file across requests

When you’ll reference the same file in more than one request, upload it once with files.upload(...) to get a file_id, then pass that file_id with file_content_part (Python) / fileContentPart (Node). This avoids re-sending the bytes — and for video, avoids re-preparing it for the provider — on every call.
import maitai

client = maitai.Maitai()

# Upload once
file_id = client.files.upload("clip.mp4")

# Reference it across as many requests as you like
for question in ["What happens first?", "Who is speaking?", "Summarize it."]:
    response = client.chat.completions.create(
        messages=[
            {
                "role": "user",
                "content": [
                    {"type": "text", "text": question},
                    maitai.file_content_part(file_id),
                ],
            }
        ],
        application="demo_app",
        intent="VIDEO_ANALYSIS",
        model="gemini-3.5-flash",
        server_side_inference=True,
    )
    print(response.choices[0].message.content)

Upload methods

client.files.upload(path, mime_type=None)
str (Python)
Upload a file from a local path; returns the file_id. MIME type is inferred from the extension when not provided.
client.files.upload_bytes(data, filename, mime_type=None)
str (Python)
Upload raw bytes; returns the file_id.
maitai.files.upload(filePath, mimeType?)
Promise<string> (Node)
Upload a file from a local path; resolves to the file_id.
maitai.files.uploadBytes(bytes, filename, mimeType?)
Promise<string> (Node)
Upload raw bytes (Uint8Array or Buffer); resolves to the file_id.
file_content_part(file_id)
content part
maitai.file_content_part(file_id) (Python) / fileContentPart(fileId) (Node) builds the content part that references an uploaded file.

Image URLs

For images already hosted at a public URL, you can use the standard OpenAI image_url content part instead of uploading bytes:
messages = [
    {
        "role": "user",
        "content": [
            {"type": "text", "text": "What's in this image?"},
            {
                "type": "image_url",
                "image_url": {"url": "https://example.com/photo.jpg"},
            },
        ],
    }
]

Errors

400 — model does not accept <type> file inputs
error
The target model does not support that file type. For example, sending a video to an OpenAI model. Switch to a model that supports it (a Gemini model for video), or remove the file.

Notes

  • Video is uploaded once and prepared for the model, which can take a few seconds for larger files; this happens on the first request that references it.
  • Files are scoped to your company and are not shared across accounts.
  • File inputs work with the rest of the request as usual — combine them with Structured Output, Tool Calling, and streaming.