Skip to content

Document API

Document routes ingest and manage files inside user-owned collections.

API call order

  1. Create/list a private collection from collection APIs.
  2. Use collection_id with document upload/list/get/delete routes.
  3. Use collection names in message/generate APIs for retrieval.

Shared request setup is documented once in API index.

Upload documents to a collection

POST /collections/{collection_id}/documents

Upload documents to a collection.

Stores document records and triggers asynchronous parsing, chunking, and vectorization for retrieval.

Parameters:

Name Type Description Default
collection_id str

Collection identifier.

Path(..., description='Collection ID')
files list[UploadFile]

One or more files to ingest.

File(...)
metadata_urls list[str] | str | None

Optional list or single URL per file.

Form(default=None)
metadata_names list[str] | str | None

Optional list or single display name per file.

Form(default=None)
embeddings_model str

Embeddings model to use for vectorization.

Form(default=DEFAULT_EMBEDDING_MODEL)
chunk_size int

Chunk size for splitting documents.

Form(default=DEFAULT_CHUNK_SIZE)
chunk_overlap int

Overlap between chunks.

Form(default=DEFAULT_CHUNK_OVERLAP)
requesting_user User

Authenticated user injected by dependency.

Depends(get_current_user)

Returns:

Type Description
dict

Service response with ingestion details.

Raises:

Type Description
HTTPException

404 if collection is not found; 403 if access is forbidden; 500 for processing errors.

Usage

files = [
    ("files", ("sentinel_overview.pdf", open("sentinel_overview.pdf", "rb"), "application/pdf")),
    ("files", ("copernicus_brief.txt", open("copernicus_brief.txt", "rb"), "text/plain")),
]

data = [
    ("metadata_urls", "https://example.org/sentinel_overview"),
    ("metadata_urls", "https://example.org/copernicus_brief"),
    ("metadata_names", "Sentinel Overview"),
    ("metadata_names", "Copernicus Brief"),
    ("chunk_size", "1024"),
    ("chunk_overlap", "100"),
]

resp = requests.post(
    f"{BASE_URL}/collections/{COLLECTION_ID}/documents",
    headers=headers,
    files=files,
    data=data,
    timeout=120,
)
resp.raise_for_status()
print(resp.json())

Explanation

Uploads and ingests one or more files into a target collection.

Notes

  • Supports multipart form data with repeated fields.
  • embeddings_model is optional.

List documents in a collection

GET /collections/{collection_id}/documents?page=1&limit=20

List documents in a collection.

Parameters:

Name Type Description Default
collection_id str

Collection identifier.

Path(..., description='Collection ID')
pagination Pagination

Pagination parameters.

Depends()
requesting_user User

Authenticated user injected by dependency.

Depends(get_current_user)

Returns:

Type Description
PaginatedResponse[Document]

Paginated documents for the collection.

Raises:

Type Description
HTTPException

404 if collection is not found; 403 if access is forbidden.

Usage

resp = requests.get(
    f"{BASE_URL}/collections/{COLLECTION_ID}/documents",
    params={"page": 1, "limit": 20},
    headers=headers,
    timeout=30,
)
resp.raise_for_status()
documents = resp.json()["data"]
print([(d["id"], d["name"]) for d in documents])

Explanation

Lists ingested documents for a collection.

Notes

  • Use returned IDs for document detail and delete routes.

Get one document

GET /collections/{collection_id}/documents/{document_id}

Get a specific document from a collection.

Parameters:

Name Type Description Default
collection_id str

Collection identifier.

Path(..., description='Collection ID')
document_id str

Document identifier.

Path(..., description='Document ID')
requesting_user User

Authenticated user injected by dependency.

Depends(get_current_user)

Returns:

Type Description
Document

Document details.

Raises:

Type Description
HTTPException

404 if not found; 400 if document not in collection; 403 if access is forbidden.

Usage

DOCUMENT_ID = documents[0]["id"]

resp = requests.get(
    f"{BASE_URL}/collections/{COLLECTION_ID}/documents/{DOCUMENT_ID}",
    headers=headers,
    timeout=30,
)
resp.raise_for_status()
print(resp.json())

Explanation

Returns metadata for a specific document in the collection.

Notes

  • Requires both collection_id and document_id.

Delete one document

DELETE /collections/{collection_id}/documents/{document_id}

Delete a document from a collection.

Removes the document record and attempts to delete associated vectors.

Parameters:

Name Type Description Default
collection_id str

Collection identifier.

Path(..., description='Collection ID')
document_id str

Document identifier.

Path(..., description='Document ID')
requesting_user User

Authenticated user injected by dependency.

Depends(get_current_user)

Returns:

Type Description
dict

Confirmation message.

Raises:

Type Description
HTTPException

404 if not found; 400 if document not in collection; 403 if deletion is forbidden.

Usage

resp = requests.delete(
    f"{BASE_URL}/collections/{COLLECTION_ID}/documents/{DOCUMENT_ID}",
    headers=headers,
    timeout=30,
)
resp.raise_for_status()
print(resp.json())

Explanation

Deletes a document and associated vectors.

Notes

  • Destructive operation; retrieval quality may change.

Full API reference

For exhaustive schema details, use Swagger API.