Document API

Document routes ingest and manage files inside user-owned collections.

API call order

Create/list a private collection from collection APIs.
Use collection_id with document upload/list/get/delete routes.
Use collection names in message/generate APIs for retrieval.

Shared request setup is documented once in API index.

Upload documents to a collection

POST /collections/{collection_id}/documents

Upload documents to a collection.

Stores document records and triggers asynchronous parsing, chunking, and vectorization for retrieval.

Parameters:

Name	Type	Description	Default
`collection_id`	`str`	Collection identifier.	`Path(..., description='Collection ID')`
`files`	`list[UploadFile]`	One or more files to ingest.	`File(...)`
`metadata_urls`	`list[str] \| str \| None`	Optional list or single URL per file.	`Form(default=None)`
`metadata_names`	`list[str] \| str \| None`	Optional list or single display name per file.	`Form(default=None)`
`embeddings_model`	`str`	Embeddings model to use for vectorization.	`Form(default=DEFAULT_EMBEDDING_MODEL)`
`chunk_size`	`int`	Chunk size for splitting documents.	`Form(default=DEFAULT_CHUNK_SIZE)`
`chunk_overlap`	`int`	Overlap between chunks.	`Form(default=DEFAULT_CHUNK_OVERLAP)`
`requesting_user`	`User`	Authenticated user injected by dependency.	`Depends(get_current_user)`

Returns:

Type	Description
`dict`	Service response with ingestion details.

Raises:

Type	Description
`HTTPException`	404 if collection is not found; 403 if access is forbidden; 500 for processing errors.

Usage

files = [
    ("files", ("sentinel_overview.pdf", open("sentinel_overview.pdf", "rb"), "application/pdf")),
    ("files", ("copernicus_brief.txt", open("copernicus_brief.txt", "rb"), "text/plain")),
]

data = [
    ("metadata_urls", "https://example.org/sentinel_overview"),
    ("metadata_urls", "https://example.org/copernicus_brief"),
    ("metadata_names", "Sentinel Overview"),
    ("metadata_names", "Copernicus Brief"),
    ("chunk_size", "1024"),
    ("chunk_overlap", "100"),
]

resp = requests.post(
    f"{BASE_URL}/collections/{COLLECTION_ID}/documents",
    headers=headers,
    files=files,
    data=data,
    timeout=120,
)
resp.raise_for_status()
print(resp.json())

Explanation

Uploads and ingests one or more files into a target collection.

Notes

Supports multipart form data with repeated fields.
embeddings_model is optional.

List documents in a collection

GET /collections/{collection_id}/documents?page=1&limit=20

List documents in a collection.

Parameters:

Name	Type	Description	Default
`collection_id`	`str`	Collection identifier.	`Path(..., description='Collection ID')`
`pagination`	`Pagination`	Pagination parameters.	`Depends()`
`requesting_user`	`User`	Authenticated user injected by dependency.	`Depends(get_current_user)`

Returns:

Type	Description
`PaginatedResponse[Document]`	Paginated documents for the collection.

Raises:

Type	Description
`HTTPException`	404 if collection is not found; 403 if access is forbidden.

Usage

resp = requests.get(
    f"{BASE_URL}/collections/{COLLECTION_ID}/documents",
    params={"page": 1, "limit": 20},
    headers=headers,
    timeout=30,
)
resp.raise_for_status()
documents = resp.json()["data"]
print([(d["id"], d["name"]) for d in documents])

Explanation

Lists ingested documents for a collection.

Notes

Use returned IDs for document detail and delete routes.

Get one document

GET /collections/{collection_id}/documents/{document_id}

Get a specific document from a collection.

Parameters:

Name	Type	Description	Default
`collection_id`	`str`	Collection identifier.	`Path(..., description='Collection ID')`
`document_id`	`str`	Document identifier.	`Path(..., description='Document ID')`
`requesting_user`	`User`	Authenticated user injected by dependency.	`Depends(get_current_user)`

Returns:

Type	Description
`Document`	Document details.

Raises:

Type	Description
`HTTPException`	404 if not found; 400 if document not in collection; 403 if access is forbidden.

Usage

DOCUMENT_ID = documents[0]["id"]

resp = requests.get(
    f"{BASE_URL}/collections/{COLLECTION_ID}/documents/{DOCUMENT_ID}",
    headers=headers,
    timeout=30,
)
resp.raise_for_status()
print(resp.json())

Explanation

Returns metadata for a specific document in the collection.

Notes

Requires both collection_id and document_id.

Delete one document

DELETE /collections/{collection_id}/documents/{document_id}

Delete a document from a collection.

Removes the document record and attempts to delete associated vectors.

Parameters:

Name	Type	Description	Default
`collection_id`	`str`	Collection identifier.	`Path(..., description='Collection ID')`
`document_id`	`str`	Document identifier.	`Path(..., description='Document ID')`
`requesting_user`	`User`	Authenticated user injected by dependency.	`Depends(get_current_user)`

Returns:

Type	Description
`dict`	Confirmation message.

Raises:

Type	Description
`HTTPException`	404 if not found; 400 if document not in collection; 403 if deletion is forbidden.

Usage

resp = requests.delete(
    f"{BASE_URL}/collections/{COLLECTION_ID}/documents/{DOCUMENT_ID}",
    headers=headers,
    timeout=30,
)
resp.raise_for_status()
print(resp.json())

Explanation

Deletes a document and associated vectors.

Notes

Destructive operation; retrieval quality may change.

Full API reference

For exhaustive schema details, use Swagger API.