Document API
Document routes ingest and manage files inside user-owned collections.
API call order
- Create/list a private collection from collection APIs.
- Use
collection_idwith document upload/list/get/delete routes. - Use collection names in message/generate APIs for retrieval.
Shared request setup is documented once in API index.
Upload documents to a collection
POST /collections/{collection_id}/documents
Upload documents to a collection.
Stores document records and triggers asynchronous parsing, chunking, and vectorization for retrieval.
Parameters:
| Name | Type | Description | Default |
|---|---|---|---|
collection_id
|
str
|
Collection identifier. |
Path(..., description='Collection ID')
|
files
|
list[UploadFile]
|
One or more files to ingest. |
File(...)
|
metadata_urls
|
list[str] | str | None
|
Optional list or single URL per file. |
Form(default=None)
|
metadata_names
|
list[str] | str | None
|
Optional list or single display name per file. |
Form(default=None)
|
embeddings_model
|
str
|
Embeddings model to use for vectorization. |
Form(default=DEFAULT_EMBEDDING_MODEL)
|
chunk_size
|
int
|
Chunk size for splitting documents. |
Form(default=DEFAULT_CHUNK_SIZE)
|
chunk_overlap
|
int
|
Overlap between chunks. |
Form(default=DEFAULT_CHUNK_OVERLAP)
|
requesting_user
|
User
|
Authenticated user injected by dependency. |
Depends(get_current_user)
|
Returns:
| Type | Description |
|---|---|
dict
|
Service response with ingestion details. |
Raises:
| Type | Description |
|---|---|
HTTPException
|
404 if collection is not found; 403 if access is forbidden; 500 for processing errors. |
Usage
files = [
("files", ("sentinel_overview.pdf", open("sentinel_overview.pdf", "rb"), "application/pdf")),
("files", ("copernicus_brief.txt", open("copernicus_brief.txt", "rb"), "text/plain")),
]
data = [
("metadata_urls", "https://example.org/sentinel_overview"),
("metadata_urls", "https://example.org/copernicus_brief"),
("metadata_names", "Sentinel Overview"),
("metadata_names", "Copernicus Brief"),
("chunk_size", "1024"),
("chunk_overlap", "100"),
]
resp = requests.post(
f"{BASE_URL}/collections/{COLLECTION_ID}/documents",
headers=headers,
files=files,
data=data,
timeout=120,
)
resp.raise_for_status()
print(resp.json())
Explanation
Uploads and ingests one or more files into a target collection.
Notes
- Supports multipart form data with repeated fields.
embeddings_modelis optional.
List documents in a collection
GET /collections/{collection_id}/documents?page=1&limit=20
List documents in a collection.
Parameters:
| Name | Type | Description | Default |
|---|---|---|---|
collection_id
|
str
|
Collection identifier. |
Path(..., description='Collection ID')
|
pagination
|
Pagination
|
Pagination parameters. |
Depends()
|
requesting_user
|
User
|
Authenticated user injected by dependency. |
Depends(get_current_user)
|
Returns:
| Type | Description |
|---|---|
PaginatedResponse[Document]
|
Paginated documents for the collection. |
Raises:
| Type | Description |
|---|---|
HTTPException
|
404 if collection is not found; 403 if access is forbidden. |
Usage
resp = requests.get(
f"{BASE_URL}/collections/{COLLECTION_ID}/documents",
params={"page": 1, "limit": 20},
headers=headers,
timeout=30,
)
resp.raise_for_status()
documents = resp.json()["data"]
print([(d["id"], d["name"]) for d in documents])
Explanation
Lists ingested documents for a collection.
Notes
- Use returned IDs for document detail and delete routes.
Get one document
GET /collections/{collection_id}/documents/{document_id}
Get a specific document from a collection.
Parameters:
| Name | Type | Description | Default |
|---|---|---|---|
collection_id
|
str
|
Collection identifier. |
Path(..., description='Collection ID')
|
document_id
|
str
|
Document identifier. |
Path(..., description='Document ID')
|
requesting_user
|
User
|
Authenticated user injected by dependency. |
Depends(get_current_user)
|
Returns:
| Type | Description |
|---|---|
Document
|
Document details. |
Raises:
| Type | Description |
|---|---|
HTTPException
|
404 if not found; 400 if document not in collection; 403 if access is forbidden. |
Usage
DOCUMENT_ID = documents[0]["id"]
resp = requests.get(
f"{BASE_URL}/collections/{COLLECTION_ID}/documents/{DOCUMENT_ID}",
headers=headers,
timeout=30,
)
resp.raise_for_status()
print(resp.json())
Explanation
Returns metadata for a specific document in the collection.
Notes
- Requires both
collection_idanddocument_id.
Delete one document
DELETE /collections/{collection_id}/documents/{document_id}
Delete a document from a collection.
Removes the document record and attempts to delete associated vectors.
Parameters:
| Name | Type | Description | Default |
|---|---|---|---|
collection_id
|
str
|
Collection identifier. |
Path(..., description='Collection ID')
|
document_id
|
str
|
Document identifier. |
Path(..., description='Document ID')
|
requesting_user
|
User
|
Authenticated user injected by dependency. |
Depends(get_current_user)
|
Returns:
| Type | Description |
|---|---|
dict
|
Confirmation message. |
Raises:
| Type | Description |
|---|---|
HTTPException
|
404 if not found; 400 if document not in collection; 403 if deletion is forbidden. |
Usage
resp = requests.delete(
f"{BASE_URL}/collections/{COLLECTION_ID}/documents/{DOCUMENT_ID}",
headers=headers,
timeout=30,
)
resp.raise_for_status()
print(resp.json())
Explanation
Deletes a document and associated vectors.
Notes
- Destructive operation; retrieval quality may change.
Full API reference
For exhaustive schema details, use Swagger API.