The VecML RAG (Retrieval-Augmented Generation) API allows you to build powerful document search and question-answering systems. Upload documents, create searchable collections, and query them using natural language.
Upload and process PDFs, text files, and more
Semantic Multimodal search with AI-powered retrieval
End-to-end RAG pipeline with document indexing, retrieval and LLM generation
First, you need to obtain an API key from VecML:
All API requests should be made to the following base URL:
https://rag.vecml.com/api
Create a new document collection to organize your files.
POST /create_collection
collection_name
(string): Name for the new collectioncollection_id
(string): Unique identifier for the collection, remember this for later usecollection_name
(string): Name of the created collectionmessage
(string): Success messageimport requests
# Get your API key from https://account.vecml.com/user-api-keys
api_key = "your_api_key_here"
base_url = "https://rag.vecml.com/api"
headers = {
"X-API-Key": api_key,
"Content-Type": "application/json"
}
# Create a new collection
response = requests.post(
f"{base_url}/create_collection",
headers=headers,
json={"collection_name": "My Research Papers"}
)
result = response.json()
collection_id = result["collection_id"]
print(f"Created collection: {collection_id}")
# Get your API key from https://account.vecml.com/user-api-keys
curl -X POST "https://rag.vecml.com/api/create_collection" \
-H "X-API-Key: your_api_key_here" \
-H "Content-Type: application/json" \
-d '{"collection_name": "My Research Papers"}'
Upload and index a document in your collection for searchable content. If you want to index multiple files in the same collection and search via all the files in the future, call this endpoint multiple times with the same collection_id.
POST /index
collection_id
(string): ID of the target collectionfile
(file): Document file to upload (see supported formats below)use_mm
(boolean): Enable multimodal processing for images and figures, this will take longer to process but will be more accurate for documents with imagesmessage
(string): Success messagecollection_id
(string): ID of the collectionfilename
(string): Name of the uploaded filefile_size
(integer): Size of the file in bytesqueue_info
(string): Information about processing queue position# Index a document
files = {
'file': ('document.pdf', open('path/to/document.pdf', 'rb'), 'application/pdf')
}
data = {
'collection_id': collection_id,
'use_mm': 'false' # Set to 'true' for multimodal processing
}
response = requests.post(
f"{base_url}/index",
headers={"X-API-Key": api_key},
files=files,
data=data
)
result = response.json()
print(f"Indexed: {result['filename']} ({result['file_size']} bytes)")
curl -X POST "https://rag.vecml.com/api/index" \
-H "X-API-Key: your_api_key_here" \
-F "collection_id=your_collection_id" \
-F "file=@/path/to/document.pdf" \
-F "use_mm=false"
Query your indexed documents using natural language and get AI-generated answers based on the retrieved content.
POST /query
collection_id
(string): ID of the collection to searchquery
(string): Natural language question or search querymax_tokens
(integer, optional): Maximum tokens in response (default: 1000)llm_model
(string, optional): LLM model to use (default: "qwen3_8b")streaming
(boolean, optional): Enable streaming response (default: false)temperature
(float, optional): Response creativity (0.0-1.0, default: 0.7)max_input_tokens
(integer, optional): Maximum input tokens (default: 8000)system_prompt
(string, optional): Custom system prompt for the LLMqwen3_8b
qwen3_4b
gpt-4.1-nano
gpt-4o-mini
gpt-4.1-mini
gemini-2.0-flash
gemini-2.5-pro
claude-3-5-haiku
claude-4-sonnet
o3-mini
answer
(string): AI-generated answer based on retrieved contentusage
(object, optional): Token usage information with prompt_tokens, completion_tokens, and total_tokens# Query the collection with LLM response
query_data = {
"collection_id": collection_id,
"query": "What is the main topic of this document?",
"max_tokens": 2000,
"llm_model": "qwen3_8b", # Options: qwen3_8b, qwen3_4b, gemini-2.0-flash, gpt-4o-mini, claude-3-5-haiku, etc.
"streaming": False, # Set to True for streaming response
"temperature": 0.7,
"max_input_tokens": 8000,
"system_prompt": "You are a helpful assistant." # Optional custom system prompt
}
response = requests.post(
f"{base_url}/query",
headers=headers,
json=query_data
)
result = response.json()
print("LLM Answer:", result['answer'])
if result.get('usage'):
print(f"Token usage: {result['usage']['total_tokens']} total tokens")
curl -X POST "https://rag.vecml.com/api/query" \
-H "X-API-Key: your_api_key_here" \
-H "Content-Type: application/json" \
-d '{
"collection_id": "your_collection_id",
"query": "What is the main topic of this document?",
"max_tokens": 2000,
"llm_model": "qwen3_8b",
"streaming": false,
"temperature": 0.7
}'
Stream the response from the LLM as it generates the answer in real-time.
POST /query
streaming: true
text/plain
# Streaming query example
import requests
query_data = {
"collection_id": collection_id,
"query": "Summarize this document in detail.",
"max_tokens": 2000,
"llm_model": "qwen3_8b",
"streaming": True,
"temperature": 0.7
}
response = requests.post(
f"{base_url}/query",
headers=headers,
json=query_data,
stream=True
)
print("Streaming response:")
for chunk in response.iter_content(chunk_size=8, decode_unicode=True):
if chunk:
print(chunk, end='', flush=True)
print() # New line after streaming
curl -X POST "https://rag.vecml.com/api/query" \
-H "X-API-Key: your_api_key_here" \
-H "Content-Type: application/json" \
-d '{
"collection_id": "your_collection_id",
"query": "Summarize this document in detail.",
"max_tokens": 2000,
"llm_model": "qwen3_8b",
"streaming": true,
"temperature": 0.7
}' \
--no-buffer
List all collections belonging to your account.
GET /collections
No request body required. Authentication via API key header.
collections
(array): List of collection objectscollection_id
, collection_name
, created_at
, updated_at
# List all collections
response = requests.get(
f"{base_url}/collections",
headers={"X-API-Key": api_key}
)
result = response.json()
print(f"Found {len(result['collections'])} collections")
for collection in result['collections']:
print(f"- {collection['collection_name']} ({collection['collection_id']})")
curl -X GET "https://rag.vecml.com/api/collections" \
-H "X-API-Key: your_api_key_here"
Get detailed information about a specific collection including all indexed files.
GET /collections/{collection_id}
collection_id
(path parameter): ID of the collection to retrievecollection_id
(string): ID of the collectioncollection_name
(string): Name of the collectioncreated_at
, updated_at
(string): Timestampsfiles
(array): List of files with metadata (filename, size, upload date)# Get collection details
response = requests.get(
f"{base_url}/collections/{collection_id}",
headers={"X-API-Key": api_key}
)
result = response.json()
print(f"Collection: {result['collection_name']}")
print(f"Files: {len(result['files'])}")
for file_info in result['files']:
print(f"- {file_info['original_filename']} ({file_info['file_size']} bytes)")
curl -X GET "https://rag.vecml.com/api/collections/your_collection_id" \
-H "X-API-Key: your_api_key_here"
Permanently delete a collection and all its indexed documents. This action cannot be undone.
DELETE /collections/{collection_id}
collection_id
(path parameter): ID of the collection to deletemessage
(string): Success messagecollection_id
(string): ID of the deleted collectionfiles_deleted
(integer): Number of files removedstorage_freed
(integer): Amount of storage freed in bytes# Delete collection
response = requests.delete(
f"{base_url}/collections/{collection_id}",
headers={"X-API-Key": api_key}
)
result = response.json()
print(f"Deleted collection: {result['collection_id']}")
print(f"Files deleted: {result['files_deleted']}")
print(f"Storage freed: {result['storage_freed']} bytes")
curl -X DELETE "https://rag.vecml.com/api/collections/your_collection_id" \
-H "X-API-Key: your_api_key_here"