VecML RAG API Documentation

Introduction

The VecML RAG (Retrieval-Augmented Generation) API allows you to build powerful document search and question-answering systems. Upload documents, create searchable collections, and query them using natural language.

Document Indexing

Upload and process PDFs, text files, and more

Smart Search

Semantic Multimodal search with AI-powered retrieval

End2End RAG

End-to-end RAG pipeline with document indexing, retrieval and LLM generation

Before You Start

Get Your API Key

First, you need to obtain an API key from VecML:

Sign up for a new account or log in to your existing account at https://account.vecml.com
Visit https://account.vecml.com/user-api-keys
Generate a new API key for your application, and please save it in a secure location

API Endpoints

Base URL

All API requests should be made to the following base URL:

https://rag.vecml.com/api

Create Collection

Create a new document collection to organize your files.

Endpoint

POST /create_collection

Input

collection_name (string): Name for the new collection

Output

collection_id (string): Unique identifier for the collection, remember this for later use
collection_name (string): Name of the created collection
message (string): Success message

Python Example

Python

import requests

# Get your API key from https://account.vecml.com/user-api-keys
api_key = "your_api_key_here"
base_url = "https://rag.vecml.com/api"

headers = {
    "X-API-Key": api_key,
    "Content-Type": "application/json"
}

# Create a new collection
response = requests.post(
    f"{base_url}/create_collection",
    headers=headers,
    json={"collection_name": "My Research Papers"}
)

result = response.json()
collection_id = result["collection_id"]
print(f"Created collection: {collection_id}")

cURL Example

Bash

# Get your API key from https://account.vecml.com/user-api-keys
curl -X POST "https://rag.vecml.com/api/create_collection" \
  -H "X-API-Key: your_api_key_here" \
  -H "Content-Type: application/json" \
  -d '{"collection_name": "My Research Papers"}'

Index Document

Upload and index a document in your collection for searchable content. If you want to index multiple files in the same collection and search via all the files in the future, call this endpoint multiple times with the same collection_id.

Endpoint

POST /index

Input

collection_id (string): ID of the target collection
file (file): Document file to upload (see supported formats below)
use_mm (boolean): Enable multimodal processing for images and figures, this will take longer to process but will be more accurate for documents with images

Supported File Types & Size Limits

Document Files

• PDF: up to 25MB
• DOCX: up to 5MB
• DOC: up to 5MB
• PPTX: up to 25MB
• XLSX: up to 1MB
• XLS: up to 1MB

Text Files

• TXT: up to 1MB
• CSV: up to 1MB
• JSON: up to 1MB
• JSONL: up to 1MB
• Markdown: up to 1MB

Output

message (string): Success message
collection_id (string): ID of the collection
filename (string): Name of the uploaded file
file_size (integer): Size of the file in bytes
queue_info (string): Information about processing queue position

Python Example

Python

# Index a document
files = {
    'file': ('document.pdf', open('path/to/document.pdf', 'rb'), 'application/pdf')
}
data = {
    'collection_id': collection_id,
    'use_mm': 'false'  # Set to 'true' for multimodal processing
}

response = requests.post(
    f"{base_url}/index",
    headers={"X-API-Key": api_key},
    files=files,
    data=data
)

result = response.json()
print(f"Indexed: {result['filename']} ({result['file_size']} bytes)")

cURL Example

Bash

curl -X POST "https://rag.vecml.com/api/index" \
  -H "X-API-Key: your_api_key_here" \
  -F "collection_id=your_collection_id" \
  -F "file=@/path/to/document.pdf" \
  -F "use_mm=false"

Query

Query your indexed documents using natural language and get AI-generated answers based on the retrieved content.

Endpoint

POST /query

Input

collection_id (string): ID of the collection to search
query (string): Natural language question or search query
max_tokens (integer, optional): Maximum tokens in response (default: 1000)
llm_model (string, optional): LLM model to use (default: "qwen3_8b")
streaming (boolean, optional): Enable streaming response (default: false)
temperature (float, optional): Response creativity (0.0-1.0, default: 0.7)
max_input_tokens (integer, optional): Maximum input tokens (default: 8000)
system_prompt (string, optional): Custom system prompt for the LLM

Available Models

qwen3_8b
qwen3_4b
gpt-4.1-nano
gpt-4o-mini
gpt-4.1-mini

gemini-2.0-flash
gemini-2.5-pro
claude-3-5-haiku
claude-4-sonnet
o3-mini

Output

answer (string): AI-generated answer based on retrieved content
usage (object, optional): Token usage information with prompt_tokens, completion_tokens, and total_tokens

Note: See the next section for how to generate the streaming response.

Python Example (Non-streaming)

Python

# Query the collection with LLM response
query_data = {
    "collection_id": collection_id,
    "query": "What is the main topic of this document?",
    "max_tokens": 2000,
    "llm_model": "qwen3_8b",  # Options: qwen3_8b, qwen3_4b, gemini-2.0-flash, gpt-4o-mini, claude-3-5-haiku, etc.
    "streaming": False,    # Set to True for streaming response
    "temperature": 0.7,
    "max_input_tokens": 8000,
    "system_prompt": "You are a helpful assistant."  # Optional custom system prompt
}

response = requests.post(
    f"{base_url}/query",
    headers=headers,
    json=query_data
)

result = response.json()
print("LLM Answer:", result['answer'])
if result.get('usage'):
    print(f"Token usage: {result['usage']['total_tokens']} total tokens")

cURL Example (Non-streaming)

Bash

curl -X POST "https://rag.vecml.com/api/query" \
  -H "X-API-Key: your_api_key_here" \
  -H "Content-Type: application/json" \
  -d '{
    "collection_id": "your_collection_id",
    "query": "What is the main topic of this document?",
    "max_tokens": 2000,
    "llm_model": "qwen3_8b",
    "streaming": false,
    "temperature": 0.7
  }'

Streaming Query

Stream the response from the LLM as it generates the answer in real-time.

Endpoint

POST /query

Input

Same as Query above, but with streaming: true

Output

Text stream: Real-time streaming of the LLM-generated answer
Content-Type: text/plain

Tip: Use --no-buffer with curl and stream=True with requests to see the response as it's generated.

Python Example (Streaming)

Python

# Streaming query example
import requests

query_data = {
    "collection_id": collection_id,
    "query": "Summarize this document in detail.",
    "max_tokens": 2000,
    "llm_model": "qwen3_8b",
    "streaming": True,
    "temperature": 0.7
}

response = requests.post(
    f"{base_url}/query",
    headers=headers,
    json=query_data,
    stream=True
)

print("Streaming response:")
for chunk in response.iter_content(chunk_size=8, decode_unicode=True):
    if chunk:
        print(chunk, end='', flush=True)
print()  # New line after streaming

cURL Example (Streaming)

Bash

curl -X POST "https://rag.vecml.com/api/query" \
  -H "X-API-Key: your_api_key_here" \
  -H "Content-Type: application/json" \
  -d '{
    "collection_id": "your_collection_id",
    "query": "Summarize this document in detail.",
    "max_tokens": 2000,
    "llm_model": "qwen3_8b",
    "streaming": true,
    "temperature": 0.7
  }' \
  --no-buffer

List Collections

List all collections belonging to your account.

Endpoint

GET /collections

Input

No request body required. Authentication via API key header.

Output

collections (array): List of collection objects
Each collection contains: collection_id, collection_name, created_at, updated_at

Python Example

Python

# List all collections
response = requests.get(
    f"{base_url}/collections",
    headers={"X-API-Key": api_key}
)

result = response.json()
print(f"Found {len(result['collections'])} collections")
for collection in result['collections']:
    print(f"- {collection['collection_name']} ({collection['collection_id']})")

cURL Example

Bash

curl -X GET "https://rag.vecml.com/api/collections" \
  -H "X-API-Key: your_api_key_here"

Get Collection Details

Get detailed information about a specific collection including all indexed files.

Endpoint

GET /collections/{collection_id}

Input

collection_id (path parameter): ID of the collection to retrieve

Output

collection_id (string): ID of the collection
collection_name (string): Name of the collection
created_at, updated_at (string): Timestamps
files (array): List of files with metadata (filename, size, upload date)

Python Example

Python

# Get collection details
response = requests.get(
    f"{base_url}/collections/{collection_id}",
    headers={"X-API-Key": api_key}
)

result = response.json()
print(f"Collection: {result['collection_name']}")
print(f"Files: {len(result['files'])}")
for file_info in result['files']:
    print(f"- {file_info['original_filename']} ({file_info['file_size']} bytes)")

cURL Example

Bash

curl -X GET "https://rag.vecml.com/api/collections/your_collection_id" \
  -H "X-API-Key: your_api_key_here"

Delete Collection

Permanently delete a collection and all its indexed documents. This action cannot be undone.

Endpoint

DELETE /collections/{collection_id}

Input

collection_id (path parameter): ID of the collection to delete

Output

message (string): Success message
collection_id (string): ID of the deleted collection
files_deleted (integer): Number of files removed
storage_freed (integer): Amount of storage freed in bytes

Python Example

Python

# Delete collection
response = requests.delete(
    f"{base_url}/collections/{collection_id}",
    headers={"X-API-Key": api_key}
)

result = response.json()
print(f"Deleted collection: {result['collection_id']}")
print(f"Files deleted: {result['files_deleted']}")
print(f"Storage freed: {result['storage_freed']} bytes")

cURL Example

Bash

curl -X DELETE "https://rag.vecml.com/api/collections/your_collection_id" \
  -H "X-API-Key: your_api_key_here"