VecML Logo

VecML RAG API

Introduction

The VecML RAG (Retrieval-Augmented Generation) API allows you to build powerful document search and question-answering systems. Upload documents, create searchable collections, and query them using natural language.

Document Indexing

Upload and process PDFs, text files, and more

Smart Search

Semantic Multimodal search with AI-powered retrieval

End2End RAG

End-to-end RAG pipeline with document indexing, retrieval and LLM generation

Before You Start

Get Your API Key

First, you need to obtain an API key from VecML:

  1. Sign up for a new account or log in to your existing account at https://account.vecml.com
  2. Visit https://account.vecml.com/user-api-keys
  3. Generate a new API key for your application, and please save it in a secure location

API Endpoints

Base URL

All API requests should be made to the following base URL:

https://rag.vecml.com/api

Create Collection

Create a new document collection to organize your files.

Endpoint

POST /create_collection

Input

  • collection_name (string): Name for the new collection

Output

  • collection_id (string): Unique identifier for the collection, remember this for later use
  • collection_name (string): Name of the created collection
  • message (string): Success message

Python Example

Python
import requests

# Get your API key from https://account.vecml.com/user-api-keys
api_key = "your_api_key_here"
base_url = "https://rag.vecml.com/api"

headers = {
    "X-API-Key": api_key,
    "Content-Type": "application/json"
}

# Create a new collection
response = requests.post(
    f"{base_url}/create_collection",
    headers=headers,
    json={"collection_name": "My Research Papers"}
)

result = response.json()
collection_id = result["collection_id"]
print(f"Created collection: {collection_id}")

cURL Example

Bash
# Get your API key from https://account.vecml.com/user-api-keys
curl -X POST "https://rag.vecml.com/api/create_collection" \
  -H "X-API-Key: your_api_key_here" \
  -H "Content-Type: application/json" \
  -d '{"collection_name": "My Research Papers"}'

Index Document

Upload and index a document in your collection for searchable content. If you want to index multiple files in the same collection and search via all the files in the future, call this endpoint multiple times with the same collection_id.

Endpoint

POST /index

Input

  • collection_id (string): ID of the target collection
  • file (file): Document file to upload (see supported formats below)
  • use_mm (boolean): Enable multimodal processing for images and figures, this will take longer to process but will be more accurate for documents with images

Supported File Types & Size Limits

Document Files
  • PDF: up to 25MB
  • DOCX: up to 5MB
  • DOC: up to 5MB
  • PPTX: up to 25MB
  • XLSX: up to 1MB
  • XLS: up to 1MB
Text Files
  • TXT: up to 1MB
  • CSV: up to 1MB
  • JSON: up to 1MB
  • JSONL: up to 1MB
  • Markdown: up to 1MB

Output

  • message (string): Success message
  • collection_id (string): ID of the collection
  • filename (string): Name of the uploaded file
  • file_size (integer): Size of the file in bytes
  • queue_info (string): Information about processing queue position

Python Example

Python
# Index a document
files = {
    'file': ('document.pdf', open('path/to/document.pdf', 'rb'), 'application/pdf')
}
data = {
    'collection_id': collection_id,
    'use_mm': 'false'  # Set to 'true' for multimodal processing
}

response = requests.post(
    f"{base_url}/index",
    headers={"X-API-Key": api_key},
    files=files,
    data=data
)

result = response.json()
print(f"Indexed: {result['filename']} ({result['file_size']} bytes)")

cURL Example

Bash
curl -X POST "https://rag.vecml.com/api/index" \
  -H "X-API-Key: your_api_key_here" \
  -F "collection_id=your_collection_id" \
  -F "file=@/path/to/document.pdf" \
  -F "use_mm=false"

Query

Query your indexed documents using natural language and get AI-generated answers based on the retrieved content.

Endpoint

POST /query

Input

  • collection_id (string): ID of the collection to search
  • query (string): Natural language question or search query
  • max_tokens (integer, optional): Maximum tokens in response (default: 1000)
  • llm_model (string, optional): LLM model to use (default: "qwen3_8b")
  • streaming (boolean, optional): Enable streaming response (default: false)
  • temperature (float, optional): Response creativity (0.0-1.0, default: 0.7)
  • max_input_tokens (integer, optional): Maximum input tokens (default: 8000)
  • system_prompt (string, optional): Custom system prompt for the LLM

Available Models

  • qwen3_8b
  • qwen3_4b
  • gpt-4.1-nano
  • gpt-4o-mini
  • gpt-4.1-mini
  • gemini-2.0-flash
  • gemini-2.5-pro
  • claude-3-5-haiku
  • claude-4-sonnet
  • o3-mini

Output

  • answer (string): AI-generated answer based on retrieved content
  • usage (object, optional): Token usage information with prompt_tokens, completion_tokens, and total_tokens

Note: See the next section for how to generate the streaming response.

Python Example (Non-streaming)

Python
# Query the collection with LLM response
query_data = {
    "collection_id": collection_id,
    "query": "What is the main topic of this document?",
    "max_tokens": 2000,
    "llm_model": "qwen3_8b",  # Options: qwen3_8b, qwen3_4b, gemini-2.0-flash, gpt-4o-mini, claude-3-5-haiku, etc.
    "streaming": False,    # Set to True for streaming response
    "temperature": 0.7,
    "max_input_tokens": 8000,
    "system_prompt": "You are a helpful assistant."  # Optional custom system prompt
}

response = requests.post(
    f"{base_url}/query",
    headers=headers,
    json=query_data
)

result = response.json()
print("LLM Answer:", result['answer'])
if result.get('usage'):
    print(f"Token usage: {result['usage']['total_tokens']} total tokens")

cURL Example (Non-streaming)

Bash
curl -X POST "https://rag.vecml.com/api/query" \
  -H "X-API-Key: your_api_key_here" \
  -H "Content-Type: application/json" \
  -d '{
    "collection_id": "your_collection_id",
    "query": "What is the main topic of this document?",
    "max_tokens": 2000,
    "llm_model": "qwen3_8b",
    "streaming": false,
    "temperature": 0.7
  }'

Streaming Query

Stream the response from the LLM as it generates the answer in real-time.

Endpoint

POST /query

Input

  • Same as Query above, but with streaming: true

Output

  • Text stream: Real-time streaming of the LLM-generated answer
  • Content-Type: text/plain

Tip: Use --no-buffer with curl and stream=True with requests to see the response as it's generated.

Python Example (Streaming)

Python
# Streaming query example
import requests

query_data = {
    "collection_id": collection_id,
    "query": "Summarize this document in detail.",
    "max_tokens": 2000,
    "llm_model": "qwen3_8b",
    "streaming": True,
    "temperature": 0.7
}

response = requests.post(
    f"{base_url}/query",
    headers=headers,
    json=query_data,
    stream=True
)

print("Streaming response:")
for chunk in response.iter_content(chunk_size=8, decode_unicode=True):
    if chunk:
        print(chunk, end='', flush=True)
print()  # New line after streaming

cURL Example (Streaming)

Bash
curl -X POST "https://rag.vecml.com/api/query" \
  -H "X-API-Key: your_api_key_here" \
  -H "Content-Type: application/json" \
  -d '{
    "collection_id": "your_collection_id",
    "query": "Summarize this document in detail.",
    "max_tokens": 2000,
    "llm_model": "qwen3_8b",
    "streaming": true,
    "temperature": 0.7
  }' \
  --no-buffer

List Collections

List all collections belonging to your account.

Endpoint

GET /collections

Input

No request body required. Authentication via API key header.

Output

  • collections (array): List of collection objects
  • Each collection contains: collection_id, collection_name, created_at, updated_at

Python Example

Python
# List all collections
response = requests.get(
    f"{base_url}/collections",
    headers={"X-API-Key": api_key}
)

result = response.json()
print(f"Found {len(result['collections'])} collections")
for collection in result['collections']:
    print(f"- {collection['collection_name']} ({collection['collection_id']})")

cURL Example

Bash
curl -X GET "https://rag.vecml.com/api/collections" \
  -H "X-API-Key: your_api_key_here"

Get Collection Details

Get detailed information about a specific collection including all indexed files.

Endpoint

GET /collections/{collection_id}

Input

  • collection_id (path parameter): ID of the collection to retrieve

Output

  • collection_id (string): ID of the collection
  • collection_name (string): Name of the collection
  • created_at, updated_at (string): Timestamps
  • files (array): List of files with metadata (filename, size, upload date)

Python Example

Python
# Get collection details
response = requests.get(
    f"{base_url}/collections/{collection_id}",
    headers={"X-API-Key": api_key}
)

result = response.json()
print(f"Collection: {result['collection_name']}")
print(f"Files: {len(result['files'])}")
for file_info in result['files']:
    print(f"- {file_info['original_filename']} ({file_info['file_size']} bytes)")

cURL Example

Bash
curl -X GET "https://rag.vecml.com/api/collections/your_collection_id" \
  -H "X-API-Key: your_api_key_here"

Delete Collection

Permanently delete a collection and all its indexed documents. This action cannot be undone.

Endpoint

DELETE /collections/{collection_id}

Input

  • collection_id (path parameter): ID of the collection to delete

Output

  • message (string): Success message
  • collection_id (string): ID of the deleted collection
  • files_deleted (integer): Number of files removed
  • storage_freed (integer): Amount of storage freed in bytes

Python Example

Python
# Delete collection
response = requests.delete(
    f"{base_url}/collections/{collection_id}",
    headers={"X-API-Key": api_key}
)

result = response.json()
print(f"Deleted collection: {result['collection_id']}")
print(f"Files deleted: {result['files_deleted']}")
print(f"Storage freed: {result['storage_freed']} bytes")

cURL Example

Bash
curl -X DELETE "https://rag.vecml.com/api/collections/your_collection_id" \
  -H "X-API-Key: your_api_key_here"