Nomic Embed v2
Free.ai (self-hosted)
·
embeddings
·
~100 tokens per call
Nomic Embed v2 is an embedding model built by Nomic AI. Strongest at Retrieval augmented generation with flexible vector sizes.. Self-hosted on Free.ai GPUs — runs free against your daily token pool (100 tokens per call). Released under Apache 2.0 — commercial use permitted on Free.ai.
Use via API
OpenAI-compatible REST API. Generate a key and call this model in seconds.
curl -X POST https://api.free.ai/v1/image/generate/ \
-H "Authorization: Bearer sk-free-..." \
-H "Content-Type: application/json" \
-d '{"model":"nomic-embed-v2","prompt":"your prompt here"}'
API Documentation
Get API Key
Frequently Asked Questions
Nomic Embed v2 converts text into a dense vector (a list of floats) that captures meaning. Use it for semantic search, clustering, recommendation, retrieval-augmented generation (RAG), and any task where "is this text similar to that text" matters.
Typical dimensions are 384, 768, 1024, or 1536 depending on the model. BGE-M3 emits 1024-dim; OpenAI Ada emits 1536. The API response includes the dimension so your vector DB picks the right index.
Modern embedding models (including most options on Free.ai) are trained on 100+ languages. Cross-language retrieval works — search in English, match documents in Spanish.
512 to 8,192 tokens depending on the model. Longer inputs are truncated — chunk long documents into paragraphs before embedding.
Nomic Embed v2 runs on our own GPUs and is among the cheapest tools — about ~100 tokens per call drawn from your daily free pool. $5 = 200K tokens.
Yes — POST a list of strings to /v1/embeddings/ and Nomic Embed v2 returns a list of vectors in the same order. Batch size up to 2,048 per request.
L2-normalized by default — cosine similarity = dot product. Pass `normalize=false` if you want raw vectors for a different distance metric.
Any — Pinecone, Weaviate, Qdrant, Chroma, pgvector, FAISS, LanceDB. Nomic Embed v2 returns plain JSON floats; the DB never sees the model.
Yes — POST to /v1/embeddings/ with model="Nomic Embed v2". OpenAI-compatible response shape, so existing client libraries work unchanged. /api/ has the full reference.
Self-hosted models keep your text on our GPUs and discard it after the call returns. Premium pass through with a DPA. We do not train on your inputs.
Sub-100ms for short text on self-hosted, 100–500ms on premium. Batch calls scale roughly linearly — 1,000 chunks complete in 2–10 seconds.
Yes — Free.ai grants commercial use of embeddings. Build production search, RAG pipelines, recommendation systems with no per-vector royalty.