Purpose-built embedding models
These models are optimized specifically for tasks like semantic search and document similarity comparison.Model name | model size |
---|---|
nomic-ai/nomic-embed-text-v1.5 (recommended) | 137M |
nomic-ai/nomic-embed-text-v1 | 137M |
WhereIsAI/UAE-Large-V1 | 335M |
thenlper/gte-large | 335M |
thenlper/gte-base | 109M |
BAAI/bge-base-en-v1.5 | 109M |
BAAI/bge-small-en-v1.5 | 33M |
mixedbread-ai/mxbai-embed-large-v1 | 335M |
sentence-transformers/all-MiniLM-L6-v2 | 23M |
sentence-transformers/paraphrase-multilingual-MiniLM-L12-v2 | 118M |
LLM-based embedding models
Fireworks also supports retrieving embeddings from LLM-based models. These powerful inference models also produce embeddings useful for the same tasks as purpose-build embedding models. Generally, any LLM architecture that is compatible with Fireworks can be used for embeddings through the/v1/embeddings
endpoint. This includes all the architectures supported for uploading custom models, such as Llama, Qwen, DeepSeek, Mistral, Mixtral, and many others.
Here are some examples of LLM-based embedding models that work with the embeddings API:
Model name |
---|
fireworks/llama4-scout-instruct-basic |
fireworks/glm-4p5 |
fireworks/gpt-oss-20b |
fireworks/kimi-k2-instruct |
fireworks/qwen3-30b-a3b |
fireworks/deepseek-r1 |
Embedding documents
The embedding model inputs text and outputs a vector (list) of floating point numbers to use for tasks like similarity comparisons and search. Our embedding service is OpenAI compatible. Refer to OpenAI’s embeddings guide and OpenAI’s embeddings documentation for more information on using these models.Python (OpenAI 1.x)
search_document: Spiderman was a particularly entertaining movie with...
and returns the following
Response
/v1/embeddings
endpoint with your chosen model.
Embedding queries and document
In the previous example, you might have noticed thesearch_document:
prefix. Nomic models have been fine-tuned to take prefixes, so for user queries, you will need add thesearch_query:
prefix, and for documents, you need to prefix with search_document:
Here’s a quick example:
- Let’s say we previously used the embedding model to embed many movie reviews that we stored in a vector database. All the documents should have been prefixed with
search_document:
- We now want to create a movie recommendation that takes in a user query and outputs recommendations based on this data. The code below demonstrates how to embed the user query and system prompt.
Python (OpenAI 1.x)
Variable dimensions
The model also supports variable embedding dimension sizes. In this case, we can provide dimension as a query to theembeddings.create()
request
Python (OpenAI 1.x)