What is cosine similarity and why use it for embeddings?

Cosine similarity measures the angle between two vectors, ranging from -1 (opposite) to 1 (identical). It is preferred over Euclidean distance for embeddings because it is invariant to vector magnitude — two sentences have the same similarity regardless of their length.

How many tokens can I embed in one request?

The text-embedding-3 models accept up to 8,191 tokens per input string. Longer documents must be chunked before embedding. You can embed multiple strings in one request using the input array, up to the API rate limit.

Should I use text-embedding-3-small or 3-large?

Start with text-embedding-3-small. It is 5x cheaper with comparable quality for most tasks. Upgrade to 3-large only if you observe measurable retrieval quality improvements on your specific dataset and queries.

Generate Text Embeddings with OpenAI

Text embeddings convert natural language into dense numeric vectors that capture semantic meaning. Two sentences with similar meaning produce similar vectors (high cosine similarity) even if they share no words, while semantically unrelated sentences produce distant vectors. OpenAI's text-embedding-3-small and text-embedding-3-large models are the standard choice for semantic search, RAG pipelines, clustering, and classification tasks that require meaning-aware text representations. This example shows an embeddings API request for a set of sentences and explains how to interpret the resulting vectors for similarity search. The text-embedding-3-small model produces 1,536-dimensional vectors and costs $0.02 per million tokens — about 2,000 times cheaper than GPT-4o for the same text volume. For most RAG applications, text-embedding-3-small is the right default; use text-embedding-3-large only when you have benchmarked a quality gap on your specific data. For RAG pipelines, embed both your knowledge base documents (at indexing time) and the user's query (at search time), then find the top-k documents with highest cosine similarity. The dimensionality parameter allows you to reduce vector dimensions for storage efficiency — reducing from 1,536 to 512 dimensions cuts storage and search latency by 3x with only modest quality reduction.

Example

{
  "model": "text-embedding-3-small",
  "input": [
    "How do I reset my password?",
    "What is the process for changing my account password?",
    "I forgot my login credentials",
    "How do I cancel my subscription?"
  ],
  "dimensions": 1536,
  "encoding_format": "float"
}

[ open in OpenAI Request Builder → ]

FAQ

What is cosine similarity and why use it for embeddings?: Cosine similarity measures the angle between two vectors, ranging from -1 (opposite) to 1 (identical). It is preferred over Euclidean distance for embeddings because it is invariant to vector magnitude — two sentences have the same similarity regardless of their length.
How many tokens can I embed in one request?: The text-embedding-3 models accept up to 8,191 tokens per input string. Longer documents must be chunked before embedding. You can embed multiple strings in one request using the input array, up to the API rate limit.
Should I use text-embedding-3-small or 3-large?: Start with text-embedding-3-small. It is 5x cheaper with comparable quality for most tasks. Upgrade to 3-large only if you observe measurable retrieval quality improvements on your specific dataset and queries.

Related Examples

Build an OpenAI Chat Completion Request

The Chat Completion API is the primary interface for all GPT models and the foun...

Format CSV Data for AI Fine-Tuning

Fine-tuning LLMs on custom datasets requires converting raw training data into t...

Split a Long Document for AI Processing

When a document is too long to fit in a single LLM context window, it must be sp...