Embedding

Embeddings compress a string of tokens into a high-dimensional representation. They are preferably contextually aware, meaning different strings of tokens will have a different embedding.

Embeddings are can be used used to generate the next-expected token, evaluating text similarities, and with the similarity identification a way to do search is necessary in RAG

Embeddings are generally depend on the tokenization methods.

graph LR
    Text --> Token
    Token --> C[Token Embedding]
    C --> D[Sequence Embedding]
    D --> E[Changeable LLM]

    subgraph Embedding["Embedding Model"]
        C
        D
    end

In order to separate the representation, allowing greater freedom in evaluating downstream architectures and permitting enduring lookup ability with RAG, these models can be part of a larger and more complex models for sequence generation.

Text and Code Embeddings by Contrastrive Pre-Training

The authors demonstrate using contrastive pre-training can yield high-quality vector representations of text and code.

Sentence-BERT: Sentence Embeddings using Siamese BERT-Networks

Paper

📋

Matryoshka Representation Learning

The authors demonstrate MLR, which can encode information at different granularities allowing a single embedding to be be used for different downstream tasks.

ELE Embeddings

ELE Provides spherical embeddings based on descriptional logic. This allows for representation which works nicely with knoelged-graphs and ontologies.

Paper

Fastembed with qdrant

Light & Fast embedding model

Quantized model weights
ONNX Runtime, no PyTorch dependency
CPU-first design
Data-parallelism for encoding of large datasets
Accuracy/Recall

Better than OpenAI Ada-002
Default is Flag Embedding, which is top of the MTEB leaderboard
List of supported models - including multilingual models

Evaluating¶

Massive Text Embedding Benchmark

Paper

Blogs and posts¶

Openai GPT-3 text embeddings