Text Embeddings
Distillation
Large Language Models
Gecko
MTEB
Retrieval Performance
Gecko: Versatile Text Embeddings Distilled from Large Language Models

The Gecko embedding model represents a significant leap towards compact and versatile text embeddings. By extracting knowledge from large language models (LLMs), Gecko offers an enhanced retrieval performance. The research presents a distinctive two-step distillation process:

  1. Generating diverse synthetic data pairs using an LLM
  2. Refining these pairs through retrieving candidate passages and relabeling with the LLM Standout features include:
  • A compact model with 256 embedding dimensions outperforming larger embeddings
  • Achieving an average score of 66.31 on the Massive Text Embedding Benchmark (MTEB)
  • Competitiveness with models seven times larger and with higher dimensional embeddings

As a distilled model, Gecko offers a more resource-efficient solution without compromising performance. This work suggests a promising direction for future endeavors in creating leaner AI models that retain or exceed the capabilities of their sizeable predecessors. Such developments are crucial as we continue striving for scalable and sustainable AI technologies that deliver advanced functionalities with reduced computational demands.

Personalized AI news from scientific papers.