Skip to content

Pre-trained Models

Dynamic Field

It is impossible to keep up manually with all pre-trained models. For the most up-to-date information, refer to the Hugging Face Open LLM Leaderboard.

Because of the costs associated with aggregating sufficient data and performing large-scale training, it is often preferable to start with pre-trained models. They can be both open source and closed source in origin, and choosing between them will be an important decision related to project requirements.

To ensure models meet technical, customer, and organizational requirements, it is important to compare and evaluate them.

API-Based Models

API Access

Open Source Models

Latest Developments

Llama 3

Trained on 15T Multilingual tokens, with 405B trainable parameters: - Powerful data selection and synthesis strategy - Simple post-training with SFT, rejection sampling, and DPO - 4D Parallelism combining TP, PP, CP, and DP image

Parallelism approach: image

Multimodal training: image

Multimodal Models

MOLMO

High-quality image captioning using voice recordings: - Blog - Paper

Text Models

Llama 2

Open-source set of 7B-70B models: - Paper: Llama 2: Open Foundation and Fine-Tuned Chat Models - Strong performance across tasks image

Mistral

Released September 2023: - Announcement - Hugging Face image

Additional Text Models
Qwen

Open-source models including Qwen-72B and Qwen-1.8B: - Trained on 3T tokens of high-quality data - 32K context window length - Enhanced system prompt capability - Qwen-1.8B optimized for efficiency (3GB GPU memory) - GitHub Repository

Vision Models

Vision-Focused Models

Speech Models

Moshi

Speech-text foundation model for real-time dialogue

Closed Source Models

OpenAI o1

Next generation model with integrated chain of thought: - Improved complex reasoning and transparent explanations - Scales performance with inference compute - Introduces AGI-benchmark 1.0 with 27 categories - Demonstrates inference time scaling laws image - Reproducible Results - System Card

Gemini

Google's multimodal model: - Technical Report - AlphaCode2 Report image image

Additional Closed Source Models