Pre-trained Models¶
Dynamic Field
It is impossible to keep up manually with all pre-trained models. For the most up-to-date information, refer to the Hugging Face Open LLM Leaderboard.
Because of the costs associated with aggregating sufficient data and performing large-scale training, it is often preferable to start with pre-trained models. They can be both open source and closed source in origin, and choosing between them will be an important decision related to project requirements.
To ensure models meet technical, customer, and organizational requirements, it is important to compare and evaluate them.
API-Based Models¶
API Access
- OpenAI: Access to GPT models through API
- Hugging Face Transformers: Popular library for transformer models
Open Source Models¶
Latest Developments¶
Llama 3
Trained on 15T Multilingual tokens, with 405B trainable parameters:
- Powerful data selection and synthesis strategy
- Simple post-training with SFT, rejection sampling, and DPO
- 4D Parallelism combining TP, PP, CP, and DP
Multimodal Models¶
Text Models¶
Llama 2
Open-source set of 7B-70B models:
- Paper: Llama 2: Open Foundation and Fine-Tuned Chat Models
- Strong performance across tasks
Mistral
Released September 2023:
- Announcement
- Hugging Face
Additional Text Models
Qwen
Open-source models including Qwen-72B and Qwen-1.8B: - Trained on 3T tokens of high-quality data - 32K context window length - Enhanced system prompt capability - Qwen-1.8B optimized for efficiency (3GB GPU memory) - GitHub Repository
Vision Models¶
Vision-Focused Models
Speech Models¶
Moshi
Speech-text foundation model for real-time dialogue
Closed Source Models¶
OpenAI o1
Next generation model with integrated chain of thought:
- Improved complex reasoning and transparent explanations
- Scales performance with inference compute
- Introduces AGI-benchmark 1.0 with 27 categories
- Demonstrates inference time scaling laws
- Reproducible Results
- System Card
Gemini
Google's multimodal model:
- Technical Report
- AlphaCode2 Report