Skip to content

Pre trained models

🚧 It is impossible to keep up manually 🚧

Because it is not possible to manually maintain open-source references, we refer the reader to the Hugging face OPen LLM Leaderboard

Still, below, we relay several important and foundational ones.

Pre trained models

Because of the costs associated with aggregating sufficient data and performing large-scale training it is often preferrable to start with pre-trained models. They can be both open source and closed source in origin, and choosing between the two of them will be an important decisions related to project requirements.

Whatever pre-trained model that you use, to ensure they meet technical, customer, and organizational requirements it is important to by compare and evaluate them.

Below we share important models.

APIs based model usage

Open Source

Text-focused

Llama is a library and set of models that has an expanding community due to the generally open-source nature of high-quality Llama 2 model.

Llama 2: Open Foundation and Fine-Tuned Chat Models A nearly open source set of 7B-70B models with quality performance" image

Sept, 2023 GitHub Repo stars Mistral Transformer

Announcement Hugging Face image

[Qwen]

Open-source : Qwen-72B and Qwen-1.8B! Including Base, Chat and Quantized versions.

🌟 Qwen-72B has been trained on high-quality data consisting of 3T tokens, boasting a larger parameter scale and more training data to achieve a comprehensive performance upgrade. Additionally, we have expanded the context window length to 32K and enhanced the system prompt capability, allowing users to customize their own AI assistant with just a single prompt.

🎁 Qwen-1.8B is our additional gift to the research community, striking a balance between maintaining essential functionalities and maximizing efficiency, generating 2K-length text content with just 3GB of GPU memory.

🤗 https://huggingface.co/Qwen 🤖 https://github.com/QwenLM/Qwen

Vision focused

Abstract

https://arxiv.org/pdf/2403.17297.pdf The evolution of Large Language Models (LLMs) like ChatGPT and GPT-4 has sparked discussions on the advent of Artificial General Intelligence (AGI). However, replicating such advancements in open-source models has been challenging. This paper introduces InternLM2, an open-source LLM that outperforms its predecessors in comprehensive evaluations across 6 dimensions and 30 benchmarks, long-context modeling, and open-ended subjective evaluations through innovative pre-training and optimization techniques. The pre-training process of InternLM2 is meticulously detailed, highlighting the preparation of diverse data types including text, code, and long-context data. InternLM2 efficiently captures long-term dependencies, initially trained on 4k tokens before advancing to 32k tokens in pre-training and fine-tuning stages, exhibiting remarkable performance on the 200k ``Needle-in-a-Haystack" test. InternLM2 is further aligned using Supervised Fine-Tuning (SFT) and a novel Conditional Online Reinforcement Learning from Human Feedback (COOL RLHF) strategy that addresses conflicting human preferences and reward hacking. By releasing InternLM2 models in different training stages and model sizes, we provide the community with insights into the model's evolution.

Multimodal

Closed Source

Gemini

Report Tech Report image image