Skip to content

Training and finetuning

The big-bang like expansion of AI has led to a surge in services, methods, frameworks, and tools that enhance the creation and deployment of models from start to finish. Although there are end-to-end providers for generating valuable GenAI solutions, there is immense value in implementing and experimenting with your own stacks.

Additionally, there are useful libraries and tools worth exploring.

tldr; Here are the prominent frameworks

  • Langchain is an early system with a principled design that allows for extensive applications to be built with it.
  • Llama Ecosystem is a community of Llama-focused modelers, based on the Meta model called Llama, Llama-2, and beyond.
  • A number of others.

The rapid development in Generative AI tooling makes it challenging to keep up with the development and deprecation of powerful frameworks and tools. Some of the mentioned references may not be fully completed, or even nascent repos to build their intended purposes (described here). Please let us know if we are missing anything here.

Layer 1: Foundation

Starting with base programming languages, increasingly higher-level frameworks enable training and calling of AI models. Higher-level orchestration libraries and platforms allow creating and evaluating chains, agents, and systems that sometimes use visual interfaces. These can often be augmented with various tools/packages/repositories. On top of these involve mostly or all-complete frameworks and platforms that enable nearly complete.

Base languages

Prominent languages include python, C++/CUDA, and Javascript.

AI software libraries

PyTorch is a popular python-focused system for creating and using AI.
Tensorflow is a popular multi-language eco-system for creating and using AI.
JAX is a library enabling composable transformations of Python+NumPy programs: differentiate, vectorize, JIT to GPU/TPU, and more
spAcy is a library for advanced Natural Language Processing in Python and Cython.

Higher level

Pytorch Lightning Enables model training with Pytorch and minimizes the boilerplate

Model parallelism

??? abstract "Pytorch Lightning Thunder

GitHub Repo stars Deep Speed (by MSFT) empowers ChatGPT-like model training with a single click, offering 15x speedup over SOTA RLHF systems with unprecedented cost reduction at all scales

Blog on Deepspeed Ulysses image

DeepSpeed-Ulysses uses a simple, portable, and effective methodology for enabling highly efficient and scalable LLM training with extremely long sequence lengths "DeepSpeed-Ulysses partitions individual samples along the sequence dimension among participating GPU. Then right before the attention computation, it employs all-to-all communication collective on the partitioned queries, keys and values such that each GPU receives the full sequence but only for a non-overlapping subset of the attention heads. This allows the participating GPUs to compute attention for different attention heads in parallel. Finally, DeepSpeed-Ulysses employs another all-to-all to gather the results along the attention heads while re-partitioning along the sequence dimension." Ulysses Tutorial here and blog on DeepSpeed ZeRO++

GitHub Repo stars Levanter (not just LLMS) Codebase for training FMs with JAX.

Release Using Haliax for naming tensors field names instead of indexes. (for example Batch, Feature....). Full sharding and distributable/parallelizable.

RL4LMs by microsoft A modular RL library to fine-tune language models to human preferences.

paper

Ray

Fine Tuning

GitHub Repo stars LLM Finetuning Hub is an evolving model finetuning codebase.

References