Skip to content

Recursive

recursive training involves the use of an LLM so improve the selection or variety of data for that LLM. This is well describe in data augmentation

The process of data simulation for AI typically involves two main steps:

  1. Training a Broad and Generalized Model: The first step involves training a broad and generalized model. This model is trained on a wide-ranging dataset and is capable of generating highly specific synthetic data.

  2. Training a Narrow and Task-Specific Model: The second step involves training a narrower, task-specific model on the synthetic data generated by the broad model. This task-specific model is tailored to the task at hand and can perform it with high accuracy.

graph LR
  A[Train Broad and Generalized Model] --> B[Generate Highly Specific Data]
  B --> C[Train Narrow and Task-Specific Model on Specific Data]

Research and Understanding

Beyond Model Collapse: Scaling Up with Synthesized Data Requires Reinforcement

Results: The authors show that training from feedback-augmented synthesized data, either by pruning incorrect predictions or by selecting the best of several guesses, can prevent model collapse.

GitHub Repo stars LLM2LLM: Boosting LLMs with Novel Iterative Data Enhancement

Developments The authors reveal in their paper a solution an iterative training and generation approach that enable effective fine tuning on low-data regimes. image

GitHub Repo stars Alpaca
Shepherd: A Critic for Language Model Generation A 7B model trained to critique outputs

Example chat response image

Baize: An Open-Source Chat Model with Parameter-Efficient Tuning on Self-Chat Data Parameter efficient LLama Tuning and risk minimization

with a new 'Self Distillation' with Feedback to improve itself even more. RESEARCH ONLY image

Self-Alignment with Instruction Backtranslation

image

The seed model is used to construct training examples by generating instruction prompts for web documents (self-augmentation), and then selecting high quality examples from among these candidates (self-curation). This data is then used to finetune a stronger model. F

WizardMath: Empowering Mathematical Reasoning for Large Language Models via Reinforced Evol-Instruct

Llama-2 based reinforcement enables substantial improvement over other models. image Paper

GitHub Repo stars Fabic is a technique to incorporate iterative feedback into the generative process of diffusion models based on StableDiffusion.

Paper

Error modes

It has been found that when models are trained on output generated by those models, they can lead collapse. This collapse occurs because patterns that are generated may not fully embody non-synthetic data, leading progressively worse patterns that are generated. With enough time, the results can be sufficiently ungrounded that they become gibberish. While there are manners of helping to prevent this from happening, including tightly controlling the formats and content of the inputs (and outputs) of the data, it is not guaranteed that the synthetic data will be as syntatically, semantically, or epistmelogically valid.