Recursive
recursive training involves the use of an LLM so improve the selection or variety of data for that LLM. This is well describe in data augmentation
The process of data simulation for AI typically involves two main steps:
-
Training a Broad and Generalized Model: The first step involves training a broad and generalized model. This model is trained on a wide-ranging dataset and is capable of generating highly specific synthetic data.
-
Training a Narrow and Task-Specific Model: The second step involves training a narrower, task-specific model on the synthetic data generated by the broad model. This task-specific model is tailored to the task at hand and can perform it with high accuracy.
graph LR
A[Train Broad and Generalized Model] --> B[Generate Highly Specific Data]
B --> C[Train Narrow and Task-Specific Model on Specific Data]
Research and Understanding¶
Beyond Model Collapse: Scaling Up with Synthesized Data Requires Reinforcement
Results: The authors show that training from feedback-augmented synthesized data, either by pruning incorrect predictions or by selecting the best of several guesses, can prevent model collapse.
LLM2LLM: Boosting LLMs with Novel Iterative Data Enhancement
Developments The authors reveal in their paper a solution an iterative training and generation approach that enable effective fine tuning on low-data regimes.
Shepherd: A Critic for Language Model Generation A 7B model trained to critique outputs
Baize: An Open-Source Chat Model with Parameter-Efficient Tuning on Self-Chat Data Parameter efficient LLama Tuning and risk minimization
with a new 'Self Distillation' with Feedback to improve itself even more. RESEARCH ONLY
Self-Alignment with Instruction Backtranslation
The seed model is used to construct training examples by generating instruction prompts for web documents (self-augmentation), and then selecting high quality examples from among these candidates (self-curation). This data is then used to finetune a stronger model. F
WizardMath: Empowering Mathematical Reasoning for Large Language Models via Reinforced Evol-Instruct
Llama-2 based reinforcement enables substantial improvement over other models. Paper
Fabic is a technique to incorporate iterative feedback into the generative process of diffusion models based on StableDiffusion.
Error modes¶
It has been found that when models are trained on output generated by those models, they can lead collapse. This collapse occurs because patterns that are generated may not fully embody non-synthetic data, leading progressively worse patterns that are generated. With enough time, the results can be sufficiently ungrounded that they become gibberish. While there are manners of helping to prevent this from happening, including tightly controlling the formats and content of the inputs (and outputs) of the data, it is not guaranteed that the synthetic data will be as syntatically, semantically, or epistmelogically valid.