How to Train Data-Efficient LLMs
Developments The authors compare a number of sampling methods and demonstrate that an LLM that choosing high quality pre-training data with a simple prompt can result in outperforming models that converge 70% faster while rejecting 90% of data. The resulting model they call Ask-LLM
.
Method The authors a number of sampling methods including those that were heuristic-based including compute-efficient density/perplexity estimation. The models that were most , gains were primarily found when using a
###
This is a pretraining .... datapoint.
###
Does the previous paragraph demarcated within ### and ### contain info
rmative signal for pre-raining a large-language model?
An informaive datapoint should be well-formatted, contain some usable knowledge of the world, and strictly NOT have any harmful, racist, sexist, etc. content.
OPTIONS:
- yes
- no