Generating¶

Generating new data from an input involves selecting the next best token or sets of tokens given an input query, and an output logit vector.

There can be improving the ouput results by better pre-conditioning the prompts, improving the token generaton, and then by enabling iterative cycles as in test time inference that will iteratively produce chain of thought-like outputs that can produce vastly improved results at the cost of additional computation time and speed.

To improving the input prompts, information is found and used to augment the original query, and placing it in the context. This process is known as retrieval augmented generation (RAG) that can also work with explicit knowledge representations with knowledge graphs to augment any implicit knowledge embodied in LLMs.

Once submitted to the LLM, token generation can be improved by improving how output tokens are selected from predicted logits, helping to improve accuracy and latency.