Developing architectures
Here we share novel and promising architectures that may supplement or supplant other presently established models.
Models¶
REPRESENTATION ENGINEERING: A TOP-DOWN APPROACH TO AI TRANSPARENCY
Developments The authors create a manner of extracting conceptual relations within models by prompting them, and examining the layer-wise activations associated with that word, and a linear model is trained to identify the direction principal to activating that concept. The reading vector forms the the principal componentassociated with that concept can be most liketly added to the output to enhance that quality. This leads to the potential to directly create alignments, hallucination control, and other targeted revisions of output.
Bayesian Flow Networks A new class of generative models for discrete and continuous data and generation
Retentive Network: A successor to Transformer for Large Language Models Important LLM-like system using similar components that may help it to be more scaleable than O(N^2)
memory and O(N)
inference complexity.
Memoria stores and retrieves information called engram at multiple memory levels of working memory, short-term memory, and long-term memory, using connection weights that change according to Hebb’s rule.
Structured State Space Sequence Models (SSSSMs)¶
Structured state space sequence models are a class of models that generally combine RNNs, convolutions with inspiration from state-space methods.
Well-known methods include:
MambaByte¶
Operating on bytes directly instead of relying on encoding representation and subword tokenization and modality offers models greater flexability and versatility. Attending to the increased context length, which has been enabled by SSSSMs
Mamba: Linear-Time Sequence Modeling with Selective State Spaces
Their method provides potential highly parallelizable that operates on very long contexts.
Others¶
HyenaDNA: Long-Range Genomic Sequence Modeling at Single Nucleotide Resolution Uses inspiration from FFT to create a drop-in replacement for Transformer models.
Retentive Network: A successor to Transformer for Large Language Models Important LLM-like system using similar components that may help it to be more scaleable than O(N^2)
memory and O(N)
inference complexity.
- Linear Attention
- H3
- RWKV Paper