Developing architectures

Here we share novel and promising architectures that may supplement or supplant other presently established models.

Models¶

📋

REPRESENTATION ENGINEERING: A TOP-DOWN APPROACH TO AI TRANSPARENCY

Developments The authors create a manner of extracting conceptual relations within models by prompting them, and examining the layer-wise activations associated with that word, and a linear model is trained to identify the direction principal to activating that concept. The reading vector forms the the principal componentassociated with that concept can be most liketly added to the output to enhance that quality. This leads to the potential to directly create alignments, hallucination control, and other targeted revisions of output.

Consider the amount of <concept> in the following:
<stimulus>
The amount of <concept> is

Bayesian Flow Networks A new class of generative models for discrete and continuous data and generation

Score Entropy Discrete Diffusion (SEDD)

Developments SEDD addresses a key limitation in diffusion models by extending them effectively to discrete data domains like natural language. The authors propose "score entropy," a novel loss function that naturally extends score matching to discrete spaces. SEDD significantly outperforms existing language diffusion models (reducing perplexity by 25-75%) and is competitive with autoregressive models like GPT-2. Compared to autoregressive models, SEDD generates more faithful text without requiring temperature scaling, can trade compute for quality (achieving similar quality with 32× fewer network evaluations), and enables controllable text infilling beyond just left-to-right generation.

GitHub Repository

Retentive Network: A successor to Transformer for Large Language Models Important LLM-like system using similar components that may help it to be more scaleable than O(N^2) memory and O(N) inference complexity.

Memoria stores and retrieves information called engram at multiple memory levels of working memory, short-term memory, and long-term memory, using connection weights that change according to Hebb's rule.

Paper

Structured State Space Sequence Models (SSSSMs)¶

Structured state space sequence models are a class of models that generally combine RNNs, convolutions with inspiration from state-space methods.

Well-known methods include:

MambaByte¶

Operating on bytes directly instead of relying on encoding representation and subword tokenization and modality offers models greater flexability and versatility. Attending to the increased context length, which has been enabled by SSSSMs

MambaByte: Token-free Selective State Space Model

MegaByte-Pytorch Github

Mamba: Linear-Time Sequence Modeling with Selective State Spaces

Their method provides potential highly parallelizable that operates on very long contexts.

Others¶

HyenaDNA: Long-Range Genomic Sequence Modeling at Single Nucleotide Resolution Uses inspiration from FFT to create a drop-in replacement for Transformer models.

Paper for Hyena Architecture

Retentive Network: A successor to Transformer for Large Language Models Important LLM-like system using similar components that may help it to be more scaleable than O(N^2) memory and O(N) inference complexity.

Linear Attention
H3
RWKV Paper