GitHub Repo stars Transformer Compression with SliceGPT

Developments In their paper the authors reveal that a manner of replacing matrices with dense smaller dense matrices reducing the embedding dimensions. This can eliminate up to 25% of parameters (and embeddings) for LLama-2, and maintain 99% zero shot task performance across multiple models. image

image

image

Share link! 📋
Link copied!
See the main site!