Selective Atteion Improves Transformer

The authors present a solution to minimize unecessary attention given to information based on updated understanding of its value. They create 'selective attention' that helps to ensure information is may not need to attend to other areas.

image

image

Advanced Transformer Blocks

Share link! 📋
Link copied!
See the main site!