Sunday, December 10, 2023

Longformer Self Attention

 

https://huggingface.co/docs/transformers/model_doc/longformer

Longformer Self Attention

Longformer self attention employs self attention on both a “local” context and a “global” context. Most tokens only attend “locally” to each other meaning that each token attends to its12 previous tokens and12 succeeding tokens with being the window length as defined in config.attention_window. Note that config.attention_window can be of type List to define a different for each layer. A selected few tokens attend “globally” to all other tokens, as it is conventionally done for all tokens in BertSelfAttention.

Note that “locally” and “globally” attending tokens are projected by different query, key and value matrices. Also note that every “locally” attending token not only attends to tokens within its window, but also to all “globally” attending tokens so that global attention is symmetric.

The user can define which tokens attend “locally” and which tokens attend “globally” by setting the tensor global_attention_mask at run-time appropriately. All Longformer models employ the following logic for global_attention_mask:

  • 0: the token attends “locally”,
  • 1: the token attends “globally”

No comments:

Post a Comment