https://huggingface.co/docs/transformers/model_doc/longformer
Longformer Self Attention
Longformer self attention employs self attention on both a “local” context and a “global” context. Most tokens only attend “locally” to each other meaning that each token attends to its previous tokens and succeeding tokens with being the window length as defined in config.attention_window
. Note that config.attention_window
can be of type List
to define a different for each layer. A selected few tokens attend “globally” to all other tokens, as it is conventionally done for all tokens in BertSelfAttention
.
Note that “locally” and “globally” attending tokens are projected by different query, key and value matrices. Also note that every “locally” attending token not only attends to tokens within its window, but also to all “globally” attending tokens so that global attention is symmetric.
The user can define which tokens attend “locally” and which tokens attend “globally” by setting the tensor global_attention_mask
at run-time appropriately. All Longformer models employ the following logic for global_attention_mask
:
- 0: the token attends “locally”,
- 1: the token attends “globally”
No comments:
Post a Comment