https://www.turing.com/kb/brief-introduction-to-transformers-and-their-power#the-transformer-encoder
Decoders pay attention only to the words before them, as opposed to encoders, which pay attention to every word regardless of order. As a result, the prediction for the word at the position, i, only depends on the words preceding it in the sequence.
No comments:
Post a Comment