Transformer模型介绍
文献:$Vaswani A, Shazeer N, Parmar N, et al. Attention is all you need[C]. Advances in neural information processing systems. 2017: 5998-6008.$
详解:http://nlp.seas.harvard.edu/2018/04/03/attention.html The Annotated Transformer.
视频:https://www.youtube.com/watch?v=ugWDIIOHtPA 李宏毅:Transformer
1. encoder-decoder
2. self attention
1 | import torch |