西安交通大学Natural languageprocessingwith deeplearningXIANHAOTONGUNIVERSITYLanguage Model&Distributed Representation (5)交通大学ChenLicli@xjtu.edu.cn2023
Chen Li cli@xjtu.edu.cn 2023 Language Model & Distributed Representation (5) Natural language processing with deep learning
Outlines1.Self-attention2. Transformer3. Pre-training LM
Outlines 1. Self-attention 2. Transformer 3. Pre-training LM
Outlines1.Self-attention2. Transformer3. Pre-training LM
Outlines 1. Self-attention 2. Transformer 3. Pre-training LM
Self-attentionSelf-Attentionyt=f(at,A,B)Where AandB areanother sequence (matrix)交通大学
Self-attention l Where A and B are another sequence (matrix) l Self-Attention
Self-attentionSelf-Attentionyt = f(at, A, B)WhereA andB areanotherseguence (matrix)If take A(key)= B(value) = X(query), then it is called selfattention交通大学
Self-attention l Where A and B are another sequence (matrix) l If take A(key)= B(value) = X(query), then it is called self attention. l Self-Attention