Pre-trainingLMGenerativePretrained Transformer(GPT)2018's GPT was a big success in pretraining a decoder!Transformerdecoderwith12layers768-dimensionalhiddenstates,3072-dimensionalfeed-forwardhiddenlayers交通大学RadfordA,NarasimhanK,SalimansT,etal.Improving language understanding by generative pre-training[J].2018
Pre-training LM l Generative Pretrained Transformer (GPT) 2018’s GPT was a big success in pretraining a decoder! • Transformer decoder with 12 layers. • 768-dimensional hidden states, 3072-dimensional feed-forward hidden layers. Radford A, Narasimhan K, Salimans T, et al. Improving language understanding by generative pre-training[J]. 2018
Pre-trainingLMGenerativePretrained Transformer (GPT)2018's GPTwas a big success in pretraining a decoder!Transformerdecoderwith12layers768-dimensionalhiddenstates,3072-dimensionalfeed-forwardhiddenlayersTrained onBooksCorpus:over70o0uniquebooksContains long spansofcontiguoustext,for learning long-distancedependencies交道大学RadfordA,NarasimhanK,SalimansT,etal.Improving language understanding by generative pre-training[J].2018
Pre-training LM l Generative Pretrained Transformer (GPT) 2018’s GPT was a big success in pretraining a decoder! • Transformer decoder with 12 layers. • 768-dimensional hidden states, 3072-dimensional feed-forward hidden layers. • Trained on BooksCorpus: over 7000 unique books. • Contains long spans of contiguous text, for learning long-distance dependencies. Radford A, Narasimhan K, Salimans T, et al. Improving language understanding by generative pre-training[J]. 2018