当前位置：和泉文库 > 计算机 > 浏览文档

《深度自然语言处理》课程教学课件（Natural language processing with deep learning）09 Language Model & Distributed Representation（6/6）

文件格式：PDF，文件大小：1.37MB，售价：7.12元

文档详细内容（约51页）

Pre-training LMPretraining Language Models with three architecturesLanguage models!Whatwe've seen so far.DecodersNice to generate from; can't condition on future wordsGetsbidirectional context-can condition onfuture!EncodersWait,how do we pretrain them?交道大学Encoder-Goodpartsofdecodersandencoders?Whats thebest wayto pretrain them?Decoders

Pre-training LM l Pretraining Language Models with three architectures Decoders • Language models! What we've seen so far. • Nice to generate from; can't condition on future words Encoders • Gets bidirectional context – can condition on future! • Wait, how do we pretrain them? Encoder- Decoders • Good parts of decoders and encoders? • What’s the best way to pretrain them?

Pre-trainingLMPretrainingLanguage ModelswiththreearchitecturesPretraining for three types of architecturesThe neural architecture influences the type of pretraining, and natural use cases.Languagemodels!Whatwe'veseensofarDecodersNice to generate from; can't condition on future wordsancondiiononfutEncoderswepretrairihem?逸大Encoderparts-ofdecoDecoders

Pre-training LM Pretraining for three types of architectures The neural architecture influences the type of pretraining, and natural use cases. • Language models! What we’ve seen so far. • Nice to generate from; can’t condition on future words Decoders • Gets bidirectional context – can condition on future! • Wait, how do we pretrain them? Encoders Encoder- Decoders • Good parts of decoders and encoders? • What’s the best way to pretrain them? l Pretraining Language Models with three architectures

Pre-trainingLMPretrainingdecodersWhen using language modelpretrained decoders,wecan ignorethattheyweretrainedtomodel(/1:-1)交通大学

Pre-training LM When using language model pretrained decoders, we can ignore that they were trained to model �(� |�1: −1 ) l Pretraining decoders

Pre-training LMPretraining decodersWhen using language modelpretrained decoders,wecan ignorethattheyweretrainedtomodel(/1:-1)?/?A,bLinearWe canfinetunethembytraininga classifieronthelastword'shiddenstate.hi,..,hrh,....h, =Decoder(wi..., Wr)y~ Awr +bWi..,WT交通大[Notehowthelinearlayerhasn'tbeenpretrainedandmustbelearnedfromscratch.]

Pre-training LM When using language model pretrained decoders, we can ignore that they were trained to model �(� |�1: −1 ) We can finetune them by training a classifier on the last word’s hidden state. 1 1 ,., Decoder( ,., ) T T h h  w w ~ T y Aw  b [Note how the linear layer hasn’t been pretrained and must be learned from scratch.] l Pretraining decoders

Pre-training LMPretraining decodersWhen using language modelpretrained decoders,wecan ignorethat they were trained to model ( 11: -1)/?A,bLinearWecan finetunethembytraininga classifier onthelastword'shiddenstate.hi,..,hrh,.... h, = Decoder(w...., W.)y~ Awr +bWi,...,WT通大Where andarerandomly initialized[Notehowthe linear layerhasn'tbeenandspecifiedbythedownstreamtaskpretrainedandmustbelearnedfromscratch.]Gradientsbackpropagatethroughthewholenetwork

Pre-training LM When using language model pretrained decoders, we can ignore that they were trained to model �(� |�1: −1 ) We can finetune them by training a classifier on the last word’s hidden state. 1 1 ,., Decoder( ,., ) T T h h  w w ~ T y Aw  b Where � and � are randomly initialized and specified by the downstream task. Gradients backpropagate through the whole network. [Note how the linear layer hasn’t been pretrained and must be learned from scratch.] l Pretraining decoders

点击进入文档下载页（PDF格式）

共51页，可试读17页，点击继续阅读 ↓↓

您可能感兴趣的文档

《深度自然语言处理》课程教学课件（Natural language processing with deep learning）08 Language Model & Distributed Representation（5/6）
《深度自然语言处理》课程教学课件（Natural language processing with deep learning）07 Language Model & Distributed Representation（4/6）
《深度自然语言处理》课程教学课件（Natural language processing with deep learning）05 Language Model & Distributed Representation（2/6）
《深度自然语言处理》课程教学课件（Natural language processing with deep learning）06 Language Model & Distributed Representation（3/6）
《深度自然语言处理》课程教学课件（Natural language processing with deep learning）04 Language Model & Distributed Representation（1/6）
《深度自然语言处理》课程教学课件（Natural language processing with deep learning）03 Fundamental Tasks of NLP
《深度自然语言处理》课程教学课件（Natural language processing with deep learning）01 About the course
《深度自然语言处理》课程教学课件（Natural language processing with deep learning）02 What is NLP, why NLP and How NLP
佛山大学（佛山科学技术学院）：2022年版计算机科学与技术专业理论课程教学大纲汇编
佛山大学（佛山科学技术学院）：2022年版物联网实践课程教学大纲汇编
佛山大学（佛山科学技术学院）：2022年版智能科学与技术专业理论课程教学大纲汇编
佛山大学（佛山科学技术学院）：2022年版物联网实验课程教学大纲汇编
《深度自然语言处理》课程教学课件（Natural language processing with deep learning）12 sentiment analysis
《深度自然语言处理》课程教学课件（Natural language processing with deep learning）11 coreference resolution
《深度自然语言处理》课程教学课件（Natural language processing with deep learning）10 information extraction
《深度自然语言处理》课程教学课件（Natural language processing with deep learning）15 Machine translation
《深度自然语言处理》课程教学课件（Natural language processing with deep learning）14 Question Answering
《深度自然语言处理》课程教学课件（Natural language processing with deep learning）16 Natural Language Generation
《深度自然语言处理》课程教学课件（Natural language processing with deep learning）17 Deep leanring Programing framework
全国信息安全标准化技术委员会：大数据安全标准化白皮书（2018 版）
沈阳师范大学：《大学计算机基础》课程教学大纲 Fundamentals of University Computer A
沈阳师范大学：《大学计算机基础》课程授课教案（讲义，共五章，任课教师：刘冰）
《大学计算机基础》课程教学资源（教案讲义，共五章，沈阳师范大学：刘冰）
《大学计算机基础》课程教学大纲 Fundamentals of University Computer A

点击购买下载（PDF）

下载及服务说明

购买前请先查看本文档预览页，确认内容后再进行支付；
如遇文件无法下载、无法访问或其它任何问题，可发送电子邮件反馈，核实后将进行文件补发或退款等其它相关操作；
邮箱：

文档浏览记录