当前位置：和泉文库 > 计算机 > 浏览文档

广东工业大学：《机器学习》课程教学资源（课件讲义）第19讲 ViT及注意力机制改进（Vision Transformers ,ViTs）

文件格式：PDF，文件大小：2.62MB，售价：7.39元

文档详细内容（约32页）

MACHINE LEARNING BERKELEY Naive Solution (imageGPT) Another problem:time complexity o Recall:transformers are O(n^2)w.r.t.input length o AND input length is O(n^2)w.r.t.length of each side o 256x 256 image=>65536 pixels o For reference,BERT only has a max length of 512 tokens ● Solution:just use smaller images Imao o Max size of 64 x 64 Trained on a similar objective to language models(next pixel prediction instead of next token prediction)

Naive Solution (imageGPT) ● Another problem: time complexity :( ○ Recall: transformers are O(n^2) w.r.t. input length ○ AND input length is O(n^2) w.r.t. length of each side ○ 256 x 256 image => 65536 pixels ○ For reference, BERT only has a max length of 512 tokens ● Solution: just use smaller images lmao ○ Max size of 64 x 64 ● Trained on a similar objective to language models (next pixel prediction instead of next token prediction)

MACHINE LEARNING BERKELEY The good PRE-TRAINED ON ●' Nice image representations EVALUATION MODEL ACCURACY LARSLAE CIFAR-10 ResNet-15210 94.0 ● SOTA on semi-supervised classification Linear Probe SimCLR12 95.3 o Task:classification with limited labeled samples iGPT-L 32x32 96.3 CIFAR-100 ResNet-152 78.0 0 Model:linear classifer on iGPT representations Linear Probe SimCLR 80.2 o Competitive results with a naive method iGPT-L32x32 82.8 lots of compute ● Nice image generations o Effective at modeling visual information

The good ● Nice image representations ● SOTA on semi-supervised classification ○ Task: classification with limited labeled samples ○ Model: linear classifier on iGPT representations ○ Competitive results with a naive method + lots of compute ● Nice image generations ○ Effective at modeling visual information

MACHINE LEARNING BERKELEY The bad lots of compute

The bad lots of compute

MACHINE LEARNING BERKELEY The bad ●' "We train iGPT-S,iGPT-M,and iGPT-L,transformers containing 76M,455M,and 1.4B parameters respectively,on ImageNet.We also train iGPT-XL,a 6.8 billion parameter transformer,on a mix of ImageNet and images from the web." ● "iGPT-L was trained for roughly 2500 V100-days while a similarly performing MoCo model can be trained in roughly 70 V100-days" o For reference,MoCo is another self-supervised model but it has a ResNet backbone that is capable of handling a 224 x 224 image resolution All that for only a 64x64 resolution!

The bad ● “We train iGPT-S, iGPT-M, and iGPT-L, transformers containing 76M, 455M, and 1.4B parameters respectively, on ImageNet. We also train iGPT-XL, a 6.8 billion parameter transformer, on a mix of ImageNet and images from the web.” ● “iGPT-L was trained for roughly 2500 V100-days while a similarly performing MoCo model can be trained in roughly 70 V100-days” ○ For reference, MoCo is another self-supervised model but it has a ResNet backbone that is capable of handling a 224 x 224 image resolution ● All that for only a 64x64 resolution!

MACHINE LEARNING BERKELEY So...why? ● Mostly a proof of concept Paradigm of transformers +m a ss i ve self-supervised pre-training but applied to a new domain o A general method for learning representations o Same method,new modes

So… why? ● Mostly a proof of concept ● Paradigm of transformers + m a s s i v e self-supervised pre-training but applied to a new domain ○ A general method for learning representations ○ Same method, new modes

点击进入文档下载页（PDF格式）

共32页，可试读12页，点击继续阅读 ↓↓

您可能感兴趣的文档

广东工业大学：《机器学习》课程教学资源（课件讲义）第18讲变换器模型 Transformer
广东工业大学：《机器学习》课程教学资源（PPT讲稿）第18讲变换器模型 Transformer
广东工业大学：《机器学习》课程教学资源（课件讲义）第17讲注意力机制（自注意力）
广东工业大学：《机器学习》课程教学资源（PPT讲稿）第17讲注意力机制（概述）
广东工业大学：《机器学习》课程教学资源（PPT讲稿）第16讲现代循环神经网络（嵌入向量, 词嵌入, 子词嵌入, 全局向量的词嵌入）
广东工业大学：《机器学习》课程教学资源（PPT讲稿）第16讲现代循环神经网络（编码器解码器，Seq2seq模型，束搜索）
广东工业大学：《机器学习》课程教学资源（PPT讲稿）第16讲现代循环神经网络（高级循环神经网络）
广东工业大学：《机器学习》课程教学资源（课件讲义）第15讲无监督学习——降维深度学习可视化（Neighbor Embedding，LLE T-SNE）
广东工业大学：《机器学习》课程教学资源（课件讲义）第15讲无监督学习——降维深度学习可视化（PCA Kmeans）
广东工业大学：《机器学习》课程教学资源（课件讲义）第14讲循环神经网络（RNN）
广东工业大学：《机器学习》课程教学资源（课件讲义）第13讲卷积神经网络计算机视觉应用（目标检测，计算机视觉训练技巧）
广东工业大学：《机器学习》课程教学资源（课件讲义）第13讲卷积神经网络计算机视觉应用（Inception, 批量归一化和残差网络ResNet）
广东工业大学：《机器学习》课程教学资源（课件讲义）第19讲 ViT及注意力机制改进（各式各样的Attention）
广东工业大学：《机器学习》课程教学资源（课件讲义）第20讲预训练模型 Pre-training of Deep Bidirectional Transformers for Language Understanding（授课：周郭许）
广东工业大学：《机器学习》课程教学资源（课件讲义）第21讲生成式网络模型（自编码器 Deep Auto-encoder）
广东工业大学：《机器学习》课程教学资源（课件讲义）第21讲生成式网络模型（VAE Generation）
广东工业大学：《机器学习》课程教学资源（课件讲义）第22讲生成式网络模型（Diffusion Model）
广东工业大学：《机器学习》课程教学资源（课件讲义）第22讲生成式网络模型（Stable Diffusion）
北京信息科技大学：计算机学院各专业课程教学大纲汇编
北京信息科技大学：计算中心及图书馆课程教学大纲汇编
新乡学院：数学与统计学院信息与计算科学专业《数学分析Ⅰ》课程教学大纲（2015）
新乡学院：数学与统计学院信息与计算科学专业《数学分析Ⅱ》课程教学大纲（2015）
新乡学院：数学与统计学院信息与计算科学专业《数学分析Ⅲ》课程教学大纲（2015）
新乡学院：数学与统计学院信息与计算科学专业《高等代数Ⅰ》课程教学大纲（2015）

点击购买下载（PDF）

下载及服务说明

购买前请先查看本文档预览页，确认内容后再进行支付；
如遇文件无法下载、无法访问或其它任何问题，可发送电子邮件反馈，核实后将进行文件补发或退款等其它相关操作；
邮箱：

文档浏览记录