当前位置：和泉文库 > 计算机 > 浏览文档

广东工业大学：《机器学习》课程教学资源（课件讲义）第19讲 ViT及注意力机制改进（Vision Transformers ,ViTs）

文件格式：PDF，文件大小：2.62MB，售价：7.39元

文档详细内容（约32页）

MACHINE LEARNING BERKELEY Vision Transformers (ViTs) By:ML@B Edu Team

Vision Transformers (ViTs) By: ML@B Edu Team

MACHINE LEARNING BERKELEY Motivation Transformers work well for text-what happens if we use them on images? Transformers have some nice properties that could be useful for computer vision o ex.scalability.global receptive fields

Motivation ● Transformers work well for text → what happens if we use them on images? ● Transformers have some nice properties that could be useful for computer vision ○ ex. scalability, global receptive fields

MACHINE LEARNING BERKELEY Recall:Transformer Architecture Start with text string 1.→text tokens 2.-text embedding vectors(via embedding dictionary) 3.text/position embedding vectors 4.stacks transformer layers(self-attention normalization residual connections MLP blocks) 5.→CLS token 6.-attach classification head and do prediction,etc. Commonly trained with a self-supervised objective(ex.next token prediction)

Recall: Transformer Architecture Start with text string 1. → text tokens 2. → text embedding vectors (via embedding dictionary) 3. → text/position embedding vectors 4. → stacks transformer layers (self-attention + normalization + residual connections + MLP blocks) 5. → CLS token 6. → attach classification head and do prediction, etc. Commonly trained with a self-supervised objective (ex. next token prediction)

MACHINE LEARNING BERKELEY Problem! Start with text string 1.→text tokens 2.-text embedding vectors(via embedding dictionary) 3.text/position embedding vectors 4.stacks transformer layers(self-attention normalization residual connections MLP blocks) 5.→CLS token 6.attach classification head and do prediction,etc. Commonly trained with a self-supervised objective(ex.next token prediction)

Problem! Start with text string 1. → text tokens 2. → text embedding vectors (via embedding dictionary) 3. → text/position embedding vectors 4. → stacks transformer layers (self-attention + normalization + residual connections + MLP blocks) 5. → CLS token 6. → attach classification head and do prediction, etc. Commonly trained with a self-supervised objective (ex. next token prediction)

MACHINE LEARNING BERKELEY Naive Solution(imageGPT) Paper:"Generative Pretraining from Pixels" Pixels are kinda discrete-just treat each color value like a separate word in your vocabulary！ o Each pixel is commonly represented by a 24 bit value(integers in the range [O,255]for each of the 3 color channels) o Vocab size of 2^24=16,777,216! Who needs that many colors anyway? o Use a 9 bit representation (integers in the range [0,8]for each of the 3 color channels) o Vocab size of 512 Read pixels from raster order(row by row from left to right)to get input sequence

Naive Solution (imageGPT) Paper: “Generative Pretraining from Pixels” ● Pixels are kinda discrete — just treat each color value like a separate word in your vocabulary! ○ Each pixel is commonly represented by a 24 bit value (integers in the range [0, 255] for each of the 3 color channels) ○ Vocab size of 2^24 = 16,777,216! ● Who needs that many colors anyway? ○ Use a 9 bit representation (integers in the range [0, 8] for each of the 3 color channels) ○ Vocab size of 512 ● Read pixels from raster order (row by row from left to right) to get input sequence

点击进入文档下载页（PDF格式）

共32页，可试读12页，点击继续阅读 ↓↓

您可能感兴趣的文档

广东工业大学：《机器学习》课程教学资源（课件讲义）第18讲变换器模型 Transformer
广东工业大学：《机器学习》课程教学资源（PPT讲稿）第18讲变换器模型 Transformer
广东工业大学：《机器学习》课程教学资源（课件讲义）第17讲注意力机制（自注意力）
广东工业大学：《机器学习》课程教学资源（PPT讲稿）第17讲注意力机制（概述）
广东工业大学：《机器学习》课程教学资源（PPT讲稿）第16讲现代循环神经网络（嵌入向量, 词嵌入, 子词嵌入, 全局向量的词嵌入）
广东工业大学：《机器学习》课程教学资源（PPT讲稿）第16讲现代循环神经网络（编码器解码器，Seq2seq模型，束搜索）
广东工业大学：《机器学习》课程教学资源（PPT讲稿）第16讲现代循环神经网络（高级循环神经网络）
广东工业大学：《机器学习》课程教学资源（课件讲义）第15讲无监督学习——降维深度学习可视化（Neighbor Embedding，LLE T-SNE）
广东工业大学：《机器学习》课程教学资源（课件讲义）第15讲无监督学习——降维深度学习可视化（PCA Kmeans）
广东工业大学：《机器学习》课程教学资源（课件讲义）第14讲循环神经网络（RNN）
广东工业大学：《机器学习》课程教学资源（课件讲义）第13讲卷积神经网络计算机视觉应用（目标检测，计算机视觉训练技巧）
广东工业大学：《机器学习》课程教学资源（课件讲义）第13讲卷积神经网络计算机视觉应用（Inception, 批量归一化和残差网络ResNet）
广东工业大学：《机器学习》课程教学资源（课件讲义）第19讲 ViT及注意力机制改进（各式各样的Attention）
广东工业大学：《机器学习》课程教学资源（课件讲义）第20讲预训练模型 Pre-training of Deep Bidirectional Transformers for Language Understanding（授课：周郭许）
广东工业大学：《机器学习》课程教学资源（课件讲义）第21讲生成式网络模型（自编码器 Deep Auto-encoder）
广东工业大学：《机器学习》课程教学资源（课件讲义）第21讲生成式网络模型（VAE Generation）
广东工业大学：《机器学习》课程教学资源（课件讲义）第22讲生成式网络模型（Diffusion Model）
广东工业大学：《机器学习》课程教学资源（课件讲义）第22讲生成式网络模型（Stable Diffusion）
北京信息科技大学：计算机学院各专业课程教学大纲汇编
北京信息科技大学：计算中心及图书馆课程教学大纲汇编
新乡学院：数学与统计学院信息与计算科学专业《数学分析Ⅰ》课程教学大纲（2015）
新乡学院：数学与统计学院信息与计算科学专业《数学分析Ⅱ》课程教学大纲（2015）
新乡学院：数学与统计学院信息与计算科学专业《数学分析Ⅲ》课程教学大纲（2015）
新乡学院：数学与统计学院信息与计算科学专业《高等代数Ⅰ》课程教学大纲（2015）

点击购买下载（PDF）

下载及服务说明

购买前请先查看本文档预览页，确认内容后再进行支付；
如遇文件无法下载、无法访问或其它任何问题，可发送电子邮件反馈，核实后将进行文件补发或退款等其它相关操作；
邮箱：

文档浏览记录