Skip Some Calculations with Human Knowledge Can we fill in some values with human knowledge?
Skip Some Calculations with Human Knowledge Can we fill in some values with human knowledge?
Local Attention Truncated Attention Set to 0 Similar with CNN Calculate attention key weight
Local Attention / Truncated Attention Calculate attention weight Set to 0 …… Similar with CNN key query
Stride Attention
Stride Attention … …
Global Attention special token="token中的里長伯" Add special token into original sequence Attend to every token-collect global information Attended by every token->it knows global information No attention between non- special token
… … Global Attention Add special token into original sequence • Attend to every token → collect global information • Attended by every token → it knows global information special token = “token中的里長伯“ No attention between nonspecial token
Many Different Choices .. 小孩子才做摆擇··。 Different heads use different patterns
Many Different Choices … Different heads use different patterns. 小孩子才做選擇...