还在用全部token训练ViT?清华&UCLA提出token的动态稀疏化采样,降低inference时的计算量
论文链接:https://arxiv.org/abs/2106.02034
项目链接:https://github.com/raoyongming/DynamicViT
01
02
2.1 Overview
2.2 Hierarchical Token Sparsification with Prediction Modules
2.3 End-to-end Optimization with Attention Masking
2.4 Training and Inference
03
3.1 Main results
3.2 Comparisons with the-state-of-the-arts
3.3 Analysis
DynamicViT for model scaling
Visualizations
Comparisons of different sparsification strategy
04
作者介绍
研究领域:FightingCV公众号运营者,研究方向为多模态内容理解,专注于解决视觉模态和语言模态相结合的任务,促进Vision-Language模型的实地应用。
知乎/公众号:FightingCV
END,入群👇备注:TFM
赞 (0)