NAS+CNN+Transformer=ViT-Res!MIT团队重磅开源ViT-Res,精度高于DeiT-Ti8.6%
详细信息如下:

论文链接:https://arxiv.org/abs/2109.00642
项目链接:https://github.com/yilunliao/vit-search

01


02
2.1 Background on Vision Transformer
Tokenization
Position Embedding
MHSA

FFN
LN

2.2 Residual Spatial Reduction


2.3 Weight-Sharing NAS with Multi-Architectural Sampling

Algorithm Overview
Search Space
Multi-Architectural Sampling for Super-Network Training



Evolutionary Search
2.4 Extra Techniques
Token Labeling with CutMix and Mixup
Convolution before Tokenization
03
3.1 Ablation Study
Multi-Stage Network with Residual Connection and Token Labeling

Weight-Sharing NAS with Multi-Architectural Sampling

3.2 Comparison with Related Works

04
作者介绍
研究领域:FightingCV公众号运营者,研究方向为多模态内容理解,专注于解决视觉模态和语言模态相结合的任务,促进Vision-Language模型的实地应用。
知乎/公众号:FightingCV
END,入群👇备注:NAS
赞 (0)
