DL之InceptionV2/V3:InceptionV2 & InceptionV3算法的简介(论文介绍)、架构详解、案例应用等配图集合之详细攻略
DL之InceptionV2/V3:InceptionV2 & InceptionV3算法的简介(论文介绍)、架构详解、案例应用等配图集合之详细攻略
相关文章
DL之GoogleNet:GoogleNet(InceptionV1)算法的简介(论文介绍)、架构详解、案例应用等配图集合之详细攻略
DL之BN-Inception:BN-Inception算法的简介(论文介绍)、架构详解、案例应用等配图集合之详细攻略
DL之InceptionV2/V3:InceptionV2 & InceptionV3算法的简介(论文介绍)、架构详解、案例应用等配图集合之详细攻略
DL之InceptionV2/V3:InceptionV2 & InceptionV3算法的架构详解
DL之InceptionV4/ResNet:InceptionV4/Inception-ResNet算法的简介(论文介绍)、架构详解、案例应用等配图集合之详细攻略
InceptionV2 & InceptionV3算法的简介(论文介绍)
InceptionV2 & InceptionV3是谷歌研究人员,在InceptionV1和BN-Inception网络模型基础上进行改进的。
摘要
Convolutional networks are at the core of most stateof-the-art computer vision solutions for a wide variety of tasks. Since 2014 very deep convolutional networks started to become mainstream, yielding substantial gains in various benchmarks. Although increased model size and computational cost tend to translate to immediate quality gains for most tasks (as long as enough labeled data is provided for training), computational efficiency and low parameter count are still enabling factors for various use cases such as mobile vision and big-data scenarios. Here we are exploring ways to scale up networks in ways that aim at utilizing the added computation as efficiently as possible by suitably factorized convolutions and aggressive regularization. We benchmark our methods on the ILSVRC 2012 classification challenge validation set demonstrate substantial gains over the state of the art: 21.2% top-1 and 5.6% top-5 error for single frame evaluation using a network with a computational cost of 5 billion multiply-adds per inference and with using less than 25 million parameters. With an ensemble of 4 models and multi-crop evaluation, we report 3.5% top-5 error and 17.3% top-1 error.
卷积网络是最先进的计算机视觉解决方案的核心,可用于各种各样的任务。自2014年以来,非常深的卷积网络开始成为主流,在各种基准中产生了实质性的收益。尽管增加的模型大小和计算成本往往会转化为大多数任务的即时质量收益(只要为训练提供足够的标记数据),计算效率和低参数计数仍然是各种用例(如移动视觉和大数据场景。在这里,我们正在探索扩大网络的方法,旨在通过适当的因子分解卷积和积极的正则化,尽可能有效地利用增加的计算。我们在ILSVRC 2012分类挑战验证集上对我们的方法进行了基准测试,结果表明,与最新技术相比,单帧评估的21.2%Top-1和5.6%Top-5错误显著增加,使用的网络计算成本为每次推理增加50亿,并且使用少于2500万个参数。综合4个模型和多作物评估,我们报告了3.5% top-5 错误 and 17.3% top-1 错误。
结论
We have provided several design principles to scale up convolutional networks and studied them in the context of the Inception architecture. This guidance can lead to high performance vision networks that have a relatively modest computation cost compared to simpler, more monolithic architectures. Our highest quality version of Inception-v3 reaches 21.2%, top-1 and 5.6% top-5 error for single crop evaluation on the ILSVR 2012 classification, setting a new state of the art. This is achieved with relatively modest (2.5×) increase in computational cost compared to the network described in Ioffe et al [7]. Still our solution uses much less computation than the best published results based on denser networks: our model outperforms the results of He et al [6] – cutting the top-5 (top-1) error by 25% (14%) relative, respectively – while being six times cheaper computationally and using at least five times less parameters (estimated). Our ensemble of four Inception-v3 models reaches 3.5% with multi-crop evaluation reaches 3.5% top5 error which represents an over 25% reduction to the best published results and is almost half of the error of ILSVRC 2014 winining GoogLeNet ensemble.
我们提供了几个扩展卷积网络的设计原则,并在初始体系结构的上下文中对它们进行了研究。与更简单、更单一的体系结构相比,这种指导可以导致具有相对较低计算成本的高性能视觉网络。在ILSVR 2012分类的单作物评估中,我们最高质量版本的Inception-v3达到21.2%, top-1 and 5.6% top-5 错误,创造了新的技术水平。与Ioffe等人所述的网络相比,这是通过相对适度(2.5倍)的计算成本增加实现的。尽管如此,我们的解决方案使用的计算量比基于更密集网络的最佳公布结果要少得多:我们的模型比He等人的计算结果要好得多——分别将前5(前1)个错误相对减少了 top-5 (top-1) 错误 by 25% (14%),同时计算成本低了6倍,并且至少使用了参数(估计)减少5倍。我们的四个初始-v3模型的集合达到3.5%,多作物评估达到3.5% top5的错误,这意味着最佳公布结果减少了25%以上,几乎是ILSVRC 2014冠军 GoogLeNet 集合误差的一半。
We have also demonstrated that high quality results can be reached with receptive field resolution as low as 79×79. This might prove to be helpful in systems for detecting relatively small objects. We have studied how factorizing convolutions and aggressive dimension reductions inside neural network can result in networks with relatively low computational cost while maintaining high quality. The combination of lower parameter count and additional regularization with batch-normalized auxiliary classifiers and label-smoothing allows for training high quality networks on relatively modest sized training sets.
我们还证明了接收场分辨率低至79×79可以获得高质量的结果。这可能被证明有助于系统检测相对较小的物体。我们研究了在保持高质量的同时,神经网络中的因子分解卷积和积极的降维是如何产生计算成本相对较低的网络的。将较低的参数计数和额外的正则化与批标准化辅助分类器和标签平滑相结合,可以在相对较小的训练集上训练高质量的网络。
论文
Christian Szegedy, Vincent Vanhoucke, Sergey Ioffe, Jonathon Shlens, Zbigniew Wojna.
Rethinking the Inception Architecture for Computer Vision
https://arxiv.org/abs/1512.00567
InceptionV2 & InceptionV3算法的架构详解
DL之InceptionV2/V3:InceptionV2 & InceptionV3算法的架构详解
InceptionV2 & InceptionV3算法的案例应用
TF之DD:实现输出Inception模型内的某个卷积层或者所有卷积层的形状
TF之DD:利用Inception模型+GD算法生成原始的Deep Dream图片
TF之DD:利用Inception模型+GD算法生成更大尺寸的Deep Dream精美图片
TF之DD:利用Inception模型+GD算法生成更高质量的Deep Dream高质量图片
TF之DD:利用Inception模型+GD算法——五个架构设计思路
TF之DD:利用Inception模型+GD算法生成带背景的大尺寸、高质量的Deep Dream图片