介绍一篇关于点云的深度学习的文章-PointNet

PointNet: Deep Learning on PointSets for 3D Classification and SegmentationCharles R. Qi*  Hao Su* Kaichun Mo  Leonidas J. GuibasStanfordUniversity[arXiv version] [Code on GitHub]

Applications of PointNet. We propose anovel deep net architecture that consumes raw point cloud (set of points)without voxelization or rendering. It is a unified architecture that learnsboth global and local point features, providing a simple, efficient andeffective approach for a number of 3D recognition tasks.PointNet的应用:提出了一种新的深度学习的架构用于处理原始点云,而不用处理体素网格化或渲染后的点云。它是一个完整的体系结构,可以用来学习点云的全局和本地点云特征,给三维识别任务提供了一种简单、有效且高效的方法。AbstractPoint cloud is an important type ofgeometric data structure. Due to its irregular format, most researcherstransform such data to regular 3D voxel grids or collections of images. This,however, renders data unnecessarily voluminous and causes issues. In thispaper, we design a novel type of neural network that directly consumes pointclouds, which well respects the permutation invariance of points in the input.Our network, named PointNet, provides a unified architecture for applicationsranging from object classification, part segmentation, to scene semanticparsing. Though simple, PointNet is highly efficient and effective.Empirically, it shows strong performance on par or even better than state ofthe art. Theoretically, we provide analysis towards understanding of what thenetwork has learnt and why the network is robust with respect to inputperturbation and corruption.点云是一种重要的几何数据结构类型。由于其数据格式不规则,大多数研究人员将这些数据转换成规则的三维体素网格或图像集合。然而,这使得数据不必要地变得庞大,并导致一些问题。文中设计了一种直接处理点云的神经网络,它很好地体现了输入点云的序列不变性。命名为PointNet,从对象分类,部分分割、场景的语义分析等方面提供了一个完整的的体系结构。虽然简单,但是PointNet网络是高效且有效的。从经验上看,它表现出很强的PAR水平,甚至优于state of the art。从理论上讲,……(反正就很厉害,鲁棒性很强呗,我也就是学习一下,看一下,因为实验实验条件有限,并没有跑代码)翻译是水平有限,有大牛带我飞吗?!!!!IntroductionIn this paper we explore deep learningarchitectures capable of reasoning about 3D geometric data such as point cloudsor meshes. Typical convolutional architectures require highly regular inputdata formats, like those of image grids or 3D voxels, in order to performweight sharing and other kernel optimizations. Since point clouds or meshes arenot in a regular format, most researchers typically transform such data toregular 3D voxel grids or collections of images (e.g, views) before feedingthem to a deep net architecture. This data representation transformation,however, renders the resulting data unnecessarily voluminous — while alsointroducing quantization artifacts that can obscure natural invariances of thedata.For this reason we focus on a different input representation for 3Dgeometry using simply point clouds and name our resulting deep nets PointNetsPoint clouds are simple and unified structures that avoid the combinatorialirregularities and complexities of meshes,and thus are easier to learn from.The PointNet, however,still  has  to respect  the  fact that  a  point cloud  is  just  aset of points and therefore invariant topermutations of itsmembers, necessitating certain symmetrizations in the netcomputation.Further invariances to rigid motions also needto be considered.在本文中,我们探索深层的学习网络架构,可以学习和理解的三维几何数据的点云或网格。我们知道典型的卷积结构需要高度规则的输入数据格式,如图像网格或三维体素,目的是为了什么优化内核(水平有限,只是感兴趣,请见谅)。由于点云或网格不是一种规则格式,大多数研究人员通常将这些数据转换成规则的三维体素网格或图像集合,然后将它们提供给深层网络。这一数据结构的转变为规则的网格点云,使不必要的数据的引入,使得数据量变大,导致了什么什么( you know)。因为这个原因,对于深度学习领域研究不同的输入的三维几何表示的点云,就得到了我们的深度学习网络PointNet,输入的点云是简单而统一的结构,避免了组合的不规则性和复杂性的网格,从而更容易学习。然而……(翻译水平有限)PointNet ArchitectureTo deal with unordered input set, key to ourapproach is the use of a single symmetricfunction, max pooling. Effectively the network learns a set of optimizationfunctions/criteria that select interesting or informative points of the pointcloud and encode the reason for their selection. The final fully connectedlayers of the network aggregate these learnt optimal values into the globaldescriptor for the entire shape as mentioned above (shape classification) orare used to predict per point labels (shape segmentation). Our input format iseasy to apply rigid or affine transformations to, as each point transformsindependently. Thus we can add a data-dependent spatial transformer networkthat attempts to canonicalize the data before the PointNet processes them, soas to further improve the results.为了处理无序的输入集,我们的方法的关键是使用单一的对称函数-max pooling。实际上,是深度网络学习了一组优化函数或者标准,它们选择点云的角点或信息点,并编码它们为什么选择的原因。网络的最终是将这些学习到的最佳值聚集到上面描述的整个形状的全局描述符(形状分类)或用于预测每个点云标签(形状分割)(实在翻译能力有限)。输入的点云格式很容易应用刚性或仿射变换,……………………(后面真我自己意会了,能力有限,只是让大家知道这篇文章的存在,有兴趣的可以多学习学习,这是深度学习与三维结合的案例,以后可能会越来越多吧)

Figure 2. PointNet architecture. The classification network takes n points as input, applies input andfeature transformations, and then aggregates point features by max pooling. Theoutput is classification score for mclasses. The segmentation network is an extension to the classification net. Itconcatenates global and local features and outputs per point scores. mlp stands for multi-layer perceptron,the numbers in bracket are its layer sizes. Batchnorm is used for all layerswith ReLU. Dropout layers are used for the last mlp in classification net.图2,PointNet架构:分类网络以N个点作为输入,应用输入和特征转换,然后通过max pooling 聚合点特征。输出是M类分类评分。分割网络的分类网络的延伸。它将全局和局部特征和每一点分数输出。MLP是多层感知器,括号内为其层尺寸。batchnorm用于所有图层ReLU。降层是用于分类网络最后的MLP。Object Part Segmentation Results

Figure  3. Part Segmentation Results. We  visualize the CAD part segmentation results across all 16 object categories.  We show both results for partial simulated Kinect scans (left block) and  complete ShapeNet CAD models (right block).图3.局部分割结果。我们可视化CAD零件在所有16个对象类别的分割结果。显示结果部分模拟Kinect扫描(左边)和完整的shapenet CAD模型(右边)。Semantic Segmentation Results

Figure 4. Semantic Segmentation Results. Top row is input point cloud withcolor. Bottom row is output semantic segmentation result (on points) displayedin the same camera viewpoint as input.图4.语义分割结果。顶行是带颜色的输入点云。底部行是输出的语义分割结果,显示在同一相机的视点作为输入。Visualizing What PointNet has Learnt

Figure 5. Point function visualization. Our network learns a collection ofpoint function that selects representative/critical points from an input pointcloud. Here, we randomly pick 15 point functions from the 1024 functions in ourmodel and visualize the activation regions for them.图5.点函数可视化。我们的深度网络学习点函数的集合,从输入点云选择代表/临界点。在这里,我们从模型中的1024个函数中随机挑选15点函数,并可视化它们的激活区域。

Figure  6. Visualizing Critical Points and  Shape Upper-bound. The first row shows the input point clouds. The second  row show the critical points picked  by our PointNet. The third row shows the upper-bound shape for the input --  any input point sets that falls between the critical point set and the  upper-bound set will result in the same classification result.图6.可视化临界点和形状上限。第一行显示输入点云。第二行显示的临界点是PointNet选择的结果。第三行显示输入的上限形状——任何介于临界点集和上限集之间的输入点集都会产生相同的分类结果。偶尔看到这篇文章,觉得这是深度学习与三维点云结合的案例,相信这是未来的趋势吧,毕竟深度学习那么火!不过还是很抱歉各位,本人英语水平有限,能力也有限,只是想分享给大家,顺便自己学习一下,不要喷就好了,谢谢

(0)

相关推荐