【连载16】GoogLeNet Inception V1
一些思考
更深更宽的网络意味着更多的参数,从而大大增加过拟合的风险,尤其在训练数据不是那么多或者某个label训练数据不足的情况下更容易发生; 增加计算资源的消耗,实际情况下,不管是因为数据稀疏还是扩充的网络结构利用不充分(比如很多权重接近0),都会导致大量计算的浪费。
简单解释下稀疏性,当整个特征空间是非线性甚至不连续时:
学好局部空间的特征集更能提升性能,类似于Maxout网络中使用多个局部线性函数的组合来拟合非线性函数的思想; 假设整个特征空间由N个不连续局部特征空间集合组成,任意一个样本会被映射到这N个空间中并激活/不激活相应特征维度,如果用C1表示某类样本被激活的特征维度集合,用C2表示另一类样本的特征维度集合,当数据量不够大时,要想增加特征区分度并很好的区分两类样本,就要降低C1和C2的重合度(比如可用Jaccard距离衡量),即缩小C1和C2的大小,意味着相应的特征维度集会变稀疏。
把不同大小卷积核抽象得到的特征空间看做子特征空间,每个子特征空间都是稀疏的,把这些不同尺度特征做融合,相当于得到一个相对稠密的空间; 采用1×1、3×3、5×5卷积核(不是必须的,也可以是其他大小),stride取1,利用padding可以方便的做输出特征维度对齐; 大量事实表明pooling层能有效提高卷积网络的效果,所以加了一条max pooling路径; 这个结构符合直观理解,视觉信息通过不同尺度的变换被聚合起来作为下一阶段的特征,比如:人的高矮、胖瘦、青老信息被聚合后做下一步判断。
直接经过96个5×5卷积层(stride=1,padding=2)后,输出为:28×28×96,卷积层参数量为:192×5×5×96=460800; 借鉴NIN网络,在5×5卷积前使用32个1×1卷积核做维度缩减,变成28×28×32,之后经过96个5×5卷积层(stride=1,padding=2)后,输出为:28×28×96,但所有卷积层的参数量为:192×1×1×32+32×5×5×96=82944,可见整个参数量是原来的1/5.5,且效果上没有多少损失。
新网络结构为:
GoogLeNet结构
所有卷积层均使用ReLU激活函数,包括做了1×1卷积降维后的激活; 移除全连接层,像NIN一样使用Global Average Pooling,使得Top 1准确率提高0.6%,但由于GAP与类别数目有关系,为了方便大家做模型fine-tuning,最后加了一个全连接层; 与前面的ResNet类似,实验观察到,相对浅层的神经网络层对模型效果有较大的贡献,训练阶段通过对Inception(4a、4d)增加两个额外的分类器来增强反向传播时的梯度信号,但最重要的还是正则化作用,这一点在GoogLeNet v3中得到实验证实,并间接证实了GoogLeNet V2中BN的正则化作用,这两个分类器的loss会以0.3的权重加在整体loss上,在模型inference阶段,这两个分类器会被去掉; 用于降维的1×1卷积核个数为128个; 全连接层使用1024个神经元; 使用丢弃概率为0.7的Dropout层;
输入数据为224×224×3的RGB图像,图中"S"代表做same-padding,"V"代表不做。
C1卷积层:64个7×7卷积核(stride=2,padding=3),输出为:112×112×64; P1抽样层:64个3×3卷积核(stride=2),输出为56×56×64,其中:56=(112-3+1)/2+1 C2卷积层:192个3×3卷积核(stride=1,padding=1),输出为:56×56×192; P2抽样层:192个3×3卷积核(stride=2),输出为28×28×192,其中:28=(56-3+1)/2+1,接着数据被分出4个分支,进入Inception (3a) Inception (3a):由4部分组成 64个1×1的卷积核,输出为28×28×64; 96个1×1的卷积核做降维,输出为28×28×96,之后128个3×3卷积核(stride=1,padding=1),输出为:28×28×128 16个1×1的卷积核做降维,输出为28×28×16,之后32个5×5卷积核(stride=1,padding=2),输出为:28×28×32 192个3×3卷积核(stride=1,padding=1),输出为28×28×192,进行32个1×1卷积核,输出为:28×28×32
最后对4个分支的输出做“深度”方向组合,得到输出28×28×256,接着数据被分出4个分支,进入Inception (3b);Inception (3b):由4部分组成 128个1×1的卷积核,输出为28×28×128; 128个1×1的卷积核做降维,输出为28×28×128,进行192个3×3卷积核(stride=1,padding=1),输出为:28×28×192 32个1×1的卷积核做降维,输出为28×28×32,进行96个5×5卷积核(stride=1,padding=2),输出为:28×28×96 256个3×3卷积核(stride=1,padding=1),输出为28×28×256,进行64个1×1卷积核,输出为:28×28×64
最后对4个分支的输出做“深度”方向组合,得到输出28×28×480;
后面结构以此类推。
代码实践
googlenet_inception_v1.py # -*- coding: utf-8 -*-
from keras.layers import Input, Conv2D, Dense, MaxPooling2D, AveragePooling2D
from keras.layers import Dropout, Flatten, merge, ZeroPadding2D, Reshape, Activation
from keras.models import Model
from keras.regularizers import l1_l2
import tensorflow as tf
import googlenet_custom_layers
def inception_module(name,
input_layer,
num_c_1x1,
num_c_1x1_3x3_reduce,
num_c_3x3,
num_c_1x1_5x5_reduce,
num_p_5x5,
num_c_1x1_reduce):
inception_1x1 = Conv2D(name=name+"/inception_1x1",
filters=num_c_1x1,
kernel_size=(1, 1),
strides=(1, 1),
padding='same',
kernel_initializer='he_normal',
activation='relu',
kernel_regularizer=l1_l2(0.0001))(input_layer)
inception_3x3_reduce = Conv2D(name=name+"/inception_3x3_reduce",
filters=num_c_1x1_3x3_reduce,
kernel_size=(1, 1),
strides=(1, 1),
padding='same',
kernel_initializer='he_normal',
activation='relu',
kernel_regularizer=l1_l2(0.0001))(input_layer)
inception_3x3 = Conv2D(name=name+"/inception_3x3",
filters=num_c_3x3,
kernel_size=(3, 3),
strides=(1, 1),
padding='same',
kernel_initializer='he_normal',
activation='relu',
kernel_regularizer=l1_l2(0.0001))(inception_3x3_reduce)
inception_5x5_reduce = Conv2D(name=name+"/inception_5x5_reduce",
filters=num_c_1x1_5x5_reduce,
kernel_size=(1, 1),
strides=(1, 1),
padding='same',
kernel_initializer='he_normal',
activation='relu',
kernel_regularizer=l1_l2(0.0001))(input_layer)
inception_5x5 = Conv2D(name=name+"/inception_5x5",
filters=num_p_5x5,
kernel_size=(5, 5),
strides=(1, 1),
padding='same',
kernel_initializer='he_normal',
activation='relu',
kernel_regularizer=l1_l2(0.0001))(inception_5x5_reduce)
inception_max_pool = MaxPooling2D(name=name+"/inception_max_pool",
pool_size=(3, 3),
strides=(1, 1),
padding="same")(input_layer)
inception_max_pool_proj = Conv2D(name=name+"/inception_max_pool_project",
filters=num_c_1x1_reduce,
kernel_size=(1, 1),
strides=(1, 1),
padding='same',
kernel_initializer='he_normal',
activation='relu',
kernel_regularizer=l1_l2(0.0001))(inception_max_pool)
print (inception_1x1.get_shape(), inception_3x3.get_shape(), inception_5x5.get_shape(), inception_max_pool_proj.get_shape())
# inception_output = tf.concat(3, [inception_1x1, inception_3x3, inception_5x5, inception_max_pool_proj])
from keras.layers.merge import concatenate
#注意,由于变态的tensorflow更改了concat函数的参数顺序,需要注意自己的tf和keras版本
#适时的将/usr/lib/python×××/site-packages/keras/backend/tensorflow_backend.py的1554行的代码由
#return tf.concat([to_dense(x) for x in tensors], axis) 改为:
#return tf.concat(axis, [to_dense(x) for x in tensors])
inception_output = concatenate([inception_1x1, inception_3x3, inception_5x5, inception_max_pool_proj])
return inception_output
def googLeNet_inception_v1_building(input_shape, output_num, fine_tune=None):
input_layer = Input(shape=input_shape)
# 第一层,卷积层
conv1_7x7 = Conv2D(name="conv1_7x7/2",
filters=64,
kernel_size=(7, 7),
strides=(2, 2),
padding='same',
kernel_initializer='he_normal',
activation='relu',
kernel_regularizer=l1_l2(0.0001))(input_layer)
conv1_zero_pad = ZeroPadding2D(padding=(1, 1))(conv1_7x7)
# 第二层,max pooling层
pool1_3x3 = MaxPooling2D(name="max_pool1_3x3/2",
pool_size=(3, 3),
strides=(2, 2),
padding='valid')(conv1_zero_pad)
# 第二层,LRN规范化
#pool1_norm1 = tf.nn.lrn(pool1_3x3, 4, bias=1.0, alpha=0.001 / 9.0, beta=0.75, name='ax_pool1_3x3/norm1')
pool1_norm1 = googlenet_custom_layers.LRN2D(name='max_pool1_3x3/norm1')(pool1_3x3)
# 第四层,卷积层降维
conv2_3x3_reduce = Conv2D(name="conv2_3x3_reduce/1",
filters=64,
kernel_size=(1, 1),
padding='same',
kernel_initializer='he_normal',
activation='relu',
kernel_regularizer=l1_l2(0.0001))(pool1_norm1)
# 第五层,卷积层
conv2_3x3 = Conv2D(name="conv2_3x3/1",
filters=192,
kernel_size=(3, 3),
padding='same',
kernel_initializer='he_normal',
activation='relu',
kernel_regularizer=l1_l2(0.0001))(conv2_3x3_reduce)
# 第六层,LRN规范化
#conv2_norm2 = tf.nn.lrn(conv2_3x3, 4, bias=1.0, alpha=0.001 / 9.0, beta=0.75, name='conv2_3x3/norm2')
conv2_norm2 = googlenet_custom_layers.LRN2D(name='conv2_3x3/norm2')(conv2_3x3)
conv2_zero_pad = ZeroPadding2D(padding=(1, 1))(conv2_norm2)
# 第七层,max pooling层
pool2_3x3 = MaxPooling2D(name="max_pool2_3x3",
pool_size=(3, 3),
strides=(2, 2),
padding='valid')(conv2_zero_pad)
# 第八层,inception 3a
inception_3a = inception_module("inception_3a",pool2_3x3, 64, 96, 128, 16, 32, 32)
# 第九层,inception 3b
inception_3b = inception_module("inception_3b",inception_3a, 128, 128, 192, 32, 96, 64)
inception_3b_zero_pad = ZeroPadding2D(padding=(1, 1))(inception_3b)
# 第十层,max pooling层
pool3_3x3 = MaxPooling2D(name="max_pool3_3x3/2",
pool_size=(3, 3),
strides=(2, 2),
padding='valid')(inception_3b_zero_pad)
# 第十一层,inception 4a
inception_4a = inception_module("inception_4a",pool3_3x3, 192, 96, 208, 16, 48, 64)
# 第十二层,分支loss1
loss1_ave_pool = AveragePooling2D(name="loss1/ave_pool",
pool_size=(5, 5),
strides=(3, 3))(inception_4a)
loss1_conv = Conv2D(name="loss1/conv",
filters=128,
kernel_size=(1, 1),
padding='same',
kernel_initializer='he_normal',
activation='relu',
kernel_regularizer=l1_l2(0.0001))(loss1_ave_pool)
loss1_flat = Flatten()(loss1_conv)
loss1_fc = Dense(1024,
activation='relu',
name="loss1/fc",
kernel_regularizer=l1_l2(0.0001))(loss1_flat)
loss1_drop_fc = Dropout(0.7)(loss1_fc)
loss1_classifier = Dense(output_num,
name="loss1/classifier",
kernel_regularizer=l1_l2(0.0001))(loss1_drop_fc)
loss1_classifier_act = Activation('softmax')(loss1_classifier)
# 第十二层,inception_4b
inception_4b = inception_module("inception_4b",inception_4a, 160, 112, 224, 24, 64, 64)
# 第十三层,inception_4c
inception_4c = inception_module("inception_4c",inception_4b, 128, 128, 256, 24, 64, 64)
# 第十四层,inception_4c
inception_4d = inception_module("inception_4d",inception_4c, 112, 144, 288, 32, 64, 64)
# 第十五层,分支loss2
loss2_ave_pool = AveragePooling2D(pool_size=(5, 5),
strides=(3, 3),
name='loss2/ave_pool')(inception_4d)
loss2_conv = Conv2D(name="loss2/conv",
filters=128,
kernel_size=(1, 1),
padding='same',
kernel_initializer='he_normal',
activation='relu',
kernel_regularizer=l1_l2(0.0001))(loss2_ave_pool)
loss2_flat = Flatten()(loss2_conv)
loss2_fc = Dense(1024,
activation='relu',
name="loss2/fc",
kernel_regularizer=l1_l2(0.0001))(loss2_flat)
loss2_drop_fc = Dropout(0.7)(loss2_fc)
loss2_classifier = Dense(output_num,
name="loss2/classifier",
kernel_regularizer=l1_l2(0.0001))(loss2_drop_fc)
loss2_classifier_act = Activation('softmax')(loss2_classifier)
# 第十五层,inception_4e
inception_4e = inception_module("inception_4e",inception_4d, 256, 160, 320, 32, 128, 128)
inception_4e_zero_pad = ZeroPadding2D(padding=(1, 1))(inception_4e)
# 第十六层,max pooling层
pool4_3x3 = MaxPooling2D(name="max_pool4_3x3",
pool_size=(3, 3),
strides=(2, 2),
padding='valid')(inception_4e_zero_pad)
# 第十七层,inception_5a
inception_5a = inception_module("inception_5a",pool4_3x3, 256, 160, 320, 32, 128, 128)
# 第十八层,inception_5b
inception_5b = inception_module("inception_5b",inception_5a, 384, 192, 384, 48, 128, 128)
# 第十九层,average pooling层
pool5_7x7 = AveragePooling2D(name="ave_pool5_7x7",
pool_size=(7, 7),
strides=(1, 1))(inception_5b)
loss3_flat = Flatten()(pool5_7x7)
pool5_drop_7x7 = Dropout(0.4)(loss3_flat)
# 第二十层,全连接层
loss3_classifier = Dense(output_num,
name="loss3/classifier",
kernel_regularizer=l1_l2(0.0001))(pool5_drop_7x7)
loss3_classifier_act = Activation('softmax')(loss3_classifier)
googlenet_inception_v1 = Model(name="googlenet_inception_v1",
input=input_layer,
output=[loss1_classifier_act, loss2_classifier_act, loss3_classifier_act])
if fine_tune:
googlenet_inception_v1.load_weights(fine_tune)
return googlenet_inception_v1googlenet_custom_layers.py from keras.layers.core import Layer
import keras.backend as K
class LRN2D(Layer):
"""
This code is adapted from pylearn2.
License at: https://github.com/lisa-lab/pylearn2/blob/master/LICENSE.txt
"""
def __init__(self, alpha=1e-4, k=2, beta=0.75, n=5, **kwargs):
if n % 2 == 0:
raise NotImplementedError("LRN2D only works with odd n. n provided: " + str(n))
super(LRN2D, self).__init__(**kwargs)
self.alpha = alpha
self.k = k
self.beta = beta
self.n = n
def get_output(self, train):
X = self.get_input(train)
b, ch, r, c = K.shape(X)
half_n = self.n // 2
input_sqr = K.square(X)
extra_channels = K.zeros((b, ch + 2 * half_n, r, c))
input_sqr = K.concatenate([extra_channels[:, :half_n, :, :],
input_sqr,
extra_channels[:, half_n + ch:, :, :]],
axis=1)
scale = self.k
for i in range(self.n):
scale += self.alpha * input_sqr[:, i:i + ch, :, :]
scale = scale ** self.beta
return X / scale
def get_config(self):
config = {"name": self.__class__.__name__,
"alpha": self.alpha,
"k": self.k,
"beta": self.beta,
"n": self.n}
base_config = super(LRN2D, self).get_config()
return dict(list(base_config.items()) + list(config.items()))
class PoolHelper(Layer):
def __init__(self, **kwargs):
super(PoolHelper, self).__init__(**kwargs)
def call(self, x, mask=None):
return x[:, :, 1:, 1:]
def get_config(self):
config = {}
base_config = super(PoolHelper, self).get_config()
return dict(list(base_config.items()) + list(config.items()))googlenet_inception_v1-cifar10.py # -*- coding: utf-8 -*-
import numpy as np
import matplotlib
matplotlib.use("Agg")
import matplotlib.pyplot as plt
import os
from scipy.misc import toimage
from keras.datasets import cifar10
from keras.utils import np_utils
from keras.preprocessing.image import ImageDataGenerator
from keras.callbacks import ModelCheckpoint
from keras import backend as K
import tensorflow as tf
tf.python.control_flow_ops = tf
from keras.callbacks import ReduceLROnPlateau, CSVLogger, EarlyStopping
lr_reducer = ReduceLROnPlateau(monitor='val_loss', factor=np.sqrt(0.5), cooldown=0, patience=3, min_lr=1e-6)
early_stopper = EarlyStopping(monitor='val_acc', min_delta=0.0005, patience=15)
csv_logger = CSVLogger('resnet34_cifar10.csv')
import os
import googlenet_inception_v1
if __name__ == "__main__":
from keras.utils.vis_utils import plot_model
with tf.device('/gpu:4'):
gpu_options = tf.GPUOptions(per_process_gpu_memory_fraction=1, allow_growth=True)
os.environ["CUDA_VISIBLE_DEVICES"] = "4"
tf.Session(config=K.tf.ConfigProto(allow_soft_placement=True,
log_device_placement=True,
gpu_options=gpu_options))
(X_train, y_train), (X_test, y_test) = cifar10.load_data()
# 定义输入数据并做归一化
dim = 32
channel = 3
class_num = 10
X_train = X_train.reshape(X_train.shape[0], dim, dim, channel).astype('float32') / 255
X_test = X_test.reshape(X_test.shape[0], dim, dim, channel).astype('float32') / 255
Y_train = np_utils.to_categorical(y_train, class_num)
Y_test = np_utils.to_categorical(y_test, class_num)
# this will do preprocessing and realtime data augmentation
datagen = ImageDataGenerator(
featurewise_center=False, # set input mean to 0 over the dataset
samplewise_center=False, # set each sample mean to 0
featurewise_std_normalization=False, # divide inputs by std of the dataset
samplewise_std_normalization=False, # divide each input by its std
zca_whitening=False, # apply ZCA whitening
rotation_range=25, # randomly rotate images in the range (degrees, 0 to 180)
width_shift_range=0.1, # randomly shift images horizontally (fraction of total width)
height_shift_range=0.1, # randomly shift images vertically (fraction of total height)
horizontal_flip=True, # randomly flip images
vertical_flip=False) # randomly flip images
datagen.fit(X_train)
s = X_train.shape[1:]
print(s)
model = googlenet_inception_v1.googLeNet_inception_v1_building(s,class_num)
model.summary()
#import pdb
#pdb.set_trace()
plot_model(model, to_file="GoogLeNet-Inception-V1.jpg", show_shapes=True)
model.compile(loss='categorical_crossentropy',
optimizer='adadelta',
metrics=['accuracy'])
batch_size = 32
nb_epoch = 100
# import pdb
# pdb.set_trace()
ModelCheckpoint("weights-improvement-{epoch:02d}-{val_acc:.2f}.hdf5", monitor='val_loss', verbose=0,
save_best_only=False, save_weights_only=False, mode='auto')
for e in range(nb_epoch):
batches = 0
for X_batch, Y_batch in datagen.flow(X_train, Y_train, batch_size=64):
loss = model.train_on_batch(X_batch, [Y_batch,Y_batch,Y_batch]) # note the three outputs
print loss
#print '\r\n'
#loss_and_metrics = model.evaluate(X_test, [Y_test,Y_test,Y_test], batch_size=128)
#model.fit(X_test, [Y_test,Y_test,Y_test], batch_size=64)
batches += 1
if batches >= len(X_train) / 64:
# we need to break the loop by hand because
# the generator loops indefinitely
break
score = model.evaluate(X_test, Y_test, verbose=0)
print('Test score:', score[0])
print('Test accuracy:', score[1])
2.机器学习原来这么有趣!【第二章】:用机器学习制作超级马里奥的关卡
记得把公号加星标,会第一时间收到通知。
创作不易,如果觉得有点用,希望可以随手转发或者”在看“,拜谢各位老铁
赞 (0)