【连载17】GoogLeNet Inception V2
使用较大的学习率而不用特别关心诸如梯度爆炸或消失等优化问题; 降低了模型效果对初始权重的依赖; 可以加速收敛,一定程度上可以不使用Dropout这种降低收敛速度的方法,但却起到了正则化作用提高了模型泛化性; 即使不使用ReLU也能缓解激活函数饱和问题; 能够学习到从当前层到下一层的分布缩放( scaling (方差),shift (期望))系数。
一些思考
假设:为样本标注,为样本x通过神经网络若干层后每层的输入;
理论上:的联合概率分布应该与集合中任意一层输入的联合概率分布一致,如:;
但是:,其中条件概率p(y|x)是一致的,即,但由于神经网络每一层对输入分布的改变,导致边缘概率是不一致的,即,甚至随着网络深度的加深,前面层微小的变化会导致后面层巨大的变化。
BN原理
以batch的方式做训练,对m个样本求期望和方差后对训练数据做白化,通过白化操作可以去除特征相关性并把数据缩放在一个球体上,这么做的好处既可以加快优化算法的优化速度也可能提高优化精度,一个直观的解释: 左边是未做白化的原始可行域,右边是做了白化的可行域; 当原始输入对模型学习更有利时能够恢复原始输入(和残差网络有点神似):
另外当采用较大的学习率时,传统方法会由于激活函数饱和区的存在导致反向传播时梯度出现爆炸或消失,但采用BN后,参数的尺度变化不影响梯度的反向传播,可以证明:
卷积神经网络中的BN
代码实践
import copy
import numpy as np
import pandas as pd
import matplotlib
matplotlib.use("Agg")
import matplotlib.pyplot as plt
from matplotlib.pyplot import plot,savefig
from keras.datasets import mnist, cifar10
from keras.models import Sequential
from keras.layers.core import Dense, Dropout, Activation, Flatten, Reshape
from keras.optimizers import SGD, RMSprop
from keras.utils import np_utils
from keras.regularizers import l2
from keras.layers.convolutional import Convolution2D, MaxPooling2D, ZeroPadding2D, AveragePooling2D
from keras.callbacks import EarlyStopping
from keras.preprocessing.image import ImageDataGenerator
from keras.layers.normalization import BatchNormalization
import tensorflow as tf
tf.python.control_flow_ops = tf
from PIL import Image
def build_LeNet5():
model = Sequential()
model.add(Convolution2D(96, 11, 11, border_mode='same', input_shape = (32, 32, 3), dim_ordering='tf'))
#注释1 model.add(BatchNormalization())
model.add(MaxPooling2D(pool_size=(2, 2)))
#注释2 model.add(BatchNormalization())
model.add(Activation("tanh"))
model.add(Convolution2D(120, 1, 1, border_mode='valid'))
#注释3 model.add(BatchNormalization())
model.add(Flatten())
model.add(Dense(10))
model.add(BatchNormalization())
model.add(Activation("relu"))
#注释4 model.add(Dense(10))
model.add(Activation('softmax'))
return model
if __name__=="__main__":
from keras.utils.vis_utils import plot_model
model = build_LeNet5()
model.summary()
plot_model(model, to_file="LeNet-5.png", show_shapes=True)
(X_train, y_train), (X_test, y_test) = cifar10.load_data()#mnist.load_data()
X_train = X_train.reshape(X_train.shape[0], 32, 32, 3).astype('float32') / 255
X_test = X_test.reshape(X_test.shape[0], 32, 32, 3).astype('float32') / 255
Y_train = np_utils.to_categorical(y_train, 10)
Y_test = np_utils.to_categorical(y_test, 10)
# this will do preprocessing and realtime data augmentation
datagen = ImageDataGenerator(
featurewise_center=False, # set input mean to 0 over the dataset
samplewise_center=False, # set each sample mean to 0
featurewise_std_normalization=False, # divide inputs by std of the dataset
samplewise_std_normalization=False, # divide each input by its std
zca_whitening=False, # apply ZCA whitening
rotation_range=25, # randomly rotate images in the range (degrees, 0 to 180)
width_shift_range=0.1, # randomly shift images horizontally (fraction of total width)
height_shift_range=0.1, # randomly shift images vertically (fraction of total height)
horizontal_flip=False, # randomly flip images
vertical_flip=False) # randomly flip images
datagen.fit(X_train)
# training
model.compile(loss='categorical_crossentropy',
optimizer='adadelta',
metrics=['accuracy'])
batch_size = 32
nb_epoch = 8
model.fit(X_train, Y_train, batch_size=batch_size, nb_epoch=nb_epoch,
verbose=1, validation_data=(X_test, Y_test))
score = model.evaluate(X_test, Y_test, verbose=0)
print('Test score:', score[0])
print('Test accuracy:', score[1])
第一组:放开所有注释 第二组:放开注释4 第三组:注释掉所有BN
2.机器学习原来这么有趣!【第二章】:用机器学习制作超级马里奥的关卡
记得把公号加星标,会第一时间收到通知。
创作不易,如果觉得有点用,希望可以随手转发或者”在看“,拜谢各位老铁
赞 (0)