3. Tensorflow 中的深度神经网络
到目前为止,我们已经看到了LeNet5 CNN架构。 LeNet5包含两个卷积层,紧接着的是完全连接的层,因此可以称为浅层神经网络。那时候(1998年),GPU还没有被用来进行计算,而且CPU的功能也没有那么强大,所以,在当时,两个卷积层已经算是相当具有创新意义了。
后来,很多其他类型的卷积神经网络被设计出来,你可以在查看详细信息。
比如,由Alex Krizhevsky开发的非常有名的AlexNet 架构(2012年),7层的ZF Net (2013),以及16层的 VGGNet (2014)。
在2015年,Google发布了一个包含初始模块的22层的CNN(GoogLeNet),而微软亚洲研究院构建了一个152层的CNN,被称为ResNet。
现在,根据我们目前已经学到的知识,我们来看一下如何在Tensorflow中创建AlexNet和VGGNet16架构。
3.1 AlexNet
虽然LeNet5是第一个ConvNet,但它被认为是一个浅层神经网络。它在由大小为28 x 28的灰度图像组成的MNIST数据集上运行良好,但是当我们尝试分类更大、分辨率更好、类别更多的图像时,性能就会下降。
第一个深度CNN于2012年推出,称为AlexNet,其创始人为Alex Krizhevsky、Ilya Sutskever和Geoffrey Hinton。与最近的架构相比,AlexNet可以算是简单的了,但在当时它确实非常成功。它以令人难以置信的15.4%的测试错误率赢得了ImageNet比赛(亚军的误差为26.2%),并在全球深度学习和人工智能领域掀起了一场革命。
它包括5个卷积层、3个最大池化层、3个完全连接层和2个丢弃层。整体架构如下所示:
第0层:大小为224 x 224 x 3的输入图像
第1层:具有96个滤波器(filter_depth_1 = 96)的卷积层,大小为11×11(filter_size_1 = 11),步长为4。它包含ReLU激活函数。 紧接着的是最大池化层和本地响应归一化层。
第2层:具有大小为5 x 5(filter_size_2 = 5)的256个滤波器(filter_depth_2 = 256)且步幅为1的卷积层。它包含ReLU激活函数。 紧接着的还是最大池化层和本地响应归一化层。
第3层:具有384个滤波器的卷积层(filter_depth_3 = 384),尺寸为3×3(filter_size_3 = 3),步幅为1。它包含ReLU激活函数
第4层:与第3层相同。
第5层:具有大小为3×3(filter_size_4 = 3)的256个滤波器(filter_depth_4 = 256)且步幅为1的卷积层。它包含ReLU激活函数
第6-8层:这些卷积层之后是完全连接层,每个层具有4096个神经元。在原始论文中,他们对1000个类别的数据集进行分类,但是我们将使用具有17个不同类别(的花卉)的oxford17数据集。
请注意,由于这些数据集中的图像太小,因此无法在MNIST或CIFAR-10数据集上使用此CNN(或其他的深度CNN)。正如我们以前看到的,一个池化层(或一个步幅为2的卷积层)将图像大小减小了2倍。 AlexNet具有3个最大池化层和一个步长为4的卷积层。这意味着原始图像尺寸会缩小2^5。 MNIST数据集中的图像将简单地缩小到尺寸小于0。
因此,我们需要加载具有较大图像的数据集,最好是224 x 224 x 3(如原始文件所示)。 17个类别的花卉数据集,又名oxflower17数据集是最理想的,因为它包含了这个大小的图像:
ox17_image_width = 224
ox17_image_height = 224
ox17_image_depth = 3
ox17_num_labels = 17
import tflearn.datasets.oxflower17 as oxflower17
train_dataset_, train_labels_ = oxflower17.load_data(one_hot=True)
train_dataset_ox17, train_labels_ox17 = train_dataset_[:1000,:,:,:], train_labels_[:1000,:]
test_dataset_ox17, test_labels_ox17 = train_dataset_[1000:,:,:,:], train_labels_[1000:,:]
print('Training set', train_dataset_ox17.shape, train_labels_ox17.shape)
print('Test set', test_dataset_ox17.shape, test_labels_ox17.shape)
让我们试着在AlexNet中创建权重矩阵和不同的层。正如我们之前看到的,我们需要跟层数一样多的权重矩阵和偏差矢量,并且每个权重矩阵的大小应该与其所属层的过滤器的大小相对应。
ALEX_PATCH_DEPTH_1, ALEX_PATCH_DEPTH_2, ALEX_PATCH_DEPTH_3, ALEX_PATCH_DEPTH_4 = 96, 256, 384, 256
ALEX_PATCH_SIZE_1, ALEX_PATCH_SIZE_2, ALEX_PATCH_SIZE_3, ALEX_PATCH_SIZE_4 = 11, 5, 3, 3
ALEX_NUM_HIDDEN_1, ALEX_NUM_HIDDEN_2 = 4096, 4096
def variables_alexnet(patch_size1 = ALEX_PATCH_SIZE_1, patch_size2 = ALEX_PATCH_SIZE_2,
patch_size3 = ALEX_PATCH_SIZE_3, patch_size4 = ALEX_PATCH_SIZE_4,
patch_depth1 = ALEX_PATCH_DEPTH_1, patch_depth2 = ALEX_PATCH_DEPTH_2,
patch_depth3 = ALEX_PATCH_DEPTH_3, patch_depth4 = ALEX_PATCH_DEPTH_4,
num_hidden1 = ALEX_NUM_HIDDEN_1, num_hidden2 = ALEX_NUM_HIDDEN_2,
image_width = 224, image_height = 224, image_depth = 3, num_labels = 17):
w1 = tf.Variable(tf.truncated_normal([patch_size1, patch_size1, image_depth, patch_depth1], stddev=0.1))
b1 = tf.Variable(tf.zeros([patch_depth1]))
w2 = tf.Variable(tf.truncated_normal([patch_size2, patch_size2, patch_depth1, patch_depth2], stddev=0.1))
b2 = tf.Variable(tf.constant(1.0, shape=[patch_depth2]))
w3 = tf.Variable(tf.truncated_normal([patch_size3, patch_size3, patch_depth2, patch_depth3], stddev=0.1))
b3 = tf.Variable(tf.zeros([patch_depth3]))
w4 = tf.Variable(tf.truncated_normal([patch_size4, patch_size4, patch_depth3, patch_depth3], stddev=0.1))
b4 = tf.Variable(tf.constant(1.0, shape=[patch_depth3]))
w5 = tf.Variable(tf.truncated_normal([patch_size4, patch_size4, patch_depth3, patch_depth3], stddev=0.1))
b5 = tf.Variable(tf.zeros([patch_depth3]))
pool_reductions = 3
conv_reductions = 2
no_reductions = pool_reductions + conv_reductions
w6 = tf.Variable(tf.truncated_normal([(image_width // 2no_reductions)(image_height // 2no_reductions)patch_depth3, num_hidden1], stddev=0.1))
b6 = tf.Variable(tf.constant(1.0, shape = [num_hidden1]))
w7 = tf.Variable(tf.truncated_normal([num_hidden1, num_hidden2], stddev=0.1))
b7 = tf.Variable(tf.constant(1.0, shape = [num_hidden2]))
w8 = tf.Variable(tf.truncated_normal([num_hidden2, num_labels], stddev=0.1))
b8 = tf.Variable(tf.constant(1.0, shape = [num_labels]))
variables = {
'w1': w1, 'w2': w2, 'w3': w3, 'w4': w4, 'w5': w5, 'w6': w6, 'w7': w7, 'w8': w8,
'b1': b1, 'b2': b2, 'b3': b3, 'b4': b4, 'b5': b5, 'b6': b6, 'b7': b7, 'b8': b8
}
return variables
def model_alexnet(data, variables):
layer1_conv = tf.nn.conv2d(data, variables['w1'], [1, 4, 4, 1], padding='SAME')
layer1_relu = tf.nn.relu(layer1_conv + variables['b1'])
layer1_pool = tf.nn.max_pool(layer1_relu, [1, 3, 3, 1], [1, 2, 2, 1], padding='SAME')
layer1_norm = tf.nn.local_response_normalization(layer1_pool)
layer2_conv = tf.nn.conv2d(layer1_norm, variables['w2'], [1, 1, 1, 1], padding='SAME')
layer2_relu = tf.nn.relu(layer2_conv + variables['b2'])
layer2_pool = tf.nn.max_pool(layer2_relu, [1, 3, 3, 1], [1, 2, 2, 1], padding='SAME')
layer2_norm = tf.nn.local_response_normalization(layer2_pool)
layer3_conv = tf.nn.conv2d(layer2_norm, variables['w3'], [1, 1, 1, 1], padding='SAME')
layer3_relu = tf.nn.relu(layer3_conv + variables['b3'])
layer4_conv = tf.nn.conv2d(layer3_relu, variables['w4'], [1, 1, 1, 1], padding='SAME')
layer4_relu = tf.nn.relu(layer4_conv + variables['b4'])
layer5_conv = tf.nn.conv2d(layer4_relu, variables['w5'], [1, 1, 1, 1], padding='SAME')
layer5_relu = tf.nn.relu(layer5_conv + variables['b5'])
layer5_pool = tf.nn.max_pool(layer4_relu, [1, 3, 3, 1], [1, 2, 2, 1], padding='SAME')
layer5_norm = tf.nn.local_response_normalization(layer5_pool)
flat_layer = flatten_tf_array(layer5_norm)
layer6_fccd = tf.matmul(flat_layer, variables['w6']) + variables['b6']
layer6_tanh = tf.tanh(layer6_fccd)
layer6_drop = tf.nn.dropout(layer6_tanh, 0.5)
layer7_fccd = tf.matmul(layer6_drop, variables['w7']) + variables['b7']
layer7_tanh = tf.tanh(layer7_fccd)
layer7_drop = tf.nn.dropout(layer7_tanh, 0.5)
logits = tf.matmul(layer7_drop, variables['w8']) + variables['b8']
return logits
现在我们可以修改CNN模型来使用AlexNet模型的权重和层次来对图像进行分类。
3.2 VGG Net-16
VGG Net于2014年由牛津大学的Karen Simonyan和Andrew Zisserman创建出来。 它包含了更多的层(16-19层),但是每一层的设计更为简单;所有卷积层都具有3×3以及步长为3的过滤器,并且所有最大池化层的步长都为2。
所以它是一个更深的CNN,但更简单。
它存在不同的配置,16层或19层。 这两种不同配置之间的区别是在第2,第3和第4最大池化层之后对3或4个卷积层的使用(见下文)。
配置为16层(配置D)的结果似乎更好,所以我们试着在Tensorflow中创建它。
#The VGGNET Neural Network
VGG16_PATCH_SIZE_1, VGG16_PATCH_SIZE_2, VGG16_PATCH_SIZE_3, VGG16_PATCH_SIZE_4 = 3, 3, 3, 3
VGG16_PATCH_DEPTH_1, VGG16_PATCH_DEPTH_2, VGG16_PATCH_DEPTH_3, VGG16_PATCH_DEPTH_4 = 64, 128, 256, 512
VGG16_NUM_HIDDEN_1, VGG16_NUM_HIDDEN_2 = 4096, 1000
def variables_vggnet16(patch_size1 = VGG16_PATCH_SIZE_1, patch_size2 = VGG16_PATCH_SIZE_2,
patch_size3 = VGG16_PATCH_SIZE_3, patch_size4 = VGG16_PATCH_SIZE_4,
patch_depth1 = VGG16_PATCH_DEPTH_1, patch_depth2 = VGG16_PATCH_DEPTH_2,
patch_depth3 = VGG16_PATCH_DEPTH_3, patch_depth4 = VGG16_PATCH_DEPTH_4,
num_hidden1 = VGG16_NUM_HIDDEN_1, num_hidden2 = VGG16_NUM_HIDDEN_2,
image_width = 224, image_height = 224, image_depth = 3, num_labels = 17):
w1 = tf.Variable(tf.truncated_normal([patch_size1, patch_size1, image_depth, patch_depth1], stddev=0.1))
b1 = tf.Variable(tf.zeros([patch_depth1]))
w2 = tf.Variable(tf.truncated_normal([patch_size1, patch_size1, patch_depth1, patch_depth1], stddev=0.1))
b2 = tf.Variable(tf.constant(1.0, shape=[patch_depth1]))
w3 = tf.Variable(tf.truncated_normal([patch_size2, patch_size2, patch_depth1, patch_depth2], stddev=0.1))
b3 = tf.Variable(tf.constant(1.0, shape = [patch_depth2]))
w4 = tf.Variable(tf.truncated_normal([patch_size2, patch_size2, patch_depth2, patch_depth2], stddev=0.1))
b4 = tf.Variable(tf.constant(1.0, shape = [patch_depth2]))
w5 = tf.Variable(tf.truncated_normal([patch_size3, patch_size3, patch_depth2, patch_depth3], stddev=0.1))
b5 = tf.Variable(tf.constant(1.0, shape = [patch_depth3]))
w6 = tf.Variable(tf.truncated_normal([patch_size3, patch_size3, patch_depth3, patch_depth3], stddev=0.1))
b6 = tf.Variable(tf.constant(1.0, shape = [patch_depth3]))
w7 = tf.Variable(tf.truncated_normal([patch_size3, patch_size3, patch_depth3, patch_depth3], stddev=0.1))
b7 = tf.Variable(tf.constant(1.0, shape=[patch_depth3]))
w8 = tf.Variable(tf.truncated_normal([patch_size4, patch_size4, patch_depth3, patch_depth4], stddev=0.1))
b8 = tf.Variable(tf.constant(1.0, shape = [patch_depth4]))
w9 = tf.Variable(tf.truncated_normal([patch_size4, patch_size4, patch_depth4, patch_depth4], stddev=0.1))
b9 = tf.Variable(tf.constant(1.0, shape = [patch_depth4]))
w10 = tf.Variable(tf.truncated_normal([patch_size4, patch_size4, patch_depth4, patch_depth4], stddev=0.1))
b10 = tf.Variable(tf.constant(1.0, shape = [patch_depth4]))
w11 = tf.Variable(tf.truncated_normal([patch_size4, patch_size4, patch_depth4, patch_depth4], stddev=0.1))
b11 = tf.Variable(tf.constant(1.0, shape = [patch_depth4]))
w12 = tf.Variable(tf.truncated_normal([patch_size4, patch_size4, patch_depth4, patch_depth4], stddev=0.1))
b12 = tf.Variable(tf.constant(1.0, shape=[patch_depth4]))
w13 = tf.Variable(tf.truncated_normal([patch_size4, patch_size4, patch_depth4, patch_depth4], stddev=0.1))
b13 = tf.Variable(tf.constant(1.0, shape = [patch_depth4]))
no_pooling_layers = 5
w14 = tf.Variable(tf.truncated_normal([(image_width // (2no_pooling_layers))(image_height // (2no_pooling_layers))patch_depth4 , num_hidden1], stddev=0.1))
b14 = tf.Variable(tf.constant(1.0, shape = [num_hidden1]))
w15 = tf.Variable(tf.truncated_normal([num_hidden1, num_hidden2], stddev=0.1))
b15 = tf.Variable(tf.constant(1.0, shape = [num_hidden2]))
w16 = tf.Variable(tf.truncated_normal([num_hidden2, num_labels], stddev=0.1))
b16 = tf.Variable(tf.constant(1.0, shape = [num_labels]))
variables = {
'w1': w1, 'w2': w2, 'w3': w3, 'w4': w4, 'w5': w5, 'w6': w6, 'w7': w7, 'w8': w8, 'w9': w9, 'w10': w10,
'w11': w11, 'w12': w12, 'w13': w13, 'w14': w14, 'w15': w15, 'w16': w16,
'b1': b1, 'b2': b2, 'b3': b3, 'b4': b4, 'b5': b5, 'b6': b6, 'b7': b7, 'b8': b8, 'b9': b9, 'b10': b10,
'b11': b11, 'b12': b12, 'b13': b13, 'b14': b14, 'b15': b15, 'b16': b16
}
return variables
def model_vggnet16(data, variables):
layer1_conv = tf.nn.conv2d(data, variables['w1'], [1, 1, 1, 1], padding='SAME')
layer1_actv = tf.nn.relu(layer1_conv + variables['b1'])
layer2_conv = tf.nn.conv2d(layer1_actv, variables['w2'], [1, 1, 1, 1], padding='SAME')
layer2_actv = tf.nn.relu(layer2_conv + variables['b2'])
layer2_pool = tf.nn.max_pool(layer2_actv, [1, 2, 2, 1], [1, 2, 2, 1], padding='SAME')
评论
查看更多