ResNet¶
Note
ResNet的残差块加了一个恒等映射通路,使得我们能够训练更深的网络
ResNet是2015年ImageNet比赛的冠军
结构¶
下图比较了一个正常块和一个残差块,残差块加了一个恒等映射通路,在最后的激活函数前相加:
残差块使得随着网络的加深,函数空间总是嵌套的!而且它能缓解反向传播时的梯度消失和梯度爆炸,最终让ResNet又快又好:
在GoogleNet节中我们知道,1×1 卷积可以调整通道数和分辨率,它的功能类似于恒等映射,因此我们可以做出两类残差块:
import torch
from torch import nn
import torch.nn.functional as F
class Residual(nn.Module):
"""ResNet的残差块"""
def __init__(self, input_channels, num_channels, use_1x1conv=False,
strides=1):
# use_1x1conv=False时,必须input_channels=num_channels & strides=1
# 不然相加时shap不一致
super().__init__()
# 第一个卷积层完成通道和分辨率转换
self.conv1 = nn.Conv2d(input_channels, num_channels, kernel_size=3,
padding=1, stride=strides)
self.conv2 = nn.Conv2d(num_channels, num_channels, kernel_size=3,
padding=1)
# 使用1*1卷积完成通道和分辨率转换
if use_1x1conv:
self.conv3 = nn.Conv2d(input_channels, num_channels,
kernel_size=1, stride=strides)
else:
self.conv3 = None
# BatchNorm在卷积层和激活函数之间
self.bn1 = nn.BatchNorm2d(num_channels)
self.bn2 = nn.BatchNorm2d(num_channels)
def forward(self, X):
# 正常块
Y = F.relu(self.bn1(self.conv1(X)))
Y = self.bn2(self.conv2(Y))
# 1*1 conv layer instead of identity sometimes
if self.conv3:
X = self.conv3(X)
Y += X
return F.relu(Y)