# 预训练+微调

```{note}
使用预训练模型就像是站在巨人的肩膀上，现在很多模型都是预训练+微调的模式。<br/>
本节我们使用在Fashion-MNIST上训练好的模型来训练MNIST。
```

## 获得模型和数据

In [1]:
from tensorflow import keras

# 之前在Fashion-MNIST上训练的模型
model_A = keras.models.load_model("my_fashion_mnist_model")
# 前面用预训练的数据，最后一层从头开始
model_B_on_A = keras.models.Sequential(model_A.layers[:-1])
model_B_on_A.add(keras.layers.Dense(10, activation="sigmoid"))

In [2]:
# 载入MNIST数据集
(X_train_val, y_train_val), (X_test, y_test) = keras.datasets.mnist.load_data()

X_val, X_train = X_train_val[:5000] / 255., X_train_val[5000:] / 255.
y_val, y_train = y_train_val[:5000], y_train_val[5000:]
X_test = X_test / 255.

## 冻结预训练层

In [3]:
# 冻结pretrain layers
for layer in model_B_on_A.layers[:-1]:
    layer.trainable = False

# 编译
model_B_on_A.compile(loss="sparse_categorical_crossentropy", 
                     optimizer=keras.optimizers.SGD(learning_rate=1e-2),
                     metrics=["accuracy"])
# 训练
history = model_B_on_A.fit(X_train, y_train, 
                           epochs=5,
                           validation_data=(X_val, y_val))

Epoch 1/5
Epoch 2/5
Epoch 3/5
Epoch 4/5
Epoch 5/5


## 微调预训练层

在冻结预训练层进行训练后，我们可以放开限制进行微调，注意微调时要用较小的学习率。

In [4]:
# 解冻
for layer in model_B_on_A.layers[:-1]:
    layer.trainable = True

# 使用较小的学习率重新编译
model_B_on_A.compile(loss="sparse_categorical_crossentropy",
                     optimizer=keras.optimizers.SGD(learning_rate=1e-3),
                     metrics=["accuracy"])
# 训练
history = model_B_on_A.fit(X_train, y_train, 
                           epochs=5,
                           validation_data=(X_val, y_val))

Epoch 1/5
Epoch 2/5
Epoch 3/5
Epoch 4/5
Epoch 5/5
