0%

学习小土堆pytorch教程记录-pytorch模型训练过程

现有网络模型的使用和修改、网络模型的保存和读取、完整的模型训练套路、使用GPU进行训练、完整的模型验证套路

算是学完了基础使用吧,但离实际操作还有很远。

代码的github仓库为:https://github.com/Eclipse-git725/pytorch

1. 现有网络模型的使用和修改

官方文档:https://pytorch.org/vision/stable/index.html
imgenet:https://pytorch.org/vision/0.9/datasets.html#imagenet
需要先安装scipy包,pip install scipy
pretrained为True时,模型是在数据集上已经训练好的

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
import torch
import torchvision
from torch import nn

# 太大了,147G
# dataset = torchvision.datasets.ImageNet("./dataset", split="train", transform=torchvision.transforms.ToTensor(), download=True)
# dataloader = torch.utils.data.DataLoader(dataset, batch_size=1, shuffle=True, drop_last=True)

vgg16_false = torchvision.models.vgg16(pretrained=False)
# vgg16_true = torchvision.models.vgg16(pretrained=True)
# 调试,断点打到这里
print("ok")
print(vgg16_false)

# 迁移学习,根据现有网络改变它的结构
# 修改vgg16,in_features=1000,out_features=10
# vgg16_false.add_module("add_linear", nn.Linear(1000, 10))
# vgg16_false.classifier.add_module("add_linear", nn.Linear(1000, 10))
vgg16_false.classifier[6] = nn.Linear(4096, 10)
print(vgg16_false)

2. 网络模型的保存和读取

模型保存两种方式

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
import torch
import torchvision
from torch import nn

vgg16 = torchvision.models.vgg16(pretrained=False)
# 保存方式1,模型结构和模型参数
torch.save(vgg16, 'vgg16_method1.pth')

# 保存方式2,模型参数(官方推荐)
torch.save(vgg16.state_dict(), 'vgg16_method2.pth')

# 陷阱
class Model(nn.Module):
def __init__(self):
super(Model, self).__init__()
self.conv2d1 = nn.Conv2d(3, 64, 3)

def forward(self, x):
return self.conv2d1(x)

model = Model()
torch.save(model, "model_method1.pth")

模型读取两种方式(如果是自己定义的模型需要能访问到自己的模型结构才能加载)

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
import torch
import torchvision
from torch import nn
from model_save import *

# 读取方式1
model = torch.load('vgg16_method1.pth')
# print(model)

# 读取方式2
model = torch.load('vgg16_method2.pth')
# print(model)
# 只保存参数,如何恢复网络模型,load_state_dict参数里写字典
vgg16 = torchvision.models.vgg16(pretrained=False)
vgg16.load_state_dict(torch.load('vgg16_method2.pth'))
# print(vgg16)

# 陷阱
# class Model(nn.Module):
# def __init__(self):
# super(Model, self).__init__()
# self.conv2d1 = nn.Conv2d(3, 64, 3)

# def forward(self, x):
# return self.conv2d1(x)
# 只写load会报错,需要把模型定义写上,或者用import引入
model = torch.load('model_method1.pth')
print(model)

3. 完整的模型训练套路

  1. 加载训练数据集和测试数据集
  2. 创建神经网络
  3. 创建损失函数
  4. 创建优化器
  5. 设置训练次数、测试次数和训练轮数
  6. 开始训练、测试,输出损失、准确率等,以及进行tensorboard可视化
    1
    2
    3
    4
    5
    6
    7
    8
    9
    10
    11
    12
    13
    14
    15
    16
    17
    18
    19
    20
    21
    22
    23
    24
    25
    26
    27
    28
    29
    30
    31
    32
    33
    34
    35
    36
    37
    38
    39
    40
    41
    42
    43
    44
    45
    46
    47
    48
    49
    50
    51
    52
    53
    54
    55
    56
    57
    58
    59
    60
    61
    62
    63
    64
    65
    66
    67
    68
    69
    70
    71
    72
    73
    74
    75
    76
    77
    78
    79
    80
    81
    82
    83
    84
    85
    86
    87
    from torch.utils.tensorboard import SummaryWriter
    import torchvision
    from torch.utils.data import DataLoader
    from model.model import Model
    from torch import nn
    import torch

    train_set = torchvision.datasets.CIFAR10(root="./dataset", train=True, transform=torchvision.transforms.ToTensor(), download=True)
    test_set = torchvision.datasets.CIFAR10(root="./dataset", train=False, transform=torchvision.transforms.ToTensor(), download=True)

    print("训练数据集的长度为:{}".format(len(train_set)))
    print("测试数据集的长度为:{}".format(len(test_set)))

    # 用DataLoader加载数据集
    train_dataloader = DataLoader(train_set, batch_size=64, drop_last=True)
    test_dataloader = DataLoader(test_set, batch_size=64, shuffle=True)

    # 搭建神经网络
    model = Model()

    # 创建损失函数
    loss_fn = nn.CrossEntropyLoss()

    # 创建优化器
    learning_rate = 1e-2
    optimizer = torch.optim.SGD(model.parameters(), lr=learning_rate)

    # 设置训练网络的参数
    # 记录训练次数
    total_train_step = 0
    # 记录测试次数
    total_test_step = 0
    # 记录训练轮数
    epoch = 5

    # 添加tensorboard
    writer = SummaryWriter("logs")


    for i in range(epoch):
    print("-----------第{}轮训练开始---------".format(i+1))

    # 训练步骤开始
    # 对特殊的层有影响,比如dropout和batchnorm
    model.trian()
    for data in train_dataloader:
    imgs, target = data
    output = model(imgs)
    loss = loss_fn(output, target)

    # 优化器优化模型
    optimizer.zero_grad()
    loss.backward()
    optimizer.step()
    total_train_step += 1
    if(total_train_step % 100 == 0):
    print("训练次数:{},loss:{}".format(total_train_step, loss))
    writer.add_scalar("train_loss", loss.item(), total_train_step)

    # 测试步骤开始
    # 对特殊的层有影响,比如dropout和batchnorm
    model.eval()
    total_test_loss = 0
    # 分类中的重要参数,正确率
    total_accuracy = 0
    with torch.no_grad():
    for data in test_dataloader:
    imgs, target = data
    output = model(imgs)
    loss = loss_fn(output, target)
    total_test_loss += loss
    # 计算正确率
    accuracy = (output.argmax(1) == target).sum()
    total_accuracy += accuracy

    print("整体测试集上的loss为:{}".format(total_test_loss))
    print("整体测试集上的正确率为:{}".format(total_accuracy/len(test_set)))
    writer.add_scalar("test_loss", total_test_loss, total_test_step)
    writer.add_scalar("test_accuracy", total_accuracy/len(test_set), total_test_step)
    total_test_step += 1

    # 保存每一轮训练的模型
    torch.save(model, "model/model_{}.pth".format(i))
    print("模型已保存")

    writer.close()

4. 使用GPU进行训练

只有模型、数据、损失函数可以调用.cuda()函数
示例:

1
2
3
4
5
6
7
8
9
10
if torch.cuda.is_available():
model.cuda()

loss_fn = nn.CrossEntropyLoss()
if torch.cuda.is_available():
loss_fn.cuda()

if torch.cuda.is_available():
imgs = imgs.cuda()
target = target.cuda()

如果电脑没有GPU,可以使用google colab使用GPU训练,有免费额度。
使用手机Google可以不用开梯子,不用手机号,就注册上一个google邮箱。
设置使用GPU,可想jupyter一样使用。
1

5. 完整的模型验证套路

利用已经训练好的模型,给它提供输入
若是拿GPU训练好的模型,在CPU上验证,在模型load时使用:
model = torch.load("./model/model_4.pth", map_location=torch.device('cpu'))

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
from PIL import Image
import torchvision
from torchvision import transforms
from torch import nn
import torch

img_path = "./imgs/dog.png"
img = Image.open(img_path)
print(img)

transform = torchvision.transforms.Compose([
transforms.Resize((32, 32)),
transforms.ToTensor()
])

img = transform(img)
print(img.shape)


class Model(nn.Module):
def __init__(self):
super(Model, self).__init__()
self.model = nn.Sequential(
nn.Conv2d(3, 32, 5, padding=2),
nn.MaxPool2d(2),
nn.Conv2d(32, 32, 5, padding=2),
nn.MaxPool2d(2),
nn.Conv2d(32, 64, 5, padding=2),
nn.MaxPool2d(2),
nn.Flatten(),
nn.Linear(1024, 64),
nn.Linear(64, 10)
)

def forward(self, x):
return self.model(x)

model = torch.load("./model/model_4.pth")
print(model)
img = torch.reshape(img, (1, 3, 32, 32))
model.eval()
with torch.no_grad():
output = model(img)
print(output)
print(output.argmax(1))