Pytorch搭建YOLOv5目标检测平台实践

科技 07-10 来源：强哥分享干货

YOLOv5是一种单阶段目标检测算法，该算法在YOLOv4的基础上添加了一些新的改进思路，使其速度与精度都得到了极大的性能提升。主要的改进思路如下所示：

输入端：在模型训练阶段，提出了一些改进思路，主要包括Mosaic数据增强、自适应锚框计算、自适应图片缩放；
基准网络：融合其它检测算法中的一些新思路，主要包括：Focus结构与CSP结构；
Neck网络：目标检测网络在BackBone与最后的Head输出层之间往往会插入一些层，Yolov5中添加了FPN+PAN结构；
Head输出层：输出层的锚框机制与YOLOv4相同，主要改进的是训练时的损失函数GIOU_Loss，以及预测框筛选的DIOU_nms。

Yolov5图像识别技术优点如下：

识别率高：使用Pytorch框架，能够方便地训练自己的数据集，相对于YOLO V4采用的Darknet框架，Pytorch框架技术更加全面，识别精准度更优。
误报率低：YOLO V5s高达140FPS的对象识别速度很快，识别率高，误报率低。
代码易读：代码易读，整合了大量的计算机视觉技术，非常有利于算法编写人员学习和借鉴。能够直接对单个图像，批量处理图像，视频甚至网络摄像头端口输入进行有效推理。
格式转化方便：能够轻松的将Pytorch权重文件转化为安卓使用的ONXX格式，然后可以转换为OPENCV的使用格式，或者通过CoreML转化为IOS格式，直接部署到手机应用端。
工作效率高：不仅易于配置环境，模型训练也非常快速，并且批处理推理产生实时结果。

YOLOv5网络结构图

使用netron模型可视化工具可以可视化YOLOv5 yolov5s的模型结构如下图所示：

YOLOv5模型结构

YOLOv5算法性能测试图

YOLOv5官方代码中，给出的目标检测网络中一共有4个版本，分别是yolov5s、yolov5m、yolov5l、yolov5x四个模型，下图为YOLOv5算法性能测试图：

YOLO性能

Yolov5s网络最小，速度最少，AP精度也最低。但如果检测的以大目标为主，追求速度，倒也是个不错的选择。
其他的三种网络，在此基础上，不断加深加宽网络，AP精度也不断提升，但速度的消耗也在不断增加。
目前使用下来，yolov5s的模型十几M大小，速度很快，线上生产效果可观，嵌入式设备可以使用。

准备训练环境和数据集

安装YOLOv5

#克隆YOLOv5源代码
git clone https://github.com/ultralytics/yolov5  
cd yolov5
#安装YOLOv5所需要的依赖
pip install -r requirements.txt

下载YOLOv5源代码后，目录结构如下图所示：

YOLO源代码结构

针对YOLOv5源代码目录结构主要说明如下：

data：YOLO训练数据集配置，采用yaml格式定义训练数据集，包括自定义训练数据集和YOLO官方提供的训练数据集样例；
models：定义YOLO模型源代码所在目录；
runs：运行YOLO脚本生成的目录，其中包括train，detect，val子目录；
utils：YOLO工具类所在目录；
detect.py：预测脚本，可以使用该脚本预测YOLO官方预训练模型，也可以预测自定义训练数据集训练后的模型；
export.py：导出模型脚本，可以导出多种深度机器学习框架的模型，例如：ONNX，TensorFlow，CoreML等；
train.py：训练脚本，可以使用此脚本训练自定义训练数据集；
val.py：验证脚本，可以验证训练模型的准确率；
yolov5s.pt：官方提供的预训练模型权重文件；

如果需要在CMD命令行里运行YOLOv5的脚本，需要将YOLOv5的源代码根目录加入到PTYHONPATH环境变量，如下所示：

# 查看PYTHONPATH环境变量
set PYTHONPATH
PYTHONPATH=D:\pyworkspace\yolov5;

下面我们将通过一个简单的示例来演示下，使用YOLOv5训练一个简单的对象检测模型，具体操作步骤如下：

下载数据集

Caltech 101 dataset，我们首先从网络上下载一个开源的目标检测数据集，该数据集包含101一个分类（目标对象），其中包括目标检测的原始图像和目标的标注数据。

将下载的数据集进行解压，解压后的数据集文件夹与YOLOv5文件夹处于同一目录层级，如下结构：

# parent
# ├── yolov5
# └── datasets
#     └── caltech-101

caltech-101文件夹下包含两个文件夹101_ObjectCategories和Annotations：

101_ObjectCategories文件下包含许多文件夹，每一个文件夹代表一个分类，每一个分类下包含若干训练图片；
Annotations文件夹下包含许多标注分类，每一个标注分类与101_ObjectCategories文件夹下的分类对应，标注形式以线框进行标注；

可视化数据集

随机的选择一组训练样本图片和对应的标注，来验证下标注的准确性

# 导入需要的包
from scipy.io import loadmat
import cv2
from matplotlib import pyplot as plt

# 加载标注文件
annot = loadmat("../datasets/caltech-101/labels/110.mat")["box_coord"][0]
top_left_x, top_left_y = annot[2], annot[0]
bottom_right_x, bottom_right_y = annot[3], annot[1]

# 计算标注框的宽和高
box_width = bottom_right_x - top_left_x
box_height = bottom_right_y - top_left_y

# 读取标注文件对应的训练样本图片
img = cv2.imread("../datasets/caltech-101/images/110.jpg")

# plt画图
plt.imshow(img)

ax = plt.gca()
# 默认框的颜色是黑色，第一个参数是左上角的点坐标
# 第二个参数是宽，第三个参数是长
ax.add_patch(
    plt.Rectangle(
        (top_left_x, top_left_y), 
        box_width, box_height, 
        color="red", fill=False, linewidth=1
    )
)

plt.show()

目标检测

数据集转换

下载的caltech-101数据集标注的XY坐标如下所示：

目标对象坐标

其中（min_x, min_y）= (top_left_x, top_left_y), (max_x, max_y) = (bottom_right_x, bottom_right_y)

然后，YOLOv5需要的训练数据标注格式为：

One row per object
Each row is class x_center y_center width height format.
Box coordinates must be in normalized xywh format (from 0 - 1). If your boxes are in pixels, divide x_center and width by image width, and y_center and height by image height.
Class numbers are zero-indexed (start from 0).

也就是说：

一个检测对象的标注一行；
每一行格式为：类别索引 + 空格 + 检测对象中心坐标X + 空格 + 检测对象中心坐标Y + 空格 + 检测对象的宽 + 空格 + 检测对象的高度；
线框的坐标和宽高必须归一化到0到1之间；
类别的索引应该从0开始；

我们只对"飞机"这个类别进行对象检测，因此需要对"飞机"数据的标注数据进行转换，转换成YOLO的数据标注格式

#导入需要的包
import os
from scipy.io import loadmat
import cv2

#分别定义caltech-101数据集标注文件和图像的路径
path_images = "../datasets/caltech-101/101_ObjectCategories/airplanes/"
path_annot = "../datasets/caltech-101/Annotations/Airplanes_Side_2/"

# list of paths to images and annotations
image_paths = [
    f for f in os.listdir(path_images) if os.path.isfile(os.path.join(path_images, f))
]
annot_paths = [
    f for f in os.listdir(path_annot) if os.path.isfile(os.path.join(path_annot, f))
]
# loop over the annotations and images, preprocess them and store in lists
for i in range(0, len(annot_paths)):
    # Access bounding box coordinates
    mat_file = path_annot + annot_paths[i]
    annot = loadmat(mat_file)
    if 'box_coord' not in annot:
        continue

    annot = annot["box_coord"][0]
    #获取标注的xy坐标
    top_left_x, top_left_y = annot[2], annot[0]
    bottom_right_x, bottom_right_y = annot[3], annot[1]

    image = cv2.imread(
        path_images + image_paths[i],
    )
    (height, width, c) = image.shape

    x_max = bottom_right_x
    x_min = top_left_x
    y_max = bottom_right_y
    y_min = top_left_y
    #计算线框的中心坐标
    x_center = (x_max + x_min) / 2 / width
    y_center = (y_max + y_min) / 2 / height
    #计算线框的宽高
    box_w = (abs(x_max - x_min)) / width
    box_h = (abs(y_max - y_min)) / height

    txt_file = image_paths[i].split(".")
    txt_file = txt_file[0] + ".txt"
    f = open(path_annot + "../../txt/" + txt_file, mode='w')
    #将线框保存为YOLO标注格式
    f.write("0 " + str(format(x_center, '.6f'))
            + " " + str(format(y_center,'.6f'))
            + " " + str(format(box_w, '.6f'))
            + " " + str(format(box_h, '.6f')))
    f.close()

最后，将转换后的数据集整理成如下的目录结构

D:\pyworkspace\datasets>tree cust_dataset
D:\PYWORKSPACE\DATASETS\CUST_DATASET
├─images
│  └─train
└─labels
    └─train

准备YOLO训练数据

准备YOLO训练数据集

YOLOv5采用yaml格式定义训练数据集，YOLO从yaml读取训练配置项进行训练。我们需要定义我们自己的训练数据集yaml文件，

从yolov5/data/coco128.yaml复制粘贴一个新的文件cust_dataset.yaml，修改其中的内容如下：

# Example usage: python train.py --data cust_dataset.yaml
# parent
# ├── yolov5
# └── datasets
#     └── cust_dataset  ← 自定义数据集根目录

path: ../datasets/cust_dataset  # dataset root dir
train: images/train  # train images (relative to 'path') 
val: images/train  # val images (relative to 'path') 
test:  # test images (optional)

# Classes
nc: 1  # number of classes
names: ['airplane']  # class names

该文件中字段含义如下：

path：训练数据集的根目录；
train：训练图片的相对路径；
val：验证图片的相对路径；
test：测试图片的相对路径，该选项为非必填项；
nc：检测对象类别的数量；
names：检测对象类别的名字，由于我们只对"飞机"这个类别进行检测，所以这里的nc我们只有一个类别，names也只有一个分类"飞机"；

准备预训练模型

YOLO官方提供如下预训练模型

YOLO模型对比

预训练模型对比如下：

Model	size(pixels)	mAPval 0.5:0.95	mAPval 0.5	Speed CPU b1 (ms)	Speed V100 b1 (ms)	Speed V100 b32 (ms)	params (M)	FLOPs @640 (B)
YOLOv5n	640	28.0	45.7	45	6.3	0.6	1.9	4.5
YOLOv5s	640	37.4	56.8	98	6.4	0.9	7.2	16.5
YOLOv5m	640	45.4	64.1	224	8.2	1.7	21.2	49.0
YOLOv5l	640	49.0	67.3	430	10.1	2.7	46.5	109.1
YOLOv5x	640	50.7	68.9	766	12.1	4.8	86.7	205.7
YOLOv5n6	1280	36.0	54.4	153	8.1	2.1	3.2	4.6
YOLOv5s6	1280	44.8	63.7	385	8.2	3.6	12.6	16.8
YOLOv5m6	1280	51.3	69.3	887	11.1	6.8	35.7	50.0
YOLOv5l6	1280	53.7	71.3	1784	15.8	10.5	76.8	111.4
YOLOv5x6 + TTA	1280 1536	55.0 55.8	72.7 72.7	3136 -	26.2 -	19.4 -	140.7 -	209.8 -

我们下载一个基本的预训练模型YOLOv5s，将下载的预训练模型文件yolov5s.pt放入yolov5根目录。

开始训练自定义数据集

经过上面的准备工作，我们完成了两件事情：

训练数据集的准备；
预训练模型下载；

接下来我们将开始真正的训练我们的自定义数据集。YOLOv5源代码中已经提供了训练脚本train.py，该训练脚本中提供了若干参数：

def parse_opt(known=False):
    parser = argparse.ArgumentParser()
    parser.add_argument('--weights', type=str, default=ROOT / 'yolov5s.pt', help='initial weights path')
    parser.add_argument('--cfg', type=str, default='', help='model.yaml path')
    parser.add_argument('--data', type=str, default=ROOT / 'data/cust_dataset.yaml', help='dataset.yaml path')
    parser.add_argument('--hyp', type=str, default=ROOT / 'data/hyps/hyp.scratch-low.yaml', help='hyperparameters path')
    parser.add_argument('--epochs', type=int, default=300)
    parser.add_argument('--batch-size', type=int, default=16, help='total batch size for all GPUs, -1 for autobatch')
    parser.add_argument('--imgsz', '--img', '--img-size', type=int, default=420, help='train, val image size (pixels)')
    parser.add_argument('--rect', action='store_true', help='rectangular training')
    parser.add_argument('--resume', nargs='?', const=True, default=False, help='resume most recent training')
    parser.add_argument('--nosave', action='store_true', help='only save final checkpoint')
    parser.add_argument('--noval', action='store_true', help='only validate final epoch')
    parser.add_argument('--noautoanchor', action='store_true', help='disable AutoAnchor')
    parser.add_argument('--noplots', action='store_true', help='save no plot files')
    parser.add_argument('--evolve', type=int, nargs='?', const=300, help='evolve hyperparameters for x generations')
    parser.add_argument('--bucket', type=str, default='', help='gsutil bucket')
    parser.add_argument('--cache', type=str, nargs='?', const='ram', help='--cache images in "ram" (default) or "disk"')
    parser.add_argument('--image-weights', action='store_true', help='use weighted image selection for training')
    parser.add_argument('--device', default='', help='cuda device, i.e. 0 or 0,1,2,3 or cpu')
    parser.add_argument('--multi-scale', action='store_true', help='vary img-size +/- 50%%')
    parser.add_argument('--single-cls', action='store_true', help='train multi-class data as single-class')
    parser.add_argument('--optimizer', type=str, choices=['SGD', 'Adam', 'AdamW'], default='SGD', help='optimizer')
    parser.add_argument('--sync-bn', action='store_true', help='use SyncBatchNorm, only available in DDP mode')
    parser.add_argument('--workers', type=int, default=0, help='max dataloader workers (per RANK in DDP mode)')
    parser.add_argument('--project', default=ROOT / 'runs/train', help='save to project/name')
    parser.add_argument('--name', default='exp', help='save to project/name')
    parser.add_argument('--exist-ok', action='store_true', help='existing project/name ok, do not increment')
    parser.add_argument('--quad', action='store_true', help='quad dataloader')
    parser.add_argument('--cos-lr', action='store_true', help='cosine LR scheduler')
    parser.add_argument('--label-smoothing', type=float, default=0.0, help='Label smoothing epsilon')
    parser.add_argument('--patience', type=int, default=100, help='EarlyStopping patience (epochs without improvement)')
    parser.add_argument('--freeze', nargs='+', type=int, default=[0], help='Freeze layers: backbone=10, first3=0 1 2')
    parser.add_argument('--save-period', type=int, default=-1, help='Save checkpoint every x epochs (disabled if < 1)')
    parser.add_argument('--local_rank', type=int, default=-1, help='DDP parameter, do not modify')

    # Weights & Biases arguments
    parser.add_argument('--entity', default=None, help='W&B: Entity')
    parser.add_argument('--upload_dataset', nargs='?', const=True, default=False, help='W&B: Upload data, "val" option')
    parser.add_argument('--bbox_interval', type=int, default=-1, help='W&B: Set bounding-box image logging interval')
    parser.add_argument('--artifact_alias', type=str, default='latest', help='W&B: Version of dataset artifact to use')

    opt = parser.parse_known_args()[0] if known else parser.parse_args()
    return opt

针对上面若干参数，我们只需要关心对我们训练有用的参数即可，其他参数我们只需要了解。

img：训练和验证的图片的大小，单位是像素；
batch：训练和验证的批次大小；
epochs：训练和验证的轮次；
data：训练数据集的配置文件，对应我们上面定义的cust_dataset.yaml；
weights：预训练模型的权重文件；
device：是否使用GPU，如果使用则指定GPU设备号，如果使用CPU，则指定为cpu；

在开始训练之前，我们需要确定下，我们本地的PyTorch是否支持GPU环境；

>>> import torch
>>> torch.cuda.is_available()
True
>>> torch.cuda.device_count()
1
>>>

可以看到我们本地的PyTorch是支持GPU加速的，并且存在一块可用的GPU。

训练命令如下：

cd yolov5
python train.py --img 512 --batch 8 --epochs 20 --data data\cust_dataset.yaml --weights yolov5s.pt --device 0

训练过程日志

在上面的训练命令中，我们指定了训练20个轮次，大约经过1个小时20个轮次训练完毕。

训练开始的时候，YOLO会自动的在runs/train/exp/weights/目录下，生成3个batch的训练数据可视化效果图，可以看到在一个batch下我们的训练数据的样子。

训练batch

使用Tensorboard来可视化训练过程中的各项指标：

tensorboard --logdir=runs	rain
Serving TensorBoard on localhost; to expose to the network, use a proxy or pass --bind_all
TensorBoard 2.6.0 at http://localhost:6006/ (Press CTRL+C to quit)

训练指标

训练指标如上图所示。

训练完成后，会将我们训练的模型权重文件保存在runs/train/exp/weights/best.pt目录下，我们可以使用该模型权重来进行一些预测，以验证我们自己训练的模型预测效果怎样。

预测训练模型

当训练完后，我们可以使用我们训练的模型进行预测，从网络上随便找一张图片，如下所示：

测试数据

我们使用这个图片进行预测。

YOLOv5源码中，提供一个预测的脚本detect.py，该脚本提供若干参数：

def parse_opt():
    parser = argparse.ArgumentParser()
    parser.add_argument('--weights', nargs='+', type=str, default=ROOT / 'yolov5s.pt', help='model path(s)')
    parser.add_argument('--source', type=str, default=ROOT / 'data/images', help='file/dir/URL/glob, 0 for webcam')
    parser.add_argument('--data', type=str, default=ROOT / 'data/coco128.yaml', help='(optional) dataset.yaml path')
    parser.add_argument('--imgsz', '--img', '--img-size', nargs='+', type=int, default=[640], help='inference size h,w')
    parser.add_argument('--conf-thres', type=float, default=0.25, help='confidence threshold')
    parser.add_argument('--iou-thres', type=float, default=0.45, help='NMS IoU threshold')
    parser.add_argument('--max-det', type=int, default=1000, help='maximum detections per image')
    parser.add_argument('--device', default='', help='cuda device, i.e. 0 or 0,1,2,3 or cpu')
    parser.add_argument('--view-img', action='store_true', help='show results')
    parser.add_argument('--save-txt', action='store_true', help='save results to *.txt')
    parser.add_argument('--save-conf', action='store_true', help='save confidences in --save-txt labels')
    parser.add_argument('--save-crop', action='store_true', help='save cropped prediction boxes')
    parser.add_argument('--nosave', action='store_true', help='do not save images/videos')
    parser.add_argument('--classes', nargs='+', type=int, help='filter by class: --classes 0, or --classes 0 2 3')
    parser.add_argument('--agnostic-nms', action='store_true', help='class-agnostic NMS')
    parser.add_argument('--augment', action='store_true', help='augmented inference')
    parser.add_argument('--visualize', action='store_true', help='visualize features')
    parser.add_argument('--update', action='store_true', help='update all models')
    parser.add_argument('--project', default=ROOT / 'runs/detect', help='save results to project/name')
    parser.add_argument('--name', default='exp', help='save results to project/name')
    parser.add_argument('--exist-ok', action='store_true', help='existing project/name ok, do not increment')
    parser.add_argument('--line-thickness', default=3, type=int, help='bounding box thickness (pixels)')
    parser.add_argument('--hide-labels', default=False, action='store_true', help='hide labels')
    parser.add_argument('--hide-conf', default=False, action='store_true', help='hide confidences')
    parser.add_argument('--half', action='store_true', help='use FP16 half-precision inference')
    parser.add_argument('--dnn', action='store_true', help='use OpenCV DNN for ONNX inference')
    opt = parser.parse_args()
    opt.imgsz *= 2 if len(opt.imgsz) == 1 else 1  # expand
    print_args(vars(opt))
    return opt

以上若干参数，我们只关心最重要的两个参数：

weights：训练模型的权重，这里可以指定我们刚才训练的模型权重；
source：预测来源，可以是图片，视频，目录和webcam等

预测命令如下：

python detect.py --weights runs	rain\exp\weights\best.pt --source data\images\feiji.jpeg
detect: weights=['runs\train\exp\weights\best.pt'], source=data\images\feiji.jpeg, data=data\coco128.yaml, imgsz=[640, 640], conf_thres=0.25, iou_thres=0.45, max_det=1000, device=, view_img=False, save_txt=False, save_conf=False, save_crop=False, nosave=False, classes=None, agnostic_nms=False, augment=False, visualize=False, update=False, project=runs\detect, name=exp, exist_ok=False, line_thickness=3, hide_labels=False, hide_conf=False, half=False, dnn=False
YOLOv5  v6.1-190-g4d59f65 Python-3.8.13 torch-1.11.0 CUDA:0 (NVIDIA GeForce MX250, 2048MiB)

Fusing layers...
Model summary: 213 layers, 7012822 parameters, 0 gradients, 15.8 GFLOPs
image 1/1 D:\pyworkspace\yolov5\data\images\feiji.jpeg: 384x640 1 airplane, Done. (0.026s)
Speed: 0.9ms pre-process, 26.0ms inference, 5.0ms NMS per image at shape (1, 3, 640, 640)
Results saved to runs\detect\exp

这里我们使用刚才训练的模型权重，预测图片就是我们上面随机找的图片。