NPU和CPU对比运行速度有何不同？基于i.MX 8M Plus处理器的MYD-JX8MPQ开发板-电子发烧友网

参考

https://www.toradex.cn/blog/nxp-imx8ji-yueiq-kuang-jia-ce-shi-machine-learning

IMX-MACHINE-LEARNING-UG.pdf

CPU和NPU图像分类

cd /usr/bin/tensoRFlow-lite-2.4.0/examples

CPU运行

./label_image -m mobilenet_v1_1.0_224_quant.tflite -i grace_hopper.bmp -l labels.txt

INFO: Loaded model mobilenet_v1_1.0_224_quant.tflite

INFO: resolved reporter

INFO: invoked

INFO: averagetime:50.66ms

INFO: 0.780392: 653 military unIForm

INFO: 0.105882: 907 Windsor tie

INFO: 0.0156863: 458 bow tie

INFO: 0.0117647: 466 bulletproof vest

INFO: 0.00784314: 835 suit

GPU/NPU加速运行

./label_image -m mobilenet_v1_1.0_224_quant.tflite -i grace_hopper.bmp -l labels.txt-a 1

INFO: Loaded model mobilenet_v1_1.0_224_quant.tflite

INFO: resolved reporter

INFO: Created TensorFlow Lite delegate for NNAPI.

INFO: Applied NNAPI delegate.

INFO: invoked

INFO: average time:2.775ms

INFO: 0.768627: 653 military uniform

INFO: 0.105882: 907 Windsor tie

INFO: 0.0196078: 458 bow tie

INFO: 0.0117647: 466 bulletproof vest

INFO: 0.00784314: 835 suit

USE_GPU_INFERENCE=0./label_image -m mobilenet_v1_1.0_224_quant.tflite -i grace_hopper.bmp -l labels.txt--external_delegate_path=/usr/lib/libvx_delegate.so

Python运行

python3 label_image.py

INFO: Created TensorFlow Lite delegate for NNAPI.

Applied NNAPI delegate.

WARM-up time:6628.5ms

Inference time: 2.9 ms

0.870588: military uniform

0.031373: Windsor tie

0.011765: mortarboard

0.007843: bow tie

0.007843: bulletproof vest

基准测试CPU单核运行

./benchmark_model --graph=mobilenet_v1_1.0_224_quant.tflite

STARTING!

Log parameter values verbosely: [0]

Graph: [mobilenet_v1_1.0_224_quant.tflite]

Loaded model mobilenet_v1_1.0_224_quant.tflite

The input model file size (MB): 4.27635

Initialized session in 15.076ms.

Running benchmark for at least 1 iterations and at least 0.5 seconds but terminate if exceeding 150 seconds.

count=4 first=166743 curr=161124 min=161054 max=166743avg=162728std=2347

Running benchmark for at least 50 iterations and at least 1 seconds but terminate if exceeding 150 seconds.

count=50 first=161039 curr=161030 min=160877 max=161292 avg=161039std=94

Inference timings in us: Init: 15076, First inference: 166743, Warmup (avg):162728, Inference (avg):161039

Note: as the benchmark tool itself affects memory footprint, the following is only APPROXIMATE to the actual memory footprint of the model at runtime. Take the information at your discretion.

Peak memory footprint (MB): init=2.65234 overall=9.00391

CPU多核运行

./benchmark_model --graph=mobilenet_v1_1.0_224_quant.tflite --num_threads=4

4核--num_threads设置为4性能最好

STARTING!

Log parameter values verbosely: [0]

Num threads: [4]

Graph: [mobilenet_v1_1.0_224_quant.tflite]

#threads used for CPU inference: [4]

Loaded model mobilenet_v1_1.0_224_quant.tflite

The input model file size (MB): 4.27635

Initialized session in 2.536ms.

Running benchmark for at least 1 iterations and at least 0.5 seconds but terminate if exceeding 150 seconds.

count=11 first=48722 curr=44756 min=44597 max=49397 avg=45518.9 std=1679

Running benchmark for at least 50 iterations and at least 1 seconds but terminate if exceeding 150 seconds.

count=50 first=44678 curr=44591 min=44590 max=50798avg=44965.2std=1170

Inference timings in us: Init: 2536, First inference: 48722, Warmup (avg):45518.9, Inference (avg):44965.2

Note: as the benchmark tool itself affects memory footprint, the following is only APPROXIMATE to the actual memory footprint of the model at runtime. Take the information at your discretion.

Peak memory footprint (MB): init=1.38281 overall=8.69922

GPU/NPU加速

./benchmark_model --graph=mobilenet_v1_1.0_224_quant.tflite --num_threads=4 --use_nnapi=true

STARTING!

Log parameter values verbosely: [0]

Num threads: [4]

Graph: [mobilenet_v1_1.0_224_quant.tflite]

#threads used for CPU inference: [4]

Use NNAPI: [1]

NNAPI accelerators available: [vsi-npu]

Loaded model mobilenet_v1_1.0_224_quant.tflite

INFO: Created TensorFlow Lite delegate for NNAPI.

Explicitly applied NNAPI delegate, and the model graph will be completely executed by the delegate.

The input model file size (MB): 4.27635

Initialized session in 3.968ms.

Running benchmark for at least 1 iterations and at least 0.5 seconds but terminate if exceeding 150 seconds.

count=1 curr=6611085

Running benchmark for at least 50 iterations and at least 1 seconds but terminate if exceeding 150 seconds.

count=369 first=2715 curr=2623 min=2572 max=2776avg=2634.2std=20

Inference timings in us: Init: 3968, First inference: 6611085, Warmup (avg): 6.61108e+06, Inference (avg): 2634.2

Note: as the benchmark tool itself affects memory footprint, the following is only APPROXIMATE to the actual memory footprint of the model at runtime. Take the information at your discretion.

Peak memory footprint (MB): init=2.42188 overall=28.4062

结果对比

	CPU运行	CPU多核多线程	NPU加速
图像分类	50.66 ms		2.775 ms
基准测试	161039uS	44965.2uS	2634.2uS

OpenCV DNN

cd /usr/share/OpenCV/samples/bin

./example_dnn_classification --input=dog416.png --zoo=models.yml squeezenet

下载模型

cd /usr/share/opencv4/testdata/dnn/

python3 download_models_basic.py

图像分类

cd /usr/share/OpenCV/samples/bin

./example_dnn_classification --input=dog416.png --zoo=models.yml squeezenet

文件浏览器地址栏输入

ftp://ftp.toradex.cn/Linux/i.MX8/eIQ/OpenCV/Image_Classification.zip

下载文件

解压得到文件models.yml和squeezenet_v1.1.caffemodel

cd /usr/share/OpenCV/samples/bin

将文件导入到开发板的/usr/share/OpenCV/samples/bin目录下

$cp/usr/share/opencv4/testdata/dnn/dog416.png /usr/share/OpenCV/samples/bin/
$cp/usr/share/opencv4/testdata/dnn/squeezenet_v1.1.prototxt /usr/share/OpenCV/samples/bin/
$cp/usr/share/OpenCV/samples/data/dnn/classification_classes_ILSVRC2012.txt /usr/share/OpenCV/samples/bin/
$ cd /usr/share/OpenCV/samples/bin/

图片输入

./example_dnn_classification --input=dog416.png --zoo=models.yml squeezenet

报错

root@myd-jx8mp:/usr/share/OpenCV/samples/bin# ./example_dnn_classification --input=dog416.png --zoo=model.yml squeezenet

ERRORS:

Missing parameter: 'mean'

Missing parameter: 'rgb'

加入参数--rgb 和 --mean=1

还是报错加入参数--mode

root@myd-jx8mp:/usr/share/OpenCV/samples/bin# ./example_dnn_classification --rgb --mean=1 --input=dog416.png --zoo=models.yml squeezenet

[WARN:0]global/usr/src/debug/opencv/4.4.0.imx-r0/git/modules/videoio/src/cap_gstreamer.cpp (898) open OpenCV | GStreamer warning: unable to query duration of stream

[WARN:0]global/usr/src/debug/opencv/4.4.0.imx-r0/git/modules/videoio/src/cap_gstreamer.cpp (935) open OpenCV | GStreamer warning: Cannot query video position: status=1, value=0, duration=-1

root@myd-jx8mp:/usr/share/OpenCV/samples/bin#./example_dnn_classification --rgb --mean=1 --input=dog416.png --zoo=models.yml squeezenet --mode

[WARN:0]global/usr/src/debug/opencv/4.4.0.imx-r0/git/modules/videoio/src/cap_gstreamer.cpp (898) open OpenCV | GStreamer warning: unable to query duration of stream

[WARN:0]global/usr/src/debug/opencv/4.4.0.imx-r0/git/modules/videoio/src/cap_gstreamer.cpp (935) open OpenCV | GStreamer warning: Cannot query video position: status=1, value=0, duration=-1

视频输入

./example_dnn_classification --device=2 --zoo=models.yml squeezenet

问题

如果testdata目录下没有文件,则查找下

lhj@DESKTOP-BINN7F8:~/myd-jx8mp-yocto$ find . -name "dog416.png"

./build-xwayland/tmp/work/cortexa53-crypto-mx8mp-poky-linux/opencv/4.4.0.imx-r0/extra/testdata/dnn/dog416.png

再将相应的文件复制到开发板

cd./build-xwayland/tmp/work/cortexa53-crypto-mx8mp-poky-linux/opencv/4.4.0.imx-r0/extra/testdata/

tar -cvf /mnt/e/dnn.tar ./dnn/

cd/usr/share/opencv4/testdata目录不存在则先创建

rz导入dnn.tar

解压tar -xvf dnn.tar

terminate calLEDafter throwing an instance of 'cv::Exception'

what():OpenCV(4.4.0)/usr/src/debug/opencv/4.4.0.imx-r0/git/samples/dnn/classification.cpperrorAssertion failed) !model.empty() in function 'main'

Aborted

lhj@DESKTOP-BINN7F8:~/myd-jx8mp-yocto/build-xwayland$ find . -name classification.cpp

lhj@DESKTOP-BINN7F8:~/myd-jx8mp-yocto/build-xwayland$ cp ./tmp/work/cortexa53-crypto-mx8mp-poky-linux/opencv/4.4.0.imx-r0/packages-split/opencv-src/usr/src/debug/opencv/4.4.0.imx-r0/git/samples/dnn/classification.cpp /mnt/e

lhj@DESKTOP-BINN7F8:~/myd-jx8mp-yocto/build-xwayland$

YOLO对象检测

cd /usr/share/OpenCV/samples/bin

./example_dnn_object_detection --width=1024 --height=1024 --scale=0.00392 --input=dog416.png --rgb --zoo=models.yml yolo

https://pjreddie.com/darknet/yolo/下载cfg和weights文件

cd/usr/share/OpenCV/samples/bin/

导入上面下载的文件

cp/usr/share/OpenCV/samples/data/dnn/object_detection_classes_yolov3.txt/usr/share/OpenCV/samples/bin/

cp/usr/share/opencv4/testdata/dnn/yolov3.cfg/usr/share/OpenCV/samples/bin/./example_dnn_object_detection --width=1024 --height=1024 --scale=0.00392 --input=dog416.png --rgb --zoo=models.yml yolo

OpenCV经典机器学

cd /usr/share/OpenCV/samples/bin

线性SVM

./example_tutorial_introduction_to_svm

非线性SVM

./example_tutorial_non_linear_svms

PCA分析

./example_tutorial_introduction_to_pca ../data/pca_test1.jpg

逻辑回归

./example_cpp_logistic_regression

声明：本文内容及配图由入驻作者撰写或者入驻合作网站授权转载。文章观点仅代表作者本人，不代表电子发烧友网立场。文章及其配图仅供工程师学习之用，如有内容侵权或者其他违规问题，请联系本站处理。举报投诉

嵌入式开发

嵌入式开发

+关注

关注
18

文章
1028

浏览量
47564

恩智浦i.MX 91生态合作伙伴最新开发板资源

恩智浦i.MX 91应用处理器系列能够快速实现基于Linux的经济高效的边缘应用程序。i.MX 91处理器在保持实惠价格的同时，融合了高性能处理

发表于 12-20 17:00 •301次阅读

【迅为电子】i.MX6UL和i.MX6ULL芯片区别与开发板对比

【迅为电子】i.MX6UL和i.MX6ULL芯片区别与开发板对比

发表于 11-28 14:31 •330次阅读

i.MX Linux开发实战指南—基于野火i.MX系列开发板

电子发烧友网站提供《i.MX Linux开发实战指南—基于野火i.MX系列开发板.pdf》资料免费下载

发表于 10-10 17:23 •11次下载

使用TPS6521825和LP873347 PMIC为NXP i.MX 8M Mini和Nano供电

电子发烧友网站提供《使用TPS6521825和LP873347 PMIC为NXP i.MX 8M Mini和Nano供电.pdf》资料免费下载

发表于 09-13 09:42 •0次下载

使用TPS65219为i.MX 8M Plus供电

电子发烧友网站提供《使用TPS65219为i.MX 8M Plus供电.pdf》资料免费下载

发表于 08-31 10:15 •0次下载

贸泽开售NXP Semiconductors i.MX 8ULP跨界应用处理器

8ULP跨界应用处理器。i.MX 8ULP通过EdgeLock®安全区域提供超低功耗处理功能和先进的集成安全性，可简化复杂的安全部署，在I

发表于 08-21 15:22 •308次阅读

i.MX 8M Plus中的ISP图像信号处理模块特性概述

应用处理器，是一款专注于机器学习和视觉、高级多媒体以及具有高可靠性的工业自动化解决方案，旨在满足智慧家庭、楼宇、城市和工业4.0等应用的需求。为了满足日益增长的图像和视觉处理的需求，i.MX

发表于 08-02 11:39 •1256次阅读

康佳特推出基于恩智浦i.MX 95系列处理器的新款SMARC模块

(COM)，扩展了基于低功耗恩智浦i.MX Arm处理器的模块产品组合。康佳特也因此加强了与恩智浦的紧密合作关系。客户将受益于标准模块的可扩展性和可靠的升级路径，以满足现有和新能效边缘 AI 应用的高安全性要求。在这些应用中，与上一代

发表于 07-16 14:55 •944次阅读

TSN时钟同步 | PTP对时案例演示——基于NXP i.MX 8M Plus

TLIMX8MP-EVM评估板的TSN时钟同步、PTP对时案例，创龙科技TLIMX8MP-EVM是基于NXP i.MX 8M

发表于 07-10 10:28

点击参与米尔NXP i.MX 93开发板有奖试用

米尔与NXP合作发布的新品基于NXPi.MX93应用处理器的MYD-LMX9X开发板免费试用活动来啦~~米尔提供了3块价值678元的MYD-

发表于 06-13 08:02 •527次阅读

免费！NXP i.MX 93开发板有奖试用

米尔与NXP合作发布的新品基于NXPi.MX93应用处理器的MYD-LMX9X开发板免费试用活动来啦~~米尔提供了2块价值678

发表于 05-23 08:01 •722次阅读

米尔NXP i.MX 93开发板限量6折！赋能入门级边缘处理市场

NXP在处理器板块耕耘多年，从早期的i.MX6→i.MX7→i.MX8，再到最新的i.MX9都已经有

发表于 04-29 08:01 •561次阅读

NXP系列-NXP i.MX 93核心板开发板-入门级嵌入式核心板-产品资料

MYC-LMX9X核心板及开发板NXP i.MX 93重新定义入门级嵌入式CPU模组2*Cortex-A55@1.7GHz+Cortex-M

发表于 04-23 11:07 •4次下载

重新定义入门级嵌入式处理器模组-米尔NXP i.MX 93核心板

，进一步提升了性能、资源利用和价格的平衡。其中i.MX 93处理器配备双核Cortex-A55@1.7 GHz+Cortex-M33@250MHz，兼顾多任务和实时性需求，集成0.5 TOPS

发表于 04-19 17:50 •511次阅读

适用于 NXP i.MX 8M Mini 的 TPS6521825 电源管理IC TPS6521825数据表

电子发烧友网站提供《适用于 NXP i.MX 8M Mini 的 TPS6521825 电源管理IC TPS6521825数据表.pdf》资料免费下载

发表于 03-01 09:09 •0次下载