0
  • 聊天消息
  • 系统消息
  • 评论与回复
登录后你可以
  • 下载海量资料
  • 学习在线课程
  • 观看技术视频
  • 写文章/发帖/加入社区
会员中心
创作中心

完善资料让更多小伙伴认识你,还能领取20积分哦,立即完善>

3天内不再提示

NPU和CPU对比运行速度有何不同?基于i.MX 8M Plus处理器的MYD-JX8MPQ开发板

米尔电子 2022-05-09 16:46 次阅读

参考

https://www.toradex.cn/blog/nxp-imx8ji-yueiq-kuang-jia-ce-shi-machine-learning

IMX-MACHINE-LEARNING-UG.pdf


CPU和NPU图像分类

cd /usr/bin/tensoRFlow-lite-2.4.0/examples

CPU运行

./label_image -m mobilenet_v1_1.0_224_quant.tflite -i grace_hopper.bmp -l labels.txt

INFO: Loaded model mobilenet_v1_1.0_224_quant.tflite

INFO: resolved reporter

INFO: invoked

INFO: averagetime:50.66ms

INFO: 0.780392: 653 military unIForm

INFO: 0.105882: 907 Windsor tie

INFO: 0.0156863: 458 bow tie

INFO: 0.0117647: 466 bulletproof vest

INFO: 0.00784314: 835 suit


GPU/NPU加速运行

./label_image -m mobilenet_v1_1.0_224_quant.tflite -i grace_hopper.bmp -l labels.txt-a 1

INFO: Loaded model mobilenet_v1_1.0_224_quant.tflite

INFO: resolved reporter

INFO: Created TensorFlow Lite delegate for NNAPI.

INFO: Applied NNAPI delegate.

INFO: invoked

INFO: average time:2.775ms

INFO: 0.768627: 653 military uniform

INFO: 0.105882: 907 Windsor tie

INFO: 0.0196078: 458 bow tie

INFO: 0.0117647: 466 bulletproof vest

INFO: 0.00784314: 835 suit

USE_GPU_INFERENCE=0./label_image -m mobilenet_v1_1.0_224_quant.tflite -i grace_hopper.bmp -l labels.txt--external_delegate_path=/usr/lib/libvx_delegate.so

Python运行

python3 label_image.py

INFO: Created TensorFlow Lite delegate for NNAPI.

Applied NNAPI delegate.

WARM-up time:6628.5ms

Inference time: 2.9 ms

0.870588: military uniform

0.031373: Windsor tie

0.011765: mortarboard

0.007843: bow tie

0.007843: bulletproof vest


基准测试CPU单核运行

./benchmark_model --graph=mobilenet_v1_1.0_224_quant.tflite

STARTING!

Log parameter values verbosely: [0]

Graph: [mobilenet_v1_1.0_224_quant.tflite]

Loaded model mobilenet_v1_1.0_224_quant.tflite

The input model file size (MB): 4.27635

Initialized session in 15.076ms.

Running benchmark for at least 1 iterations and at least 0.5 seconds but terminate if exceeding 150 seconds.

count=4 first=166743 curr=161124 min=161054 max=166743avg=162728std=2347

Running benchmark for at least 50 iterations and at least 1 seconds but terminate if exceeding 150 seconds.

count=50 first=161039 curr=161030 min=160877 max=161292 avg=161039std=94

Inference timings in us: Init: 15076, First inference: 166743, Warmup (avg):162728, Inference (avg):161039

Note: as the benchmark tool itself affects memory footprint, the following is only APPROXIMATE to the actual memory footprint of the model at runtime. Take the information at your discretion.

Peak memory footprint (MB): init=2.65234 overall=9.00391

CPU多核运行

./benchmark_model --graph=mobilenet_v1_1.0_224_quant.tflite --num_threads=4

4核--num_threads设置为4性能最好

STARTING!

Log parameter values verbosely: [0]

Num threads: [4]

Graph: [mobilenet_v1_1.0_224_quant.tflite]

#threads used for CPU inference: [4]

Loaded model mobilenet_v1_1.0_224_quant.tflite

The input model file size (MB): 4.27635

Initialized session in 2.536ms.

Running benchmark for at least 1 iterations and at least 0.5 seconds but terminate if exceeding 150 seconds.

count=11 first=48722 curr=44756 min=44597 max=49397 avg=45518.9 std=1679

Running benchmark for at least 50 iterations and at least 1 seconds but terminate if exceeding 150 seconds.

count=50 first=44678 curr=44591 min=44590 max=50798avg=44965.2std=1170

Inference timings in us: Init: 2536, First inference: 48722, Warmup (avg):45518.9, Inference (avg):44965.2

Note: as the benchmark tool itself affects memory footprint, the following is only APPROXIMATE to the actual memory footprint of the model at runtime. Take the information at your discretion.

Peak memory footprint (MB): init=1.38281 overall=8.69922

GPU/NPU加速

./benchmark_model --graph=mobilenet_v1_1.0_224_quant.tflite --num_threads=4 --use_nnapi=true

STARTING!

Log parameter values verbosely: [0]

Num threads: [4]

Graph: [mobilenet_v1_1.0_224_quant.tflite]

#threads used for CPU inference: [4]

Use NNAPI: [1]

NNAPI accelerators available: [vsi-npu]

Loaded model mobilenet_v1_1.0_224_quant.tflite

INFO: Created TensorFlow Lite delegate for NNAPI.

Explicitly applied NNAPI delegate, and the model graph will be completely executed by the delegate.

The input model file size (MB): 4.27635

Initialized session in 3.968ms.

Running benchmark for at least 1 iterations and at least 0.5 seconds but terminate if exceeding 150 seconds.

count=1 curr=6611085

Running benchmark for at least 50 iterations and at least 1 seconds but terminate if exceeding 150 seconds.

count=369 first=2715 curr=2623 min=2572 max=2776avg=2634.2std=20

Inference timings in us: Init: 3968, First inference: 6611085, Warmup (avg): 6.61108e+06, Inference (avg): 2634.2

Note: as the benchmark tool itself affects memory footprint, the following is only APPROXIMATE to the actual memory footprint of the model at runtime. Take the information at your discretion.

Peak memory footprint (MB): init=2.42188 overall=28.4062

结果对比

CPU运行CPU多核多线程NPU加速
图像分类50.66 ms2.775 ms
基准测试161039uS44965.2uS2634.2uS

OpenCV DNN

cd /usr/share/OpenCV/samples/bin

./example_dnn_classification --input=dog416.png --zoo=models.yml squeezenet

下载模型

cd /usr/share/opencv4/testdata/dnn/

python3 download_models_basic.py

图像分类

cd /usr/share/OpenCV/samples/bin

./example_dnn_classification --input=dog416.png --zoo=models.yml squeezenet

e2a1f644-c70d-11ec-8521-dac502259ad0.jpg


文件浏览器地址栏输入

ftp://ftp.toradex.cn/Linux/i.MX8/eIQ/OpenCV/Image_Classification.zip

下载文件

解压得到文件models.yml和squeezenet_v1.1.caffemodel

cd /usr/share/OpenCV/samples/bin

将文件导入到开发板的/usr/share/OpenCV/samples/bin目录下

$cp/usr/share/opencv4/testdata/dnn/dog416.png /usr/share/OpenCV/samples/bin/
$cp/usr/share/opencv4/testdata/dnn/squeezenet_v1.1.prototxt /usr/share/OpenCV/samples/bin/
$cp/usr/share/OpenCV/samples/data/dnn/classification_classes_ILSVRC2012.txt /usr/share/OpenCV/samples/bin/
$ cd /usr/share/OpenCV/samples/bin/

图片输入

./example_dnn_classification --input=dog416.png --zoo=models.yml squeezenet

报错

root@myd-jx8mp:/usr/share/OpenCV/samples/bin# ./example_dnn_classification --input=dog416.png --zoo=model.yml squeezenet

ERRORS:

Missing parameter: 'mean'

Missing parameter: 'rgb'

加入参数--rgb 和 --mean=1

还是报错加入参数--mode

root@myd-jx8mp:/usr/share/OpenCV/samples/bin# ./example_dnn_classification --rgb --mean=1 --input=dog416.png --zoo=models.yml squeezenet

[WARN:0]global/usr/src/debug/opencv/4.4.0.imx-r0/git/modules/videoio/src/cap_gstreamer.cpp (898) open OpenCV | GStreamer warning: unable to query duration of stream

[WARN:0]global/usr/src/debug/opencv/4.4.0.imx-r0/git/modules/videoio/src/cap_gstreamer.cpp (935) open OpenCV | GStreamer warning: Cannot query video position: status=1, value=0, duration=-1

root@myd-jx8mp:/usr/share/OpenCV/samples/bin#./example_dnn_classification --rgb --mean=1 --input=dog416.png --zoo=models.yml squeezenet --mode

[WARN:0]global/usr/src/debug/opencv/4.4.0.imx-r0/git/modules/videoio/src/cap_gstreamer.cpp (898) open OpenCV | GStreamer warning: unable to query duration of stream

[WARN:0]global/usr/src/debug/opencv/4.4.0.imx-r0/git/modules/videoio/src/cap_gstreamer.cpp (935) open OpenCV | GStreamer warning: Cannot query video position: status=1, value=0, duration=-1

视频输入

./example_dnn_classification --device=2 --zoo=models.yml squeezenet

问题

如果testdata目录下没有文件,则查找下

lhj@DESKTOP-BINN7F8:~/myd-jx8mp-yocto$ find . -name "dog416.png"

./build-xwayland/tmp/work/cortexa53-crypto-mx8mp-poky-linux/opencv/4.4.0.imx-r0/extra/testdata/dnn/dog416.png

再将相应的文件复制到开发板

cd./build-xwayland/tmp/work/cortexa53-crypto-mx8mp-poky-linux/opencv/4.4.0.imx-r0/extra/testdata/

tar -cvf /mnt/e/dnn.tar ./dnn/

cd/usr/share/opencv4/testdata目录不存在则先创建

rz导入dnn.tar

解压tar -xvf dnn.tar

terminate calLEDafter throwing an instance of 'cv::Exception'

what():OpenCV(4.4.0)/usr/src/debug/opencv/4.4.0.imx-r0/git/samples/dnn/classification.cpperrorAssertion failed) !model.empty() in function 'main'

Aborted

lhj@DESKTOP-BINN7F8:~/myd-jx8mp-yocto/build-xwayland$ find . -name classification.cpp

lhj@DESKTOP-BINN7F8:~/myd-jx8mp-yocto/build-xwayland$ cp ./tmp/work/cortexa53-crypto-mx8mp-poky-linux/opencv/4.4.0.imx-r0/packages-split/opencv-src/usr/src/debug/opencv/4.4.0.imx-r0/git/samples/dnn/classification.cpp /mnt/e

lhj@DESKTOP-BINN7F8:~/myd-jx8mp-yocto/build-xwayland$

YOLO对象检测

cd /usr/share/OpenCV/samples/bin

./example_dnn_object_detection --width=1024 --height=1024 --scale=0.00392 --input=dog416.png --rgb --zoo=models.yml yolo

e2ba8f74-c70d-11ec-8521-dac502259ad0.jpg


https://pjreddie.com/darknet/yolo/下载cfg和weights文件

cd/usr/share/OpenCV/samples/bin/

导入上面下载的文件

cp/usr/share/OpenCV/samples/data/dnn/object_detection_classes_yolov3.txt/usr/share/OpenCV/samples/bin/

cp/usr/share/opencv4/testdata/dnn/yolov3.cfg/usr/share/OpenCV/samples/bin/./example_dnn_object_detection --width=1024 --height=1024 --scale=0.00392 --input=dog416.png --rgb --zoo=models.yml yolo

OpenCV经典机器学

cd /usr/share/OpenCV/samples/bin

线性SVM

./example_tutorial_introduction_to_svm

e2d1263a-c70d-11ec-8521-dac502259ad0.jpg

非线性SVM

./example_tutorial_non_linear_svms

e2e33c80-c70d-11ec-8521-dac502259ad0.jpg

PCA分析

./example_tutorial_introduction_to_pca ../data/pca_test1.jpg

e2fa2152-c70d-11ec-8521-dac502259ad0.jpg

逻辑回归

./example_cpp_logistic_regression

e310c22c-c70d-11ec-8521-dac502259ad0.jpg

e323f9c8-c70d-11ec-8521-dac502259ad0.jpg

e3371f58-c70d-11ec-8521-dac502259ad0.jpg

声明:本文内容及配图由入驻作者撰写或者入驻合作网站授权转载。文章观点仅代表作者本人,不代表电子发烧友网立场。文章及其配图仅供工程师学习之用,如有内容侵权或者其他违规问题,请联系本站处理。 举报投诉
  • 嵌入式开发
    +关注

    关注

    18

    文章

    989

    浏览量

    47163
收藏 人收藏

    评论

    相关推荐

    点击参与米尔NXP i.MX 93开发板有奖试用

    米尔与NXP合作发布的新品基于NXPi.MX93应用处理器MYD-LMX9X开发板免费试用活动来啦~~米尔提供了3块价值678元的MYD-
    的头像 发表于 06-13 08:02 148次阅读
    点击参与米尔NXP <b class='flag-5'>i.MX</b> 93<b class='flag-5'>开发板</b>有奖试用

    免费!NXP i.MX 93开发板有奖试用

    米尔与NXP合作发布的新品基于NXPi.MX93应用处理器MYD-LMX9X开发板免费试用活动来啦~~米尔提供了2块价值678
    的头像 发表于 05-23 08:01 223次阅读
    免费!NXP <b class='flag-5'>i.MX</b> 93<b class='flag-5'>开发板</b>有奖试用

    NXP系列-NXP i.MX 93核心开发板-入门级嵌入式核心-产品资料

    MYC-LMX9X核心开发板NXP i.MX 93重新定义入门级嵌入式CPU模组2*Cortex-A55@1.7GHz+Cortex-M
    发表于 04-23 11:07 0次下载

    适用于 NXP i.MX 8M Mini 的 TPS6521825 电源管理IC TPS6521825数据表

    电子发烧友网站提供《适用于 NXP i.MX 8M Mini 的 TPS6521825 电源管理IC TPS6521825数据表.pdf》资料免费下载
    发表于 03-01 09:09 0次下载
    适用于 NXP <b class='flag-5'>i.MX</b> <b class='flag-5'>8M</b> Mini 的 TPS6521825 电源管理IC TPS6521825数据表

    详解i.MX 8ULP应用处理器:高能效、低功耗的秘诀是什么?

    处理器处理语音命令、音频播放、图形显示以及系统控制等各种任务。电源管理和信息安全方面的创新使这些芯片能够始终保持工作,按需提供交互,同时大幅度降低能耗。 i.MX 8ULP
    的头像 发表于 12-15 16:25 2.6w次阅读
    详解<b class='flag-5'>i.MX</b> <b class='flag-5'>8</b>ULP应用<b class='flag-5'>处理器</b>:高能效、低功耗的秘诀是什么?

    成功案例 | 基于i.MX 8M Plus的工业级平板,打造全场景储能管理新模式!

    和系统的稳定运行。 基于恩智浦的 i.MX 8M Plus 系列应用处理器,启扬智能开发出了工业
    的头像 发表于 12-08 12:20 401次阅读
    成功案例 | 基于<b class='flag-5'>i.MX</b> <b class='flag-5'>8M</b> <b class='flag-5'>Plus</b>的工业级平板,打造全场景储能管理新模式!

    【LuckFox Pico Plus开发板免费试用】LuckFox Pico Plus开发板测评(一)

    能力高达 0.5TOPs。。 LuckFox Pico Plus核心基础参数如下: 处理器 Cortex A7 1.2GHz 神经网络处理器(N
    发表于 10-22 22:40

    标准系统:OSWare 大牛-8M Mini

    开发板名称(芯片型号) OSWare 大牛-8M Mini 芯片架构 CPU频率 介绍(字数请控制在200字以内) NXP i.MX 8M
    发表于 10-19 10:53

    8PEdge AI SBC将恩智浦 i.MX 8M Plus SoC与Hailo-8 AI加速相结合

    8PEdge AI Pico-ITX SBC将NXP i.MX 8M Plus处理器(本身
    的头像 发表于 10-17 11:49 661次阅读
    蜂<b class='flag-5'>板</b><b class='flag-5'>8</b>PEdge AI SBC将恩智浦 <b class='flag-5'>i.MX</b> <b class='flag-5'>8M</b> <b class='flag-5'>Plus</b> SoC与Hailo-<b class='flag-5'>8</b> AI加速<b class='flag-5'>器</b>相结合

    NXP开发板哪些?基于NXP i.MX 6UL、i.MX 8M Mini、i.MX 8M Plus、LS1028A的开发板概述

    、可扩展性、计算性能、安全性的产品,满足客户多样化的开发需求。 赋能工业边缘计算 i.MX 8M Plus开发板 米尔
    的头像 发表于 09-15 09:15 964次阅读
    NXP<b class='flag-5'>开发板</b><b class='flag-5'>有</b>哪些?基于NXP <b class='flag-5'>i.MX</b> 6UL、<b class='flag-5'>i.MX</b> <b class='flag-5'>8M</b> Mini、<b class='flag-5'>i.MX</b> <b class='flag-5'>8M</b> <b class='flag-5'>Plus</b>、LS1028A的<b class='flag-5'>开发板</b>概述

    迅为i.MX8M Mini开发板一些功能

    01、性能强:i.MX8MM处理器采用了先进的14LPCFinFET工艺,提供更快的速度和更高的电源效率;四核Cortex-A53,单核Cortex-M4,多达五个内核,主频高达1.8
    发表于 09-05 10:30

    恩智浦超低功耗i.MX 8ULP Sensor Hub参考案例

    大家介绍一款基于i.MX 8ULP超低功耗微处理器的Sensor Hub参考案例。该方案利用i.MX 8ULP评估
    的头像 发表于 08-18 08:05 592次阅读

    移植PROFINET用于i.MX 8M迷你Cortex-M

    电子发烧友网站提供《移植PROFINET用于i.MX 8M迷你Cortex-M.pdf》资料免费下载
    发表于 08-17 14:30 10次下载
    移植PROFINET用于<b class='flag-5'>i.MX</b> <b class='flag-5'>8M</b>迷你Cortex-<b class='flag-5'>M</b>

    i.MX 8M加标称驱动模式

    电子发烧友网站提供《i.MX 8M加标称驱动模式.pdf》资料免费下载
    发表于 08-17 14:18 0次下载
    <b class='flag-5'>i.MX</b> <b class='flag-5'>8M</b>加标称驱动模式

    8MM处理器的Cortex-M内核上使用轻量级TCP/IP i.MX

    电子发烧友网站提供《在8MM处理器的Cortex-M内核上使用轻量级TCP/IP i.MX.pdf》资料免费下载
    发表于 08-17 10:30 0次下载
    在<b class='flag-5'>8</b>MM<b class='flag-5'>处理器</b>的Cortex-<b class='flag-5'>M</b>内核上使用轻量级TCP/IP <b class='flag-5'>i.MX</b>