0
  • 聊天消息
  • 系统消息
  • 评论与回复
登录后你可以
  • 下载海量资料
  • 学习在线课程
  • 观看技术视频
  • 写文章/发帖/加入社区
会员中心
创作中心

完善资料让更多小伙伴认识你,还能领取20积分哦,立即完善>

3天内不再提示

eBPF技术实践之virtio-net网卡队列可观测

Linux阅码场 来源:Linux阅码场 2024-11-14 11:18 次阅读

在系统领域中,最具挑战性的问题通常是组件之间的边界定位。其中,virtio-net前后端的定界尤为困难。当网络报文从内核发送到virtio-net后端,或者从virtio-net后端发送到内核时,这一路径难以进行观测。一些复杂的网络抖动问题很可能是由于网卡队列不正常工作引起的。为了解决这类问题,我们基于eBPF技术扩展了网卡队列的可观测能力,使得virtio网卡前后端的定界问题不再困扰。

virtio-net 前后端驱动简介

virtio-net (后面称为 virtio 网卡)通常由两个组件组成:virtio driver(也称为virtio前端)和virtio device(也称为virtio后端)。virtio前端运行在客户机的内核中,而virtio后端可以由宿主机的内核承担。virtio网卡通常支持多队列,包括发送队列和接收队列。每个队列通过三个 ring 来实现,即avail ring、used ring和desc ring。现在我们将重点介绍 virtio 网卡前端的报文发送和接收流程,以更好地理解整个工作流程。

virtio 网卡前端发送报文

virto网卡前端发送报文主要流程包括:

a.start_xmit:virtio网卡驱动的报文发送入口函数会首先清理已发送的报文,即通过调用free_old_xmit_skbs函数来释放描述符中的报文,直到avail->idx等于used->idx为止;

b.xmit_skb:主要是为报文添加vnet_hdr头部信息,并将skb以scatter-gather形式显示,以记录报文数据的地址和长度信息;

c.virtqueue_add_outbuf:进行DMA映射,将scatter-gather记录的报文数据地址和长度信息添加到desc环中,并增加avail->idx的值;

d.virtqueue_notify:当发送队列存在数据,则通知后端。

7aaa4c30-9069-11ef-a511-92fbcf53809c.png

virtio 网卡前端接收报文

virito网阿卡前端接收报文主要流程包括:

a.网卡硬中断:硬中断会将napi加入到CPU的处理队列,并启用中断抑制,以及触发软中断;

b.net_rx_action:网络软中断入口函数;

c.virtnet_poll:这个函数是virtio网卡的NAPI poll的回调函数。如果当前队列是发送队列,它将清理发送队列,也就是执行virtnet_poll_cleantx函数。如果当前队列是接收队列,它将进行报文的接收;

d.virtnet_receive:根据used->idx的值,从描述符环中读取报文数据,并更新last_used_idx。内核会为报文数据分配skb,并进入GRO流程,进行报文的合并;e.try_fill_recv:要给desc环添加空的内存区域,并增加avail->idx的值,以确保接收队列始终有可用的内存;

f.virtqueue_napi_complete:当接收的报文数量少于预定的budget(一般为64)时,表示没有更多的数据可以接收。这时,调用virtqueue_napi_complete来表示单次napi处理完毕。同时,通过virtqueue_enable_cb_prepare来关闭中断抑制。

7adc430c-9069-11ef-a511-92fbcf53809c.png

网卡队列可观测

经过前面的分析,我们了解到virtio网卡队列中的几个重要参数,即avail->idx、used->idx和last_used_idx。使用这些参数,我们可以清晰地了解网卡队列当前包含的报文数量,并进一步得到以下可观测指标:

a.发送队列报文数:表示尚未被virtio网卡后端发送的报文数量。计算方法是avail->idx - used->idx;

b.接收队列报文数:表示尚未被virtio网卡前端接收的报文数量。计算方法是used->idx - last_used_idx;

c.网卡队列的last_used_idx:表示virtio网卡后端处理报文的进度;

d.队列饱和度:表示当前网卡队列使用量,计算方法是队列报文数/队列长度。

工作原理

我们将可观测的代码集成在了rtrace的工具里,rtrace是龙蜥社区推出的系统工具集SysAK的一个网络诊断分析工具,关于rtrace的具体原理,我们将在下回分析,eBPF 具体代码请参考代码:

https://gitee.com/anolis/sysak/blob/opensource_branch_sync/source/tools/detect/net/rtrace/src/bpf/virtio.bpf.c

virtio 网卡队列指标采集的主要流程如下:

a.rtrace挂载eBPF采集程序到内核dev_id_show和dev_port_show函数;

b.rtrace周期性读取/sys/class/net/[interface]/dev_id和/sys/class/net/[interface]/dev_port两个文件,其中dev_id文件用来表示采集发送队列信息,dev_port文件用来表示采集接收队列信息;

c.当读取文件时,会触发内核执行dev_id_show和dev_port_show两个函数。由于已经挂载了eBPF采集程序,内核会先执行eBPF采集程序;

d.eBPF采集程序通过解析dev_id_show和dev_port_show入参struct net_device获取网卡队列vring,然后从vring中解析出avail idx、used idx、队列长度和last_used_idx;

e.将数据发送给rtrace做进一步处理。

7af93be2-9069-11ef-a511-92fbcf53809c.png

故障检测

下面是rtrace采集的网卡队列信息输出。

我们可以看到0926的1号发送队列的饱和度和last_used_idx分别是0.05%/3593,0928的1号发送队列的饱和度和last_used_idx分别是0.07%/3593,可以看到发送队列的饱和度在增加,但是last_used_idx在多个采集周期内保持不变。因此,可以确定1号发送队列出现了故障。

随后我们修复了1号发送队列故障,可以看见在0906的1号发送队列饱和度和last_used_idx分别是0.00%/3599,队列里面不再有驻留的报文,恢复了正常。

0924
SendQueue0.05%/35930.00%/8520.00%/45060.00%/16000.00%/4570.00%/5090.00%/31400.00%/13520.00%/3860.00%/4100.00%/17140.00%/17580.00%/16190.00%/4460.00%/35770.00%/24430.00%/460.00%/940.00%/2120.00%/2310.00%/1460.00%/1480.00%/2260.00%/640.00%/1090.00%/840.00%/780.00%/560.00%/870.00%/880.00%/850.00%/52
RecvQueue0.00%/28050.00%/132970.00%/4750.00%/3670.00%/123780.00%/1300.00%/2220.00%/111200.00%/3550.00%/30160.00%/1330.00%/1800.00%/129800.00%/103630.00%/28250.00%/6500.00%/1510.00%/5050.00%/51800.00%/2000.00%/266700.00%/1690.00%/10420.00%/98200.00%/95860.00%/33740.00%/2290.00%/14020.00%/87960.00%/1170.00%/3010.00%/275
0925
SendQueue0.05%/35930.00%/8520.00%/45060.00%/16000.00%/4570.00%/5090.00%/31400.00%/13520.00%/3860.00%/4100.00%/17140.00%/17580.00%/16190.00%/4460.00%/35770.00%/24440.00%/460.00%/940.00%/2120.00%/2310.00%/1460.00%/1480.00%/2260.00%/640.00%/1090.00%/840.00%/780.00%/560.00%/870.00%/890.00%/850.00%/52
RecvQueue0.00%/28050.00%/132970.00%/4750.00%/3670.00%/123780.00%/1300.00%/2220.00%/111200.00%/3550.00%/30160.00%/1330.00%/1800.00%/129800.00%/103630.00%/28250.00%/6500.00%/1510.00%/5050.00%/51800.00%/2000.00%/266700.00%/1690.00%/10420.00%/98200.00%/95860.00%/33740.00%/2290.00%/14020.00%/87960.00%/1170.00%/3030.00%/275
0926
SendQueue0.05%/35930.00%/8520.00%/45060.00%/16000.00%/4570.00%/5090.00%/31400.00%/13520.00%/3860.00%/4100.00%/17140.00%/17580.00%/16190.00%/4460.00%/35770.00%/24440.00%/460.00%/940.00%/2120.00%/2310.00%/1460.00%/1480.00%/2260.00%/640.00%/1090.00%/840.00%/780.00%/560.00%/870.00%/910.00%/850.00%/52
RecvQueue0.00%/28050.00%/132970.00%/4750.00%/3670.00%/123780.00%/1300.00%/2220.00%/111200.00%/3550.00%/30160.00%/1330.00%/1800.00%/129800.00%/103630.00%/28250.00%/6500.00%/1510.00%/5050.00%/51800.00%/2000.00%/266700.00%/1690.00%/10420.00%/98200.00%/95860.00%/33740.00%/2290.00%/14020.00%/87960.00%/1170.00%/3050.00%/275
0927
SendQueue0.07%/35930.00%/8520.00%/45060.00%/16000.00%/4570.00%/5090.00%/31400.00%/13520.00%/3860.00%/4100.00%/17140.00%/17580.00%/16190.00%/4460.00%/35770.00%/24440.00%/460.00%/940.00%/2120.00%/2310.00%/1460.00%/1480.00%/2260.00%/640.00%/1090.00%/840.00%/780.00%/560.00%/870.00%/930.00%/850.00%/52
RecvQueue0.00%/28050.00%/132980.00%/4750.00%/3670.00%/123780.00%/1300.00%/2220.00%/111200.00%/3550.00%/30160.00%/1330.00%/1800.00%/129800.00%/103630.00%/28250.00%/6500.00%/1510.00%/5050.00%/51800.00%/2000.00%/266700.00%/1690.00%/10420.00%/98200.00%/95860.00%/33740.00%/2290.00%/14020.00%/87960.00%/1170.00%/3070.00%/275
0928
SendQueue0.07%/35930.00%/8520.00%/45060.00%/16000.00%/4570.00%/5090.00%/31400.00%/13520.00%/3860.00%/4140.00%/17140.00%/17580.00%/16190.00%/4460.00%/35770.00%/24450.00%/460.00%/940.00%/2120.00%/2310.00%/1460.00%/1490.00%/2260.00%/640.00%/1090.00%/840.00%/780.00%/560.00%/870.00%/960.00%/870.00%/52
RecvQueue0.00%/28050.00%/132980.00%/4750.00%/3670.00%/123780.00%/1300.00%/2220.00%/111200.00%/3550.00%/30160.00%/1330.00%/1800.00%/129800.00%/103630.00%/28250.00%/6500.00%/1510.00%/5050.00%/51800.00%/2050.00%/266700.00%/1690.00%/10420.00%/98200.00%/95860.00%/33740.00%/2290.00%/14020.00%/87970.00%/1180.00%/3090.00%/275
0929
SendQueue0.07%/35930.00%/8520.00%/45060.00%/16000.00%/4570.00%/5090.00%/31400.00%/13520.00%/3860.00%/4140.00%/17140.00%/17580.00%/16190.00%/4460.00%/35770.00%/24450.00%/460.00%/940.00%/2120.00%/2310.00%/1460.00%/1490.00%/2260.00%/640.00%/1090.00%/840.00%/780.00%/560.00%/870.00%/980.00%/870.00%/52
RecvQueue0.00%/28050.00%/132980.00%/4750.00%/3670.00%/123780.00%/1300.00%/2220.00%/111200.00%/3550.00%/30160.00%/1330.00%/1800.00%/129800.00%/103630.00%/28250.00%/6500.00%/1510.00%/5050.00%/51800.00%/2050.00%/266700.00%/1690.00%/10420.00%/98200.00%/95860.00%/33740.00%/2290.00%/14020.00%/87970.00%/1180.00%/3110.00%/275
0930
SendQueue0.07%/35930.00%/8520.00%/45060.00%/16000.00%/4570.00%/5090.00%/31400.00%/13520.00%/3860.00%/4140.00%/17140.00%/17580.00%/16190.00%/4460.00%/35770.00%/24450.00%/460.00%/940.00%/2120.00%/2310.00%/1460.00%/1490.00%/2260.00%/640.00%/1090.00%/840.00%/780.00%/560.00%/870.00%/1000.00%/870.00%/52
RecvQueue0.00%/28050.00%/132980.00%/4750.00%/3670.00%/123780.00%/1300.00%/2220.00%/111200.00%/3550.00%/30160.00%/1330.00%/1800.00%/129800.00%/103630.00%/28250.00%/6500.00%/1510.00%/5050.00%/51800.00%/2050.00%/266700.00%/1690.00%/10420.00%/98200.00%/95860.00%/33740.00%/2290.00%/14020.00%/87970.00%/1180.00%/3130.00%/275
//...省略
0906
SendQueue0.00%/35990.00%/8560.00%/45110.00%/16020.00%/4650.00%/5100.00%/31400.00%/13520.00%/3860.00%/4200.00%/17160.00%/17660.00%/16190.00%/4480.00%/35780.00%/24510.00%/460.00%/940.00%/2120.00%/2310.00%/1480.00%/1490.00%/2260.00%/640.00%/1090.00%/850.00%/870.00%/560.00%/870.00%/1010.00%/1030.00%/52
RecvQueue0.00%/28070.00%/132990.00%/4770.00%/3690.00%/123780.00%/1400.00%/2230.00%/111200.00%/3550.00%/30320.00%/1420.00%/1800.00%/129800.00%/103630.00%/28250.00%/6520.00%/1510.00%/5050.00%/51800.00%/2050.00%/266700.00%/1700.00%/10570.00%/98200.00%/95860.00%/33740.00%/2300.00%/14140.00%/88000.00%/1180.00%/3270.00%/275

总结

在virtio网卡中,前端和后端之间通过共享的网卡队列进行通信。为了更好地理解和观测网卡队列的状态和性能指标,通过观测avail idx、used idx、last_used_idx等指标,我们可以对virtio网卡的性能进行评估和优化。同时,这些指标也为我们提供了对网卡队列状态的深入理解,有助于进行故障排查和性能调优。

声明:本文内容及配图由入驻作者撰写或者入驻合作网站授权转载。文章观点仅代表作者本人,不代表电子发烧友网立场。文章及其配图仅供工程师学习之用,如有内容侵权或者其他违规问题,请联系本站处理。 举报投诉
  • 数据
    +关注

    关注

    8

    文章

    7045

    浏览量

    89061
  • 网卡
    +关注

    关注

    4

    文章

    311

    浏览量

    27386
  • 程序
    +关注

    关注

    117

    文章

    3787

    浏览量

    81066

原文标题:eBPF 技术实践之 virtio-net 网卡队列可观测

文章出处:【微信号:LinuxDev,微信公众号:Linux阅码场】欢迎添加关注!文章转载请注明出处。

收藏 人收藏

    评论

    相关推荐

    Linux性能优化

    net.core.somaxconn=65535一个端口最大监听TCP连接队列的长度net.core.netdev_max_backlog=65535数据包速率比内核处理快时,送到队列
    的头像 发表于 12-06 10:15 206次阅读
    Linux<b class='flag-5'>之</b>性能优化

    华为云全栈可观测平台——9 月 10 月新功能特性

    数据等多维度可观测性数据源,提供应用资源统一管理、一站式可观测性分析和自动化运维方案,帮助用户及时发现故障,全面掌握应用、资源及业务的实时运行状况,提升企业海量运维的自动化能力和效率。 应用运维管理 (Application Operations Management,简
    的头像 发表于 12-01 16:12 383次阅读
    华为云全栈<b class='flag-5'>可观测</b>平台——9 月 10 月新功能特性

    JavaWeb消息队列使用指南

    在现代的JavaWeb应用中,消息队列(Message Queue)是一种常见的技术,用于异步处理任务、解耦系统组件、提高系统性能和可靠性。 1. 消息队列的基本概念 消息队列是一种应
    的头像 发表于 11-25 09:27 151次阅读

    4G模组UDP应用的最佳实践

    今天说的是4G模组UDP应用,展示最佳实践,送你参考。
    的头像 发表于 11-08 09:24 387次阅读
    4G模组<b class='flag-5'>之</b>UDP应用的最佳<b class='flag-5'>实践</b>!

    破局新生丨基调听云可观测性与应用安全技术研讨会在平潭圆满举办

    、制造、科技、能源、政企、互联网等多个行业的决策层领导,共同探讨可观测性与应用安全技术在企业数字化转型中的热点应用和实践。大会由基调听云销售VP李甜甜主持,她在介
    的头像 发表于 10-29 16:01 389次阅读
    破局新生丨基调听云<b class='flag-5'>可观测</b>性与应用安全<b class='flag-5'>技术</b>研讨会在平潭圆满举办

    【质量视角】可观测性背景下的质量保障思路

    目前质量团队正在积极建设和完善应用监控能力,旨在能及时发现并解决问题,为线上服务稳定性保驾护航。随着可观测性概念的逐渐普及,监控的建设也有了新的挑战和使命。本文将探讨在可观测性背景下,作为一个测试
    的头像 发表于 10-25 17:21 266次阅读
    【质量视角】<b class='flag-5'>可观测</b>性背景下的质量保障思路

    华为云发布全栈可观测平台 AOM,以 AI 赋能应用运维可观测

    应用可用性与稳定性。 该平台发布标志着华为云在推动数字化转型和智能化运维领域的又一重大突破,全栈可观测平台的推出不仅为企业提供了更加全面和深入的系统监控和数据分析能力,还通过集成先进的人工智能技术,实现了对复杂应用环境的实时优化和问题预警。 应用
    的头像 发表于 10-15 09:54 522次阅读
    华为云发布全栈<b class='flag-5'>可观测</b>平台 AOM,以 AI 赋能应用运维<b class='flag-5'>可观测</b>

    嵌入式环形队列与消息队列的实现原理

    嵌入式环形队列,也称为环形缓冲区或循环队列,是一种先进先出(FIFO)的数据结构,用于在固定大小的存储区域中高效地存储和访问数据。其主要特点包括固定大小的数组和两个指针(头指针和尾指针),分别指向队列的起始位置和结束位置。
    的头像 发表于 09-02 15:29 531次阅读

    玩转RT-Thread消息队列的应用

    在嵌入式系统开发中,实时处理串口和ADC数据是一项重要的任务。本文将介绍如何在RT-Thread实时操作系统中,利用消息队列来同时处理来自串口和ADC的数据。通过这种方法,我们能够高效地管理和处理
    的头像 发表于 07-23 08:11 619次阅读
    玩转RT-Thread<b class='flag-5'>之</b>消息<b class='flag-5'>队列</b>的应用

    Net5.5G,全球运营商的AI

    在AI时代飞翔,运营商的Net5.5G共识与实践
    的头像 发表于 07-05 16:12 642次阅读
    <b class='flag-5'>Net</b>5.5G,全球运营商的AI<b class='flag-5'>之</b>翼

    DataDog和Dynatrace缺席,观测云成为中国峰会的明星

    在亚马逊云科技中国峰会的圆满落幕之际,国内监控观测服务的佼佼者——观测云,以其在中国可观测性领域的杰出表现,荣获了中国峰会独家荣誉。尽管全球知名的监控观测品牌DataDog和Dynat
    的头像 发表于 06-04 17:13 425次阅读

    网卡揭秘:如何选择适合您需求的网卡

    基于以太网技术,随着网络技术的发展,光纤通信逐渐成为主流,光纤网卡因此诞生。光纤网卡通过光纤传输数据,相比传统铜线,具有更远的传输距离和更高的带宽。
    的头像 发表于 04-10 10:04 735次阅读
    光<b class='flag-5'>网卡</b>揭秘:如何选择适合您需求的<b class='flag-5'>网卡</b>

    MCU专属队列功能模块QueueForMcu应用

    当需要从队列头部获取多个数据,但又不希望数据从队列中删除时,可以使用 Queue_Peek_Array 函数来实现,该函数的参数与返回值与 Queue_Pop_Array 完全相同。
    发表于 03-20 11:44 516次阅读
    MCU专属<b class='flag-5'>队列</b>功能模块<b class='flag-5'>之</b>QueueForMcu应用

    eBPF动手实践系列三:基于原生libbpf库的eBPF编程改进方案简析

    在上一篇文章《eBPF动手实践系列二:构建基于纯C语言的eBPF项目》中,我们初步实现了脱离内核源码进行纯C语言eBPF项目的构建。libbpf库在早期和内核源码结合的比较紧密,如今的
    的头像 发表于 03-19 14:19 843次阅读
    <b class='flag-5'>eBPF</b>动手<b class='flag-5'>实践</b>系列三:基于原生libbpf库的<b class='flag-5'>eBPF</b>编程改进方案简析

    如何构建APISIX基于DeepFlow的统一可观测性能力呢?

    随着应用组件的可观测性逐渐受到重视,Apache APISIX 引入插件机制丰富了可观测数据源。
    的头像 发表于 01-18 10:11 1000次阅读
    如何构建APISIX基于DeepFlow的统一<b class='flag-5'>可观测</b>性能力呢?