译者注
笔者在MacBook M2上搭建Linux虚拟机上开发eBPF程序时,遇到一些LSM eBPF类型程序无法运行的问题,哪怕是5.15内核的ubuntu server,依旧无法正常运行。显然,aarch64跟x86_64的内核功能有差异。在笔者尝试定位这些差异时,看到这篇文章,可以让大家更直观地了解LSM eBPF在两种CPU 内核上的差异。
原文本博客文章是我们在Linux中对于`aarch64`上`BPF LSM`支持的内部研究的摘要。如果你对内核代码库不熟悉,要开始查看内核源码是非常困难的,因此我们决定发布这篇文章,展示我们的方法,因为这对于想要探索内核内部的任何人都可能有所帮助。
简介
在x86_64上,我们已经在使用BPF LSM,而在aarch64上,我们依赖于Kprobes,因此我们想知道内核中缺少了哪些功能,才能让这些功能在aarch64上可用。
我们曾多次深入研究内核源代码,但通常我们搜索的是已经存在的东西,以了解其工作原理。但在这种情况下,我们在寻找的是不存在的东西,我们追寻的是那些因为未实现而返回错误的内容。
回想起Steven Rostedt关于如何开始学习Linux内核的讲话,我们从ftrace(以及构建在跟踪基础设施上的工具)开始,以了解当我们将一个不受支持的BPF程序加载到内核时会发生什么。
问题
这是当我们尝试将一个BPF LSM程序加载到aarch64 5.15 Linux内核时,使用我们的软件pulsar[2]时的输出:
root@pine64-1:/home/exein#./pulsar-enterprise-execpulsard [2023-02-16T1445ZINFOpulsar::daemon]Startingmoduleprocess-monitor [2023-02-16T1445ZINFOpulsar::daemon]Startingmodulefile-system-monitor [2023-02-16T1446ZINFOpulsar::daemon]Startingmodulenetwork-monitor [2023-02-16T1446ZINFOpulsar::daemon]Startingmodulelogger [2023-02-16T1446ZINFOpulsar::daemon]Startingmodulerules-engine [2023-02-16T1446ZINFOpulsar::daemon]Startingmoduledesktop-notifier [2023-02-16T1446ZERRORpulsar::module_manager]Moduleerrorinfile-system-monitor:failedprogramattachlsmpath_mknod Causedby: 0:`bpf_raw_tracepoint_open`failed 1:Noerrorinformation(oserror524) [2023-02-16T1446ZINFOpulsar::daemon]Startingmoduleanomaly-detection [2023-02-16T1446ZINFOpulsar::daemon]Startingmodulemalware-detection [2023-02-16T1446ZERRORpulsar::module_manager]Moduleerrorinmalware-detection:/var/lib/pulsar/malware_detection/models/parameters.jsonnotfound [2023-02-16T1446ZINFOpulsar::daemon]Startingmoduleplatform-connector [2023-02-16T1446ZINFOplatform_connector::client]Connectedtohttps://platform-dev-instance.exein.io:8001/ [2023-02-16T1446ZINFOpulsar::daemon]Startingmodulethreat-response [2023-02-16T1446ZERRORpulsar::module_manager]Moduleerrorinnetwork-monitor:failedprogramattachlsmsocket_bind Causedby: 0:`bpf_raw_tracepoint_open`failed 1:Noerrorinformation(oserror524)
我们在尝试加载与path_mknodLSM挂钩相关的BPF程序时,pulsar出现了错误524或ENOTSUPP。让我们尝试深入研究这个问题。
注意: 在进行这项研究时,我们当时无法找到预先编译为启用BPF和BTF的aarch64,因此我们不得不编译一个自定义内核。我们还启用了跟踪选项和function_graph插件,以使用下面的工具。
所有的实验都是在一台装有定制Armbian[3]镜像的Pine A64上进行的。
这些镜像具有带有标准Ubuntu 22.04 LTS Jammy用户空间的自定义内核。
工具
为了调查这个问题,我们使用了以下工具:
bpftrace[4]:基于BPF的工具,使用自定义类C语言动态附加探针。
trace-cmd[5]:围绕tracefs文件系统的包装器,与ftrace基础设施交互。
要使用这些工具,您需要在Linux内核中启用一些选项,请查阅官方文档获取完整的要求。
注意: 也可以使用其他工具来完成相同的工作,例如perf-tools[6]中的funcgraph和kprobe。
Linux 5.15
现在我们开始使用这些工具来查看在内核5.15中尝试加载我们的BPF程序时会发生什么。
从这一点开始到本文末尾,我们将使用probe二进制文件代替pulsar,因为它更简单。为了简要概括其工作原理,以下是命令行帮助:
exein@pine64-1:~$./probe TestrunnerforeBPFprograms Usage:probe[OPTIONS]Commands: file-system-monitorWatchfilecreations process-monitorWatchprocessevents(fork/exec/exit) network-monitorWatchnetworkevents helpPrintthismessageorthehelpofthegivensubcommand(s) Options: -v,--verbose -h,--helpPrinthelp -V,--versionPrintversion
在这些示例中,我们将尝试加载file-system-monitor探针。
通过运行以下命令,我们可以看到__sys_bpf函数的函数图调用,这是BPF系统调用的入口点:
trace-cmdrecord-pfunction_graph-g__sys_bpf./probefile-system-monitor trace-cmdreport
输出是一个非常庞大的函数图,太大了,无法在这里粘贴。由于我们遇到了错误,我们对程序停止前的最后几个函数感兴趣。以下是trace-cmd report输出的最后几行:
... tokio-runtime-w-1666[003]1318.058019:funcgraph_entry:|bpf_trampoline_link_prog(){ tokio-runtime-w-1666[003]1318.058020:funcgraph_entry:2.292us|bpf_attach_type_to_tramp(); tokio-runtime-w-1666[003]1318.058024:funcgraph_entry:1.250us|mutex_lock(); tokio-runtime-w-1666[003]1318.058028:funcgraph_entry:|bpf_trampoline_update(){ tokio-runtime-w-1666[003]1318.058030:funcgraph_entry:|kmem_cache_alloc_trace(){ tokio-runtime-w-1666[003]1318.058031:funcgraph_entry:1.167us|should_failslab(); tokio-runtime-w-1666[003]1318.058036:funcgraph_exit:6.792us|} tokio-runtime-w-1666[003]1318.058039:funcgraph_entry:|kmem_cache_alloc_trace(){ tokio-runtime-w-1666[003]1318.058042:funcgraph_entry:2.750us|should_failslab(); tokio-runtime-w-1666[003]1318.058046:funcgraph_exit:6.417us|} tokio-runtime-w-1666[003]1318.058048:funcgraph_entry:2.708us|bpf_jit_charge_modmem(); tokio-runtime-w-1666[003]1318.058053:funcgraph_entry:|bpf_jit_alloc_exec_page(){ tokio-runtime-w-1666[003]1318.058055:funcgraph_entry:|bpf_jit_alloc_exec(){ tokio-runtime-w-1666[003]1318.058057:funcgraph_entry:|vmalloc(){ tokio-runtime-w-1666[003]1318.058059:funcgraph_entry:|__vmalloc_node(){ tokio-runtime-w-1666[003]1318.058061:funcgraph_entry:|__vmalloc_node_range(){ tokio-runtime-w-1666[003]1318.058064:funcgraph_entry:|__get_vm_area_node.constprop.64(){ tokio-runtime-w-1666[003]1318.058067:funcgraph_entry:|kmem_cache_alloc_node_trace(){ tokio-runtime-w-1666[003]1318.058069:funcgraph_entry:1.459us|should_failslab(); tokio-runtime-w-1666[003]1318.058073:funcgraph_exit:6.292us|} tokio-runtime-w-1666[003]1318.058075:funcgraph_entry:|alloc_vmap_area(){ tokio-runtime-w-1666[003]1318.058077:funcgraph_entry:|kmem_cache_alloc_node(){ tokio-runtime-w-1666[003]1318.058079:funcgraph_entry:1.167us|should_failslab(); tokio-runtime-w-1666[003]1318.058085:funcgraph_exit:7.625us|} tokio-runtime-w-1666[003]1318.058088:funcgraph_entry:|kmem_cache_alloc_node(){ tokio-runtime-w-1666[003]1318.058089:funcgraph_entry:1.208us|should_failslab(); tokio-runtime-w-1666[003]1318.058092:funcgraph_exit:4.584us|} tokio-runtime-w-1666[003]1318.058104:funcgraph_entry:|kmem_cache_free(){ tokio-runtime-w-1666[003]1318.058107:funcgraph_entry:2.084us|__slab_free(); tokio-runtime-w-1666[003]1318.058110:funcgraph_exit:5.667us|} tokio-runtime-w-1666[003]1318.058112:funcgraph_entry:6.375us|insert_vmap_area.constprop.74(); tokio-runtime-w-1666[003]1318.058119:funcgraph_exit:+44.667us|} tokio-runtime-w-1666[003]1318.058122:funcgraph_exit:+58.250us|} tokio-runtime-w-1666[003]1318.058124:funcgraph_entry:|__kmalloc_node(){ tokio-runtime-w-1666[003]1318.058125:funcgraph_entry:1.625us|kmalloc_slab(); tokio-runtime-w-1666[003]1318.058128:funcgraph_entry:1.167us|should_failslab(); tokio-runtime-w-1666[003]1318.058131:funcgraph_exit:7.208us|} tokio-runtime-w-1666[003]1318.058133:funcgraph_entry:|alloc_pages(){ tokio-runtime-w-1666[003]1318.058135:funcgraph_entry:1.583us|get_task_policy.part.48(); tokio-runtime-w-1666[003]1318.058138:funcgraph_entry:1.500us|policy_node(); tokio-runtime-w-1666[003]1318.058141:funcgraph_entry:1.209us|policy_nodemask(); tokio-runtime-w-1666[003]1318.058143:funcgraph_entry:|__alloc_pages(){ tokio-runtime-w-1666[003]1318.058145:funcgraph_entry:1.458us|should_fail_alloc_page(); tokio-runtime-w-1666[003]1318.058147:funcgraph_entry:|get_page_from_freelist(){ tokio-runtime-w-1666[003]1318.058150:funcgraph_entry:1.583us|prep_new_page(); tokio-runtime-w-1666[003]1318.058153:funcgraph_exit:5.459us|} tokio-runtime-w-1666[003]1318.058154:funcgraph_exit:+10.542us|} tokio-runtime-w-1666[003]1318.058155:funcgraph_exit:+22.083us|} tokio-runtime-w-1666[003]1318.058157:funcgraph_entry:|__cond_resched(){ tokio-runtime-w-1666[003]1318.058158:funcgraph_entry:1.833us|rcu_all_qs(); tokio-runtime-w-1666[003]1318.058161:funcgraph_exit:4.167us|} tokio-runtime-w-1666[003]1318.058166:funcgraph_entry:5.542us|vmap_pages_range_noflush(); tokio-runtime-w-1666[003]1318.058173:funcgraph_exit:!112.375us|} tokio-runtime-w-1666[003]1318.058175:funcgraph_exit:!116.000us|} tokio-runtime-w-1666[003]1318.058176:funcgraph_exit:!119.292us|} tokio-runtime-w-1666[003]1318.058177:funcgraph_exit:!122.542us|} tokio-runtime-w-1666[003]1318.058179:funcgraph_entry:|find_vm_area(){ tokio-runtime-w-1666[003]1318.058180:funcgraph_entry:1.375us|find_vmap_area(); tokio-runtime-w-1666[003]1318.058183:funcgraph_exit:4.333us|} tokio-runtime-w-1666[003]1318.058185:funcgraph_entry:|set_memory_x(){ tokio-runtime-w-1666[003]1318.058186:funcgraph_entry:|change_memory_common(){ tokio-runtime-w-1666[003]1318.058188:funcgraph_entry:|find_vm_area(){ tokio-runtime-w-1666[003]1318.058189:funcgraph_entry:1.333us|find_vmap_area(); tokio-runtime-w-1666[003]1318.058192:funcgraph_exit:3.875us|} tokio-runtime-w-1666[003]1318.058193:funcgraph_entry:|vm_unmap_aliases(){ tokio-runtime-w-1666[003]1318.058194:funcgraph_entry:|_vm_unmap_aliases.part.58(){ tokio-runtime-w-1666[003]1318.058196:funcgraph_entry:1.542us|rcu_read_unlock_strict(); tokio-runtime-w-1666[003]1318.058199:funcgraph_entry:1.208us|rcu_read_unlock_strict(); tokio-runtime-w-1666[003]1318.058202:funcgraph_entry:1.166us|rcu_read_unlock_strict(); tokio-runtime-w-1666[003]1318.058205:funcgraph_entry:1.208us|rcu_read_unlock_strict(); tokio-runtime-w-1666[003]1318.058207:funcgraph_entry:1.208us|mutex_lock(); tokio-runtime-w-1666[003]1318.058210:funcgraph_entry:|purge_fragmented_blocks_allcpus(){ tokio-runtime-w-1666[003]1318.058212:funcgraph_entry:1.500us|rcu_read_unlock_strict(); tokio-runtime-w-1666[003]1318.058214:funcgraph_entry:1.500us|rcu_read_unlock_strict(); tokio-runtime-w-1666[003]1318.058217:funcgraph_entry:1.500us|rcu_read_unlock_strict(); tokio-runtime-w-1666[003]1318.058220:funcgraph_entry:1.167us|rcu_read_unlock_strict(); tokio-runtime-w-1666[003]1318.058222:funcgraph_exit:+11.917us|} tokio-runtime-w-1666[003]1318.058224:funcgraph_entry:|__purge_vmap_area_lazy(){ tokio-runtime-w-1666[003]1318.058232:funcgraph_entry:|kmem_cache_free(){ tokio-runtime-w-1666[003]1318.058234:funcgraph_entry:1.250us|__slab_free(); tokio-runtime-w-1666[003]1318.058237:funcgraph_exit:4.791us|} tokio-runtime-w-1666[003]1318.058241:funcgraph_entry:1.209us|__cond_resched_lock(); tokio-runtime-w-1666[003]1318.058244:funcgraph_exit:+19.625us|} tokio-runtime-w-1666[003]1318.058245:funcgraph_entry:1.167us|mutex_unlock(); tokio-runtime-w-1666[003]1318.058247:funcgraph_exit:+53.042us|} tokio-runtime-w-1666[003]1318.058248:funcgraph_exit:+55.625us|} tokio-runtime-w-1666[003]1318.058250:funcgraph_entry:|__change_memory_common(){ tokio-runtime-w-1666[003]1318.058251:funcgraph_entry:|apply_to_page_range(){ tokio-runtime-w-1666[003]1318.058253:funcgraph_entry:|__apply_to_page_range(){ tokio-runtime-w-1666[003]1318.058255:funcgraph_entry:1.250us|pud_huge(); tokio-runtime-w-1666[003]1318.058258:funcgraph_entry:1.166us|pmd_huge(); tokio-runtime-w-1666[003]1318.058260:funcgraph_entry:1.208us|change_page_range(); tokio-runtime-w-1666[003]1318.058263:funcgraph_exit:9.834us|} tokio-runtime-w-1666[003]1318.058264:funcgraph_exit:+12.709us|} tokio-runtime-w-1666[003]1318.058266:funcgraph_exit:+15.459us|} tokio-runtime-w-1666[003]1318.058268:funcgraph_exit:+80.791us|} tokio-runtime-w-1666[003]1318.058270:funcgraph_exit:+84.834us|} tokio-runtime-w-1666[003]1318.058272:funcgraph_exit:!218.500us|} tokio-runtime-w-1666[003]1318.058274:funcgraph_entry:|__alloc_percpu_gfp(){ tokio-runtime-w-1666[003]1318.058276:funcgraph_entry:|pcpu_alloc(){ tokio-runtime-w-1666[003]1318.058281:funcgraph_entry:2.250us|mutex_lock_killable(); tokio-runtime-w-1666[003]1318.058290:funcgraph_entry:|pcpu_find_block_fit(){ tokio-runtime-w-1666[003]1318.058293:funcgraph_entry:2.833us|pcpu_next_fit_region.constprop.38(); tokio-runtime-w-1666[003]1318.058299:funcgraph_exit:9.084us|} tokio-runtime-w-1666[003]1318.058301:funcgraph_entry:|pcpu_alloc_area(){ tokio-runtime-w-1666[003]1318.058315:funcgraph_entry:4.000us|pcpu_block_update_hint_alloc(); tokio-runtime-w-1666[003]1318.058320:funcgraph_entry:2.208us|pcpu_chunk_relocate(); tokio-runtime-w-1666[003]1318.058324:funcgraph_exit:+22.625us|} tokio-runtime-w-1666[003]1318.058327:funcgraph_entry:1.208us|mutex_unlock(); tokio-runtime-w-1666[003]1318.058332:funcgraph_entry:1.584us|pcpu_memcg_post_alloc_hook(); tokio-runtime-w-1666[003]1318.058335:funcgraph_exit:+58.833us|} tokio-runtime-w-1666[003]1318.058336:funcgraph_exit:+61.834us|} tokio-runtime-w-1666[003]1318.058338:funcgraph_entry:|kmem_cache_alloc_trace(){ tokio-runtime-w-1666[003]1318.058339:funcgraph_entry:1.167us|should_failslab(); tokio-runtime-w-1666[003]1318.058342:funcgraph_exit:4.458us|} tokio-runtime-w-1666[003]1318.058359:funcgraph_entry:|bpf_image_ksym_add(){ tokio-runtime-w-1666[003]1318.058360:funcgraph_entry:|bpf_ksym_add(){ tokio-runtime-w-1666[003]1318.058363:funcgraph_entry:1.583us|__local_bh_enable_ip(); tokio-runtime-w-1666[003]1318.058366:funcgraph_exit:5.750us|} tokio-runtime-w-1666[003]1318.058369:funcgraph_exit:9.834us|} tokio-runtime-w-1666[003]1318.058371:funcgraph_entry:1.250us|arch_prepare_bpf_trampoline(); tokio-runtime-w-1666[003]1318.058373:funcgraph_entry:2.292us|kfree(); tokio-runtime-w-1666[003]1318.058377:funcgraph_exit:!348.625us|} tokio-runtime-w-1666[003]1318.058379:funcgraph_entry:1.250us|mutex_unlock(); tokio-runtime-w-1666[003]1318.058382:funcgraph_exit:!363.167us|} tokio-runtime-w-1666[003]1318.058384:funcgraph_entry:|bpf_link_cleanup(){ tokio-runtime-w-1666[003]1318.058386:funcgraph_entry:|bpf_link_free_id.part.30(){ tokio-runtime-w-1666[003]1318.058392:funcgraph_entry:|call_rcu(){ tokio-runtime-w-1666[003]1318.058396:funcgraph_entry:1.834us|rcu_segcblist_enqueue(); tokio-runtime-w-1666[003]1318.058401:funcgraph_exit:9.333us|} tokio-runtime-w-1666[003]1318.058403:funcgraph_entry:1.542us|__local_bh_enable_ip(); tokio-runtime-w-1666[003]1318.058406:funcgraph_exit:+19.542us|} tokio-runtime-w-1666[003]1318.058408:funcgraph_entry:|fput(){ tokio-runtime-w-1666[003]1318.058409:funcgraph_entry:|fput_many(){ tokio-runtime-w-1666[003]1318.058411:funcgraph_entry:|task_work_add(){ tokio-runtime-w-1666[003]1318.058414:funcgraph_entry:1.625us|kick_process(); tokio-runtime-w-1666[003]1318.058418:funcgraph_exit:6.750us|} tokio-runtime-w-1666[003]1318.058419:funcgraph_exit:+10.333us|} tokio-runtime-w-1666[003]1318.058420:funcgraph_exit:+12.708us|} tokio-runtime-w-1666[003]1318.058422:funcgraph_entry:2.250us|put_unused_fd(); tokio-runtime-w-1666[003]1318.058426:funcgraph_exit:+41.416us|} tokio-runtime-w-1666[003]1318.058428:funcgraph_entry:1.292us|mutex_unlock(); tokio-runtime-w-1666[003]1318.058430:funcgraph_entry:1.250us|kfree(); tokio-runtime-w-1666[003]1318.058433:funcgraph_exit:!567.458us|} tokio-runtime-w-1666[003]1318.058435:funcgraph_entry:2.125us|__bpf_prog_put.isra.47(); tokio-runtime-w-1666[003]1318.058438:funcgraph_exit:!602.291us|} tokio-runtime-w-1666[003]1318.058439:funcgraph_exit:!631.791us|} ```shell 这是`kernel/bpf/trampoline.c`中与最后执行的函数`bpf_trampoline_update`对应的源代码: ```c staticintbpf_trampoline_update(structbpf_trampoline*tr) { structbpf_tramp_image*im; structbpf_tramp_progs*tprogs; u32flags=BPF_TRAMP_F_RESTORE_REGS; boolip_arg=false; interr,total; tprogs=bpf_trampoline_get_progs(tr,&total,&ip_arg); if(IS_ERR(tprogs)) returnPTR_ERR(tprogs); if(total==0){ err=unregister_fentry(tr,tr->cur_image->image); bpf_tramp_image_put(tr->cur_image); tr->cur_image=NULL; tr->selector=0; gotoout; } im=bpf_tramp_image_alloc(tr->key,tr->selector); if(IS_ERR(im)){ err=PTR_ERR(im); gotoout; } if(tprogs[BPF_TRAMP_FEXIT].nr_progs|| tprogs[BPF_TRAMP_MODIFY_RETURN].nr_progs) flags=BPF_TRAMP_F_CALL_ORIG|BPF_TRAMP_F_SKIP_FRAME; if(ip_arg) flags|=BPF_TRAMP_F_IP_ARG; err=arch_prepare_bpf_trampoline(im,im->image,im->image+PAGE_SIZE, &tr->func.model,flags,tprogs, tr->func.addr); if(err< 0) goto out; WARN_ON(tr->cur_image&&tr->selector==0); WARN_ON(!tr->cur_image&&tr->selector); if(tr->cur_image) /*progsalreadyrunningatthisaddress*/ err=modify_fentry(tr,tr->cur_image->image,im->image); else /*firsttimeregistering*/ err=register_fentry(tr,im->image); if(err) gotoout; if(tr->cur_image) bpf_tramp_image_put(tr->cur_image); tr->cur_image=im; tr->selector++; out: kfree(tprogs); returnerr; }
根据先前的输出,我们可以看到:
tokio-runtime-w-1666[003]1318.058371:funcgraph_entry:1.250us|arch_prepare_bpf_trampoline(); tokio-runtime-w-1666[003]1318.058373:funcgraph_entry:2.292us|kfree();
在arch_prepare_bpf_trampoline和kfree函数之间没有其他函数调用,所以很可能第一个函数在err变量中返回了错误代码。让我们来验证一下!
通过以下方式在shell中启动bpftace,我们可以捕获arch_prepare_bpf_trampoline函数的返回值并将其打印到控制台上:
bpftrace-e'kretprobe:arch_prepare_bpf_trampoline{printf("retvallink:%d ",retval);}'
并且在另一个终端中启动probe后,我们从bpftace得到了以下输出:
root@pine64-1:/home/exein#bpftrace-e'kretprobe:arch_prepare_bpf_trampoline{printf("retvallink:%d ",retval);}' Attaching1probe... retvallink:-524
这是因为内核5.15缺乏对aarch64架构的arch_prepare_bpf_trampoline实现,并使用了默认的占位符实现。
int__weak arch_prepare_bpf_trampoline(structbpf_tramp_image*tr,void*image,void*image_end, conststructbtf_func_model*m,u32flags, structbpf_tramp_links*tlinks, void*orig_call) { return-ENOTSUPP; }
因此,这个功能在这个内核版本上是不受支持的。好消息是,多亏了这个补丁[7],它在6.x内核中得到了实现。
让我们移步到6.x内核。
Linux 6.1
如果我们尝试在内核 6.1 上运行 probe,我们会得到以下输出:
root@pine64:/home/exein#./probefile-system-monitor thread'main'panickedat'initializationfailed:ProgramAttachError{program:"lsmpath_mknod",program_error:SyscallError{call:"bpf_raw_tracepoint_open",io_error:Os{code:524,kind:Uncategorized,message:"Noerrorinformation"}}}',src/bin/probe.rs43 note:runwith`RUST_BACKTRACE=1`environmentvariabletodisplayabacktrace
对于内核版本6.1,我们仍然遇到了和5.15内核一样的错误!!!让我们找出其中的原因。
这次在arch_prepare_bpf_trampoline上运行bpftrace,我们得到了以下输出:
root@pine64:/home/exein#bpftrace-e'kretprobe:arch_prepare_bpf_trampoline{printf("retvaltplink:%d ",retval);}' Attaching1probe... retvaltplink:284
所以问题不在这里,这个函数不再返回错误了。让我们回到函数调用图。
这次我们启动trace-cmd,跳过一些函数以获得更清晰的输出:
trace-cmdrecord -pfunction_graph -gbpf_trampoline_link_prog -nbpf_jit_alloc_exec -nkmalloc_trace -narch_prepare_bpf_trampoline -ngeneric_handle_domain_irq -ndo_interrupt_handler -nirq_exit_rcu ./probefile-system-monitor
我们从trace-cmd report中获得以下输出:
root@pine64:/home/exein#trace-cmdreport CPU0isempty CPU1isempty CPU3isempty cpus=4 tokio-runtime-w-11886[002]193385.056283:funcgraph_entry:|bpf_trampoline_link_prog(){ tokio-runtime-w-11886[002]193385.056321:funcgraph_entry:+15.042us|mutex_lock(); tokio-runtime-w-11886[002]193385.056373:funcgraph_entry:|__bpf_trampoline_link_prog(){ tokio-runtime-w-11886[002]193385.056395:funcgraph_entry:+14.833us|bpf_attach_type_to_tramp(); tokio-runtime-w-11886[002]193385.056428:funcgraph_entry:|bpf_trampoline_update.isra.23(){ tokio-runtime-w-11886[002]193385.056459:funcgraph_entry:2.917us|bpf_jit_charge_modmem(); tokio-runtime-w-11886[002]193385.056531:funcgraph_entry:|find_vm_area(){ tokio-runtime-w-11886[002]193385.056540:funcgraph_entry:3.000us|find_vmap_area(); tokio-runtime-w-11886[002]193385.056547:funcgraph_exit:+16.208us|} tokio-runtime-w-11886[002]193385.056554:funcgraph_entry:|__alloc_percpu_gfp(){ tokio-runtime-w-11886[002]193385.056563:funcgraph_entry:|pcpu_alloc(){ tokio-runtime-w-11886[002]193385.056568:funcgraph_entry:4.875us|mutex_lock_killable(); tokio-runtime-w-11886[002]193385.056591:funcgraph_entry:|pcpu_find_block_fit(){ tokio-runtime-w-11886[002]193385.056599:funcgraph_entry:8.625us|pcpu_next_fit_region.constprop.38(); tokio-runtime-w-11886[002]193385.056608:funcgraph_exit:+17.166us|} tokio-runtime-w-11886[002]193385.056610:funcgraph_entry:|pcpu_alloc_area(){ tokio-runtime-w-11886[002]193385.056639:funcgraph_entry:9.167us|pcpu_block_update(); tokio-runtime-w-11886[002]193385.056656:funcgraph_entry:7.667us|pcpu_block_update_hint_alloc(); tokio-runtime-w-11886[002]193385.056671:funcgraph_entry:7.750us|pcpu_chunk_relocate(); tokio-runtime-w-11886[002]193385.056679:funcgraph_exit:+69.667us|} tokio-runtime-w-11886[002]193385.056682:funcgraph_entry:7.042us|mutex_unlock(); tokio-runtime-w-11886[002]193385.056703:funcgraph_entry:2.792us|pcpu_memcg_post_alloc_hook(); tokio-runtime-w-11886[002]193385.056712:funcgraph_exit:!148.709us|} tokio-runtime-w-11886[002]193385.056719:funcgraph_exit:!165.250us|} tokio-runtime-w-11886[002]193385.056866:funcgraph_entry:|bpf_image_ksym_add(){ tokio-runtime-w-11886[002]193385.056873:funcgraph_entry:|bpf_ksym_add(){ tokio-runtime-w-11886[002]193385.056882:funcgraph_entry:2.750us|__local_bh_disable_ip(); tokio-runtime-w-11886[002]193385.056897:funcgraph_entry:4.625us|__local_bh_enable_ip(); tokio-runtime-w-11886[002]193385.056905:funcgraph_exit:+32.459us|} tokio-runtime-w-11886[002]193385.056922:funcgraph_entry:7.584us|perf_event_ksymbol(); tokio-runtime-w-11886[002]193385.056944:funcgraph_exit:+78.417us|} tokio-runtime-w-11886[002]193385.057492:funcgraph_entry:|set_memory_ro(){ tokio-runtime-w-11886[002]193385.057501:funcgraph_entry:|change_memory_common(){ tokio-runtime-w-11886[002]193385.057504:funcgraph_entry:|find_vm_area(){ tokio-runtime-w-11886[002]193385.057506:funcgraph_entry:8.875us|find_vmap_area(); tokio-runtime-w-11886[002]193385.057518:funcgraph_exit:+14.250us|} tokio-runtime-w-11886[002]193385.057522:funcgraph_entry:|__change_memory_common(){ tokio-runtime-w-11886[002]193385.057531:funcgraph_entry:|apply_to_page_range(){ tokio-runtime-w-11886[002]193385.057538:funcgraph_entry:|__apply_to_page_range(){ tokio-runtime-w-11886[002]193385.057544:funcgraph_entry:+12.791us|pud_huge(); tokio-runtime-w-11886[002]193385.057559:funcgraph_entry:2.708us|pmd_huge(); tokio-runtime-w-11886[002]193385.057574:funcgraph_entry:+15.125us|change_page_range(); tokio-runtime-w-11886[002]193385.057591:funcgraph_exit:+53.792us|} tokio-runtime-w-11886[002]193385.057597:funcgraph_exit:+66.083us|} tokio-runtime-w-11886[002]193385.057610:funcgraph_exit:+88.125us|} tokio-runtime-w-11886[002]193385.057619:funcgraph_entry:|vm_unmap_aliases(){ tokio-runtime-w-11886[002]193385.057622:funcgraph_entry:|_vm_unmap_aliases.part.77(){ tokio-runtime-w-11886[002]193385.057625:funcgraph_entry:9.125us|mutex_lock(); tokio-runtime-w-11886[002]193385.057637:funcgraph_entry:3.084us|purge_fragmented_blocks_allcpus(); tokio-runtime-w-11886[002]193385.057643:funcgraph_entry:|__purge_vmap_area_lazy(){ tokio-runtime-w-11886[002]193385.057687:funcgraph_entry:|kmem_cache_free(){ tokio-runtime-w-11886[002]193385.057693:funcgraph_entry:+13.250us|__slab_free(); tokio-runtime-w-11886[002]193385.057705:funcgraph_exit:+18.750us|} tokio-runtime-w-11886[002]193385.057718:funcgraph_entry:7.416us|__cond_resched_lock(); tokio-runtime-w-11886[002]193385.057733:funcgraph_exit:+90.042us|} tokio-runtime-w-11886[002]193385.057741:funcgraph_entry:2.792us|mutex_unlock(); tokio-runtime-w-11886[002]193385.057747:funcgraph_exit:!124.666us|} tokio-runtime-w-11886[002]193385.057749:funcgraph_exit:!130.291us|} tokio-runtime-w-11886[002]193385.057756:funcgraph_entry:|__change_memory_common(){ tokio-runtime-w-11886[002]193385.057759:funcgraph_entry:|apply_to_page_range(){ tokio-runtime-w-11886[002]193385.057765:funcgraph_entry:|__apply_to_page_range(){ tokio-runtime-w-11886[002]193385.057768:funcgraph_entry:4.125us|pud_huge(); tokio-runtime-w-11886[002]193385.057778:funcgraph_entry:8.750us|pmd_huge(); tokio-runtime-w-11886[002]193385.057790:funcgraph_entry:4.625us|change_page_range(); tokio-runtime-w-11886[002]193385.057797:funcgraph_exit:+31.958us|} tokio-runtime-w-11886[002]193385.057803:funcgraph_exit:+44.375us|} tokio-runtime-w-11886[002]193385.057817:funcgraph_exit:+61.208us|} tokio-runtime-w-11886[002]193385.057820:funcgraph_exit:!319.292us|} tokio-runtime-w-11886[002]193385.057826:funcgraph_exit:!333.667us|} tokio-runtime-w-11886[002]193385.057840:funcgraph_entry:|set_memory_x(){ tokio-runtime-w-11886[002]193385.057847:funcgraph_entry:|change_memory_common(){ tokio-runtime-w-11886[002]193385.057855:funcgraph_entry:|find_vm_area(){ tokio-runtime-w-11886[002]193385.057858:funcgraph_entry:2.917us|find_vmap_area(); tokio-runtime-w-11886[002]193385.057870:funcgraph_exit:+14.375us|} tokio-runtime-w-11886[002]193385.057876:funcgraph_entry:|vm_unmap_aliases(){ tokio-runtime-w-11886[002]193385.057879:funcgraph_entry:|_vm_unmap_aliases.part.77(){ tokio-runtime-w-11886[002]193385.057882:funcgraph_entry:3.959us|mutex_lock(); tokio-runtime-w-11886[002]193385.057893:funcgraph_entry:3.000us|purge_fragmented_blocks_allcpus(); tokio-runtime-w-11886[002]193385.057900:funcgraph_entry:2.791us|__purge_vmap_area_lazy(); tokio-runtime-w-11886[002]193385.057907:funcgraph_entry:2.709us|mutex_unlock(); tokio-runtime-w-11886[002]193385.057913:funcgraph_exit:+33.708us|} tokio-runtime-w-11886[002]193385.057915:funcgraph_exit:+43.000us|} tokio-runtime-w-11886[002]193385.057922:funcgraph_entry:|__change_memory_common(){ tokio-runtime-w-11886[002]193385.057925:funcgraph_entry:|apply_to_page_range(){ tokio-runtime-w-11886[002]193385.057930:funcgraph_entry:|__apply_to_page_range(){ tokio-runtime-w-11886[002]193385.057933:funcgraph_entry:4.292us|pud_huge(); tokio-runtime-w-11886[002]193385.057945:funcgraph_entry:8.750us|pmd_huge(); tokio-runtime-w-11886[002]193385.057956:funcgraph_entry:3.958us|change_page_range(); tokio-runtime-w-11886[002]193385.058037:funcgraph_exit:+32.083us|} tokio-runtime-w-11886[002]193385.058089:funcgraph_entry:7.667us|irq_enter_rcu(); tokio-runtime-w-11886[002]193385.058233:funcgraph_exit:!308.041us|} tokio-runtime-w-11886[002]193385.058239:funcgraph_exit:!316.709us|} tokio-runtime-w-11886[002]193385.058247:funcgraph_exit:!400.417us|} tokio-runtime-w-11886[002]193385.058255:funcgraph_exit:!415.000us|} tokio-runtime-w-11886[002]193385.058555:funcgraph_entry:8.250us|irq_enter_rcu(); tokio-runtime-w-11886[002]193385.058958:funcgraph_entry:|kallsyms_lookup_size_offset(){ tokio-runtime-w-11886[002]193385.058974:funcgraph_entry:+36.333us|get_symbol_pos(); tokio-runtime-w-11886[002]193385.059017:funcgraph_exit:+59.750us|} tokio-runtime-w-11886[002]193385.059043:funcgraph_entry:|kfree(){ tokio-runtime-w-11886[002]193385.059057:funcgraph_entry:3.000us|__kmem_cache_free(); tokio-runtime-w-11886[002]193385.059065:funcgraph_exit:+22.833us|} tokio-runtime-w-11886[002]193385.059073:funcgraph_exit:#2644.708us|} tokio-runtime-w-11886[002]193385.059079:funcgraph_exit:#2706.292us|} tokio-runtime-w-11886[002]193385.059095:funcgraph_entry:2.792us|mutex_unlock(); tokio-runtime-w-11886[002]193385.059101:funcgraph_exit:#2870.416us|}
这次程序已经通过了arch_prepare_bpf_trampoline、set_memory_ro和set_memory_x,我们看到的最后一个函数是kallsyms_lookup_size_offset。
正如我们在kernel/bpf/trampoline.c中的bpf_trampoline_update函数中所看到的,这里并没有明确调用kallsyms_lookup_size_offset:
staticintbpf_trampoline_update(structbpf_trampoline*tr,boollock_direct_mutex) { //...OTHERCODE... #ifdefCONFIG_DYNAMIC_FTRACE_WITH_DIRECT_CALLS again: if((tr->flags&BPF_TRAMP_F_SHARE_IPMODIFY)&& (tr->flags&BPF_TRAMP_F_CALL_ORIG)) tr->flags|=BPF_TRAMP_F_ORIG_STACK; #endif err=arch_prepare_bpf_trampoline(im,im->image,im->image+PAGE_SIZE, &tr->func.model,tr->flags,tlinks, tr->func.addr); if(err< 0) goto out; set_memory_ro((long)im->image,1); set_memory_x((long)im->image,1); WARN_ON(tr->cur_image&&tr->selector==0); WARN_ON(!tr->cur_image&&tr->selector); if(tr->cur_image) /*progsalreadyrunningatthisaddress*/ err=modify_fentry(tr,tr->cur_image->image,im->image,lock_direct_mutex); else /*firsttimeregistering*/ err=register_fentry(tr,im->image); #ifdefCONFIG_DYNAMIC_FTRACE_WITH_DIRECT_CALLS if(err==-EAGAIN){ /*-EAGAINfrombpf_tramp_ftrace_ops_func.Now *BPF_TRAMP_F_SHARE_IPMODIFYisset,wecangeneratethe *trampolineagain,andretryregister. */ /*resetfops->funcandfops->trampolineforre-register*/ tr->fops->func=NULL; tr->fops->trampoline=0; /*resetim->imagememoryattrforarch_prepare_bpf_trampoline*/ set_memory_nx((long)im->image,1); set_memory_rw((long)im->image,1); gotoagain; } #endif if(err) gotoout; if(tr->cur_image) bpf_tramp_image_put(tr->cur_image); tr->cur_image=im; tr->selector++; out: /*Ifanyerrorhappens,restorepreviousflags*/ if(err) tr->flags=orig_flags; kfree(tlinks); returnerr; } ```shell >**注意:**`bpf_trampoline_update`的实现与之前的内核5.15稍有不同。 `kallsyms_lookup_size_offset`的调用被隐藏在另一个函数内部。我们在函数图中看不到它,因为编译器将其内联了。 看起来`kallsyms_lookup_size_offset`是由`ftrace_location`调用的: ```c unsignedlongftrace_location(unsignedlongip) { structdyn_ftrace*rec; unsignedlongoffset; unsignedlongsize; rec=lookup_rec(ip,ip); if(!rec){ if(!kallsyms_lookup_size_offset(ip,&size,&offset)) gotoout; /*mapsym+0to__fentry__*/ if(!offset) rec=lookup_rec(ip,ip+size-1); } if(rec) returnrec->ip; out: return0; }
ftrace_location被register_fentry调用,而register_fentry在调用ftrace_location之后,在struct bpf_trampoline *tr的fops字段上包含了一次检查。
/*firsttimeregistering*/ staticintregister_fentry(structbpf_trampoline*tr,void*new_addr) { void*ip=tr->func.addr; unsignedlongfaddr; intret; faddr=ftrace_location((unsignedlong)ip); if(faddr){ if(!tr->fops) return-ENOTSUPP; tr->func.ftrace_managed=true; } if(bpf_trampoline_module_get(tr)) return-ENOENT; if(tr->func.ftrace_managed){ ftrace_set_filter_ip(tr->fops,(unsignedlong)ip,0,1); ret=register_ftrace_direct_multi(tr->fops,(long)new_addr); }else{ ret=bpf_arch_text_poke(ip,BPF_MOD_CALL,NULL,new_addr); } if(ret) bpf_trampoline_module_put(tr); returnret; }
确实,如果tr->fops为false,该函数将返回错误-ENOTSUPP。
让我们找出tr->fops是在哪里初始化的。
如果我们是正确的,那么创建trampoline的地方应该在bpf_trampoline_lookup函数内部。
staticstructbpf_trampoline*bpf_trampoline_lookup(u64key) { structbpf_trampoline*tr; structhlist_head*head; inti; mutex_lock(&trampoline_mutex); head=&trampoline_table[hash_64(key,TRAMPOLINE_HASH_BITS)]; hlist_for_each_entry(tr,head,hlist){ if(tr->key==key){ refcount_inc(&tr->refcnt); gotoout; } } tr=kzalloc(sizeof(*tr),GFP_KERNEL); if(!tr) gotoout; #ifdefCONFIG_DYNAMIC_FTRACE_WITH_DIRECT_CALLS tr->fops=kzalloc(sizeof(structftrace_ops),GFP_KERNEL); if(!tr->fops){ kfree(tr); tr=NULL; gotoout; } tr->fops->private=tr; tr->fops->ops_func=bpf_tramp_ftrace_ops_func; #endif tr->key=key; INIT_HLIST_NODE(&tr->hlist); hlist_add_head(&tr->hlist,head); refcount_set(&tr->refcnt,1); mutex_init(&tr->mutex); for(i=0;i< BPF_TRAMP_MAX; i++) INIT_HLIST_HEAD(&tr->progs_hlist[i]); out: mutex_unlock(&trampoline_mutex); returntr; }
在分配之后,只有在出现CONFIG_DYNAMIC_FTRACE_WITH_DIRECT_CALLS标志时,才会填充trampoline的fops字段。这个标志依赖于HAVE_CONFIG_DYNAMIC_FTRACE_WITH_DIRECT_CALLS标志,而这个标志在aarch64上不存在。
结论
当前情况下,由于缺少_ftrace直接调用_功能,无法在aarch64上使用BPF LSM。幸运的是,当前的mainline分支已经合并了一个补丁[8],该补丁将在aarch64上启用LSMs(以及其他功能)。
预计这些变化将会在下一个6.4版的Linux内核中发布。
审核编辑:汤梓红
-
内核
+关注
关注
3文章
1362浏览量
40218 -
cpu
+关注
关注
68文章
10824浏览量
211089 -
Linux
+关注
关注
87文章
11219浏览量
208873 -
程序
+关注
关注
116文章
3773浏览量
80830
原文标题:探索aarch64架构上使用ftrace的BPF LSM
文章出处:【微信号:LinuxDev,微信公众号:Linux阅码场】欢迎添加关注!文章转载请注明出处。
发布评论请先 登录
相关推荐
评论