Berkeley Packet Filter, known today by the name BPF, is an old technology that has been extended to achieve magic, as many of its supporters will unequivocally state. As we’ll see, this is not without merit.
I had first heard of this technology from an old Linux Journal article published way back in the mists of time. I think it was the late 90s, although I didn’t stumble across it until much later (unfortunately, I was unable to find a link). If I recall, I was looking into ways of filtering raw link-layer packets in user space, and BPF enabled early inspection of network traffic right off of the network card.
However, I was then distracted by a squirrel, and I never returned to the subject matter.
Recently, though, I have returned to the subject again after viewing a number of talks about the kernel technology, mostly by Liz Rice. I highly recommend watching everything that she does.
In a very real way, eBPF
allows for an on-the-fly configurable kernel, and some have analogized it to the introduction of JavaScript in the browser. Indeed, it does have the potential to change how we think about doing systems-based operations, not only the scope but the granularity, and there are already production-ready projects with tooling to assist in that.
Since the kernel controls the entire system, there is no better place from which to observe everything that is happening on the system, including networking functionality. It is the keys to the kingdom.
Ok, enough dramatics. Let’s look at why eBPF
has been dubbed a “Linux superpower” by people who put their pants on one leg at a time.
What is eBPF?
Starting with a patch to the Linux 3.15 kernel by Alexei Starovoitov in March 2014, what has become to be known as eBPF
(extended BPF) was first added to Linux. This was exciting and groundbreaking because it allowed, essentially, user space code to be run in the kernel to do a seemingly infinite amount of interesting tasks.
At a high level, eBPF
is an in-kernel virtual machine that allows for running programmable user space code to do observability, instrumentation, security, network filtering (its initial use case), et al. It has 10 64-bit registers that allow parameters to be passed to functions in eBPF
virtual machine registers just like native hardware (CPUs).
There is a new bpf
system call that allows for developers to register their eBPF
program functions with events in the kernel at pre-defined hook points, and the kernel will run these programs when the events are triggered.
For example, you may be interested in tracking a file descriptor as it is used in I/O operations, or dropping network packets that seem suspicious, or monitoring for suspicious activity such as chmod
syscalls, etc., and you have a nifty eBPF
function that has been inserted into the kernel at the prescribed hook. Then, anytime something happens on the machine that goes through the code paths in the kernel where the hooks are defined, the kernel will see if an eBPF
program is registered, and, if so, it will call it. You’ve just changed the world, bro!
Crucially, it is possible to store and access state using eBPF
maps. That is, both the eBPF
program running in kernel space and the application running in user space have access to these maps that can retrieve information out of any of the many supported data structures. The eBPF
functionality also provides a system call for user space to access and write to this state.
Here is a partial list of the supported map data structure types:
BPF_MAP_TYPE_HASH
: a hash tableBPF_MAP_TYPE_ARRAY
: an array map, optimized for fast lookup speeds, often used for countersBPF_MAP_TYPE_PROG_ARRAY
: an array of file descriptors corresponding to eBPF programs; used to implement jump tables and sub-programs to handle specific packet protocolsBPF_MAP_TYPE_PERCPU_ARRAY
: a per-CPU array, used to implement histograms of latencyBPF_MAP_TYPE_PERF_EVENT_ARRAY
: stores pointers to struct perf_event, used to read and store perf event countersBPF_MAP_TYPE_LRU_HASH
: a hash table that only retains the most recently used items
It’s also worth nothing that eBPF
has helper functions. This is good, because it will provide us with a stable API, which is much better than relying on kernel function names, as they can change with kernel versions.
The original BPF is now referred to as classic BPF, or cBPF. The newer extended BPF is now largely referred to as the non-acronym word “BPF”.
How does eBPF work?
The following is how
eBPF
works from a developer’s point-of-view, and it doesn’t go into too much depth about its internal workings. For a really good overview ofeBPF
internals, see Brendan Gregg’s talk on BPF Internals.
So, I’ve briefly described what eBPF
is and why it’s quite possibly the Bee’s Knees. Now, I’ll touch on how you, as presumably someone with 46 chromosomes, can put this to good use.
Most developers, of course, work in user space. And how do user space applications interact with the kernel? Through system calls, of course. Many high-level languages abstract these syscalls from the developer, but they’re still used under the covers. And just as these high-level languages do, there are eBPF
tools that have also abstracted away the eBPF
syscalls so that we don’t have to worry about them as we write our programs.
For example, loading the program and associating the program with the trace point is something that has to be done in user space, and these tools will do that for you.
So, how does one get their eBPF
program into kernel space? Essentially, you write a restricted C program and then a compiler toolchain like LLVM
compiles it to eBPF
bytecode (i.e, Clang
frontend to the LLVM
backend). There are also tools that let you embed your restricted C code in a higher-level language, as we’ll see in the Examples section.
Why a “restricted” C program? Because there are certain things that the verifier running in the kernel will check for and not allow you to do, such as write code paths that don’t exit. The kernel will not allow a user-defined
eBPF
program to hang or crash the kernel.
Once the eBPF
program is inserted into the kernel by calling the bpf
syscall, it is verified by the kernel to ensure it’s safe using static analysis (after all, injecting user space code into kernel space is fraught with danger) and then JIT compiled from the intermediary eBPF
bytecode to the machine specific instruction set, so it runs as fast as natively-compiled kernel code and kernel modules.
So, when are the eBPF
functions called? Well, eBPF
is event-driven, and the functions are called when these events occur, such as when the following hooks are encountered:
- system calls
- function entry/exit
- kernel tracepoints
- network events
If a predefined hook doesn’t exist for a particular kernel function, it is still possible to create a kernel probe (kprobe
and kretprobe
) or user probe (uprobe
and uretprobe
) to be called at a function’s entry point in kernel and user applications, respectively.
And, as previously mentioned, user space can get information from kernel space about the particular things that it’s interested in and have captured, through the use of the eBPF
maps key/value data structure (by commands that can create and modify these eBPF
maps).
Another great thing is the machine does not need to be restarted once the code is injected into the kernel. The eBPF
program will start working immediately!
Weeeeeeeeeeeeeeeeeeeeeeeeeeeeee
Examples
bpftrace
Here is an easy one-line example using bpftrace
, taken from the README
and used for explication in a great Liz Rice talk entitled A Beginner’s Guide to eBPF Programming with Go.
It runs a script when any process on the machine makes a system call. Note that it blocks and only quits by sending the SIGINT
interrupt signal (Control-C) to the process:
$ sudo bpftrace -e 'tracepoint:raw_syscalls:sys_enter { @[comm] = count(); }'
Attaching 1 probe...
^C
@[rpcbind]: 1
@[StreamT~ns #372]: 2
@[wpa_supplicant]: 2
@[Cache2 I/O]: 3
@[rs:main Q:Reg]: 6
@[Renderer]: 6
@[in:imuxsock]: 6
@[auditd]: 7
@[sudo]: 7
@[rtkit-daemon]: 8
@[tracker-miner-f]: 8
@[IndexedDB #297]: 9
@[Xorg:gdrv0]: 9
@[packagekitd]: 11
@[Compositor]: 12
@[StreamTrans #21]: 13
@[MediaTimer #1]: 13
@[gvfs-afc-volume]: 13
@[MediaSu~isor #6]: 13
@[dockerd]: 13
@[MediaSu~isor #8]: 13
@[URL Classifier]: 14
@[StreamTrans #22]: 14
...
bcc
Make sure that your kernel satisfies the requirements set by the
bcc
project.
Here is a “Hello, World!” example taken from the bcc
project. It provides Python bindings and allows you to embed the C code directly into the program:
#!/usr/bin/python
#
# This is a Hello World example that formats output as fields.
from bcc import BPF
from bcc.utils import printb
# define BPF program
prog = """
int hello(void *ctx) {
bpf_trace_printk("Hello, World!\\n");
return 0;
}
"""
# load BPF program
b = BPF(text=prog)
b.attach_kprobe(event=b.get_syscall_fnname("clone"), fn_name="hello")
# header
print("%-18s %-16s %-6s %s" % ("TIME(s)", "COMM", "PID", "MESSAGE"))
# format output
while 1:
try:
(task, pid, cpu, flags, ts, msg) = b.trace_fields()
except ValueError:
continue
except KeyboardInterrupt:
exit()
printb(b"%-18.9f %-16s %-6d %s" % (ts, task, pid, msg))
chmod
that little sucker and run it with escalated privileges because it’s inserted code into the kernel. Again, note that it blocks and only quits by sending the SIGINT
interrupt signal (Control-C) to the process:
(ebpf) $ chmod 700 main.py
(ebpf) $ sudo ./main.py
TIME(s) COMM PID MESSAGE
78444.455829000 tmux: server 2390 Hello, World!
78444.458999000 tmux: server 2390 Hello, World!
78444.460851000 sh 302895 Hello, World!
78444.460934000 tmux: server 2390 Hello, World!
78444.461730000 sh 302896 Hello, World!
78444.462156000 bash 302897 Hello, World!
78444.462440000 bash 302900 Hello, World!
78444.462787000 sh 302898 Hello, World!
78444.463126000 bash 302899 Hello, World!
78444.463452000 bash 302903 Hello, World!
78444.463928000 bash 302897 Hello, World!
...
You’re gonna want to create a Python virtual environment before downloading any dependencies and running that script, player.
If you get an error similar to the following, it means that you need to install the bcc
tools on your machine:
Traceback (most recent call last):
File "./main.py", line 9, in <module>
from bcc import BPF
ModuleNotFoundError: No module named 'bcc'
$
$ sudo apt-get install bpfcc-tools linux-headers-$(uname -r)
Summary
This article serves as a general overview of eBPF
and hopefully has provided the reader with an understanding of not only what it is but how powerful it can be.
There are other very good reasons to check out this technology. Here are some of them:
- The injected program immediately effects all running containers that share the same kernel.
- Writing an
eBPF
program is easier than writing a kernel module (and safer). - You can modify/patch your kernel immediately without having to wait until the change is in the Linux kernel and then into your Linux distribution, which could literally take years.
Of course, there are other projects that I haven’t mentioned but are definitely worth investigating, such as Cilium.