This document tries to summarize and structure what I have learned about the FUSE (Filesystem in Userspace) protocol and Linux kernel internals during the development of gocryptfs.
The Markdown source code of this document is available at https://github.com/rfjakob/the-fuse-wire-protocol - pull requests welcome!
The rendered HTML should always be available at https://nuetzlich.net/the-fuse-wire-protocol/.
To understand how FUSE works it is important to know how the Linux filesystem stack looks like. FUSE is designed to fit seamlessly into the existing model.
Let’s take unlink("/tmp/foo")
on an ext4 filesystem as an example. Like many other system calls, unlink()
operates on a file path, while Linux interally operates on dentry
(“directory entry”) structs (definiton).
Each dentry
has a pointer to an inode
struct (definiton) that is filled by the filesystem (in our example, ext4).
Each inode
struct in turn contains a list of function pointers in an inode_operations
struct (definition).
The overall structure looks like this:
dentry
inode
inode_operations
lookup()
unlink()
The Linux VFS layer splits the path into segments. In our case, /
, tmp
, foo
.
The /
(root directory) dentry
is created at mount-time and serves as the starting point for the recursive walk:
lookup("tmp")
on the dentry
corresponding to /
and receives the dentry
for tmp
lookup("foo")
on the dentry
corresponding to tmp
and receives the dentry
for foo
unlink()
on the dentry
corresponding to foo
The lookup()
and unlink()
functions are, in our example, implemented by the ext4 filesystem.
For a FUSE filesystem, the functions in inode_operations
are implemented in the userspace filesystem. The FUSE module in the Linux kernel provides stub implementations (definition) that forward the requests to the userspace filesystem and convert between kernel API and FUSE wire protocol.
dcache
Translating paths to dentry
structs is a performance-critical operation. To avoid calling the filesystem’s lookup()
function for each segment, the Linux kernel implements a directory entry cache called dcache
.
For local filesystems like ext4, the cached entries never expire. For FUSE filesystems, the default timeout is 1 second, but it can be set to an arbitrary value using the entry_timeout
mount option in libfuse (see man 8 fuse
) or the EntryTimeout
field in go-fuse.
The Linux kernel and the userspace filesystem communicate by sending messages through the /dev/fuse
device. On the kernel side, message parsing and generation is handled by the FUSE module. On the userspace side this is usually handled by a FUSE library. libfuse is the reference implementation and is developed in lockstep with the kernel. Alternative FUSE libraries like go-fuse follow the developments in libfuse.
Note: the excellent manual page fuse.4 has more details.
Kernel & userspace have the message format defined correspondingly in C header files:
Every message from the kernel to userspace starts with the fuse_in_header
struct (definition), the most interesting fields are:
opcode
… the operation the kernel wants to perform (a uint32 from enum fuse_opcode)nodeid
… the file or directory to operate on (arbitrary uint64 identifier)The opcode defines the data that follows the header. An opcode-specific struct and up to two filenames may follow. A RENAME
message uses all of those fields and looks like this:
fuse_in_header
structfuse_rename_in
structWhereas an UNLINK
message looks like this:
fuse_in_header
structThe go-fuse library has two nice tables listing what data follows the header for each opcode. Due to Go naming conventions, the struct names are slightly different than the C names, but the correlation should be clear enough.
LOOKUP
OpcodeThe nodeid
field in fuse_in_header
identifies which file or directory the operation should be performed on. The kernel has to obtain the nodeid
from the userspace filesystem before it can perform any other operation.
The process is the same for in-kernel filesystems: See the section “The Inode Object” in https://www.kernel.org/doc/Documentation/filesystems/vfs.txt.
The LOOKUP
opcode allows the kernel to get a nodeid
for a filename in a directory. A LOOKUP
message looks like this:
fuse_in_header
structThe userspace filesystem replies with the nodeid
corresponding to the filename in the directory identified by the nodeid
in the header. The root directory has a fixed nodeid
of 1.
The nodeid
is an arbitrary value that is chosen by the userspace filesystem. The userspace filesystem must remember which file or directory the nodeid
corresponds to.
fuse(4) — Linux manual page https://man7.org/linux/man-pages/man4/fuse.4.html
Writing a FUSE Filesystem: a Tutorial
Joseph J. Pfeiffer Jr.
https://www.cs.nmsu.edu/~pfeiffer/fuse-tutorial/
Overview of the Linux Virtual File System
Richard Gooch, Pekka Enberg
https://www.kernel.org/doc/Documentation/filesystems/vfs.txt
To FUSE or Not to FUSE: Performance of User-Space File Systems
Vangoor, Tarasov, Zadok; 2017
https://www.usenix.org/system/files/conference/fast17/fast17-vangoor.pdf