|
| 1 | +--- |
| 2 | +title: "Unikraft Filesystem Stack" |
| 3 | +description: This blog post provides a technical overview of the new VFS stack introduced in Unikraft 0.20.0. |
| 4 | +publishedDate: 2025-09-09 |
| 5 | +image: |
| 6 | +authors: |
| 7 | +- Andrei Tatar |
| 8 | +tags: |
| 9 | +- filesystem |
| 10 | +- vfs |
| 11 | +- libukfs |
| 12 | +- libposix-vfs |
| 13 | +- libvfscore |
| 14 | +--- |
| 15 | + |
| 16 | +# The Unikraft Filesystem Stack |
| 17 | + |
| 18 | +Files play a pivotal role in how applications and the kernel interact. |
| 19 | +As the old adage goes, "everything is a file". |
| 20 | +Indeed, on POSIX systems one can scarcely interact with the broader system without a file of some sort being involved. |
| 21 | +This ubiquity is not accidental, as files offer an appealing abstraction over a large and diverse number of resources external to an application. |
| 22 | +Whether representing persistent storage media, network connections, serial consoles, or kernel state, files are central to applications talking to the outside world. |
| 23 | +Furthermore, all but the most trivial of applications make extensive use of the filesystem -- a tree-like abstraction that maps hierarchies of file names ("paths") to actual files. |
| 24 | + |
| 25 | +In Unikraft the file(system) stack has been traditionally handled by the fairly monolithic `vfscore` library, whose design and history saddle us with some unfortunate limitations. |
| 26 | +With Unikraft release 0.16.0 Telesto we started addressing these fundamental issues, migrating sockets and pseudofiles to a new, more modular file stack built around `ukfile`. |
| 27 | +Filesystems however required more careful consideration (and a lot more dev work) to get right, and as such, we have since been hard at work behind the scenes to bring the new VFS stack to life. |
| 28 | +That is, until now. |
| 29 | + |
| 30 | +We are excited to release this modernized filesystem stack as part of [Unikraft 0.20.0 Kiviuq](https://unikraft.org/blog/2025-09-08-unikraft-releases-v0.20.0), bringing with it new features, better performance, and a solid base for future improvements. |
| 31 | + |
| 32 | +## Status Quo, `vfscore` & its Limitations |
| 33 | + |
| 34 | +While vfscore has [quite the storied past](https://unikraft.org/blog/2023-06-09-tales-of-open-source-vfscore) and it has served the project well for many years, over time fundamental limitations of its design have become more and more apparent, limiting and sometimes outright hindering new development. |
| 35 | +Here we attempt to give a non-exhaustive overview of the most relevant of these limitations, following up with how we addressed these in the design of the new stack. |
| 36 | + |
| 37 | +#### Insufficient Abstraction |
| 38 | + |
| 39 | +In vfscore, a file's open state (e.g., `lseek` position) and file descriptor are tightly bound to the file object, appearing as fields in its struct. |
| 40 | +In addition to being a redundant source of truth with the fdtab, this tight coupling suggests a 1:1 relationship that is not really there. |
| 41 | +In truth, files, open file descriptions, and file descriptors are three different concepts, and vfscore's design masks two 1:N relationships -- a file may be referenced by any number of open file descriptions, each of which in turn can be referenced by any number of file descriptors. |
| 42 | +This limitation is addressed in the ukfile stack by `posix-fd` + `posix-fdtab`, with the feature now available to filesystem nodes as well. |
| 43 | + |
| 44 | +#### Files == Paths |
| 45 | + |
| 46 | +In a similar limitation to the above, vfscore views the filesystem as a _reversible_ mapping of paths to files, implying another 1:1 relationship that does not exist in practice. |
| 47 | +Hardlinks are a trivial counterexample to this assumption, and a feature lacking in previous versions. |
| 48 | +Another, more subtle consequence is the inability of vfscore to mount on top of a non-empty directory, or to handle bind mounts. |
| 49 | + |
| 50 | +#### Absolute Lookups |
| 51 | + |
| 52 | +Building on its assumptions about the mapping of paths to files, vfscore treats all lookups as absolute, roughly following two steps: (1) look up absolute path prefix in mount table to determine mount root, and (2) delegate lookup relative to mount root to driver. |
| 53 | +This becomes most unfortunate when doing relative lookups, as the VFS code must spend considerable time building an absolute path before doing anything else, a process that resets and repeats every time when encountering a symlink. |
| 54 | +With the recent proliferation of `*at` syscalls in Linux that focus on relative lookup & operations, coupled with encouragement of their use over their legacy absolute path counterparts, this extra overhead becomes more and more unavoidable. |
| 55 | + |
| 56 | +#### Monolithic Nature |
| 57 | + |
| 58 | +Unlike most Unikraft core libraries, and counter to the unikernel philosophy, vfscore is unusually monolithic, bearing responsibility across many abstraction layers. |
| 59 | +While the inherent complexity of a VFS warrants some level of tight coupling, the amount of vertical integration in vfscore is excessive and the overall architecture would benefit from clearly defined and documented interfaces between layers. |
| 60 | + |
| 61 | +## Unikraft Filesystem Stack |
| 62 | + |
| 63 | +To address vfscore's issues, as well as to lay the groundwork for future development, we introduce the Unikraft filesystem stack, anchored by two core libraries: |
| 64 | + |
| 65 | +- `ukfs` - what is _a_ filesystem; driver registration & lookup |
| 66 | +- `posix-vfs` what is _the_ filesystem (VFS); all userspace-facing operations |
| 67 | + |
| 68 | +Describing the entire design in detail would take far more than one blog post, but we would like to highlight some of the more pertinent or unique considerations. |
| 69 | + |
| 70 | +### Modularity, Mechanism, and Policy |
| 71 | + |
| 72 | +A first important issue is breaking up vfscore's responsibilities into dedicated orthogonal components. |
| 73 | +Compile-time driver registration, global VFS state, and the fstab loaded at boot are all entirely different concepts that should be separated by defined interfaces. |
| 74 | + |
| 75 | +Informing the decision on where to draw boundaries between components, we focused on having ukfs drivers provide _mechanism_ -- how to interact with a filesystem -- with higher layers focused on _policy_ -- when to interact and how to interpret the result. |
| 76 | + |
| 77 | +### Cheap Path Handling |
| 78 | + |
| 79 | +In direct contrast to vfscore's lookup logic, operations across the new filesystem stack aim to never copy data unless strictly needed. |
| 80 | +Lookups exclusively use the constant path provided by callers, directly passing (slices of) it down to driver code. |
| 81 | +As a complementary measure, `readlink` is also internally zero-copy, guaranteeing that all lookups can be performed without any temporary buffers. |
| 82 | + |
| 83 | +This mindset goes beyond memory usage, with all filenames or paths in the ukfs API being passed and returned non-terminated along with their length, as opposed to common NUL-terminated C strings. |
| 84 | +In addition to enabling elegant slicing of const strings, this permits us to use a single `str(n)len` at the appropriate abstraction level where C strings are received from userspace, avoiding the current excess of iterations over the same string that would make [Shlemiel the painter](https://www.joelonsoftware.com/2001/12/11/back-to-basics/) proud. |
| 85 | + |
| 86 | +### Locality & Lookups |
| 87 | + |
| 88 | +On the topic of paths, and again in direct contrast with vfscore, the concept of an "absolute path" is completely foreign to a ukfs driver. |
| 89 | +Indeed, a filesystem driver need not know or care about higher level concepts like `/` or the VFS; its responsibilities begin and end at "how to lookup a path below one of its nodes". |
| 90 | +As such, all lookups in `ukfs` are relative to a base node, without exceptions. |
| 91 | +This natively supports relative lookups used by modern syscalls without the compute and space overhead of building a "real absolute path". |
| 92 | + |
| 93 | +This focus on locality goes beyond relative paths: all `ukfs` operations are relative to a target node, and each node is the authoritative source of its "ops table". |
| 94 | +Higher levels (such as `posix-vfs`) are responsible for global concepts like "the filesystem root" required for absolute paths, or "current working directory" required for implicit relative paths. |
| 95 | + |
| 96 | +Mounts in particular are an interesting case, as live filesystems need to know, at least to some degree, whether a node of theirs is a mount point, in which case lookup stops and the condition is signalled. |
| 97 | +What precisely to do in response is entirely up to the caller: whether to traverse the mount point, signal error, or something entirely different, all fall under the umbrella of "policy" and thus outside the scope of what a filesystem driver cares about. |
| 98 | +This separation ensures relative lookups behave as expected after a mount without needing complex bookkeeping on part of the higher VFS layer. |
| 99 | + |
| 100 | +### Driver Templates |
| 101 | + |
| 102 | +The `ukfs` API has all operations output filesystem nodes as raw `ukfile` instances, giving drivers considerable power and freedom to dictate the behaviour of their files. |
| 103 | +But with great power comes great responsibility, one that some drivers may not wish to burden themselves with; a non-exhaustive list of these responsibilities is: |
| 104 | + |
| 105 | +- volume-wide state |
| 106 | +- volume lifetime management |
| 107 | +- driver-internal node representation |
| 108 | +- public runtime state (locks, etc.) |
| 109 | +- lifetime management (refcounting semantics) |
| 110 | +- ukfs runtime volatile state (mounts, etc.) |
| 111 | + |
| 112 | +For such cases, `ukfs` provides driver templates -- code generation macros that provide generic boilerplate code and "impedance match" between the `ukfile`/`ukfs` API and a more natural, bespoke interface for the driver in question. |
| 113 | +This allows a driver to focus on the abstraction layer it most naturally works at, without compromising its performance, nor the flexibility of other drivers in the stack. |
| 114 | + |
| 115 | +## New Libraries |
| 116 | + |
| 117 | +As part of this full-stack release, we introduced several new core libraries: |
| 118 | + |
| 119 | +- `ukfs` -- filesystem API; compile-time driver registration; runtime driver lookup |
| 120 | +- `ukfs-ramfs` -- memory-resident volatile filesystem |
| 121 | +- `ukfs-devfs` -- dedicated ramfs for special/device files |
| 122 | +- `posix-vfs` -- Virtual File System (VFS) API |
| 123 | +- `posix-vfs-fstab` -- mount filesystems at boot |
| 124 | +- `uksparsebuf` -- utility lib for managing sparse buffers; used by filesystem drivers |
| 125 | +- `ukpod` -- utility lib for managing demand-paged memory decoupled from `ukvmem`; used by filesystem drivers |
| 126 | + |
| 127 | +Their `README.md` files offer a more detailed explanation of their design for the technically curious, as well as pointing to the relevant API headers for the _very_ technically curious. |
| 128 | + |
| 129 | +## Limitations |
| 130 | + |
| 131 | +While we encourage users to migrate to the new VFS stack, there are two important limitations to take into account at this time: |
| 132 | + |
| 133 | +- No shimming with `vfscore` -- unlike existing logic in `posix-fdtab`, which seamlessly shims between legacy vfscore files and new ukfiles, there is no similar support for `ukfs` and `vfscore` filesystems to coexist in the same build. A user must choose one VFS stack or the other; this point is especially relevant since |
| 134 | +- No persistent drivers -- this release does not include `ukfs` drivers for any host-persistent filesystems (equivalent to legacy `9pfs`). Users of these should stick to vfscore for now. |
| 135 | + |
| 136 | +## Ending Thoughts |
| 137 | + |
| 138 | +The new VFS stack included in 0.20 is the culmination of almost 2 years of development and marks an important milestone -- real-world applications running entirely on the new stack. |
| 139 | +This is merely the groundwork for more to come, and we are excited to continue the work on more features, performance improvements, and the long-awaited deprecation and retirement of vfscore. |
0 commit comments