Skip to content

Commit 6acb5e0

Browse files
andreittrStefanJum
authored andcommitted
blog: Add blog post on new vfs stack
Signed-off-by: Andrei Tatar <ttr@unikraft.io>
1 parent f8e4cea commit 6acb5e0

File tree

1 file changed

+139
-0
lines changed

1 file changed

+139
-0
lines changed
Lines changed: 139 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,139 @@
1+
---
2+
title: "Unikraft Filesystem Stack"
3+
description: This blog post provides a technical overview of the new VFS stack introduced in Unikraft 0.20.0.
4+
publishedDate: 2025-09-09
5+
image:
6+
authors:
7+
- Andrei Tatar
8+
tags:
9+
- filesystem
10+
- vfs
11+
- libukfs
12+
- libposix-vfs
13+
- libvfscore
14+
---
15+
16+
# The Unikraft Filesystem Stack
17+
18+
Files play a pivotal role in how applications and the kernel interact.
19+
As the old adage goes, "everything is a file".
20+
Indeed, on POSIX systems one can scarcely interact with the broader system without a file of some sort being involved.
21+
This ubiquity is not accidental, as files offer an appealing abstraction over a large and diverse number of resources external to an application.
22+
Whether representing persistent storage media, network connections, serial consoles, or kernel state, files are central to applications talking to the outside world.
23+
Furthermore, all but the most trivial of applications make extensive use of the filesystem -- a tree-like abstraction that maps hierarchies of file names ("paths") to actual files.
24+
25+
In Unikraft the file(system) stack has been traditionally handled by the fairly monolithic `vfscore` library, whose design and history saddle us with some unfortunate limitations.
26+
With Unikraft release 0.16.0 Telesto we started addressing these fundamental issues, migrating sockets and pseudofiles to a new, more modular file stack built around `ukfile`.
27+
Filesystems however required more careful consideration (and a lot more dev work) to get right, and as such, we have since been hard at work behind the scenes to bring the new VFS stack to life.
28+
That is, until now.
29+
30+
We are excited to release this modernized filesystem stack as part of [Unikraft 0.20.0 Kiviuq](https://unikraft.org/blog/2025-09-08-unikraft-releases-v0.20.0), bringing with it new features, better performance, and a solid base for future improvements.
31+
32+
## Status Quo, `vfscore` & its Limitations
33+
34+
While vfscore has [quite the storied past](https://unikraft.org/blog/2023-06-09-tales-of-open-source-vfscore) and it has served the project well for many years, over time fundamental limitations of its design have become more and more apparent, limiting and sometimes outright hindering new development.
35+
Here we attempt to give a non-exhaustive overview of the most relevant of these limitations, following up with how we addressed these in the design of the new stack.
36+
37+
#### Insufficient Abstraction
38+
39+
In vfscore, a file's open state (e.g., `lseek` position) and file descriptor are tightly bound to the file object, appearing as fields in its struct.
40+
In addition to being a redundant source of truth with the fdtab, this tight coupling suggests a 1:1 relationship that is not really there.
41+
In truth, files, open file descriptions, and file descriptors are three different concepts, and vfscore's design masks two 1:N relationships -- a file may be referenced by any number of open file descriptions, each of which in turn can be referenced by any number of file descriptors.
42+
This limitation is addressed in the ukfile stack by `posix-fd` + `posix-fdtab`, with the feature now available to filesystem nodes as well.
43+
44+
#### Files == Paths
45+
46+
In a similar limitation to the above, vfscore views the filesystem as a _reversible_ mapping of paths to files, implying another 1:1 relationship that does not exist in practice.
47+
Hardlinks are a trivial counterexample to this assumption, and a feature lacking in previous versions.
48+
Another, more subtle consequence is the inability of vfscore to mount on top of a non-empty directory, or to handle bind mounts.
49+
50+
#### Absolute Lookups
51+
52+
Building on its assumptions about the mapping of paths to files, vfscore treats all lookups as absolute, roughly following two steps: (1) look up absolute path prefix in mount table to determine mount root, and (2) delegate lookup relative to mount root to driver.
53+
This becomes most unfortunate when doing relative lookups, as the VFS code must spend considerable time building an absolute path before doing anything else, a process that resets and repeats every time when encountering a symlink.
54+
With the recent proliferation of `*at` syscalls in Linux that focus on relative lookup & operations, coupled with encouragement of their use over their legacy absolute path counterparts, this extra overhead becomes more and more unavoidable.
55+
56+
#### Monolithic Nature
57+
58+
Unlike most Unikraft core libraries, and counter to the unikernel philosophy, vfscore is unusually monolithic, bearing responsibility across many abstraction layers.
59+
While the inherent complexity of a VFS warrants some level of tight coupling, the amount of vertical integration in vfscore is excessive and the overall architecture would benefit from clearly defined and documented interfaces between layers.
60+
61+
## Unikraft Filesystem Stack
62+
63+
To address vfscore's issues, as well as to lay the groundwork for future development, we introduce the Unikraft filesystem stack, anchored by two core libraries:
64+
65+
- `ukfs` - what is _a_ filesystem; driver registration & lookup
66+
- `posix-vfs` what is _the_ filesystem (VFS); all userspace-facing operations
67+
68+
Describing the entire design in detail would take far more than one blog post, but we would like to highlight some of the more pertinent or unique considerations.
69+
70+
### Modularity, Mechanism, and Policy
71+
72+
A first important issue is breaking up vfscore's responsibilities into dedicated orthogonal components.
73+
Compile-time driver registration, global VFS state, and the fstab loaded at boot are all entirely different concepts that should be separated by defined interfaces.
74+
75+
Informing the decision on where to draw boundaries between components, we focused on having ukfs drivers provide _mechanism_ -- how to interact with a filesystem -- with higher layers focused on _policy_ -- when to interact and how to interpret the result.
76+
77+
### Cheap Path Handling
78+
79+
In direct contrast to vfscore's lookup logic, operations across the new filesystem stack aim to never copy data unless strictly needed.
80+
Lookups exclusively use the constant path provided by callers, directly passing (slices of) it down to driver code.
81+
As a complementary measure, `readlink` is also internally zero-copy, guaranteeing that all lookups can be performed without any temporary buffers.
82+
83+
This mindset goes beyond memory usage, with all filenames or paths in the ukfs API being passed and returned non-terminated along with their length, as opposed to common NUL-terminated C strings.
84+
In addition to enabling elegant slicing of const strings, this permits us to use a single `str(n)len` at the appropriate abstraction level where C strings are received from userspace, avoiding the current excess of iterations over the same string that would make [Shlemiel the painter](https://www.joelonsoftware.com/2001/12/11/back-to-basics/) proud.
85+
86+
### Locality & Lookups
87+
88+
On the topic of paths, and again in direct contrast with vfscore, the concept of an "absolute path" is completely foreign to a ukfs driver.
89+
Indeed, a filesystem driver need not know or care about higher level concepts like `/` or the VFS; its responsibilities begin and end at "how to lookup a path below one of its nodes".
90+
As such, all lookups in `ukfs` are relative to a base node, without exceptions.
91+
This natively supports relative lookups used by modern syscalls without the compute and space overhead of building a "real absolute path".
92+
93+
This focus on locality goes beyond relative paths: all `ukfs` operations are relative to a target node, and each node is the authoritative source of its "ops table".
94+
Higher levels (such as `posix-vfs`) are responsible for global concepts like "the filesystem root" required for absolute paths, or "current working directory" required for implicit relative paths.
95+
96+
Mounts in particular are an interesting case, as live filesystems need to know, at least to some degree, whether a node of theirs is a mount point, in which case lookup stops and the condition is signalled.
97+
What precisely to do in response is entirely up to the caller: whether to traverse the mount point, signal error, or something entirely different, all fall under the umbrella of "policy" and thus outside the scope of what a filesystem driver cares about.
98+
This separation ensures relative lookups behave as expected after a mount without needing complex bookkeeping on part of the higher VFS layer.
99+
100+
### Driver Templates
101+
102+
The `ukfs` API has all operations output filesystem nodes as raw `ukfile` instances, giving drivers considerable power and freedom to dictate the behaviour of their files.
103+
But with great power comes great responsibility, one that some drivers may not wish to burden themselves with; a non-exhaustive list of these responsibilities is:
104+
105+
- volume-wide state
106+
- volume lifetime management
107+
- driver-internal node representation
108+
- public runtime state (locks, etc.)
109+
- lifetime management (refcounting semantics)
110+
- ukfs runtime volatile state (mounts, etc.)
111+
112+
For such cases, `ukfs` provides driver templates -- code generation macros that provide generic boilerplate code and "impedance match" between the `ukfile`/`ukfs` API and a more natural, bespoke interface for the driver in question.
113+
This allows a driver to focus on the abstraction layer it most naturally works at, without compromising its performance, nor the flexibility of other drivers in the stack.
114+
115+
## New Libraries
116+
117+
As part of this full-stack release, we introduced several new core libraries:
118+
119+
- `ukfs` -- filesystem API; compile-time driver registration; runtime driver lookup
120+
- `ukfs-ramfs` -- memory-resident volatile filesystem
121+
- `ukfs-devfs` -- dedicated ramfs for special/device files
122+
- `posix-vfs` -- Virtual File System (VFS) API
123+
- `posix-vfs-fstab` -- mount filesystems at boot
124+
- `uksparsebuf` -- utility lib for managing sparse buffers; used by filesystem drivers
125+
- `ukpod` -- utility lib for managing demand-paged memory decoupled from `ukvmem`; used by filesystem drivers
126+
127+
Their `README.md` files offer a more detailed explanation of their design for the technically curious, as well as pointing to the relevant API headers for the _very_ technically curious.
128+
129+
## Limitations
130+
131+
While we encourage users to migrate to the new VFS stack, there are two important limitations to take into account at this time:
132+
133+
- No shimming with `vfscore` -- unlike existing logic in `posix-fdtab`, which seamlessly shims between legacy vfscore files and new ukfiles, there is no similar support for `ukfs` and `vfscore` filesystems to coexist in the same build. A user must choose one VFS stack or the other; this point is especially relevant since
134+
- No persistent drivers -- this release does not include `ukfs` drivers for any host-persistent filesystems (equivalent to legacy `9pfs`). Users of these should stick to vfscore for now.
135+
136+
## Ending Thoughts
137+
138+
The new VFS stack included in 0.20 is the culmination of almost 2 years of development and marks an important milestone -- real-world applications running entirely on the new stack.
139+
This is merely the groundwork for more to come, and we are excited to continue the work on more features, performance improvements, and the long-awaited deprecation and retirement of vfscore.

0 commit comments

Comments
 (0)