Building a root only system

Some of the embedded systems that run Linux, usually have to run a small number of specific processes. This kinds of tasks usually are run only by the root user, with full permissions. As controversial as it may seem, adding an option to run a root-only Linux kernel may prove to be a valuable feature for some applications.

The kernel API for retrieving group and user ids is based on these two functions:

static inline uid_t __kuid_val(kuid_t uid)
static inline gid_t __kgid_val(kgid_t gid)

These two functions return actual uid/gid number from the kuid_t/gkid_t structures. As all the permission checks are done using wrappers over these functions, a sensitive idea is to make them always return the root uid/gid (0) in a root-only system. This way a great amount of code would be shed by constant folding procedure in the compiler.

Because many of the permission checks are done for the 0 uid/gid, the code handling the non-zero case won’t ever be executed so it can be removed. As the bloat-o-meter script shown, this change only removes around 25k from the final kernel image. Considering a tiny build has around 1000k uncompressed, this apparently trivial change gets to decrease the kernel size by 2.5%.

The patch implementing this change also removes code that is useful only in multi-user systems, such as uid and gid related syscalls and capabilities. If the community sees value in this change, it should be included in the next release.

Identifying syscalls (part2)

First problem that appears in the previous post is that objdump is arch specific, so decompiling for ARM, for example, would need a different implementation of objdump. This is why, in order find all the system calls made in userspace, it is better to use nm, which will include all the calls to libc.

In order to keep a list consisting only of syscalls, we will intersect the ouput of nm with a list resulting from a simple grep in kernel/sys_ni.c that gives us all the possible syscalls that can be conditionally compiled. And this will filter out the first obtained list. So we will have something similar with:

[‘uselib’, ‘io_submit’, ‘io_setup’, ‘madvise’] (1)

list of all syscalls from kernel/sys_ni.c (2)

(2) \ ((1) ∩ (2)) => [list of all syscalls that we don’t need to compile in]

Furthermore, we need to match each syscall with the corresponding symbols that compile it out. This is obtained by parsing all source files and Makefiles in the kernel tree, following the next steps:

– use a stack in order to know between which ifdef and endif a syscall is defined;

– keep a dictionary where the key is the syscall and the values are all the symbols that it depends on and the conditionals between them;

Having all of these done, we can easily combine them and obtain two simple lists [1]. The output is only a suggestion, as opposed to automatically setting the given symbols to ‘no’, for two reasons:


– some of those symbols that can be set to ‘no’ (considering syscalls) may compile out some code that is useful for the developer;

– the obtained Kconfig options can have dependencies which need to be solved by hand.