This effectively fixes the majority of all VM tests which were broken
because `/dev/vda` (or any other block device) wasn't mountable:
machine # mounting /dev/vda on /...
machine # mount: mounting /dev/vda on /mnt-root/ failed: No such device[ 2.820976] Kernel panic - not syncing: Attempted to kill init! exitcode=0x00000100
machine # [ 2.821757] CPU: 0 PID: 1 Comm: init Not tainted 5.10.72 #1-NixOS
machine # [ 2.821757] Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS rel-1.14.0-0-g155821a1990b-prebuilt.qemu.org 04/01/2014
machine # [ 2.821757] Call Trace:
machine # [ 2.821757] dump_stack+0x6b/0x83
machine # [ 2.821757] panic+0x101/0x2c8
machine # [ 2.821757] do_exit.cold+0x14/0xb3
machine # [ 2.821757] do_group_exit+0x33/0xa0
machine # [ 2.821757] __x64_sys_exit_group+0x14/0x20
machine # [ 2.821757] do_syscall_64+0x33/0x40
machine # [ 2.821757] entry_SYSCALL_64_after_hwframe+0x44/0xa9
machine # [ 2.821757] RIP: 0033:0x7f67ec2800f6
machine # [ 2.821757] Code: 00 4c 8b 0d 2c 5d 11 00 eb 19 66 2e 0f 1f 84 00 00 00 00 00 89 d7 89 f0 0f 05 48 3d 00 f0 ff ff 77 22 f4 89 d7 44 89 c0 0f 05 <48> 3d 00 f0 ff ff 76 e2 f7 d8 64 41 89 01 eb da 66 2e 0f 1f 84 00
machine # [ 2.821757] RSP: 002b:00007fff8f5a71d8 EFLAGS: 00000202 ORIG_RAX: 00000000000000e7
machine # [ 2.821757] RAX: ffffffffffffffda RBX: 0000000000699704 RCX: 00007f67ec2800f6
machine # [ 2.821757] RDX: 0000000000000001 RSI: 000000000000003c RDI: 0000000000000001
machine # [ 2.821757] RBP: 0000000000000004 R08: 00000000000000e7 R09: ffffffffffffff80
machine # [ 2.821757] R10: 00007f67ec33f3e0 R11: 0000000000000202 R12: 000000000000000b
machine # [ 2.821757] R13: 00007fff8f5a75a8 R14: 0000000000000000 R15: 00000000004fc198
machine # [ 2.821757] Kernel Offset: 0x31e00000 from 0xffffffff81000000 (relocation range: 0xffffffff80000000-0xffffffffbfffffff)
machine # [ 2.821757] Rebooting in 1 seconds..
This happened because the kernel failed to load modules such as `ext4`
from `boot.initrd.availableKernelModules`[1] on e.g. a `mount(2)` syscall.
The problem is that `kmod` isn't linked against `libpthread.so.0`
anymore because it got merged into `libc.so.6` (however, the .so still
exists), but still needs it:
machine # newfstatat(AT_FDCWD, "/nix/store/eeeeeeeeeeeeeeeeeeeeeeeeeeeeeeee-glibc-2.34-36/lib/x86_64", 0x7ffd951114c0, 0) = -1 ENOENT (No such file or directory)
machine # openat(AT_FDCWD, "/nix/store/eeeeeeeeeeeeeeeeeeeeeeeeeeeeeeee-glibc-2.34-36/lib/x86_64/libpthread.so.0", O_RDONLY|O_CLOEXEC) = -1 ENOENT (No such file or directory)
machine # newfstatat(AT_FDCWD, "/nix/store/eeeeeeeeeeeeeeeeeeeeeeeeeeeeeeee-glibc-2.34-36/lib/x86_64", 0x7ffd951114c0, 0) = -1 ENOENT (No such file or directory)
machine # openat(AT_FDCWD, "/nix/store/eeeeeeeeeeeeeeeeeeeeeeeeeeeeeeee-glibc-2.34-36/lib/libpthread.so.0", O_RDONLY|O_CLOEXEC) = -1 ENOENT (No such file or directory)
machine # newfstatat(AT_FDCWD, "/nix/store/eeeeeeeeeeeeeeeeeeeeeeeeeeeeeeee-glibc-2.34-36/lib", 0x7ffd951114c0, 0) = -1 ENOENT (No such file or directory)
machine # openat(AT_FDCWD, "/nix/store/eeeeeeeeeeeeeeeeeeeeeeeeeeeeeeee-glibc-2.34-36/etc/ld.so.cache", O_RDONLY|O_CLOEXEC) = -1 ENOENT (No such file or directory)
machine # writev(2, [{iov_base="/nix/store/kdc9n48ksdc1a8y8w512w"..., iov_len=69}, {iov_base=": ", iov_len=2}, {iov_base="error while loading shared libra"..., iov_len=36}, {iov_base=": ", iov_len=2}, {iov_base="libpthread.so.0", iov_len=15}, {iov_base=": ", iov_len=2}, {iov_base="cy
machine # ) = 184
machine # exit_group(127) = ?
machine # +++ exited with 127 +++
machine # mount: mounting /dev/vda on /mnt-root/ failed: No such device
machine # [ 19.167180] Kernel panic - not syncing: Attempted to kill init! exitcode=0x00000100
machine # [ 19.167711] CPU: 0 PID: 1 Comm: init Not tainted 5.10.72 #1-NixOS
This is not a problem
* inside stage-1 because `LD_LIBRARY_PATH` points to `$out/lib` of
extra-utils where `libpthread.so.6` also exists.
* on a running system because `${pkgs.glibc}/lib` is part of kmod's
rpath.
However this is a problem inside the kernel which calls `modprobe` (in
our case `kmod`) to load modules and doesn't know about
`LD_LIBRARY_PATH`. Also, the rpath-reference was nuked.
To work around this, the kernel's `modprobe`
(i.e. `/proc/sys/kernel/modprobe`) now points to a wrapper which
explicitly declares `LD_LIBRARY_PATH`. We can't use `makeWrapper` here
because `modprobe` itself must not be renamed. Otherwise, `kmod` (which
is the link-target of `modprobe`) won't work because it expects
`argv[0] == "modprobe"` to perform modprobe's tasks.
[1] https://nixos.org/manual/nixos/stable/options.html#opt-boot.initrd.availableKernelModules
This build was also broken by a libc const that isn't a number anymore
and thus can't be used at places where a constant value is needed:
automount.c:86:37: error: initializer element is not constant
Failing Hydra build: https://hydra.nixos.org/build/153253104
The issue was that `SIGSTKSZ` isn't an actual const anymore and thus
can't be used to define sizes of static variables such as
static char foo[SIGSTKSZ];
since this results in compiler errors such as
error: array bound is not an integer constant before ']' token
Fedora worked around this by hard-coding the value in `catch`. Since
this is mainly a testing-framework and there's no other fix for v1 - we
should eventually remove it entirely in favor of v2 anyways - I guess
this is a good-enough fix.
Failing Hydra build: https://hydra.nixos.org/build/152455108
This is a problem that seems to be related to the most recent `gdb`
update in staging from 10.2 to 11.1[1] where `gdb` started to fail
during build with the following message:
checking for stdlib.h... In file included from /nix/store/vf96x4h90fm7bwf5zvfx8zb82fm1p21j-glibc-2.34-5-dev/include/signal.h:328,
from ../../gnulib/import/signal.h:52,
from targ-map.c:7:
targ-map.c:412:17: error: initializer element is not constant
412 | { "SIGSTKSZ", SIGSTKSZ, TARGET_SIGSTKSZ },
| ^~~~~~~~
targ-map.c:412:17: note: (near initialization for 'cb_init_signal_map[18].host_val')
Since I couldn't find any patches in the upstream repo or for other
repos - according to Repology we seem to be the only distro trying to
ship `gdb-11` with `glibc-2.34` - so I found the culprit while bisecting
`gdb` which seems to be commit `a0e674c1`[2].
It seems as if the entire `sim/`-subtree is now built by default if
`--enable-targets=all` is set (which we do for cross debugging). However
it also generates a file called `targ-map.c` referencing `SIGSTKSZ`
assuming that it's const, although this is not the case anymore with
`glibc-2.34`[3].
Since I don't really understand, what precisely is going on in there and
there are no patches available I decided to switch back to the 10.2
behavior here and disable the feature by specifying `--disable-sim` as
configure flag.
Failing Hydra build: https://hydra.nixos.org/build/153893135
[1] 43b96f66ef
[2] https://sourceware.org/git/?p=binutils-gdb.git;a=commit;h=a0e674c1ce2c877426f8a861c5294c535c5d49e6
[3] see https://sourceware.org/pipermail/libc-alpha/2021-August/129718.html
Rather than `$BASH`, `glibc` now hardcodes `/bin/bash` as
interpreter[1] in several scripts (including `ldd`).
This is a problem because relevant programs such as `ldd(1)` won't work
properly without this since we set `BASH` to `/bin/sh` to avoid
dependencies to the bootstrap tools for `bash` as runtime-dependency (since
NixOS doesn't have `/bin/bash`).
Considering that this was only done as an "improvement" to their
build-system and not because they wanted to use some bashisms here (the
variable was always called `BASH` and we still used `/bin/sh` anyways),
I'd consider this to be relatively safe.
[1] 5188a9d0265cc6f7235a8af1d31ab02e4a24853d
To quote the release-notes[1]:
> When _DYNAMIC_STACK_SIZE_SOURCE or _GNU_SOURCE are defined,
> PTHREAD_STACK_MIN is no longer constant and is redefined to
> sysconf(_SC_THREAD_STACK_MIN). This supports dynamic sized register
> sets for modern architectural features like Arm SVE.
This basically means that if the above applies, `#if PTHREAD_STACK_MIN > 0`
won't compile anymore because `PTHREAD_STACK_MIN` isn't a hard-coded
number, but `__sysconf (__SC_THREAD_STACK_MIN_VALUE)`[2].
The issue (for 1.69, 1.70, 1.72 - the other versions seem OK) was
reported upstream, but only for Solaris[3], however the corresponding
patches[4] seem to work as well for us.
Failing Hydra build: https://hydra.nixos.org/build/150926294
[1] https://sourceware.org/pipermail/libc-alpha/2021-August/129718.html
[2] See `${pkgs.glibc.dev}/include/bits/pthread_stack_min-dynamic.h`
[3] https://github.com/boostorg/thread/issues/283
[4] https://github.com/conan-io/conan-center-index/pull/361
This "-D__nonnull\\(params\\)=" leads to a compilation failure in e.g.
the configure phase:
configure:21131: gcc -c -D__nonnull\(params\)= conftest.c >&5
<command-line>: warning: ISO C99 requires whitespace after the macro name
<command-line>: error: stray '\' in program
<command-line>: error: expected ',' or ';' before '(' token
<command-line>: error: stray '\' in program
According to the commit this isn't even needed on Linux.
I confirmed that this is an (expectable) glibc-2.34 thing by checking
that
* the issue doesn't occur with gcc 10/11 on a recent glibc-2.33 staging.
* the issue DOES occur in a docker container with Fedora rawhide (which
has glibc 2.34 and gcc 11).
Linking via `-lpthread` (or `-pthread`) is not needed anymore since
`glibc-2.34` since all the functionality is part of `libc.so.6` and
`libpthread.so.6` only exists for backwards-compatibility.
However, e.g. `gcc` (`libgomp` to be precise) expects a `libpthread.so`
to link against, otherwise the configure script will fail. As already
stated in the glibc release-notes itself, it is to expect that a lot
more applications will have issues with this, so I decided to re-add
`libpthread.so` as well.
For `librt.so.1`, the same thing is needed to make sure that Perl still
compiles:
/nix/store/d6y5r7m93x14bmgn2p75fannz39jz66f-binutils-2.35.1/bin/ld: cannot find -lrt
collect2: error: ld returned 1 exit status
make[1]: *** [Makefile:490: ../../lib/auto/Time/HiRes/HiRes.so] Error 1
make[1]: Leaving directory '/build/perl-5.34.0/dist/Time-HiRes'
This fix is needed to work around linker-errors such as
undefined reference to `__libc_csu_fini'
which I got in almost every derivation which is part of stage2. The
reasoning behind this is that the startup-code was simplified[1] and
thus `__libc_csu_fini` doesn't exist anymore.
A workable solution is to use a newer libc which properly links in
stage3. And actually this seems expected given the rationale for stage3:
# Construct a third stdenv identical to the 2nd, except that this
# one uses the rebuilt Glibc from stage2. It still uses the recent
# binutils and rest of the bootstrap tools, including GCC.
So this patch basically overrides the libraries inside `gcc-unwrapped` -
which is basically the bootstrap tools and thus also contains the libc
used in stage3 - with the shared objects from the freshly built libc
from stage2.
[1] https://sourceware.org/pipermail/libc-alpha/2021-March/123079.html