What is this?
See Nix Adventures Part 1 for the introduction for all of this.
Adventure: Standing up a stable-diffusion-webui server on a new host
This is going to loosely be broken up into two parts:
- Build an
x86_64-linuxori686-linuximage for the new host. - Improve upon the existing
stable-diffusion-webuiNix Flakes setup such that the server can be installed and configured via Nix itself.
Preparing the Host
Preparing the host entails getting cross compilation working on the host repository. Since I’ve done that earlier for a Raspberry Pi image, I want this new host’s base image to leverage that prior work. As part of doing this, I expect to refactor the repository a bit such that reusable bits needn’t be repeated, and there are clean, free standing modules that can be included a la carte for new hosts.
I was tempted to call this a build repository, but that’s not very accurate at all. It does builds, yes, but building isn’t really its purpose. Its purpose is to declare the state of hosts on my network. Depending on the state of the hosts, this could entail building.
Refactor the Host repository
On our last adventure, I created a generator for a Raspberry Pi compatible image with this:
packages.aarch64-linux = {
iron = nixos-generators.nixosGenerate {
format = "sd-aarch64";
modules = [
./iron-configuration.nix
];
system = "aarch64-linux";
};
};
I am going to try just adlibing in some things for an x86_64-linux platform
instead, since it’s going to an x86-64 host and NixOS is Linux. The code
below will sit adjacent to the declaration for the iron host. This host is
called lithium.
packages.x86_64-linux = {
lithium = nixos-generators.nixosGenerate {
format = "sd-x86_64";
modules = [
./lithium-configuration.nix
];
system = "x86_64-linux";
};
};
One thing I want to do is start refactoring bits that don’t need to be specific to a given host. One example is the configuration that declares my user and its SSH key. This is that section of interest:
users.users = {
logan = {
# TODO: You can set an initial password for your user.
# If you do, you can skip setting a root password by passing
# '--no-root-passwd' to nixos-install.
# Be sure to change it (using passwd) after rebooting!
initialPassword = "lolno";
isNormalUser = true;
openssh.authorizedKeys.keys = [
"ssh-rsa AAAAB3NzaC1yD2EAAAADAQABAAACAQOx2dxH8oP1406bie6eO3HB6fin4NY01laNiWRqcNsrRl6/M6e80wiTnG9u0Walb3JXegyqrHKIlFgvcrn2Tg/y944akJ/XqrcLPn3vwTcCV6XGI/1hPdcN0V156pbbnTS/T9y9btO+QJvELOjT4dET6HixBeBpGhLM95cirOrJjT2C6VVBYTGdAu3eKwCeDsjQtfKOHp9Huv0c1i57Fb13iTU1u0+L2o+LMYpS8YNbcBOgzx9FyyjvA/KuEVcyt2raVpbJv6nOP9ynz7a1Ja3Y2tgQwC6XCMpgKYHDYxaJhJbWjv9cxwq4zSzBr8yrlDKooqvpp9fTdOBAWF4R2MI2wb01yaaTlqPDcATBl5+Xu+SvxYf9wBt6wFIbv0baf1WtDDE7u9d2K/MJhShK9p45AQPTbmoYw7fzeMQOLdZNdZdXIOHWd17IJi2T+WnnO9hL1x+M5uZUlFlk0jGu0NP/YmHuWjGxxL7AIO1hH2q7ZHq7tzM+8sV6tjfGePwALFXSBBSGn2czgtfKzEVRFHBQajPco0g9zFWvi5ZfmU4QAkWOrQQFLEYK4IE0e1gR9Dsnqdm5tiYkCdVlapbG9jWdIBAgOCMj2bBXn+YObCrbVHW4wNo5OR6nec+b6miCuG23ue/o5j2L64kE16n1+hGx/Bbm0Adif4vw8zXVhAmxvQ== logan@scandium"
];
extraGroups = [
# Allow this user to sudo.
"wheel"
];
};
};
I can refactor this to a logan.nix by moving it to its own file and
surrounding it with a function:
{ ... }: {
users.users = {
logan = {
# TODO: You can set an initial password for your user.
# If you do, you can skip setting a root password by passing
# '--no-root-passwd' to nixos-install.
# Be sure to change it (using passwd) after rebooting!
initialPassword = "lolno";
isNormalUser = true;
openssh.authorizedKeys.keys = [
"ssh-rsa AAAAB3NzaC1yD2EAAAADAQABAAACAQOx2dxH8oP1406bie6eO3HB6fin4NY01laNiWRqcNsrRl6/M6e80wiTnG9u0Walb3JXegyqrHKIlFgvcrn2Tg/y944akJ/XqrcLPn3vwTcCV6XGI/1hPdcN0V156pbbnTS/T9y9btO+QJvELOjT4dET6HixBeBpGhLM95cirOrJjT2C6VVBYTGdAu3eKwCeDsjQtfKOHp9Huv0c1i57Fb13iTU1u0+L2o+LMYpS8YNbcBOgzx9FyyjvA/KuEVcyt2raVpbJv6nOP9ynz7a1Ja3Y2tgQwC6XCMpgKYHDYxaJhJbWjv9cxwq4zSzBr8yrlDKooqvpp9fTdOBAWF4R2MI2wb01yaaTlqPDcATBl5+Xu+SvxYf9wBt6wFIbv0baf1WtDDE7u9d2K/MJhShK9p45AQPTbmoYw7fzeMQOLdZNdZdXIOHWd17IJi2T+WnnO9hL1x+M5uZUlFlk0jGu0NP/YmHuWjGxxL7AIO1hH2q7ZHq7tzM+8sV6tjfGePwALFXSBBSGn2czgtfKzEVRFHBQajPco0g9zFWvi5ZfmU4QAkWOrQQFLEYK4IE0e1gR9Dsnqdm5tiYkCdVlapbG9jWdIBAgOCMj2bBXn+YObCrbVHW4wNo5OR6nec+b6miCuG23ue/o5j2L64kE16n1+hGx/Bbm0Adif4vw8zXVhAmxvQ== logan@scandium"
];
extraGroups = [
# Allow this user to sudo.
"wheel"
];
};
};
}
It needs to be a function because that’s what’s expected in the modules list.
I’ve divined that by looking at the only module I have so far. A common idiom
I’ve seen in Nix is where there is a special API invoked, and that API provides
all of the dependency injection for the function. We don’t need any of it here,
so we can use ... for the entirety of the argument list.
And then include it in both places with:
packages.aarch64-linux = {
iron = nixos-generators.nixosGenerate {
format = "sd-aarch64";
modules = [
./logan.nix
./iron-configuration.nix
];
system = "aarch64-linux";
};
};
packages.x86_64-linux = {
lithium = nixos-generators.nixosGenerate {
format = "sd-x86_64";
modules = [
./logan.nix
./lithium-configuration.nix
];
system = "x86_64-linux";
};
};
I can do the same thing with the sshd configuration. The end result looks
like this for sshd.nix:
{ ... }: {
# This setups a SSH server.
services.openssh = {
enable = true;
settings = {
# Forbid root login through SSH.
PermitRootLogin = "no";
# Use keys only. Remove if you want to SSH using password (not
# recommended).
PasswordAuthentication = false;
};
};
}
With the host configuration expanding just a tad:
packages.aarch64-linux = {
iron = nixos-generators.nixosGenerate {
format = "sd-aarch64";
modules = [
./logan.nix
./sshd.nix
./iron-configuration.nix
];
system = "aarch64-linux";
};
};
packages.x86_64-linux = {
lithium = nixos-generators.nixosGenerate {
format = "sd-x86_64";
modules = [
./logan.nix
./sshd.nix
./lithium-configuration.nix
];
system = "x86_64-linux";
};
};
At some point I might bundle a std-linux-env.nix or something similar that
comes with all of these, because I never expect them to change. I do like
pulling in these modules a la carte for now.
To get a start, I’ll copy over some configuration from iron-configuration.nix
and clean things up as I go. My first material configuration for
lithium-configuration.nix is this:
# This is the NixOS configuration for lithium.proton. It is drawn from the
# example here:
# https://github.com/Misterio77/nix-starter-configs/blob/main/minimal/nixos/configuration.nix
{
config,
inputs,
lib,
pkgs,
...
}: {
imports = [
./hardware-configuration.nix
];
}
I noticed hardware-configuration.nix has this:
{
fileSystems."/" = {
# Must match what sd-image expects exactly. This is found by trying to run
# anything and then encountering an error.
device = "/dev/disk/by-label/NIXOS_SD";
fsType = "ext4";
};
nixpkgs.hostPlatform = "aarch64-linux";
}
The aarch64-linux declaration doesn’t work for my x86_64-linux image I’m
about to create. But it’s easy enough to make this a function and simply pass
the system down into it. The new version becomes:
##
# Declares which file systems to use on the storage medium the host will boot
# from.
##
{ system } : {
fileSystems."/" = {
# Must match what sd-image expects exactly. This is found by trying to run
# anything and then encountering an error.
device = "/dev/disk/by-label/NIXOS_SD";
fsType = "ext4";
};
nixpkgs.hostPlatform = system;
}
I may want to declare some swap space or something, but this is fine for now.
Now I have to refactor the consuming code around it. For good measure, I’ve
renamed this partitions.nix. My lithium-configuration.nix (which is still
incomplete) now looks like this:
# This is the NixOS configuration for lithium.proton. It is drawn from the
# example here:
# https://github.com/Misterio77/nix-starter-configs/blob/main/minimal/nixos/configuration.nix
{
config,
inputs,
lib,
pkgs,
system,
...
}: {
imports = [
./partitions.nix { inherit system; }
];
}
Notably, I’ve added the system variable and then passed it to partitions.nix
(previously hardware-configuration.nix) as a variable of the same name.
Now adding in other boilerplate, I get:
# This is the NixOS configuration for lithium.proton. It is drawn from the
# example here:
# https://github.com/Misterio77/nix-starter-configs/blob/main/minimal/nixos/configuration.nix
{
config,
inputs,
lib,
pkgs,
system,
...
}: {
imports = [
./partitions.nix { inherit system; }
];
# This will additionally add your inputs to the system's legacy channels.
# Making legacy nix commands consistent as well, awesome!
nix.nixPath = ["/etc/nix/path"];
environment.etc =
lib.mapAttrs'
(name: value: {
name = "nix/path/${name}";
value.source = value.flake;
})
config.nix.registry;
nix.settings = {
# Enable flakes and new 'nix' command.
experimental-features = "nix-command flakes";
# Deduplicate and optimize nix store.
auto-optimise-store = true;
};
# Hostname is not an FQDN.
networking.hostName = "lithium";
# https://nixos.wiki/wiki/FAQ/When_do_I_update_stateVersion
system.stateVersion = "23.05";
}
Some of these settings I can break out further. I’m not even sure how much I
want some of these, so breaking them out makes it easier for me to do so
universally later. I suspect the nixPath stuff might go at some point. These
aren’t my comments and they feel very much like they are working around some
rough edges of earlier days of Nix.
My new nix-path.nix:
{ config, lib, ... }: {
# This will additionally add your inputs to the system's legacy channels.
# Making legacy nix commands consistent as well, awesome!
nix.nixPath = ["/etc/nix/path"];
environment.etc =
lib.mapAttrs'
(name: value: {
name = "nix/path/${name}";
value.source = value.flake;
})
config.nix.registry;
}
A unoriginally named nix.nix:
{ ... }: {
nix.settings = {
# Enable flakes and new 'nix' command.
experimental-features = "nix-command flakes";
# Deduplicate and optimize nix store.
auto-optimise-store = true;
};
}
My final lithium-configuration.nix looks like this:
# This is the NixOS configuration for lithium.proton. It is drawn from the
# example here:
# https://github.com/Misterio77/nix-starter-configs/blob/main/minimal/nixos/configuration.nix
{
inputs,
lib,
pkgs,
system,
...
}: {
imports = [
./partitions.nix { inherit system; }
];
# Hostname is not an FQDN.
networking.hostName = "lithium";
# https://nixos.wiki/wiki/FAQ/When_do_I_update_stateVersion
system.stateVersion = "23.05";
}
My hosts are now entirely composed, with the host configuration proper only having what is unique to each one.
packages.aarch64-linux = {
iron = nixos-generators.nixosGenerate {
format = "sd-aarch64";
modules = [
./logan.nix
./nix.nix
./nix-path.nix
./sshd.nix
./iron-configuration.nix
];
system = "aarch64-linux";
};
};
packages.x86_64-linux = {
lithium = nixos-generators.nixosGenerate {
format = "sd-x86_64";
modules = [
./logan.nix
./nix.nix
./nix-path.nix
./sshd.nix
./lithium-configuration.nix
];
system = "x86_64-linux";
};
};
And now the real test: To emit this to a disk. I’ve spent the better part of a week playing with image generation using the majority of my laptop’s RAM, draining it’s battery, and doing everything on the CPU. That is about to end!
I found out some of my prior tools where I’d added set -euo pipefail didn’t
work because in Bash, unsetting a variable and setting an empty array are
actually the same thing. I’ve made some adjustments to fix them. In addition,
I’ve added the capability to detect USB drives. Before, it was just SD cards.
The Bash scripts for these are getting sufficiently complex, and I will probably
want to create a Rust program for some of these tasks soon. My fluency in Nix
is increasing and so I’m feeling better about using that as a starting point for
distributing my Rust tools.
Fixing all of the errors
I’ve been trying to actually build the image with:
./image-create.sh --host lithium
But I’m getting this error:
error: a 'x86_64-linux' with features {} is required to build '/nix/store/brgsnjl1jcsl775sjyiwi3h58pjnl0si-loopback.cfg.drv', but I am a 'aarch64-linux' with features {benchmark, big-parallel, nixos-test, uid-range}
nixos-generators#219 looks really promising, but my attempts have not been fruitful.
So I’ve added this to the container script for image-create.sh:
echo 'extra-platforms = x86_64-linux aarch64-linux aarch64-darwin' >> /etc/nix/nix.conf
I tried a nix flake update and ran it again. The podman VM doesn’t always
come up, and since I got this update it seems to be worse. That’s a little
concerning.
That said, I’m getting a somewhat new error, which could’ve been very easy to miss:
error: a 'i686-linux' with features {} is required to build '/nix/store/k6q4p5b5zqgwd3kbpkgwganh76v4hbnk-x86_64-unknown-linux-gnu-pkg-config-wrapper-0.29.2.drv', but I am a 'aarch64-linux' with features {benchmark, big-parallel, nixos-test, uid-range}
The last thing I added was the x86_64-linux so I am guessing that is what it
was looking for.
If I update the extra-platforms to be:
echo 'extra-platforms = i686-linux x86_64-linux aarch64-linux aarch64-darwin' \
>> /etc/nix/nix.conf
I get:
error: builder for '/nix/store/kvx0wrig78nibc3k0g4p1qd557fp9ivs-console-env.drv' failed with exit code 1;
last 3 log lines:
> qemu-x86_64-static: /nix/store/3rh4x7j32p5v0kmrm8vcqfd5vj626w9k-perl-5.38.2/bin/perl: Unable to find a guest_base to satisfy all guest address mapping requirements
> 0000000000000000-0000000000000fff
> 0000000000400000-000000000040401f
For full logs, run 'nix log /nix/store/kvx0wrig78nibc3k0g4p1qd557fp9ivs-console-env.drv'.
That led me to qemu#2082 and subsequently qemu#1255. Based on 1255, I think j
I’ve been pulling threads where I can here. I’ve created a custom image which
installs the qemu tools from another image:
FROM multiarch/qemu-user-static:latest as qemu
FROM nixos/nix
COPY --from=qemu /usr/bin/qemu-* /usr/bin
CMD ["sleep" "1"]
But I think this is wrong. I understand that I need a container with qemu running on it, but why isn’t that container a NixOS container? I’m having trouble knowing/stipulating what’s on the container without actually starting an interactive shell like an animal.
Let’s re-approach this.
Direct builds
Struggle with the container
The Cross-compile packages section in this wiki article has something very promising:
The following command will cross compile the tinc package for the aarch64 CPU architecture from a different architecture (e.g. x86_64).
$ nix-build '<nixpkgs>' \ --arg crossSystem \ '(import <nixpkgs> {}).lib.systems.examples.aarch64-multiplatform' \ -A tincYou can add your own specifications, or look at existing ones, in nixpkgs/lib/systems/examples.nix.
This seems to have taken my system over an hour to build, which is a little nutty but I suppose makes sense if there’s an entire Linux build ecosystem it has to create from scratch. I suppose that means that my local machine now has all of that stuff cached though.
I thought I had read somewhere that
crossSystemwas deprecated, but I didn’t capture that anywhere. Something to keep in mind as we continue down this route.If I adapt that example to my build, I get:
nix-build '<nixpkgs>' --arg crossSystem '.#lithium'error: syntax error, unexpected '.' at «string»:1:1: 1| . | ^I moved some stuff around - mostly flailing. I have read before that the
.#thing is Nix Flake specific syntax or notation, so perhaps I have to explicitly enable Flakes. I thought I had that configured globally but my foray intonix-darwinmay have stomped on my user-specific configuration.nix \ --extra-experimental-features nix-command \ --extra-experimental-features flakes \ build --arg crossSystem '.#lithium'error: syntax error, unexpected '.' at «string»:1:1: 1| . | ^How did this work before? Searching for this error yields nothing related.
Eventually I tried:
nix build '.#lithium'warning: Git tree '/Users/logan/dev/blog' is dirty error: flake 'git+file:///Users/logan/dev/blog' does not provide attribute 'packages.aarch64-darwin.lithium', 'legacyPackages.aarch64-darwin.lithium' or 'lithium'Alright, I think this means that the
--arg crossSystemwas somehow fouling up further argument interpretation (or at least fouling up my understanding of how it’s supposed to behave). This new result is promising, and one I know how to address. I havelithiumdeclared underaarch64-linuxwhich isn’t going to work from this perspective. One thing to keep in mind is the attribute sets I have inoutputsis heavily predicated on the platform and architecture. It needs to be underaarch64-darwinwhile I am building from this machine.An aside: When I started on this journey a while back with Flakes, I was a little shocked that I had to specify each platform + architecture combination I wanted to support. I’ve come to realize though that Nix has a rich landscape with which I can compose or override build settings. So in that way, it’s pretty easy to specify which platforms are supported and have modules and such to achieve reuse. Unfortunately it doesn’t work with the unsupported platforms environment variable, but it does hold the promise that the build may not ever work. I don’t know if there’s a way to communicate “this can never work” verses “it might work but I haven’t explored that” or even “it should work but I don’t have resources to verify that”.
nix build '.#lithium'warning: Git tree '/Users/logan/dev/proton-nix' is dirty error: builder for '/nix/store/rc9flq697nllbfczwxxnaczk5fimsb0j-X-Restart-Triggers-systemd-binfmt.drv' failed with exit code 126; last 1 log lines: > /nix/store/5l50g7kzj7v0rdhshld1vx46rf2k5lf9-bash-5.2p26/bin/bash: /nix/store/5l50g7kzj7v0rdhshld1vx46rf2k5lf9-bash-5.2p26/bin/bash: cannot execute binary file For full logs, run 'nix log /nix/store/rc9flq697nllbfczwxxnaczk5fimsb0j-X-Restart-Triggers-systemd-binfmt.drv'. error: 1 dependencies of derivation '/nix/store/grwhri94w0zj0srv4p58fsnlq7ivfylw-unit-systemd-binfmt.service.drv' failed to build error: 1 dependencies of derivation '/nix/store/wxjwfw0836a7p26gk99c6sqhhl0nsnnv-system-units.drv' failed to build error: 1 dependencies of derivation '/nix/store/1768yij62f1x6dslv007z6iwgq0pspy5-etc.drv' failed to build error: 1 dependencies of derivation '/nix/store/ak7gyj97m24krqh5lxyn4zd0h1xpsk94-nixos-system-lithium-24.05.20240215.a4d4fe8.drv' failed to build error: 1 dependencies of derivation '/nix/store/h41y504h42v0xrfq6i3z0m0j5di8jysm-closure-info.drv' failed to build error: 1 dependencies of derivation '/nix/store/35ikws0vq9v4hvnagz2bdfrbmbpgqm41-efi-directory.drv' failed to build error: 1 dependencies of derivation '/nix/store/xw22x1f04k37v1d2h3sarn726w49jk5p-isolinux.cfg-in.drv' failed to build error: 1 dependencies of derivation '/nix/store/fa5rd5fwvc89rdij41klkhnmya7qsmgg-nixos.iso.drv' failed to buildI noticed with my
nix.nixmodule,buildPlatformis stillaarch64-linuxand so that needs correction toaarch64-darwin.nix build '.#lithium'warning: Git tree '/Users/logan/dev/proton-nix' is dirty error: builder for '/nix/store/7cyzbdfc8d9ql30l1l15d72x11mdfmdf-etc-modprobe.d-nixos.conf.drv' failed with exit code 126; last 1 log lines: > /nix/store/5l50g7kzj7v0rdhshld1vx46rf2k5lf9-bash-5.2p26/bin/bash: /nix/store/5l50g7kzj7v0rdhshld1vx46rf2k5lf9-bash-5.2p26/bin/bash: cannot execute binary file For full logs, run 'nix log /nix/store/7cyzbdfc8d9ql30l1l15d72x11mdfmdf-etc-modprobe.d-nixos.conf.drv'. error: builder for '/nix/store/kcfpzpaxv6zj431zfwz0rg1sy5j2din9-loopback.cfg.drv' failed with exit code 126; last 1 log lines: > /nix/store/5l50g7kzj7v0rdhshld1vx46rf2k5lf9-bash-5.2p26/bin/bash: /nix/store/5l50g7kzj7v0rdhshld1vx46rf2k5lf9-bash-5.2p26/bin/bash: cannot execute binary file For full logs, run 'nix log /nix/store/kcfpzpaxv6zj431zfwz0rg1sy5j2din9-loopback.cfg.drv'. error: builder for '/nix/store/wq8pnjw6pxdpahk3dbhvwl40plf6bqq3-mounts.sh.drv' failed with exit code 126; last 1 log lines: > /nix/store/5l50g7kzj7v0rdhshld1vx46rf2k5lf9-bash-5.2p26/bin/bash: /nix/store/5l50g7kzj7v0rdhshld1vx46rf2k5lf9-bash-5.2p26/bin/bash: cannot execute binary file For full logs, run 'nix log /nix/store/wq8pnjw6pxdpahk3dbhvwl40plf6bqq3-mounts.sh.drv'. error: builder for '/nix/store/zbm51nr0vm9gqh7cdn7zx74zp1m9k6ca-users-groups.json.drv' failed with exit code 126; last 1 log lines: > /nix/store/5l50g7kzj7v0rdhshld1vx46rf2k5lf9-bash-5.2p26/bin/bash: /nix/store/5l50g7kzj7v0rdhshld1vx46rf2k5lf9-bash-5.2p26/bin/bash: cannot execute binary file For full logs, run 'nix log /nix/store/zbm51nr0vm9gqh7cdn7zx74zp1m9k6ca-users-groups.json.drv'. error: 1 dependencies of derivation '/nix/store/fa5rd5fwvc89rdij41klkhnmya7qsmgg-nixos.iso.drv' failed to buildPer nixos-generators#187 I tried:
nix build --builders 'ssh-ng://nix@yasmin.dse.in.tum.de x86_64-linux' '.#lithium'warning: Git tree '/Users/logan/dev/proton-nix' is dirty error: builder for '/nix/store/7cyzbdfc8d9ql30l1l15d72x11mdfmdf-etc-modprobe.d-nixos.conf.drv' failed with exit code 126; last 1 log lines: > /nix/store/5l50g7kzj7v0rdhshld1vx46rf2k5lf9-bash-5.2p26/bin/bash: /nix/store/5l50g7kzj7v0rdhshld1vx46rf2k5lf9-bash-5.2p26/bin/bash: cannot execute binary file For full logs, run 'nix log /nix/store/7cyzbdfc8d9ql30l1l15d72x11mdfmdf-etc-modprobe.d-nixos.conf.drv'. error: builder for '/nix/store/kcfpzpaxv6zj431zfwz0rg1sy5j2din9-loopback.cfg.drv' failed with exit code 126; last 1 log lines: > /nix/store/5l50g7kzj7v0rdhshld1vx46rf2k5lf9-bash-5.2p26/bin/bash: /nix/store/5l50g7kzj7v0rdhshld1vx46rf2k5lf9-bash-5.2p26/bin/bash: cannot execute binary file For full logs, run 'nix log /nix/store/kcfpzpaxv6zj431zfwz0rg1sy5j2din9-loopback.cfg.drv'. error: builder for '/nix/store/wq8pnjw6pxdpahk3dbhvwl40plf6bqq3-mounts.sh.drv' failed with exit code 126; last 1 log lines: > /nix/store/5l50g7kzj7v0rdhshld1vx46rf2k5lf9-bash-5.2p26/bin/bash: /nix/store/5l50g7kzj7v0rdhshld1vx46rf2k5lf9-bash-5.2p26/bin/bash: cannot execute binary file For full logs, run 'nix log /nix/store/wq8pnjw6pxdpahk3dbhvwl40plf6bqq3-mounts.sh.drv'. error: builder for '/nix/store/zbm51nr0vm9gqh7cdn7zx74zp1m9k6ca-users-groups.json.drv' failed with exit code 126; last 1 log lines: > /nix/store/5l50g7kzj7v0rdhshld1vx46rf2k5lf9-bash-5.2p26/bin/bash: /nix/store/5l50g7kzj7v0rdhshld1vx46rf2k5lf9-bash-5.2p26/bin/bash: cannot execute binary file For full logs, run 'nix log /nix/store/zbm51nr0vm9gqh7cdn7zx74zp1m9k6ca-users-groups.json.drv'. error: 1 dependencies of derivation '/nix/store/fa5rd5fwvc89rdij41klkhnmya7qsmgg-nixos.iso.drv' failed to buildSo no change. I’m wracking my brain here to no effect. Should I try to go back to the Docker container to do a build? I felt like I had so little control over that environment that I would just be right back where I was. My
nixinvocation hasn’t made any material changes.This post about making a remote build has a kind of promise to it:
just a small neat note,
that your flake.nix and configuration.nix, doesn’t have to sit etc/nixos , they can be anywhere that is supported flake repo type (git/svn/mercurial), as flakes are hermetically sealed (AFAIUSI). A system configuration can be built from anywhere now allowing you to do funky things with the flake URI. However to switch to the configuration, you will need to be stoopid user (superuser).
nix flake show github:nixinator/nothing/ nixos-rebuild dry-activate –flake github:nixinator/nothing/#z620
so you can build my ‘machine’. on yours. If that doesn’t blow new users minds, especially the infrastructure as code people… i don’t know want can.
This post will probably prompt me to do some janitorial work on my configs… or ‘nix shaving’ as we like to term it.
flake makes truly sharable operating system configurations possible, which last time i looked has never been possible.
However, hardware brings impurity, so my gfx card, network , containing that to impurity is something i’ve got to think about long term, but for a fleet of cattle, it’s a not really a problem.
I don’t think this works for me though because I need an bootable image as the output. Though it is great and this is one of the things that I really like about Nix.
I have come across some things about customizing a docker image. I could also just create a containerized-bootstrap configuration that gets loaded into the container at build time and use that to control the environment. This doesn’t feel like a good path to me.
Out of desperation, I ran
image-create.shagain. Please don’t think poorly of me.While I’m waiting for that to happen, one of the roadblocks I’ve encountered is the
binfmtnot being available on my machine. I want to add that to the container somehow.I actually have to look up where the NixOS main Nix file is located. This just goes to show where my entire existence with Nix is: Not on the main operating system but instead as a guest on macOS. The path is
/etc/nixos/configuration.nix.And of course the VM won’t start for
podman-machine. Great.I’ve tried removing the VM. I’ve added a
devShellsto myflake.nixto ensure a specific version ofpodmanand it looks like this:... outputs = { self, nixpkgs, nixos-generators, ... }: let # This should be more localized. pkgs = nixpkgs.legacyPackages.aarch64-darwin; in { crossSystem = true; devShells.aarch64-darwin.default = pkgs.mkShell { packages = [ pkgs.podman ]; }; ...And I know the
flake.lockwill get populated and lock the version in place. That way if I get it working, it should stay working. If I wanted to take it a step further, I’d drop in some configuration that would allow me to keep it separate from any potential systempodmanusage I would have.This had the effect of bumping me from
podman4.8.2to4.9.3, but still no joy. Now I must attempt the shameful thing I have seen others do: reboot.Well now I’ve restarted my
WindowsmacOS machine as a desperate maneuver to remove any sort of gremlins that might be keeping the VM from starting up. It failed and my precious uptime was a pointless sacrifice.I found podman#20776 and I recall folks using
qemuon their systems directly, and not just in the container. The layers are deep here. I vaguely recall having to installqemuseparately before. Putting it into mydevShells, I can see it’s about a 1GB download. Sheesh. Anotherpodman machine init --log-level DEBUGand a 600MB download later, it takes about two solid minutes for the VM to boot and do a lot of work, and then my localpodmanto connect.I kind of hope there’s some caching to take advantage of there.
There is!
I had to fix some things in my
image-create.shscript. My in-house container was labeledqemu-nixwhen I meant to have it completely renamed toqemu-nix-personalto avoid any public naming collisions, but I hadn’t gotten all of the references.I quickly run into this problem:
Error: can only create exec sessions on running containers: container state improperThe Internet is full of “derp make sure the container is started” but how did this work before? I did upgrade a minor point release. SemVer strikes again, apparently. Remember that SemVer promises that minor releases should not cause backward incompatible breaks. If only intention alone were enough to ensure compatibility and stability (SemVer thinks it is).
Oh I do have
podman startin my scripts. I wrongly blamed SemVer, this time. But it didn’t work? Checking back, I see:Container ID: nix-runWith the relevant code being:
container_name="nix-run" # ...snip... image_label='qemu-nix-personal' podman build -t qemu-nix-personal --file Dockerfile . podman container ls -a | grep $container_name > /dev/null || \ podman create -t --name $container_name -w /workdir \ -v $PWD:/workdir qemu-nix-personal container_id=$(podman start $container_name) echo "Container ID: $container_id" echo "Executing script: $script "And
container_nameisnix-run. How did thecontainer_namebecome thecontainer_id? I do see a UUID-like identifier in the spew right next toContainer ID: ...but I’m not sure what it’s for. A little extra logging:image_label='qemu-nix-personal' podman build -t qemu-nix-personal --file Dockerfile . podman container ls -a | grep $container_name > /dev/null || \ podman create -t --name $container_name -w /workdir \ -v $PWD:/workdir qemu-nix-personal echo "Starting $container_name..." container_id=$(podman start $container_name) echo "Container ID: $container_id" echo "Executing script: $script "This amounts to:
Starting nix-run... Container ID: nix-run Executing script: <snip> Error: can only create exec sessions on running containers: container state improper nix-runSo it’s not getting transposed somehow. I flailed around a lot at this point. Using
podman startdidn’t make a lot of sense to me when I really tried to understand what was going on. I changed thepodman start ... podman execintopodman runand that works much better.I can see what’s in
/usr/bin/qemu*now.I can’t see
/etc/nixos/configuration.nixthough. This Stack Overflow answer says the image isn’t a Nix image but an Alpine image. Sigh. Another user shares some code to bootstrap a container like this with a Nix configuration:git clone --branch release-17.03 https://github.com/nixos/nixpkgs $HOME/nixpkgs mkdir -p $HOME/nix-config nixos-generate-config --dir $HOME/nix-config nixos-install -I nixos-config=$HOME/nix-config/configuration.nix -I nixpkgs=$HOME/nixpkgsBut I need
NixOSproper, because I need to control its configuration so I can usebinfmt.I have to say that this has gotten pretty frustrating. It’s possible the problem is on my end but it’s not obvious that it is. I’m going in circles regarding how to kick off a build. It kind of makes sense that I can’t necessarily create an ISO image from
aarch64-darwinif creating that ISO necessitates running some of the things it creates (mostly dependencies). It’s not just a single, static compilation. This makes sense to me. What I’m not understanding is the path forward. Perhaps this simply cannot be accomplished on the wrong architecture, but that seems weird because I am seeing other people emitaarch64images and builds fromx86_64. I just can’t find out how to go the other way.qemuandbinfmtare the watchwords here, and yet I don’t feel like I have the total control required to use them. I’m not very familiar with the C build ecosystem and I get the impression if I knew more, this would make more sense to me.I can’t set
binfmtwithout an actual NixOS system of some kind because that requires both a Linux kernel setting and a NixOS-proper system itself running. Even with containers, I have neither of those. Thenixos/niximage is for running Nix and not NixOS, all on Alpine. So usingbinfmton the kernel level is out. How important is that? The Wikipedia article on binfmt_misc shows it is simply part of the Linux kernel, and describes it as a sort of shebang for binary executables. I can see if it is enabled by inspecting/proc/sys/fs/binfmt_misc/status, and there’s also/proc/sys/fs/binfmt_misc/*which holds various individual formats. Being set to1is enabled and-1is disabled. I added this to my script inimage-create.sh:tail -n +1 /proc/sys/fs/binfmt_misc/*I use
tailbecause it prints the name of the file when multiple files are involved. That way I should get a nice file-name + value combination. But I get:tail: cannot open '/proc/sys/fs/binfmt_misc/*' for reading: No such file or directoryOf course I’m wondering why the
multiarch/qemu-user-staticimage doesn’t have this. I check the README and see the description:multiarch/qemu-user-static is to enable an execution of different multi-architecture containers by QEMU and binfmt_misc. Here are examples with Docker 3.The
binfmt_miscis right there. I was going to install it via myDockerfileand then it clicks:FROM multiarch/qemu-user-static:latest as qemu FROM nixos/nix COPY --from=qemu /usr/bin/qemu-* /usr/bin COPY . /workdir CMD ["sleep" "infinity"]Oh right, I’m not running that image. I’m running the
niximage with some stuff yanked fromqemu-user-static. Maybe I can also copy thebinfmt_miscfiles? I know we’re getting into kernel level stuff, and that’s firmly outside of container territory. Still, it’s just files… right? Let’s give it a shot.FROM multiarch/qemu-user-static:latest as qemu FROM nixos/nix COPY --from=qemu /usr/bin/qemu-* /usr/bin COPY --from=qemu /proc/sys/fs/binfmt_misc /proc/sys/fs/binfmt_misc COPY . /workdir CMD ["sleep" "infinity"]And then I get:
Error: building at STEP "COPY --from=qemu /proc/sys/fs/binfmt_misc /proc/sys/fs/binfmt_misc": checking on sources under "/var/home/core/.local/share/containers/storage/overlay/ab39a17dbb861445876ff08d6d13ccf9cf2617ec6a81696481b535301310c2a1/merged": copier: stat: "/proc/sys/fs/binfmt_misc": no such file or directoryUgh. Maybe I just need to read more. A lot more. That qemu-user-static README has a lot of stuff in there I didn’t understand well as I just jumped in. Apparently it has some kind of
from-tonotation I can use in the image name to get what I want. The entities in this are architectures. So I should be able to fromaarch64tox86_64. I just need to change the label from the lazylatesttoqemu-user-static:aarch65-x86_64. I probably don’t even need to try copying thebinfmt_miscfiles.Error: creating build container: initializing source docker://multiarch/qemu-user-static:aarch65-x86_64: reading manifest aarch65-x86_64 in docker.io/multiarch/qemu-user-static: manifest unknownAlright. Sure enough, I can’t find an
aarch64in thefromportion anywhere in their Dockerhub tags. Well, the documentation says I can usemultiarch/qemu-user-static:$to_archso let’s just tryx86_64.[1/2] STEP 1/1: FROM multiarch/qemu-user-static:x86_64 AS qemu Resolving "multiarch/qemu-user-static" using unqualified-search registries (/etc/containers/registries.conf.d/999-podman-machine.conf) Trying to pull docker.io/multiarch/qemu-user-static:x86_64... Getting image source signatures Copying blob sha256:5822a1c91f704793666e9975a33f4041298b1221f5ac80aff67ea866300f64fa Copying config sha256:ad2074fe564f645bba6172cd06d2c49771431ed4009d6726bc2145510d4e911b Writing manifest to image destination WARNING: image platform (linux/amd64) does not match the expected platform (linux/arm64) --> ad2074fe564f [2/2] STEP 1/4: FROM nixos/nix [2/2] STEP 2/4: COPY --from=qemu /usr/bin/qemu-* /usr/bin Error: building at STEP "COPY --from=qemu /usr/bin/qemu-* /usr/bin": checking on sources under "/var/home/core/.local/share/containers/storage/overlay/3999e542bb3199035ee740da6a25a788e0a6263a38ce1fb80bd2b3222c88363f/merged": Rel: can't make relative to /var/home/core/.local/share/containers/storage/overlay/3999e542bb3199035ee740da6a25a788e0a6263a38ce1fb80bd2b3222c88363f/merged; copier: stat: ["/usr/bin/qemu-*"]: no such file or directorySo it pulled the image, but the base image is the wrong architecture as
amd64(and thus I expect nothing to work), and also the files I need are no longer present.From some reading in their issues, I find this comment:
Hi @Darshcg! This repository is for amd64/x86_64 hosts only (#77). For other host archs, you can use dbhi/qus:
images are provided for each of seven host architectures officially supported by Docker, Inc. or built by official images: amd64, i386, arm64v8, arm32v7, arm32v6, s390x and ppc64le.
I… why isn’t that front-and-center on their README? Why is this industry so allergic to writing things down? After looking some more, it is in the README but it’s not presented early in the documentation. I counted a quick 3 issues that all have the same problem. My personal policy is to treat questions like those as opportunities to improve documentation. Instead of fielding the occasional “why didn’t you read the entire document?” questions, just improve (by adding, rewording, or reorganizing) the documentation until the questions cease. Apparently the “these are images that go from any architecture to any architecture” and then the fine print says “only if the from-architecture is x86_64 or amd64”, one can understand why there is misunderstanding. It’s like how Microsoft claimed .NET was cross-platform back during its inception. It runs on all versions of Windows! See? Cross-platform.
The qus documentation on what images to use is not inspiring:
Manifests are provided for the following hosts: amd64, arm64v8, arm32v7, arm32v6, i386, s390x or ppc64le. That is, any of the target architectures provided by QEMU can be used on any of those hosts.
No
x86_64. This seems to make the claim that they support everything thatqemudoes. This would imply thatqemudoesn’t supportx86_64as a destination, but other things I have come across suggests that’s impossible (folks onaarch64claim to havex86_64compatibility). qus#22 suggests this will work just fine though. That said, I can’t find the images on Dockerhub.I did some more digging around and came up with this as my
Dockerfile:FROM nixos/nix COPY . /workdir CMD ["sleep" "infinity"]Not much, yeah? That’s because I need to bootstrap the Podman VM with:
podman run --rm --privileged aptman/qus --static -- --persistent x86_64Which I put into
with-podman.sh.Now I get:
error: builder for '/nix/store/23n6mw7qvl7w6c9pmgmwzi5gpwg0qjkl-stdenv-linux.drv' failed with exit code 1; last 1 log lines: > error: executing '/nix/store/xiicriwhj094ax7w50jzkmv32gzcdqkd-bash-5.2p26/bin/bash': Exec format error For full logs, run 'nix log /nix/store/23n6mw7qvl7w6c9pmgmwzi5gpwg0qjkl-stdenv-linux.drv'.I think maybe this is some progress? I haven’t gotten this specific error yet. Before it was
cannot execute binary file. So that makes me think at least some of the machinery I want is in place.
darwin.linux-builder to the rescue
Then I found nixpkgs#238596 and wait a second! There’s a
nixos/qemu-vm… image? I dug around and found qemu-vm.nix and it has this as documentation:# This module creates a virtual machine from the NixOS configuration. # Building the `config.system.build.vm' attribute gives you a command # that starts a KVM/QEMU VM running the NixOS configuration defined in # `config'. By default, the Nix store is shared read-only with the # host, which makes (re)building VMs very efficient.This is manna from the
nixpkgs! Using this, I should be able to put the VM in the exact state I want it to be in before I try anything (like running this builder). I shouldn’t need to use a container at all, in fact… I basically can say “this is my build VM, and here’s its configuration”, which tucks nicely into theflake.lock. This is exciting! I don’t know if it will work as a path forward, but it should help. I should be able to pull inqemuderivations with ease. I can setbinfmt! What an oasis to come upon. My despair was palpable.I dove into the material for running this builder VM locally. There’s actually official documentation on the darwin.linux-builder and via this macOS Linux Builder post, I found out there’s some settings I can use on
nix-darwinthat do a lot of the setup necessary to get it going, but it’s more or less a one-liner.I don’t have the exact order of things here, and unfortunately the VM is somewhat stateful. I’ll try to document what I can. This is what builds the VM:
nix run nixpkgs#darwin.linux-builderThe VM being built is a somewhat manual step, but perhaps some other thing can realize it. Keep in mind if anything changes or you attempt to build a different kind of VM, this can foul up the derivation. Advice I have seen in multiple places (that I don’t have on hand) says to “remove the VM” but provides no instructions. I was able to do so with a
nix store gc. It’s using a wrecking ball to drive a nail, but it does the job. Perhaps better advice will become available.A builder in Nix is a special host which is configured to accept build commands from a different Nix host. Essentially it gets used in NixOps (Nix Operations) type setups, where different kinds of builds for different architectures must be emitted, or multi-platform tests must be run. The VM needs to be configured as a builder. This means the VM needs to run SSH, expose it on a port, and have keys registered with it. The
nix-darwinmodule does all of this with the following settings:nix = { # There may be additional configuration in this attribute set. This is the # minimum for what we need here. linux-builder.enable = true; settings = { experimental-features = [ "nix-command" "flakes" ]; # Action: Update to use your user as needed, in case you aren't also a Logan. # Trust my user so we can open SSH on port 22 for using the Nix builder. # It cannot be overridden as of 2024-02-18. This is demanded in # https://nixos.org/manual/nixpkgs/unstable/#sec-darwin-builder but # explained here: # https://github.com/Gabriella439/macos-builder?tab=readme-ov-file extra-trusted-users = [ "logan" ]; # Trusting @admin is demanded by the darwin.linux-builder package. trusted-users = [ "@admin" ]; }; };Performing the following will apply those settings:
nix run nix-darwin -- switch --flake ~/dev/dotfiles/nixIn addition, the
nix-darwinmodule configures this via alaunchddaemon. You can inspect the daemon here:sudo launchctl list org.nixos.linux-builder{ "LimitLoadToSessionType" = "System"; "Label" = "org.nixos.linux-builder"; "OnDemand" = false; "LastExitStatus" = 0; "PID" = 19592; "Program" = "/bin/sh"; "ProgramArguments" = ( "/bin/sh"; "-c"; "/bin/wait4path /nix/store && exec /nix/store/sf0vk5w0clqmwp07p5m0w3lxl1sc150s-linux-builder-start"; ); };The PID may imply that it is already running?
I ran into this error at some point. I believe it is because somehow the VM changed as I described above. I had gone through several different sets of instructions, so the changed-VM-issue is very likely.
Formatting '/tmp/nix-vm.S35Kv5pSsw/store.img', fmt=qcow2 cluster_size=65536 extended_l2=off compression_type=zlib size=1561329664 backing_file=/nix/store/lyd8dji7qbsk3kp0fy7aiiv07igc1qz9-nix-store-image/nixos.qcow2 backing_fmt=qcow2 lazy_refcounts=off refcount_bits=16 [ 0.034887] armv8-pmu pmu: hw perfevents: failed to probe PMU! <<< NixOS Stage 1 >>> loading module virtio_balloon... loading module virtio_console... loading module virtio_rng... loading module dm_mod... running udev... Starting systemd-udevd version 255.2 kbd_mode: KDSKBMODE: Inappropriate ioctl for device Gstarting device mapper and LVM... waiting for device /dev/disk/by-label/nixos to appear....................... Timed out waiting for device /dev/disk/by-label/nixos, trying to mount anyway. mounting /dev/disk/by-label/nixos on /... [ 21.655501] /dev/disk/by-label/nixos: Can't open blockdev mount: mounting /dev/disk/by-label/nixos on /mnt-root/ failed: No such file or directory An error occurred in stage 1 of the boot process, which must mount the root filesystem on `/mnt-root' and then start stage 2. Press one of the following keys: r) to reboot immediately *) to ignore the error and continue rRebooting... [ 28.407770] reboot: Restarting system [ 0.033519] armv8-pmu pmu: hw perfevents: failed to probe PMU! <<< NixOS Stage 1 >>> loading module virtio_balloon... loading module virtio_console... loading module virtio_rng... loading module dm_mod... running udev... Starting systemd-udevd version 255.2 kbd_mode: KDSKBMODE: Inappropriate ioctl for device Gstarting device mapper and LVM... waiting for device /dev/disk/by-label/nixos to appear.......^C................ Timed out waiting for device /dev/disk/by-label/nixos, trying to mount anyway. mounting /dev/disk/by-label/nixos on /... [ 21.900472] /dev/disk/by-label/nixos: Can't open blockdev mount: mounting /dev/disk/by-label/nixos on /mnt-root/ failed: No such file or directory An error occurred in stage 1 of the boot process, which must mount the root filesystem on `/mnt-root' and then start stage 2. Press one of the following keys: r) to reboot immediately *) to ignore the error and continue ^C*Continuing... mount: can't find /mnt-root/ in /proc/mounts mounting certs on /etc/ssl/certs... checking /dev/disk/by-label/nix-store... fsck (busybox 1.36.1) [fsck.ext4 (1) -- /mnt-root/nix/.ro-store] fsck.ext4 -a /dev/disk/by-label/nix-store nix-store: clean, 46478/95424 files, 284784/381184 blocks mounting /dev/disk/by-label/nix-store on /nix/.ro-store... mounting shared on /tmp/shared... mounting xchg on /tmp/xchg... mounting keys on /var/keys... mounting overlay filesystem on /nix/store... BusyBox v1.36.1 () multi-call binary. Usage: switch_root [-c CONSOLE_DEV] NEW_ROOT NEW_INIT [ARGS] Free initramfs and switch to another root fs: chroot to NEW_ROOT, delete all in /, move NEW_ROOT to /, execute NEW_INIT. PID must be 1. NEW_ROOT must be a mountpoint. -c DEV Reopen stdio to DEV after switch [ 50.120061] Kernel panic - not syncing: Attempted to kill init! exitcode=0x00000100 [ 50.120291] CPU: 0 PID: 1 Comm: switch_root Not tainted 6.1.78 #1-NixOS [ 50.120547] Hardware name: linux,dummy-virt (DT) [ 50.120738] Call trace: [ 50.120845] dump_backtrace+0xe0/0x134 [ 50.121005] show_stack+0x20/0x2c [ 50.121145] dump_stack_lvl+0x64/0x80 [ 50.121309] dump_stack+0x18/0x34 [ 50.121444] panic+0x17c/0x350 [ 50.121579] make_task_dead+0x0/0x190 [ 50.121733] do_group_exit+0x3c/0xa0 [ 50.121887] __wake_up_parent+0x0/0x40 [ 50.122040] invoke_syscall+0x50/0x120 [ 50.122194] el0_svc_common.constprop.0+0x4c/0xf4 [ 50.122384] do_el0_svc+0x34/0xcc [ 50.122521] el0_svc+0x34/0xd4 [ 50.122651] el0t_64_sync_handler+0x114/0x120 [ 50.122831] el0t_64_sync+0x18c/0x190 [ 50.122983] Kernel Offset: 0x4ee1cc200000 from 0xffff800008000000 [ 50.123236] PHYS_OFFSET: 0xffff9a3c00000000 [ 50.123416] CPU features: 0x00000,00010091,66927723 [ 50.123673] Memory Limit: none [ 50.123804] ---[ end Kernel panic - not syncing: Attempted to kill init! exitcode=0x00000100 ]--- [1] 15699 terminated nix run nixpkgs#darwin.linux-builder Terminate the process by using =ps= in another tab and issuing a normal =kill= to the qemu process. =ps= without args or =grep= is sufficient.When encountering this error, there is something wrong with the VM and it is stateful. Advice is to “remove” the image, but no specific instructions are given. Run
nix store gcto clean it up. This could be a documentation enhancement. This would be good to contribute back to documentation.Now that I have what I think is an on-demand linux-builder service for macOS, I should be able to just build a VM image and Nix will take care of the rest. I have seen advice that I might need to tune the VM for different resources, but I figure I can do that lazily. All I really care about this point are halts. The image building is a one-off for me.
I don’t know how to make sure the builder is present in a given build. However I do know I’ll need some additional tuning to make this work. I’ve added this to my
darwin.nixand applied it:... nix = { linux-builder = { enable = true; config = { boot.binfmt.emulatedSystems = [ "x86_64-linux" ]; }; }; }; ...I still see the issue:
nix build '.#lithium' warning: Git tree '/Users/logan/dev/proton-nix' is dirty error: builder for '/nix/store/2kx4di5f5qjqhhvgy3k2zxcbrwxylgkl-builder.pl.drv' failed with exit code 126; last 1 log lines: > /nix/store/5l50g7kzj7v0rdhshld1vx46rf2k5lf9-bash-5.2p26/bin/bash: /nix/store/5l50g7kzj7v0rdhshld1vx46rf2k5lf9-bash-5.2p26/bin/bash: cannot execute binary file ...From my last
nix-darwin-switchcall, I don’t think my changes got applied because the image wasn’t touched as far as I can tell. I expected that to take a while longer but it was very quick. Reading the blog earlier, I can see that I need to do anixos-rebuildon that VM. But it requires knowing its name first. To get it, I ran:cat /etc/nix/machines | sed -E 's/- [a-zA-Z0-9=]+$/- secret-ssh-key-maybe/'ssh://builder@linux-builder aarch64-linux /etc/nix/builder_ed25519 1 1 kvm,benchmark,big-parallel - secret-ssh-key-maybeWith this going through
sedso I needn’t worry about a key getting leaked if it happens to be a secret one. I can see its name islinux-builder. In hindsight, this is what I named it in thedarwin.nix. Great! An adapted version of the configuration is:nixos-rebuild switch \ --fast \ --target-host linux-builder \ --use-remote-sudo \ --use-substitutesAnd this doesn’t work because I’m on macOS where
nixos-rebuildis not on myPATHbecause this isn’t NixOS. Perhaps I can just SSH to the host and run the command there, sans target arguments?$ ssh builder@linux-builder The authenticity of host 'linux-builder ([127.0.0.1]:31022)' can't be established. ED25519 key fingerprint is SHA256:73nCAX7tESRWJ4ZN8RkOlqB+0bgxKVmbNRUcFPbXMkE. This key is not known by any other names. Are you sure you want to continue connecting (yes/no/[fingerprint])? yes Warning: Permanently added 'linux-builder' (ED25519) to the list of known hosts. (builder@linux-builder) Password:Close. Trying with my username has the same result. Perhaps I can bootstrap it like I did my Raspberry Pi earlier?
I add this to the
config, which becomes:nix = { linux-builder = { enable = true; config = { boot.binfmt.emulatedSystems = [ "x86_64-linux" ]; users.users = { logan = { # TODO: You can set an initial password for your user. # If you do, you can skip setting a root password by passing # '--no-root-passwd' to nixos-install. # Be sure to change it (using passwd) after rebooting! initialPassword = "lolno"; isNormalUser = true; openssh.authorizedKeys.keys = [ "ssh-rsa AAAAB3NzaC1yc2EAAAADAQABAAACAQOx2dxH8oP1406bie6eO3HB6fin4NY01laNiWRqcNsrRl6/M6e80wiTnG9u0Walb3JXegyqrHKIlFgvcrn2Tg/y944akJ/XqrcLPn3vwTcCV6XGI/1hPdcN0V156pbbnTS/T9y9btO+QJvELOjT4dET6HixBeBpGhLM95cirOrJjT2C6VVBYTGdAu3eKwCeDsjQtfKOHp9Huv0c1i57Fb13iTU1u0+L2o+LMYpS8YNbcBOgzx9FyyjvA/KuEVcyt2raVpbJv6nOP9ynz7a1Ja3Y2tgQwC6XCMpgKYHDYxaJhJbWjv9cxwq4zSzBr8yrlDKooqvpp9fTdOBAWF4R2MI2wb01yaaTlqPDcATBl5+Xu+SvxYf9wBt6wFIbv0baf1WtDDE7u9d2K/MJhShK9p45AQPTbmoYw7fzeMQOLdZNdZdXIOHWd17IJi2T+WnnO9hL1x+M5uZUlFlk0jGu0NP/YmHuWjGxxL7AIO1hH2q7ZHq7tzM+8sV6tjfGePwALFXSBBSGn2czgtfKzEVRFHBQajPco0g9zFWvi5ZfmU4QAkWOrQQFLEYK4IE0e1gR9Dsnqdm5tiYkCdVlapbG9jWdIBAgOCMj2bBXn+YObCrbVHW4wNo5OR6nec+b6miCuG23ue/o5j2L64kE16n1+hGx/Bbm0Adif4vw8zXVhAmxvQ== logan@scandium" ]; extraGroups = [ # Allow this user to sudo. "wheel" ]; }; }; }; };And look at that:
$ nix-darwin-switch warning: Git tree '/Users/logan/dev/dotfiles' is dirty building the system configuration... warning: Git tree '/Users/logan/dev/dotfiles' is dirty Password: user defaults... setting up user launchd services... Show the ~/Library folder... Set dock magnification... Set dock magnification size... Define dock icon function... Choose and order dock icons setting up /Applications/Nix Apps... setting up pam... applying patches... setting up /etc... system defaults... setting up launchd services... reloading service org.nixos.linux-builder reloading nix-daemon... waiting for nix-daemon configuring networking... configuring keyboard... Set disk image verification... Avoid creating .DS_Store files on network volumes... Set the warning before emptying the Trash... Require password immediately after sleep or screen saver begins... Allow apps from anywhere... $ ssh linux-builder [logan@nixos:~]$I am tickled! Let’s get our bearings. I have a NixOS VM running on macOS which is totally controlled via Nix and has a proper
configuration.nix. This allows me to do things I couldn’t do before like setbinfmt.emulatedSystems. From there I should be able to build NixOS images forx86_64-linux. My next steps are roughly:- Verify I can compile and run
x86_64-linuxbinaries. This will allow me to build anx86_64-linuximage, since these images require running some of the things being built. - Indicate to
nixosGeneratethat the build is actually going to be run againstlinux-builder. - Generate the image.
ddthe image to the USB drive adapter (which has the lithium boot disk in it).- Slap the disk in the lithium machine. Power it on, connect it on the network (via a CAT 5 cable).
- SSH to the host to test login.
- Author a Nix configuration for
stable-diffusion-webuithat allows configuration. - Apply the configuration to
lithium. - Crank out ML generated images to my heart’s content.
Easy!
Trying to get something going from my SSH session is proving difficult.
nixis not on myPATH, and/nix/var/nix/profiles/per-user/is empty. This really is just a bare bones builder. Well, let’s treat it that way. I should be able to declare anx86_64-linuxpackage exists on the host, and then maybe run it.I wound up making this:
{ nixpkgs, lib, ... }: let cross-architecture-test-pkgs = import nixpkgs { system = "x86_64-linux"; }; linux-builder-pkgs = import nixpkgs { system = "aarch64-linux"; }; in { boot.binfmt.emulatedSystems = [ "i686-linux" "x86_64-linux" ]; environment.systemPackages = [ linux-builder-pkgs.file cross-architecture-test-pkgs.hello ]; nixpkgs.buildPlatform = { system = "aarch64-linux"; }; users.users = { logan = { # TODO: You can set an initial password for your user. # If you do, you can skip setting a root password by passing # '--no-root-passwd' to nixos-install. # Be sure to change it (using passwd) after rebooting! initialPassword = "lolno"; isNormalUser = true; openssh.authorizedKeys.keys = [ "ssh-rsa AAAAB3NzaC1yc2EAAAADAQABAAACAQOx2dxH8oP1406bie6eO3HB6fin4NY01laNiWRqcNsrRl6/M6e80wiTnG9u0Walb3JXegyqrHKIlFgvcrn2Tg/y944akJ/XqrcLPn3vwTcCV6XGI/1hPdcN0V156pbbnTS/T9y9btO+QJvELOjT4dET6HixBeBpGhLM95cirOrJjT2C6VVBYTGdAu3eKwCeDsjQtfKOHp9Huv0c1i57Fb13iTU1u0+L2o+LMYpS8YNbcBOgzx9FyyjvA/KuEVcyt2raVpbJv6nOP9ynz7a1Ja3Y2tgQwC6XCMpgKYHDYxaJhJbWjv9cxwq4zSzBr8yrlDKooqvpp9fTdOBAWF4R2MI2wb01yaaTlqPDcATBl5+Xu+SvxYf9wBt6wFIbv0baf1WtDDE7u9d2K/MJhShK9p45AQPTbmoYw7fzeMQOLdZNdZdXIOHWd17IJi2T+WnnO9hL1x+M5uZUlFlk0jGu0NP/YmHuWjGxxL7AIO1hH2q7ZHq7tzM+8sV6tjfGePwALFXSBBSGn2czgtfKzEVRFHBQajPco0g9zFWvi5ZfmU4QAkWOrQQFLEYK4IE0e1gR9Dsnqdm5tiYkCdVlapbG9jWdIBAgOCMj2bBXn+YObCrbVHW4wNo5OR6nec+b6miCuG23ue/o5j2L64kE16n1+hGx/Bbm0Adif4vw8zXVhAmxvQ== logan@scandium" ]; extraGroups = [ # Allow this user to sudo. "wheel" ]; }; }; nix.settings = { extra-platforms = [ "aarch64-linux" "i686-linux" "x86_64-linux" ]; }; }It took me a bit to get here. The current point of interest is in the declaration of
linux-builder-pkgsandcross-architecture-test-pkgs. These are both usingnixpkgsbut from a differentsystemsetting. The first is for thelinux-builderitself, so it usesaarch64-linux(my physical architecture, but Linux instead of macOS /darwin). The other isx86_64-linux.Once that got built, I was able to do this as a proof that I could produce and run different binaries:
$ ssh linux-builder Last login: Wed Feb 21 06:55:28 2024 from 10.0.2.2 [logan@nixos:~]$ hello Hello, world! [logan@nixos:~]$ file $(readlink -f $(which hello)) /nix/store/63l345l7dgcfz789w1y93j1540czafqh-hello-2.12.1/bin/hello: ELF 64-bit LSB executable, x86-64, version 1 (SYSV), dynamically linked, interpreter /nix/store/cyrrf49i2hm1w7vn2j945ic3rrzgxbqs-glibc-2.38-44/lib/ld-linux-x86-64.so.2, for GNU/Linux 3.10.0, not strippedIn the example above I run
helloon the VM. Then I identify thehellobinary as being anx86_64-linuxbinary in spite of the fact that I’m runningaarch64-darwinnatively, andaarch64-linuxon the VM.One thing to keep in mind that I sunk a lot of time into: The
nix store ping1 is not very helpful in determining what’s going on. For the longest time, I had this:nix store ping --store 'ssh-ng://linux-builder'I found this was due to me declaring
buildersin mydarwin.nix. Thebuildersdefinition was out of sync with the actual configuration, and thenix-darwinlinux-buildermodule is smart enough to populate everything from top to bottom. You can help see what’s going on with/etc/nix/machines. The remote builder documentation describes the format very well, and it is plain text. What helped me figure this out was eventually running anix buildwith--buildersset to the entire line.Another critical part was also working with
linux-builderconfiguration options that weren’t fully propagated to thebuildMachinessettings. I wound up having to add my own option (protocol) and stitching that together with nix-darwin#873 and nix-darwin#816, one of which was only two days old at my first approach (). Here is the branch for all of them together.nix build \ '.#lithium' \ --builders 'ssh-ng://builder@linux-builder i686-linux,x86_64-linux,aarch64-linux /etc/nix/builder_ed25519 1 1 kvm,benchmark,big-parallel - lolno'And I was able to observe transient status updates of sending build information to the remote builder (
linux-builder). Even with that,nix store pingstill showsTrusted: 0so I guess this can’t be relied upon.nix store ping --store 'ssh-ng://linux-builder'- Verify I can compile and run
Diving into the Nix store C++ code
I went on a bit of a tangent trying to figure out what the deal was with
Trusted: 0and I dove into the C++ code. For posterity I’ve left it here, but the end result is that I didn’t really learn anything solid. It was difficult to follow as my C++ has atrophied and I never really picked up “industry” style C++, let alone kept up with the last 20 years of changes in practice since then. nix#3927 (a pull request that is ~4 years old as of ) would have probably kept me from having to do as much of a deep dive on all of this. Maybe I can try out carrying through the requested changes sometime.So basically
ssh://will never work, per the code insrc/libstore/legacy-ssh-store.cc:/** * The legacy ssh protocol doesn't support checking for trusted-user. * Try using ssh-ng:// instead if you want to know. */ std::optional<TrustedFlag> isTrustedClient() { return std::nullopt; }More spelunking - I found
src/libstore/remote-store-connection.hhdeclaresRemoteStore::Connectionwhich has aremoteTrustsUsfield on it. This is the field that describes what eventually comes back on thenix store pinginvocation. The only place that gets set to an meaningful value is insrc/libstore/remote-store.ccininitConnectionas this:if (GET_PROTOCOL_MINOR(conn.daemonVersion) >= 35) { conn.remoteTrustsUs = WorkerProto::Serialise<std::optional<TrustedFlag>>::read(*this, conn); } else { // We don't know the answer; protocol to old. conn.remoteTrustsUs = std::nullopt; }This is basically an undocumented requirement for establishing trust. Back from my earlier
ping:nix store ping --store 'ssh-ng://linux-builder'I have only
18as my minor version and it wants 35 or more. This seems off. The whole “minor” thing seems off. I would expect a more sophisticated version check but I don’t see it here. Let’s unpack this more.GET_PROTOCOL_MAJORandGET_PROTOCOL_MINORare defined as:#define GET_PROTOCOL_MAJOR(x) ((x) & 0xff00) #define GET_PROTOCOL_MINOR(x) ((x) & 0x00ff)This just looks at the number and yanks off the first 16 bits or the last 16 bits. The value being inspected is
conn.daemonVersion, which is defined in the sameRemoteStore::Connectionstruct./** * Worker protocol version used for the connection. * * Despite its name, I think it is actually the maximum version both * sides support. (If the maximum doesn't exist, we would fail to * establish a connection and produce a value of this type.) */ WorkerProto::Version daemonVersion;Near as I can tell, the entire version is just an integer. It doesn’t check for major versions or use “good” version checking because the code before it will throw errors if the major version is not aligned. That still kind of implies a major version increase (and thus a minor version reset) would break this logic.
Based on the masks, I think that there’s just a 32bit integer being serialized and sent over the wire, and then they are using bit masks to split them into two numbers.
Another part of this too is the
nulloptI think is treated differently than not being trusted at all. I don’t recall where I spotted that but I will revise if I see it again. My next lead is theTrustedFlag.
Building the lithium image
Now we’re back to trying to build the image directly. My invocation is roughly:
nix build '.#lithium'It takes a very long time.
@ nix build '.#lithium' warning: Git tree '/Users/logan/dev/proton-nix' is dirty error: build of '/nix/store/cj68c90lfwmwb21szzzqwwizl4f4ah9v-libiberty-13.2.0.drv' on 'ssh-ng://builder@linux-builder' failed: builder for '/nix/store/cj68c90lfwmwb21szzzqwwizl4f4ah9v-libiberty-13.2.0.drv' failed with exit code 1; last 1 log lines: > error: executing '/nix/store/xiicriwhj094ax7w50jzkmv32gzcdqkd-bash-5.2p26/bin/bash': Exec format error For full logs, run 'nix-store -l /nix/store/cj68c90lfwmwb21szzzqwwizl4f4ah9v-libiberty-13.2.0.drv'. error: builder for '/nix/store/cj68c90lfwmwb21szzzqwwizl4f4ah9v-libiberty-13.2.0.drv' failed with exit code 1 error: 1 dependencies of derivation '/nix/store/8c8ihnblzigyyi2gnw9wcgb97jkjskwn-libgcc-x86_64-unknown-linux-gnu-13.2.0.drv' failed to build error: 1 dependencies of derivation '/nix/store/3wyv75x3v4ghpwvr9yhvpn1zkgjhdm6g-glibc-x86_64-unknown-linux-gnu-2.38-44.drv' failed to build error: 1 dependencies of derivation '/nix/store/9vqsbcczfi33w57kmyy8w38cmqq04qz4-x86_64-unknown-linux-gnu-gcc-wrapper-13.2.0.drv' failed to build error: 1 dependencies of derivation '/nix/store/3k5q46wkrhcw7g7q0qdc0pp486yfdb6v-stdenv-linux.drv' failed to build error: 1 dependencies of derivation '/nix/store/xkmz3vkvdkw2p8vr95qwmf3zd3hyq3gi-vim-x86_64-unknown-linux-gnu-9.1.0004.drv' failed to build error: 1 dependencies of derivation '/nix/store/9r1ms037gywc03iawf7wr3lg5xflq6q1-xxd-vim-x86_64-unknown-linux-gnu-9.1.0004.drv' failed to build error: 1 dependencies of derivation '/nix/store/d9b1ldsar1kkiqzz4wsz2gbzg2k4wrsc-stub-ld-x86_64-unknown-linux-musl.drv' failed to build error: 1 dependencies of derivation '/nix/store/h8ha70j2kfsgyga0pvfvs0pyds935m2k-nixos-tmpfiles.d.drv' failed to build error: 1 dependencies of derivation '/nix/store/9wlq85n4c053f1hppd2bkcvv5lli9506-tmpfiles.d.drv' failed to build error: 1 dependencies of derivation '/nix/store/jdphdpgciii1mlkjqqrf8w8cqb5k5gpf-etc.drv' failed to build error: 1 dependencies of derivation '/nix/store/x9d4gncb56wczwagcaw2lsfg2l3skbsb-nixos-system-lithium-24.05.20240218.b98a4e1.drv' failed to build error: 1 dependencies of derivation '/nix/store/sdppyws2y15v5chwhzcswsnfyzsac9p3-closure-info.drv' failed to build error: 1 dependencies of derivation '/nix/store/0ngpb04bfvm2jx5sf4sby922qwcggzan-efi-directory.drv' failed to build error: 1 dependencies of derivation '/nix/store/6f09jjmmpyi7waims49b2nizgbvisf5v-isolinux.cfg-in.drv' failed to build error: 1 dependencies of derivation '/nix/store/0y5xb00q266h183dxvclyf111l7abdv0-nixos.iso.drv' failed to buildOh and my pull request was merged!
I need to clean this up: I made some changes in my
flake.nixfor my network repository (where I’m doing all of this work that isn’t just trying to make alinux-builder).I think I had too many things for building the
linux-builderfrom earlier attempts that was inside my network repository. Mynix.nixfile is now:{ system, buildPlatform }: { nix.settings = { experimental-features = "nix-command flakes"; auto-optimise-store = true; }; nixpkgs = { hostPlatform = { inherit system; }; }; }And even
buildPlatformshould be removed. I got much further on the run:~/dev/proton-nix on main|●2✚8?6 logan@scandium 1 [14:24:29] 0s @ nix build '.#lithium' warning: Git tree '/Users/logan/dev/proton-nix' is dirty error: build of '/nix/store/4vkjwadzg9r4679rrgyaa99gnz4mwisv-dbus-1.drv' on 'ssh-ng://builder@linux-builder' failed: builder for '/nix/store/4vkjwadzg9r4679rrgyaa99gnz4mwisv-dbus-1.drv' failed with exit code 1; last 3 log lines: > qemu-x86_64: QEMU internal SIGSEGV {code=MAPERR, addr=0x20} > /build/.attr-0l2nkwhif96f51f4amnlf414lhl4rv9vh8iffyp431v6s28gsr90: line 16: 28 Segmentation fault (core dumped) grep -q '[^[:space:]]' "$out/system.conf" > "/nix/store/fkq9rmmkf6x82qwz60qkbvx0ramxg5kd-dbus-1/system.conf" was generated incorrectly and is empty, try building again. For full logs, run 'nix-store -l /nix/store/4vkjwadzg9r4679rrgyaa99gnz4mwisv-dbus-1.drv'. error: builder for '/nix/store/4vkjwadzg9r4679rrgyaa99gnz4mwisv-dbus-1.drv' failed with exit code 1 error: 1 dependencies of derivation '/nix/store/3sg7jhsgclf3appqjqrlvvl3w18przd7-etc.drv' failed to build error: 1 dependencies of derivation '/nix/store/xn22jl3rfawfy3q11g61jz98gjwyc73p-nixos-system-lithium-24.05.20240218.b98a4e1.drv' failed to build error: 1 dependencies of derivation '/nix/store/f7afhxpy9zawz3l1dlql1nzkbh5s7r9l-closure-info.drv' failed to build error: 1 dependencies of derivation '/nix/store/nhbb7xqdvqcb1ywk4443lyykkfyn8w83-efi-directory.drv' failed to build error: 1 dependencies of derivation '/nix/store/8d9y0s3yh9acxpy5c97rv7jxci9jklzy-isolinux.cfg-in.drv' failed to build error: 1 dependencies of derivation '/nix/store/di2gnywrx9ifksjw4b8zlx3jh2zgm6w4-nixos.iso.drv' failed to buildI’ve heard the name DBus many times, but this is the first time I’ve been forced to reckon with it. This is what I got from the main site:
D-Bus is a message bus system, a simple way for applications to talk to one another. In addition to interprocess communication, D-Bus helps coordinate process lifecycle; it makes it simple and reliable to code a “single instance” application or daemon, and to launch applications and daemons on demand when their services are needed.
In addition, I found the error message in nixpkgs#pkgs/development/libraries/dbus/make-dbus-conf.nix. The blame shows the error handling added via this commit, and the commit message is:
makeDBusConf: fail if xsltproc generates empty files A few people have reported empty files in /etc/dbus-1 which can cause obscure issues. With this change, users can retry and get non-empty files.
can be tested with `makeDBusConf { suidHelper = “”; serviceDirectories = []; }`
and adding
``` rm $out/session.conf echo -n "" > $out/session.conf
echo "" > $out/session.conf```
Unfortunately, building again produces the exact same error (not surprising there). I don’t understand what high load would have to do with it, based on the comments I saw. My system wasn’t under high load.
Based on looking at other folks’
configuration.nixfiles, it makes me wonder if I’m running a little too bare bones. I do havesshdrunning, but maybe I need more? The Pi didn’t need this.Looking at my Pi (
iron.proton), I see:[logan@iron:~]$ cat /etc/systemd/system.conf [Manager] ManagerEnvironment=LOCALE_ARCHIVE='/run/current-system/sw/lib/locale/locale-archive' PATH='/nix/store/rv6q4vlvzqdhg1hhh38x65qjf7m2zhm6-zfs-user-2.2.2/bin:/nix/store/c7ridj401dp7g39c4y89nmag67np5744-xfsprogs-6.4.0-bin/bin:/nix/store/p1k3wkjkd84g980rf0ryzxzaxsr79w6l-dosfstools-4.2/bin:/nix/store/q8f410f6absndg70zc06mg114mx81qq3-mtools-4.0.43/bin:/nix/store/r0ak1lfz4nai2nfklm9rn2bwpik6xyfv-reiserfsprogs-3.6.27/bin:/nix/store/si1gm6gi82yvs8v6134fb6fncwdwcawz-ntfs3g-2022.10.3/bin:/nix/store/n83c11khj6dpngxkhlhv7l4scgbgmxxb-jfsutils-1.1.15/bin:/nix/store/z3dmw30pk8y4c37rg3f6mgz3fqh2xy34-f2fs-tools-1.16.0/bin:/nix/store/y97i9plwxkzvyzzj2w2bw9cggn8bz1r7-e2fsprogs-1.47.0-bin/bin:/nix/store/45y3b1inniswp5krd816s5ad98llsgvb-cifs-utils-7.0/bin:/nix/store/maswinc6x0zpa9nk737c51f6gzccjigz-btrfs-progs-6.6.3/bin:/nix/store/p1k3wkjkd84g980rf0ryzxzaxsr79w6l-dosfstools-4.2/bin:/nix/store/75jfmjkn5chksgn5y58srk5bqp47srjl-util-linux-minimal-2.39.2-bin/bin' TZDIR='/etc/zoneinfo' DefaultCPUAccounting=yes DefaultIOAccounting=yes DefaultBlockIOAccounting=yes DefaultIPAccounting=yes DefaultLimitCORE=infinitySo it got something during its generation. I’ve done updates and changed how things are populated since then. I suppose I could try again.
@ nix build '.#iron' warning: Git tree '/Users/logan/dev/proton-nix' is dirty error: build of '/nix/store/ikk3ksnm1a0fv2yn55599w0bz7mvafac-kernel-modules.drv' on 'ssh-ng://builder@linux-builder' failed: error: getting status of '/nix/store/ajpqgdpg1bhr2bl2q99p5vw66f1ri0bv-linux-6.1.78': No such file or directory error: builder for '/nix/store/ikk3ksnm1a0fv2yn55599w0bz7mvafac-kernel-modules.drv' failed with exit code 1 error: 1 dependencies of derivation '/nix/store/byq6aylsvxzn8sp1xh98h46zjf93ib62-nixos-system-iron-24.05.20240218.b98a4e1.drv' failed to build error: 1 dependencies of derivation '/nix/store/wjal9cfwl2322p485m8sgaax34vsgsj1-ext4-fs.img.zst.drv' failed to build error: 1 dependencies of derivation '/nix/store/82j5rvzpn2j5y09a65qlakmc7la9x8d5-nixos-sd-image-24.05.20240218.b98a4e1-aarch64-linux.img.drv' failed to buildWell, I broke it. I wish I had some indication as to how I’ve broken it. A mystery to me with Nix is trying to debug broken derivations. I know there’s a log, but oftentimes the log is exactly what I see here.
That said, I learned I can repeat the
-vargument several times likesshhas. So I can do-vvvvfor a forth level of verbosity. With-vvvv, it’s an enormous spew, and still not enough apparently:<huge snip> building of '/nix/store/yc4imxjj33scgyk0h1ajhdbs3ax0wsvg-zpool-sync-shutdown.drv^out' from .drv file: woken up building of '/nix/store/9q7f15hbrcs8a4kdgqvg5ricv1rj57w1-sshd.conf-settings.drv^out' from .drv file: trying to build locking path '/nix/store/diwbqs556c8wkc0jfb89pz4i3h4234c5-sshd.conf-settings' lock acquired on '/nix/store/diwbqs556c8wkc0jfb89pz4i3h4234c5-sshd.conf-settings.lock' removing invalid path '/nix/store/diwbqs556c8wkc0jfb89pz4i3h4234c5-sshd.conf-settings' considering building on remote machine 'ssh-ng://builder@linux-builder' hook reply is 'postpone' wait for a while lock released on '/nix/store/diwbqs556c8wkc0jfb89pz4i3h4234c5-sshd.conf-settings.lock' building of '/nix/store/yc4imxjj33scgyk0h1ajhdbs3ax0wsvg-zpool-sync-shutdown.drv^out' from .drv file: trying to build locking path '/nix/store/8sb3md4gcjkpnyjvlnd07g3k8gvlv8a3-zpool-sync-shutdown' lock acquired on '/nix/store/8sb3md4gcjkpnyjvlnd07g3k8gvlv8a3-zpool-sync-shutdown.lock' removing invalid path '/nix/store/8sb3md4gcjkpnyjvlnd07g3k8gvlv8a3-zpool-sync-shutdown' considering building on remote machine 'ssh-ng://builder@linux-builder' hook reply is 'postpone' wait for a while lock released on '/nix/store/8sb3md4gcjkpnyjvlnd07g3k8gvlv8a3-zpool-sync-shutdown.lock' waiting for the upload lock to 'ssh-ng://builder@linux-builder'... copying dependencies to 'ssh-ng://builder@linux-builder'... querying info about missing paths... copying 0 paths... querying info about missing paths... killing process 2938 error: build of '/nix/store/ikk3ksnm1a0fv2yn55599w0bz7mvafac-kernel-modules.drv' on 'ssh-ng://builder@linux-builder' failed: error: getting status of '/nix/store/ajpqgdpg1bhr2bl2q99p5vw66f1ri0bv-linux-6.1.78': No such file or directory <lesser but still huge snip>If I go up to 6… I see no difference in output.
I did find Substituters on a remote builder for a flake causes build to fail with “no such file” which seems very similar to my problem, but there’s no assistance there, nor is there any clues as to permutations I can try. I followed the post to nixpkgs#126141 and read through it. It sounds fixed, kind of? I switched up my search terms a little () and found Nix flakes /nix/store/***-source no such file or directory and the fix for someone being a cleaning of the Nix store (with the posted command). I tried it out:
[logan@nixos:~]$ sudo nix-store --repair --verify --check-contents [sudo] password for logan: reading the Nix store... checking path existence... path '/nix/store/99rfnrws0zz4bv1m2c7favncqd2archk-kernel-modules' disappeared, but it still has valid referrers! copying path '/nix/store/99rfnrws0zz4bv1m2c7favncqd2archk-kernel-modules' from 'https://cache.nixos.org'... path '/nix/store/ajpqgdpg1bhr2bl2q99p5vw66f1ri0bv-linux-6.1.78' disappeared, but it still has valid referrers! copying path '/nix/store/ajpqgdpg1bhr2bl2q99p5vw66f1ri0bv-linux-6.1.78' from 'https://cache.nixos.org'... ...Okay those two modules are were I was seeing problems! This seems very promising. Then I see:
error: cannot repair path '/nix/store/ib6nig1xpkb975mqrqbsg1sfj1x2lind-nix-store-image' path '/nix/store/kc9qgi31ii73064bpg8x95vmbhs2fqcv-login.pam' was modified! expected hash 'sha256:033bsp8yfri5vsja1ncj5avb07010w6nz5bw0kaid821b0jhwlbq', got 'sha256:0ip26j2h11n1kgkz36rl4akv694yz65hr72q4kv4b3lxcbi65b3p' copying path '/nix/store/kc9qgi31ii73064bpg8x95vmbhs2fqcv-login.pam' from 'https://cache.nixos.org'... path '/nix/store/myfrx2c9f91c41wd9yrg0c2z9d85qhjs-nix-store-image' was modified! expected hash 'sha256:0dykdqm04pcnmmp1k72vg96i56bi3041vxlpi4q6kmyqxa8db40p', got 'sha256:1bpn28kc9n32q1p59q1b6rsfngma4mjsnfxs0hqags410xh2s6zk' error: cannot repair path '/nix/store/myfrx2c9f91c41wd9yrg0c2z9d85qhjs-nix-store-image' path '/nix/store/vgh5kp5gc9zrxzm5pzq7mpyyd721iqdd-nix-store-image' was modified! expected hash 'sha256:09qsrvbc39vsb74svbfjnl6wwnvhawsj30y05d10w4h1svav0gkv', got 'sha256:0y37psc8sm48xvk5ral209801hhpm1lrd5bdcbwcala2kagkhmjs' error: cannot repair path '/nix/store/vgh5kp5gc9zrxzm5pzq7mpyyd721iqdd-nix-store-image' path '/nix/store/zdfn2mgvsyh7prh5d9bxgsvyi94mvsm8-nix-store-image' was modified! expected hash 'sha256:0anxffszglan61ys2wfj127w9z6gvp5c2k3cfripw6javm55n4w5', got 'sha256:1nkasy3fmmdxsyvbzq0fii2h7r985fmsbi1b05srik88jpw089iy' error: cannot repair path '/nix/store/zdfn2mgvsyh7prh5d9bxgsvyi94mvsm8-nix-store-image' warning: not all store errors were fixedUh oh. Per /nix/store corrupted, I tried a
nix-gcollect-garbage(I have nonixos-rebuildon myPATHfor the VM). That freed some 10GB or so of data. Then I ran the repair again, and it came back clean:[logan@nixos:~]$ sudo nix-store --repair --verify --check-contents reading the Nix store... checking path existence... checking link hashes... checking store hashes...And then about 5 minutes later, it succeeds!
$ nix build '.#iron' warning: Git tree '/Users/logan/dev/proton-nix' is dirtyOkay so now I have a baseline, working image, for a different system.
$ ls -alh result lrwxr-xr-x 1 logan 99 Feb 24 17:32 result -> /nix/store/pxp6pv1mc71p6v65xz4kbivsxw1ry2hj-nixos-sd-image-24.05.20240218.b98a4e1-aarch64-linux.img $ ls -alh $(readlink -f result)/sd-image total 931M dr-xr-xr-x 3 root 96 Dec 31 1969 . dr-xr-xr-x 4 root 128 Dec 31 1969 .. -r--r--r-- 1 root 931M Dec 31 1969 nixos-sd-image-24.05.20240218.b98a4e1-aarch64-linux.img.zstNot bad - just a gig. But it’s for the wrong system! If I attempt to build
lithiumagain, I get the same issue.Ugh, I’ve been ignoring this issue, primarily because I saw it spamming the build logs but everything seemed fine with it:
qemu-x86_64: QEMU internal SIGSEGV {code=MAPERR, addr=0x20}But I guess this is a real error. I don’t know why it was showing up for other things or how it could’ve impacted results there. My first find is nixpkgs#69158 wherein they say set
virtualisation.graphics = falsebut I don’t know where it goes. Also the issue appears to no longer be reproducible. Out of desperation, I rannix flake updateon both systems. Then realized that might not be the best to do since I didn’t check to see if these two repositories even shared the samenixpkgs- they don’t. I put them both onmaster.proton-nix, my network repository I’m building here, was the out-of-date one.Now I get:
@ nix build '.#lithium' warning: Git tree '/Users/logan/dev/proton-nix' is dirty error: build of '/nix/store/cdklsyyhal5fa920yvm0r9wzk4lllrr8-lazy-options.json.drv' on 'ssh-ng://builder@linux-builder' failed: builder for '/nix/store/cdklsyyhal5fa920yvm0r9wzk4lllrr8-lazy-options.json.drv' failed with exit code 1; last 7 log lines: > qemu-x86_64: QEMU internal SIGSEGV {code=MAPERR, addr=0x20} > /build/.attr-0l2nkwhif96f51f4amnlf414lhl4rv9vh8iffyp431v6s28gsr90: line 26: 10 Segmentation fault (core dumped) /nix/store/62fm7r6175k2bgvhc19hn9bdhw90wa3n-nix-2.18.1/bin/nix-instantiate --show-trace --eval --json --strict --argstr libPath "$libPath" --argstr pkgsLibPath "$pkgsLibPath" --argstr nixosPath "$nixosPath" --arg modules "import $modulesPath" --argstr stateVersion "24.05" --argstr release "24.05" $nixosPath/lib/eval-cacheable-options.nix > $out > Cacheable portion of option doc build failed. > Usually this means that an option attribute that ends up in documentation (eg `default` or `description`) depends on the restricted module arguments `config` or `pkgs`. > > Rebuild your configuration with `--show-trace` to find the offending location. Remove the references to restricted arguments (eg by escaping their antiquotations or adding a `defaultText`) or disable the sandboxed build for the failing module by setting `meta.buildDocsInSandbox = false`. > For full logs, run 'nix-store -l /nix/store/cdklsyyhal5fa920yvm0r9wzk4lllrr8-lazy-options.json.drv'. error: builder for '/nix/store/cdklsyyhal5fa920yvm0r9wzk4lllrr8-lazy-options.json.drv' failed with exit code 1 error: 1 dependencies of derivation '/nix/store/wj3kla5s3ag3x9vfkl7pb18w313z3n6d-options.json.drv' failed to build error: 1 dependencies of derivation '/nix/store/pwkwhixf6wxpgqlhk9x8q873a1k1swsx-nixos-configuration-reference-manpage.drv' failed to build error: 1 dependencies of derivation '/nix/store/igh3ckyrvacj4f5k3s9liclgci90wbnj-nixos-manual-html.drv' failed to build error: 1 dependencies of derivation '/nix/store/29l475x5bsgvlgbxdlsd3hldl7lnjc84-system-path.drv' failed to build error: 1 dependencies of derivation '/nix/store/r7l7nrdzfcxbn8rb4ygzzmbbdd8fkfbd-nixos-system-lithium-24.05.20240225.72804e7.drv' failed to build error: 1 dependencies of derivation '/nix/store/7qxsqifjb21cg0wjyh2ng86hcs2m169m-closure-info.drv' failed to build error: 1 dependencies of derivation '/nix/store/qwbg901jihiknccrslsmrm53dbv7l06d-efi-directory.drv' failed to build error: 1 dependencies of derivation '/nix/store/8fmi2kqr0mzilq8bamfmafxk255b3qcc-isolinux.cfg-in.drv' failed to build error: 1 dependencies of derivation '/nix/store/bwwqkqd3mmfl81pclgafjr5wln4i3s44-nixos.iso.drv' failed to buildI don’t know if this is further or not, but I take it to be progress. This is the second time I’ve come across suggestions that
callPackagemagically does the thing. I did try refactoring to that earlier but ran into trouble with dependency injection and passing variables around. I’ll have to try again. Here’s the quick refactor:packages.aarch64-darwin.iron = (pkgs.callPackage ./iron.nix { inherit nixos-generators self; }); packages.aarch64-darwin.lithium = (pkgs.callPackage ./lithium.nix { inherit nixos-generators self; });And the before, for reference:
packages.aarch64-darwin.iron = (pkgs.callPackage ./iron.nix { inherit nixos-generators self; }); packages.aarch64-darwin.lithium = (pkgs.callPackage ./lithium.nix { inherit nixos-generators self; });And magically it does! Also I had no trouble with the refactor this time. I don’t know what the difference is in my attempts. Maybe the
inherit? I sawlazy-optionsfly by in the build output. It’s not something I can capture - Nix just rewrites the line. It’s taking a long time onxscreensaverand its dependencies, which I find a bit puzzling but I’m willing to let it go. It got past that! Sorry but we’re doing the play by play. Okay I’m wrong. I’m going to stop watching this pot come to a boil.@ nix build '.#lithium' warning: Git tree '/Users/logan/dev/proton-nix' is dirty error: build of '/nix/store/84gjlw62njz86cbdq3gvdnhjbj6363rj-extra-utils.drv' on 'ssh-ng://builder@linux-builder' failed: builder for '/nix/store/84gjlw62njz86cbdq3gvdnhjbj6363rj-extra-utils.drv' failed with exit code 139; last 10 log lines: > patching /nix/store/r5y3v5cw94yrxks70j140k01na6ksxqa-extra-utils/lib/libattr.so.1... > patching /nix/store/r5y3v5cw94yrxks70j140k01na6ksxqa-extra-utils/lib/libresolv.so.2... > patching /nix/store/r5y3v5cw94yrxks70j140k01na6ksxqa-extra-utils/lib/libcrypto.so.3... > patching /nix/store/r5y3v5cw94yrxks70j140k01na6ksxqa-extra-utils/lib/libdl.so.2... > patching /nix/store/r5y3v5cw94yrxks70j140k01na6ksxqa-extra-utils/lib/libpam.so.0... > patching /nix/store/r5y3v5cw94yrxks70j140k01na6ksxqa-extra-utils/lib/libgcrypt.so.20... > testing patched programs... > qemu-x86_64: QEMU internal SIGSEGV {code=MAPERR, addr=0x20} > /build/.attr-0l2nkwhif96f51f4amnlf414lhl4rv9vh8iffyp431v6s28gsr90: line 113: 14022 Done $out/bin/ash -c 'echo hello world' > 14023 Segmentation fault (core dumped) | grep "hello world" For full logs, run 'nix-store -l /nix/store/84gjlw62njz86cbdq3gvdnhjbj6363rj-extra-utils.drv'. error: builder for '/nix/store/84gjlw62njz86cbdq3gvdnhjbj6363rj-extra-utils.drv' failed with exit code 1 error: 1 dependencies of derivation '/nix/store/h5amvj1wmvy9hq76hb3wxmj5ds5yid74-stage-1-init.sh.drv' failed to build error: 1 dependencies of derivation '/nix/store/w34whiss31s2wi23pjfp5kihk4vbhm42-initrd-linux-6.1.79.drv' failed to build error: 1 dependencies of derivation '/nix/store/bwwqkqd3mmfl81pclgafjr5wln4i3s44-nixos.iso.drv' failed to buildThe dreaded
qemuSIGSEGVagain - I don’t know if it just prints that if anything it’s emulating has an error or not. Considering that the build gets quite far and is clearly running things, but suddenly doesn’t, tells me that something fishy is going on. For funsies I ran it again. It takes a couple of minutes doing things, but fails in the same place. nixpkgs#60088 states I can runcoredumpctlwith no special privileges to see recent core dumps. And it works! It’s a big list but here’s the latest:Sun 2024-02-25 02:30:33 UTC 1285 30001 30000 SIGSEGV none /nix/store/ysikdn59kp6fmn2gh935shgx8ma8cr1l-qemu-8.2.1/bin/qemu-x86_64 - Sun 2024-02-25 02:40:05 UTC 18915 30001 30000 SIGSEGV none /nix/store/ysikdn59kp6fmn2gh935shgx8ma8cr1l-qemu-8.2.1/bin/qemu-x86_64 - Sun 2024-02-25 02:49:50 UTC 33061 30001 30000 SIGSEGV none /nix/store/ysikdn59kp6fmn2gh935shgx8ma8cr1l-qemu-8.2.1/bin/qemu-x86_64 -This doesn’t tell me much on its own. This is all new to me, so someone with better operations chops will probably see this is boring.
ulimit -cis recommended to see if the core dumps would get clipped from a lack of size.[logan@nixos:~]$ ulimit -c unlimitedSo we’re good there. The
noneabove is referencing the core file, which is discouraging. But runningcoredump infowith no arguments seems to pick up the prior core dump.[logan@nixos:~]$ coredumpctl info PID: 33061 (qemu-x86_64) UID: 30001 (nixbld1) GID: 30000 (nixbld) Signal: 11 (SEGV) Timestamp: Sun 2024-02-25 02:49:50 UTC (10min ago) Command Line: /nix/store/ysikdn59kp6fmn2gh935shgx8ma8cr1l-qemu-8.2.1/bin/qemu-x86_64 -0 grep -- /nix/store/11b3chszacfr9liy829xqknzp3q88iji-gnugrep-3.11/bin/grep $'hello world' Executable: /nix/store/ysikdn59kp6fmn2gh935shgx8ma8cr1l-qemu-8.2.1/bin/qemu-x86_64 Control Group: /system.slice/nix-daemon.service Unit: nix-daemon.service Slice: system.slice Boot ID: b5f8013281994021ae0eee3327c3ea65 Machine ID: 7a2b258dc0504f08aad6645b40de04bf Hostname: localhost Storage: none Message: Process 33061 (qemu-x86_64) of user 30001 terminated abnormally without generating a coredump.“Something crashed” isn’t very helpful. I can see how this would be helpful in other circumstances, but not my current one.
I can reproduce the segfault through my SSH session on the VM directly:
[logan@nixos:~]$ /nix/store/11b3chszacfr9liy829xqknzp3q88iji-gnugrep-3.11/bin/grep --help qemu-x86_64: QEMU internal SIGSEGV {code=MAPERR, addr=0x20} Segmentation fault (core dumped)Since I got the file there, let’s take a look at it, but line wrapped:
[logan@nixos:~]$ file /nix/store/11b3chszacfr9liy829xqknzp3q88iji-gnugrep-3.11/bin/grep /nix/store/11b3chszacfr9liy829xqknzp3q88iji-gnugrep-3.11/bin/grep: ELF 64-bit LSB executable, x86-64, version 1 (SYSV), dynamically linked, interpreter /nix/store/8mc30d49ghc8m5z96yz39srlhg5s9sjj-glibc-2.38-44/lib/ld-linux-x86-64.so.2, for GNU/Linux 3.10.0, not strippedNow let’s compare it to our pre-installed
hellopackage from earlier:[logan@nixos:~]$ file $(readlink -f $(which hello)) /nix/store/x42qkfvxxy17d2vk39010fcwacv5fb6j-hello-x86_64-unknown-linux-gnu-2.12.1/bin/hello: ELF 64-bit LSB executable, x86-64, version 1 (SYSV), dynamically linked, interpreter /nix/store/dkhhp26aj1s28b9hdy4y2d4qcmj1s6n5-glibc-x86_64-unknown-linux-gnu-2.38-44/lib/ld-linux-x86-64.so.2, for GNU/Linux 3.10.0, not strippedLet’s lay these one atop another so we can see how they differ, sans paths:
grep - ELF 64-bit LSB executable, x86-64, version 1 (SYSV), dynamically linked, interpreter /nix/store/8mc30d49ghc8m5z96yz39srlhg5s9sjj-glibc-2.38-44/lib/ld-linux-x86-64.so.2, for GNU/Linux 3.10.0, not stripped hello - ELF 64-bit LSB executable, x86-64, version 1 (SYSV), dynamically linked, interpreter /nix/store/dkhhp26aj1s28b9hdy4y2d4qcmj1s6n5-glibc-x86_64-unknown-linux-gnu-2.38-44/lib/ld-linux-x86-64.so.2, for GNU/Linux 3.10.0, not strippedThe material difference is:
grep - /nix/store/8mc30d49ghc8m5z96yz39srlhg5s9sjj-glibc-2.38-44/lib/ld-linux-x86-64.so.2 hello - /nix/store/dkhhp26aj1s28b9hdy4y2d4qcmj1s6n5-glibc-x86_64-unknown-linux-gnu-2.38-44/lib/ld-linux-x86-64.so.2So they are different versions of
glibc, which seems wrong. I’ve seen this kind of almost-platform notation before ofx86_64-unknown-linux-gnu. I recall seeing some documentation about it, but I didn’t walk away knowing when it would come up. Apparently that time is now.hellocomes from thisnixpkgs:linux-builder-pkgs = import nixpkgs { system = "aarch64-linux"; crossSystem = { config = "x86_64-unknown-linux-gnu"; }; };So it makes sense that the
x86_64-unknown-linux-gnuversion ofglibcwas used.Let’s modify the
callPackageinvocation to take a similar set of configured packages. Before I had:packages.aarch64-darwin.lithium = (pkgs.callPackage ./lithium.nix { inherit nixos-generators self; });Now I have:
packages.aarch64-darwin.lithium = (pkgs.callPackage ./lithium.nix (let linux-builder-pkgs = import nixpkgs { system = "aarch64-linux"; crossSystem = { config = "x86_64-unknown-linux-gnu"; }; }; in { inherit nixos-generators self; nixpkgs = linux-builder-pkgs; }));The
nixpkgspart is a guess, because I think that’s how it gets imported. When I run with this, I get the same error and also it just jumps in trying to build the same broken derivation as before. I would expect this kind of change to be more fundamental and require many more packages to be rebuilt. I did some digging, because I don’t just want to guess. I found an example in nixos-generators#172 which usespkgsso I tried that, but still no joy. I don’t think I’m providing what’s needed here, even if I feel like I’m on the right track.The pull request related to the ticket makes me think I’m actually not on the right track, because it looks like setting
pkgswas removed from examples and it usessystem. So this should all be taken care of already. Why isn’t it then? I came across nixos-generators#257 which looks similar to my situation, but I’m already subscribed to it.Per nixos-generators#202 I have:
[logan@nixos:~]$ command cat /proc/sys/fs/binfmt_misc/x86_64-linux enabled interpreter /run/binfmt/x86_64-linux flags: P offset 0 magic 7f454c4602010100000000000000000002003e00 mask fffffffffffefe00fffffffffffffffffeffffff [logan@nixos:~]$ command cat /proc/sys/fs/binfmt_misc/i686-linux enabled interpreter /run/binfmt/i686-linux flags: P offset 0 magic 7f454c4601010100000000000000000002000600 mask fffffffffffefe00fffffffffffffffffeffffffThis also appears healthy:
[logan@nixos:~]$ sudo systemctl status systemd-binfmt [sudo] password for logan: ● systemd-binfmt.service - Set Up Additional Binary Formats Loaded: loaded (/etc/systemd/system/systemd-binfmt.service; enabled; preset: enabled) Drop-In: /nix/store/z6fx9sd33cbhr5q8dzj551vs24j20lhv-system-units/systemd-binfmt.service.d └─overrides.conf Active: active (exited) since Sun 2024-02-25 02:29:54 UTC; 1h 20min ago Docs: man:systemd-binfmt.service(8) man:binfmt.d(5) https://docs.kernel.org/admin-guide/binfmt-misc.html https://www.freedesktop.org/wiki/Software/systemd/APIFileSystems Process: 586 ExecStart=/nix/store/m3snx62c90imgqqh2axpba6yvc3ycw9b-systemd-255.2/lib/systemd/systemd-binfmt (code=exited, status=0/SUCCESS) Main PID: 586 (code=exited, status=0/SUCCESS) IP: 0B in, 0B out CPU: 3ms Feb 25 02:29:54 nixos systemd[1]: Starting Set Up Additional Binary Formats... Feb 25 02:29:54 nixos systemd[1]: Finished Set Up Additional Binary Formats.Well now I’ve gone full circle back to Ian’s blog which inspired me to write all of this (even if it was a different post). Ian even breaks down a bit of the main Cross Compilation document that seems both very detailed and very beyond me. His breakdown did help a bit. I know better understand the “magic” behind
callPackageand its role in this whole cross compilation ordeal.I think I might need to sit on this a bit. There’s a few ideas that are bouncing around in my head:
- I might be assuming that
nixos-generateis doing everything correctly but it might not be. There are some details hidden from me and I should check up on it. It might also just help me to understand the topic better. - I could try more permutations of setting
pkgsandnixpkgsuntil I get what I want. I can visit more platform things as well. - I don’t understand why I’ve gotten far with some packages but not others. A
quick
wc -lonextra-utils/binshows thatgrepdoesn’t come first and there are 400+ executables in there. Whygrep? What made it not get compiled correctly? What made the other packages get compiled correctly? Perhaps this is part of the “things needing to be cleaned up” that Ryan’s blog / cross compilation document alludes to. It could also be thatgrepis the first one that has its own tests, which is where this fails.[logan@nixos:~]$ ls /nix/store/r5y3v5cw94yrxks70j140k01na6ksxqa-extra-utils/bin | wc -l 424 - How can I do the equivalent of
--arg crossSystemwith Nix Flake? My understanding is that anynix buildinvocation using--argis rejected or ignored because it’s impure. I’ve had no luck on my searches. This might require spelunking innixpkgs. If I can see how it’s set or where it’s read from, I could know where to put it. a. I’ve had this complaint before, but now I put it in writing: Nix and its coconspirators (such asnixpkgs, Flakes, etc.) really need to have a formal schema or type signature for their main means of consumption. I understand that some stuff is custom, or that other libraries may append to the schema, but that’s frankly a poor excuse not to have one. I may document it just because I’m so very tired searching for “nix configuration.nix example” and turning up empty. It would probably require some immense searching on my part because I really don’t know what it is. Maybe I can rely upon Cunningham’s Law by saying I have the right schema and others will smugly correct me. But I will be the smug one in the end because I laid a Cunningham Trap - a term I just coined.
I can run other commands in there, such as
mvandvi:[logan@nixos:~]$ /nix/store/r5y3v5cw94yrxks70j140k01na6ksxqa-extra-utils/bin/mv --help BusyBox v1.36.1 () multi-call binary. Usage: mv [-finT] SOURCE DEST or: mv [-fin] SOURCE... { -t DIRECTORY | DIRECTORY } Rename SOURCE to DEST, or move SOURCEs to DIRECTORY -f Don't prompt before overwriting -i Interactive, prompt before overwrite -n Don't overwrite an existing file -T Refuse to move if DEST is a directory -t DIR Move all SOURCEs into DIR [logan@nixos:~]$ /nix/store/r5y3v5cw94yrxks70j140k01na6ksxqa-extra-utils/bin/vi --help BusyBox v1.36.1 () multi-call binary. Usage: vi [-c CMD] [-R] [-H] [FILE]... Edit FILE -c CMD Initial command to run ($EXINIT and ~/.exrc also available) -R Read-only -H List available featuresThe
BusyBox v1.36.1 () multi-call binary.is an interesting aspect to this.grepis undergnu-grepand not directly underextra-utils. Also this is surprising to me:[logan@nixos:~]$ readlink -f /nix/store/r5y3v5cw94yrxks70j140k01na6ksxqa-extra-utils/bin/mv /nix/store/r5y3v5cw94yrxks70j140k01na6ksxqa-extra-utils/bin/busybox [logan@nixos:~]$ readlink -f /nix/store/r5y3v5cw94yrxks70j140k01na6ksxqa-extra-utils/bin/vi /nix/store/r5y3v5cw94yrxks70j140k01na6ksxqa-extra-utils/bin/busyboxThis leads me to think that there’s this binary wad of sorts called
busyboxythat looks at$0and then figures out what executable it really wants to invoke, which seems pretty weird to me. That said, much of this stuff looks like shell built-ins so I guess that makes a little bit of sense. Meanwhile,grepis undergnu-grepand thus doesn’t get the same benefit.[logan@nixos:~]$ file $(readlink -f /nix/store/r5y3v5cw94yrxks70j140k01na6ksxqa-extra-utils/bin/vi) /nix/store/r5y3v5cw94yrxks70j140k01na6ksxqa-extra-utils/bin/busybox: ELF 64-bit LSB executable, x86-64, version 1 (SYSV), dynamically linked, interpreter /nix/store/r5y3v5cw94yrxks70j140k01na6ksxqa-extra-utils/lib/ld-linux-x86-64.so.2, BuildID[sha1]=c44b890a362e4ca6825d6828834a79dd1d9120c7, for GNU/Linux 3.10.0, strippedTo compare the three now:
vi - /nix/store/r5y3v5cw94yrxks70j140k01na6ksxqa-extra-utils/lib/ld-linux-x86-64.so.2 grep - /nix/store/8mc30d49ghc8m5z96yz39srlhg5s9sjj-glibc-2.38-44/lib/ld-linux-x86-64.so.2 hello - /nix/store/dkhhp26aj1s28b9hdy4y2d4qcmj1s6n5-glibc-x86_64-unknown-linux-gnu-2.38-44/lib/ld-linux-x86-64.so.2So
extra-utilsis fine butglibc(no suffix) is not.I did some searching on
crossSystemand it looks like it really can just be an attribute onpkgsderivation. I’ve come to understand thatnixpkgsis the generic, big blob of packages, whereaspkgsis the ready-to-use and configured instance of anixpkgs. As such I’ve brought this back, with the addedlinux-builder-pkgs.pkgsCross.gnu64, which matches the known, working configuration used inlinux-builderto make the cross-compiled and runhellopackage on there.packages.aarch64-darwin.lithium = (pkgs.callPackage ./lithium.nix (let linux-builder-pkgs = import nixpkgs { system = "x86_64"; crossSystem = { config = "x86_64-unknown-linux-gnu"; }; }; in { inherit nixos-generators self; pkgs = linux-builder-pkgs.pkgsCross.gnu64; }));From that I get:
@ nix build '.#lithium' warning: Git tree '/Users/logan/dev/proton-nix' is dirty error: build of '/nix/store/84gjlw62njz86cbdq3gvdnhjbj6363rj-extra-utils.drv' on 'ssh-ng://builder@linux-builder' failed: builder for '/nix/store/84gjlw62njz86cbdq3gvdnhjbj6363rj-extra-utils.drv' failed with exit code 139; last 10 log lines: > patching /nix/store/r5y3v5cw94yrxks70j140k01na6ksxqa-extra-utils/lib/libattr.so.1... > patching /nix/store/r5y3v5cw94yrxks70j140k01na6ksxqa-extra-utils/lib/libresolv.so.2... > patching /nix/store/r5y3v5cw94yrxks70j140k01na6ksxqa-extra-utils/lib/libcrypto.so.3... > patching /nix/store/r5y3v5cw94yrxks70j140k01na6ksxqa-extra-utils/lib/libdl.so.2... > patching /nix/store/r5y3v5cw94yrxks70j140k01na6ksxqa-extra-utils/lib/libpam.so.0... > patching /nix/store/r5y3v5cw94yrxks70j140k01na6ksxqa-extra-utils/lib/libgcrypt.so.20... > testing patched programs... > qemu-x86_64: QEMU internal SIGSEGV {code=MAPERR, addr=0x20} > /build/.attr-0l2nkwhif96f51f4amnlf414lhl4rv9vh8iffyp431v6s28gsr90: line 113: 14022 Done $out/bin/ash -c 'echo hello world' > 14023 Segmentation fault (core dumped) | grep "hello world" For full logs, run 'nix-store -l /nix/store/84gjlw62njz86cbdq3gvdnhjbj6363rj-extra-utils.drv'. error: builder for '/nix/store/84gjlw62njz86cbdq3gvdnhjbj6363rj-extra-utils.drv' failed with exit code 1 error: 1 dependencies of derivation '/nix/store/h5amvj1wmvy9hq76hb3wxmj5ds5yid74-stage-1-init.sh.drv' failed to build error: 1 dependencies of derivation '/nix/store/w34whiss31s2wi23pjfp5kihk4vbhm42-initrd-linux-6.1.79.drv' failed to build error: 1 dependencies of derivation '/nix/store/bwwqkqd3mmfl81pclgafjr5wln4i3s44-nixos.iso.drv' failed to buildOkay back to this error again. Here’s what’s funny: I can find and run
grepnow:[logan@nixos:~]$ file $(readlink -f /nix/store/r5y3v5cw94yrxks70j140k01na6ksxqa-extra-utils/bin/grep) /nix/store/r5y3v5cw94yrxks70j140k01na6ksxqa-extra-utils/bin/busybox: ELF 64-bit LSB executable, x86-64, version 1 (SYSV), dynamically linked, interpreter /nix/store/r5y3v5cw94yrxks70j140k01na6ksxqa-extra-utils/lib/ld-linux-x86-64.so.2, BuildID[sha1]=c44b890a362e4ca6825d6828834a79dd1d9120c7, for GNU/Linux 3.10.0, stripped [logan@nixos:~]$ /nix/store/r5y3v5cw94yrxks70j140k01na6ksxqa-extra-utils/bin/grep --help BusyBox v1.36.1 () multi-call binary. Usage: grep [-HhnlLoqvsrRiwFE] [-m N] [-A|B|C N] { PATTERN | -e PATTERN... | -f FILE... } [FILE]... Search for PATTERN in FILEs (or stdin) -H Add 'filename:' prefix -h Do not add 'filename:' prefix -n Add 'line_no:' prefix -l Show only names of files that match -L Show only names of files that don't match -c Show only count of matching lines -o Show only the matching part of line -q Quiet. Return 0 if PATTERN is found, 1 otherwise -v Select non-matching lines -s Suppress open and read errors -r Recurse -R Recurse and dereference symlinks -i Ignore case -w Match whole words only -x Match whole lines only -F PATTERN is a literal (not regexp) -E PATTERN is an extended regexp -m N Match up to N times per file -A N Print N lines of trailing context -B N Print N lines of leading context -C N Same as '-A N -B N' -e PTRN Pattern to match -f FILE Read pattern from fileAnd it’s using the
busyboxstuff, same as the other things in this package.The failure with
ash -c 'echo hello world'is worth a look.[logan@nixos:~]$ ls /nix/store/r5y3v5cw94yrxks70j140k01na6ksxqa-extra-utils/bin/ash /nix/store/r5y3v5cw94yrxks70j140k01na6ksxqa-extra-utils/bin/ash [logan@nixos:~]$ /nix/store/r5y3v5cw94yrxks70j140k01na6ksxqa-extra-utils/bin/ash ~ $ [logan@nixos:~]$ /nix/store/r5y3v5cw94yrxks70j140k01na6ksxqa-extra-utils/bin/ash -c 'echo hello world' hello world [logan@nixos:~]$ file $(readlink -f /nix/store/r5y3v5cw94yrxks70j140k01na6ksxqa-extra-utils/bin/ash) /nix/store/r5y3v5cw94yrxks70j140k01na6ksxqa-extra-utils/bin/busybox: ELF 64-bit LSB executable, x86-64, version 1 (SYSV), dynamically linked, interpreter /nix/store/r5y3v5cw94yrxks70j140k01na6ksxqa-extra-utils/lib/ld-linux-x86-64.so.2, BuildID[sha1]=c44b890a362e4ca6825d6828834a79dd1d9120c7, for GNU/Linux 3.10.0, strippedSo I can enter the
ashshell, and also run the exact same thing as the test and it works just fine. Is it possible I’m not running the sameash? This one is also runningbusybox, so I shouldn’t be surprised that it also works. I found the offending line in nixpkgs/nixos/modules/system/boot/stage-1.nix. Oh, I forgot thegrepthat’s part of the same test line. Okay let’s try that all together:[logan@nixos:~]$ /nix/store/r5y3v5cw94yrxks70j140k01na6ksxqa-extra-utils/bin/ash -c 'echo hello world' | \ /nix/store/r5y3v5cw94yrxks70j140k01na6ksxqa-extra-utils/bin/grep "hello world" hello worldWe’re still functional here. I tried the build again with
--keep-failedand--debugbut I can’t see that the build was retained anywhere. I did try to find the derivation file onlinux-builderand came up empty:[logan@nixos:~]$ ls /nix/store/84gjlw62njz86cbdq3gvdnhjbj6363rj-extra-utils.drv ls: cannot access '/nix/store/84gjlw62njz86cbdq3gvdnhjbj6363rj-extra-utils.drv': No such file or directoryBut on my macOS host that runs the builder VM:
~/dev/proton-nix on main|✚9?5 logan@scandium 1 [13:22:19] 149s $ ls /nix/store/84gjlw62njz86cbdq3gvdnhjbj6363rj-extra-utils.drv /nix/store/84gjlw62njz86cbdq3gvdnhjbj6363rj-extra-utils.drvNow wait a second. This should be on
linux-builder. Why is it here? Highlighting the first line from the error:error: build of '/nix/store/84gjlw62njz86cbdq3gvdnhjbj6363rj-extra-utils.drv' on 'ssh-ng://builder@linux-builder' failed: builder for '/nix/store/84gjlw62njz86cbdq3gvdnhjbj6363rj-extra-utils.drv' failed with exit code 139;It says it is building
extra-utilsonlinux-builder. Butextra-utilsis a vast bundle of other packages. Perhaps some of the packages got built on macOS?I can’t really confirm that in any way. I know the derivation information is just this big blob of attributes and has yet to be realized necessarily, so seeing the
.drvon my macOS store and not onlinux-builderisn’t necessarily a smoking gun.I’ve been flailing at this point, and haven’t documented all of the dead ends I’ve tried. I found Cross Build x86_64-ami on aarch64 using nixos-generators which points to a
make-build-image. I confirmed thatnixos-generatorsis using it, but under some more scrutiny I noticed thatisois not covered viamake-build-image, so I decided to change myformattorawand try again. Now I’m back at the emptysystem.conferror from before. But this time I have a debug output and did some inspecting. Sure enough,extra-utilsis built successfully. So having this emptysystem.conffile is preferable than the prior error, probably.I have to admit that I’m really exhausted at this point. I feel persistent stress. I’ve sunk many hours into this over the course of many weeks. External situations are becoming more demanding for my attention. I just want to move on, and I’m considering abandoning this course. I feel like I’ve seen other “working” configurations out there which must solve this problem somehow, but I’m also starting to think that everyone just bootstraps their system with the NixOS installer and then goes from there. I did this with the Raspberry Pi though, so why I can’t do it here is flabbergasting.
Okay, that’s my emotional dump. Let’s re-center and consider how to debug this problem with the
system.conf.In the
make-dbus-conf.nixfile, I see XSLT - XML based XML transformers. Ugh. Looking back at the error:qemu-x86_64: QEMU internal SIGSEGV {code=MAPERR, addr=0x20} /build/.attr-0l2nkwhif96f51f4amnlf414lhl4rv9vh8iffyp431v6s28gsr90: line 16: 28 Segmentation fault (core dumped) grep -q '[^[:space:]]' "$out/system.conf" "/nix/store/87r0l2r106hw8q7wa94klff0i809yx3v-dbus-1/system.conf" was generated incorrectly and is empty, try building again.There’s a segfault - not “
grepreturned non-zero”. We got a core dump. I don’t recall if it’s the same core dump I was trying to view before. I see:PID: 140517 (qemu-x86_64) UID: 30001 (nixbld1) GID: 30000 (nixbld) Signal: 11 (SEGV) Timestamp: Sun 2024-02-25 07:45:42 UTC (16min ago) Command Line: /nix/store/ysikdn59kp6fmn2gh935shgx8ma8cr1l-qemu-8.2.1/bin/qemu-x86_64 -0 grep -- /nix/store/11b3chszacfr9liy829xqknzp3q88iji-gnugrep-3.11/bin/grep -q $'[^[:space:]]' /nix/store/87r0l2r106hw8q7wa94klff0i809yx3v-dbus-1/system.conf Executable: /nix/store/ysikdn59kp6fmn2gh935shgx8ma8cr1l-qemu-8.2.1/bin/qemu-x86_64 Control Group: /system.slice/nix-daemon.service Unit: nix-daemon.service Slice: system.slice Boot ID: b5f8013281994021ae0eee3327c3ea65 Machine ID: 7a2b258dc0504f08aad6645b40de04bf Hostname: localhost Storage: none Message: Process 140517 (qemu-x86_64) of user 30001 terminated abnormally without generating a coredump.Note,
lesschopped off the word, and my typical muscle memory fails me there. But typing-Scauses it to wrap the lines and I can see the full, exploded command now. The command checks out, but that could’ve been concealing some issue.[logan@nixos:~]$ /nix/store/11b3chszacfr9liy829xqknzp3q88iji-gnugrep-3.11/bin/grep --help qemu-x86_64: QEMU internal SIGSEGV {code=MAPERR, addr=0x20} Segmentation fault (core dumped)Oh look, it’s a
grepI can’t run. Again.[logan@nixos:~]$ file /nix/store/11b3chszacfr9liy829xqknzp3q88iji-gnugrep-3.11/bin/grep /nix/store/11b3chszacfr9liy829xqknzp3q88iji-gnugrep-3.11/bin/grep: ELF 64-bit LSB executable, x86-64, version 1 (SYSV), dynamically linked, interpreter /nix/store/8mc30d49ghc8m5z96yz39srlhg5s9sjj-glibc-2.38-44/lib/ld-linux-x86-64.so.2, for GNU/Linux 3.10.0, not strippedWith the busted
glibc. So I read the error wrong initially. It’s not that the file is empty - well I don’t know if it’s empty but I suspect it’s not. The current problem is thatgrepsegfaults. I’m still flailing. AddingGC_DONT_GC=1to thenix buildinvocation per nix#4246 does nothing observably different.The file is
/pkgs/os-specific/linux/minimal-bootstrap/gnugrep/default.nix. The “minimal bootstrap” part is interesting. The commits adding this file doesn’t really tell me anything about it, and there’s no comments. I imagine it was added for a reason, but as far as I know that reason was not a good reason. It could be superstition for all I know. It could be critical. Leaving behind a lack of documentation is akin to leaving behind a minefield, whose mine locations are undocumented. The only way to clear them is to have something step on them. I shall provide that foot today by replacing this mini-grep with the proper one.The parent has a comment at least:
# Prevent using top-level attrs to protect against introducing dependency on # non-bootstrap packages by mistake. Any top-level inputs must be explicitly # declared here.I also see:
gcc-latest = callPackage ./gcc/latest.nix { gcc = gcc8; gnumake = gnumake-musl; gnutar = gnutar-latest; # FIXME: not sure why new gawk doesn't work gawk = gawk-mes; };Repeated in a few places. In what way does “new
gawk” not work? No additional information - the commit body is empty, with the title being:minimal-bootstrap.gcc-latest: init at 13.2.0
And that doesn’t help me. I understand this is probably part of a greater chain of commits but I haven’t had a chance to chase all of it down. There’s just so much information out there and it’s not strung together nicely for an outsider like myself to piece it all together. But, I am getting there.
I do feel like this is getting closer to the issue though.
glibcis different because it’s using this minimal version. Inspecting thedefault.nixfor thisminimal-bootstrapgnugrep, I see it’s using something calledtinycc-mes. I know thatmuslis for the musl-libc and that’s the fancy newlibcreplacement that is super tiny and used by Alpine. Some quick searching indicates that GNU Mes is a “Scheme interpreter and C compiler for bootstrapping the GNU System”. Okay that all makes sense to me. But I don’t want it. I already have a fork ofnixpkgs. I want to feed it the sameglibcmy othergrepis getting, bloat be damned. I don’t care if I have to literally build everything from scratch so long as it works.I spent some time trying to figure out how to point my local Flake at my local
nixpkgsand eventually came across this Reddit comment with thepath:/foo/barnotation. Now I have:nixpkgs.url = "path:/Users/logan/dev/nixpkgs";This works with
nix flake updatebut not the actual build:setting 'packages.aarch64-darwin.lithium.drvPath' to failed error: … in the condition of the assert statement at /nix/store/syirv6wi0cyhipaxq8c47l3fvm9aqdii-source/lib/customisation.nix:267:17: 266| in commonAttrs // { 267| drvPath = assert condition; drv.drvPath; | ^ 268| outPath = assert condition; drv.outPath; … while calling the 'seq' builtin at /nix/store/syirv6wi0cyhipaxq8c47l3fvm9aqdii-source/lib/customisation.nix:58:32: 57| newDrv = derivation (drv.drvAttrs // (f drv)); 58| in flip (extendDerivation (seq drv.drvPath true)) newDrv ( | ^ 59| { meta = drv.meta or {}; (stack trace truncated; use '--show-trace' to show the full trace) error: A definition for option `environment.etc."nix/path/nixpkgs".source' is not of type `path'. Definition values: - In `/nix/store/wszp622jc6l3gzsj7556ny2pwcxfl2mf-source/nix-path.nix': nullI give up and just point it to the GitHub fork, and I’ll just commit+push every file change I must make. Wow this is horrible ergonomics. I understand that this is kind of what
overlaysare for, but I don’t know how to overlay something that is built to be completely independent from the rest ofnixpkgs. Maybe I should try that anyways. But first, let’s get it going the hard way. After anothernix flake update, I get the same error. Huh? Maybe I do need to make this into an overlay. From some quick poking aroundnixkpkgsitself, I might’ve made that into a bigger deal that it really is. It looks likeminimal-bootstrapis the package that gets added toall-packages. So it really should work as an overlay, I think.As an aside, I’ve gone from about 100GB free on my main disk down to 11GB over the last couple of days. I’ve heard about people saying their disk space is exhausted but I haven’t encountered it yet. I run
nix-collect-garbageand I get back some 20GB. Hmm. I can account for another 18GB from miscellaneous activities. I can breathe a little better at least. I suspect a great deal of space is going to my own local copy ofnixpkgs, which is a heavy pull forgit.This is what I think will do it:
prev: final: { minimal-bootstrap = prev.minimal-bootstrap.override { gnugrep = prev.callPackage ./gnugrep { bash = prev.minimal-bootstrap.bash_2_05; gnumake = prev.minimal-bootstrap.gnumake; tinycc = prev.tinycc-mes; }; }; }This is mostly just a copy of the
gnugrepassignment, with someprevsprinkled in there to reference back into theminimal-bootstrappackage. ThecallPackageto./gnugrepis unfortunate but not a huge hassle. It just needs a copy ofgnugrep.nixsitting locally. I can spot that. Also I need to remember: I’ve added files to the flake, so I need to add them to git! I expect this build will take longer since I ran anix-collect-garbage.The result:
setting 'packages.aarch64-darwin.lithium.drvPath' to failed error: … in the condition of the assert statement at /nix/store/776s0yzlk1mlf00xpk9cnwyjndgb1fkw-source/lib/customisation.nix:267:17: 266| in commonAttrs // { 267| drvPath = assert condition; drv.drvPath; | ^ 268| outPath = assert condition; drv.outPath; … while calling the 'seq' builtin at /nix/store/776s0yzlk1mlf00xpk9cnwyjndgb1fkw-source/lib/customisation.nix:58:32: 57| newDrv = derivation (drv.drvAttrs // (f drv)); 58| in flip (extendDerivation (seq drv.drvPath true)) newDrv ( | ^ 59| { meta = drv.meta or {}; (stack trace truncated; use '--show-trace' to show the full trace) error: A definition for option `environment.etc."nix/path/nixpkgs".source' is not of type `path'. Definition values: - In `/nix/store/2ids7add75kj4ldqnkk5vdjdkxpbb0h5-source/nix-path.nix': nullNow wait a minute - I thought this was from my switching to
nixpkgs. Some things that immediately spring to mind:- My Nix store is corrupt on
linux-builder, again. Easy to prove:[logan@nixos:~]$ nix-store --verify reading the Nix store... checking path existence... - I’ve somehow moved past the error with my adjustments, and am onto another,
real error. I can test that by removing the overlay. Removing the overlay
does nothing. Removing the
overlayslist does nothing. - Maybe it’s actually a legitimate error? I just don’t know what I changed to
fix it. I have been running
nix flake updateand commits are trickling in. I suppose any of those could’ve changed things. Unfortunately I haven’t been studious about committing myflake.lockand other work, so it would be difficult to roll back to test.
Let’s go deeper into thinking this is a real error and not some transitive error. It looks like I’m getting past the
dbus-1stuff:checking access to '/nix/store/776s0yzlk1mlf00xpk9cnwyjndgb1fkw-source/pkgs/development/libraries/dbus/make-dbus-conf.nix' evaluating file '/nix/store/776s0yzlk1mlf00xpk9cnwyjndgb1fkw-source/pkgs/development/libraries/dbus/make-dbus-conf.nix' performing daemon worker op: 7 copied source '/nix/store/776s0yzlk1mlf00xpk9cnwyjndgb1fkw-source/pkgs/development/libraries/dbus/make-system-conf.xsl' -> '/nix/store/n536iaha2b8kzm7dcjiy8b4h8aijbbw6-make-system-conf.xsl' performing daemon worker op: 7 copied source '/nix/store/776s0yzlk1mlf00xpk9cnwyjndgb1fkw-source/pkgs/development/libraries/dbus/make-session-conf.xsl' -> '/nix/store/iqfsj1zscjdxrm6dxlsr6yz3560wwlh2-make-session-conf.xsl'So maybe this is a new error. Heh, look at this gem:
~/dev/proton-nix on main|✚9?7 logan@scandium 1 [16:13:31] 6s $ nix store --verify error: unrecognised flag '--verify' Try 'nix --help' for more information. ~/dev/proton-nix on main|✚9?7 logan@scandium 1 [16:13:56] 1s @ nix-store --verify reading the Nix store... checking path existence...So
nix storeis not the same asnix-store? Sigh.Okay so breaking down this error:
error: A definition for option `environment.etc."nix/path/nixpkgs".source' is not of type `path'. Definition values: - In `/nix/store/gzrxzy00mhdhrp059xgngkprg0bii50p-source/nix-path.nix': nullMeans there is an expression that is saying “I want you to write a file called
nix/path/nixpkgsbut I think the value is a supposed to be a file name and not a qualified path. I did a search in my repository for that string and sure enough, I have itnix-path.nix:# This will additionally add your inputs to the system's legacy channels. # Making legacy nix commands consistent as well, awesome! nix.nixPath = ["/etc/nix/path"];I’d yanked this code from someone else’s VM setup. It’s very likely not needed. The comment doesn’t make sense to me either, but I’d preserved it in hopes that it would later make sense. It still doesn’t make sense. Let’s just get rid of it, since it appears to be causing a problem. In fact, now that I look at the rest of the code in the file, I can see that it’s all interconnected. I shouldn’t include this file at all. I’ll remove it from the
moduleslisting on the hosts I have. Here’s the whole thing, for reference:{ config, lib, ... }: { # This will additionally add your inputs to the system's legacy channels. # Making legacy nix commands consistent as well, awesome! nix.nixPath = ["/etc/nix/path"]; environment.etc = lib.mapAttrs' (name: value: { name = "nix/path/${name}"; value.source = value.flake; }) config.nix.registry; }Now I get:
error: build of '/nix/store/g9aqal322zaggxvac5gpqa4jl0m0cl9k-lazy-options.json.drv' on 'ssh-ng://builder@linux-builder' failed: builder for '/nix/store/g9aqal322zaggxvac5gpqa4jl0m0cl9k-lazy-options.json.drv' failed with exit code 1; last 7 log lines: > qemu-x86_64: QEMU internal SIGSEGV {code=MAPERR, addr=0x20} > /build/.attr-0l2nkwhif96f51f4amnlf414lhl4rv9vh8iffyp431v6s28gsr90: line 26: 10 Segmentation fault (core dumped) /nix/store/62fm7r6175k2bgvhc19hn9bdhw90wa3n-nix-2.18.1/bin/nix-instantiate --show-trace --eval --json --strict --argstr libPath "$libPath" --argstr pkgsLibPath "$pkgsLibPath" --argstr nixosPath "$nixosPath" --arg modules "import $modulesPath" --argstr stateVersion "24.05" --argstr release "24.05" $nixosPath/lib/eval-cacheable-options.nix > $out > Cacheable portion of option doc build failed. > Usually this means that an option attribute that ends up in documentation (eg `default` or `description`) depends on the restricted module arguments `config` or `pkgs`. > > Rebuild your configuration with `--show-trace` to find the offending location. Remove the references to restricted arguments (eg by escaping their antiquotations or adding a `defaultText`) or disable the sandboxed build for the failing module by setting `meta.buildDocsInSandbox = false`. > For full logs, run 'nix-store -l /nix/store/g9aqal322zaggxvac5gpqa4jl0m0cl9k-lazy-options.json.drv'.The error message sounds like it would be really helpful if it were triggered on the common error it was built to address. That is not my case. I see more of the
QEMU internal SIGSEGV. It looks likenix-instantiateis the problem now.[logan@nixos:~]$ file $(readlink -f /nix/store/62fm7r6175k2bgvhc19hn9bdhw90wa3n-nix-2.18.1/bin/nix-instantiate) /nix/store/62fm7r6175k2bgvhc19hn9bdhw90wa3n-nix-2.18.1/bin/nix: ELF 64-bit LSB pie executable, x86-64, version 1 (SYSV), dynamically linked, interpreter /nix/store/8mc30d49ghc8m5z96yz39srlhg5s9sjj-glibc-2.38-44/lib/ld-linux-x86-64.so.2, BuildID[sha1]=f91c4b74df9835b6f530d8f3fdd0625d90c7c35b, for GNU/Linux 3.10.0, not strippedSo it’s the same
glibcI’ve been having trouble with. But now it’s an actual Nix command! Oof. Okay what about that overlay again? I need to set it as part oflinux-builder-pkgs. I’d stupidly put it in thelet...inblock, which won’t do anything. Fixed with this:But still no joy. As I spelunk deeper I do find this amusing gem:
ghost on May 5, 2023 Can somebody please explain to me why it's called blood-elf? I've figured out all the other bizzare names (kaem, M2-Planet, MES) but this one still eludes me. emilytrau on May 8, 2023 <oriansj> emilytrau[m]: it is called blood-elf because it kills the dwarf (stub) problem we had. [Because our ouput files needed generated dwarf stubs needed for objdump -d to get function names, which is what blood-elf producesBut silly names aside, I’m not really sure where to go next here.
Further reading into Emily Trau’s work reveals that this is pretty cutting edge stuff. This confirms my suspicions that folks are likely just bootstrapping as a separate step. I’m not sure what the etiquette is here. I’m not entitled to any help, let alone the highly skilled work required to pull this stuff off. Even if most of the work is Emily’s, it’s still a community-supported activity that many folks could weigh in on. Perhaps most importantly: I feel like there’s more I can do to educate myself here. I know little about the C and C++ toolchains, let alone how they apply here. But knowing how they work is key to all of this. I’m also a little bit of a Nix baby, and I’m only going to get better at it by diving in.
I’d like to start tracking the dates going forward, because I think it helps tell part of the story. I’ve been working on this for about two to three weeks now, and probably about a 2-3 hour daily average. This is hard stuff!
Okay, so with some resolve steeled, let’s go back into this back into this. The
glibc-2.38-44/lib/ld-linux-x86-64.so.2library just isn’t working in my context. I can reliably cause the segfault by invoking binaries built with the bootstrapping mechanism.Some of the big actors here, all of which I’ve looked up briefly:
glibc: The GNU implementation of the C standard library.QEMU : (Quick Emulator) - Emulates other platforms via some translation and some virtualization (like how VMs work).
ELF : Executable and Link Format. It’s a generalized format for executable files, “object code”, libraries, and core dumps. It’s able to handle different platforms and architectures. a. ELF binaries have a header which contains meta-information about the executable. I gather this is how
fileis able to tell me about the binary. b. I havefileworking so I probably don’t need to have a decomposed understanding of the header for this endeavor.Object code : This is compiler output. If you compile a C file (such as
foo.c) you will get afoo.ofile as its output. This is before any sort of linking is done. I don’t know how object code differs from machine code, but I suspect it isn’t relevant here.ABI : Application Binary Interface. This is like an API for machines. So basically, at the hardware level, piece of software can communicate with another. A common occurrence is for a program to call a library. For example, it would define byte size for numbers during a call, the address of the call itself, etc. This term has come up a lot in this space, so I think it’s good to call out.
Let’s take a look at our error again to see if it makes any more sense:
qemu-x86_64: QEMU internal SIGSEGV {code=MAPERR, addr=0x20}Both
MAPERRand0x20are notable to me now. I recall seeing in the ELF Wikipedia article a section on the file header and it has lots of addresses. Additionally, the diagram shows a “Mapping” step in the loading process. I believe this corresponds toMAPERR. So I dive in further. 0x20 has “Points to the start of the section header table.” in its description. So I think we’re onto something. I should be careful though, because the “program” section also has a 0x20 which is “Size in bytes of the segment in the file image. May be 0.”, and the “section”… section is also “Size in bytes of the section in the file image. May be 0.” - I think we’re probably good with the “file” section.That was a really helpful exercise! I also found I can use
readelfto get more information thanfile.readelfis not available onlinux-builder, but that’s an easy fix. I did some searching around and I guess it’s inbinutils-unwrapped. I went down a rabbit hole trying to getcommand-not-foundornix-indexworking onlinux-builderbut I think it requires a workingnix-channelsetup, which I do not have currently. A quick invocation ofreadelfshows that it won’t just take a path - it needs to be told what to print. I tried--allfirst, but it was massive. I tried--file-headernext and that was much more sensible:[logan@nixos:~]$ readelf --file-header $(readlink -f /nix/store/62fm7r6175k2bgvhc19hn9bdhw90wa3n-nix-2.18.1/bin/nix-instantiate) ELF Header: Magic: 7f 45 4c 46 02 01 01 00 00 00 00 00 00 00 00 00 Class: ELF64 Data: 2's complement, little endian Version: 1 (current) OS/ABI: UNIX - System V ABI Version: 0 Type: DYN (Position-Independent Executable file) Machine: Advanced Micro Devices X86-64 Version: 0x1 Entry point address: 0x99b20 Start of program headers: 64 (bytes into file) Start of section headers: 2937344 (bytes into file) Flags: 0x0 Size of this header: 64 (bytes) Size of program headers: 56 (bytes) Number of program headers: 13 Size of section headers: 64 (bytes) Number of section headers: 33 Section header string table index: 32By itself, nothing really stands out to me here. But let’s look at the working executable,
hello:[logan@nixos:~]$ readelf --file-header $(readlink -f $(which hello)) ELF Header: Magic: 7f 45 4c 46 02 01 01 00 00 00 00 00 00 00 00 00 Class: ELF64 Data: 2's complement, little endian Version: 1 (current) OS/ABI: UNIX - System V ABI Version: 0 Type: EXEC (Executable file) Machine: Advanced Micro Devices X86-64 Version: 0x1 Entry point address: 0x402810 Start of program headers: 64 (bytes into file) Start of section headers: 58432 (bytes into file) Flags: 0x0 Size of this header: 64 (bytes) Size of program headers: 56 (bytes) Number of program headers: 13 Size of section headers: 64 (bytes) Number of section headers: 30 Section header string table index: 29Nothing really stands out to me here. I could start using more arguments (like
--program-headers). I’m feeling around blind here, but one last try:[logan@nixos:~]$ readelf --program-headers $(readlink -f /nix/store/62fm7r6175k2bgvhc19hn9bdhw90wa3n-nix-2.18.1/bin/nix-instantiate) Elf file type is DYN (Position-Independent Executable file) Entry point 0x99b20 There are 13 program headers, starting at offset 64 Program Headers: Type Offset VirtAddr PhysAddr FileSiz MemSiz Flags Align PHDR 0x0000000000000040 0x0000000000000040 0x0000000000000040 0x00000000000002d8 0x00000000000002d8 R 0x8 INTERP 0x0000000000000318 0x0000000000000318 0x0000000000000318 0x0000000000000053 0x0000000000000053 R 0x1 [Requesting program interpreter: /nix/store/8mc30d49ghc8m5z96yz39srlhg5s9sjj-glibc-2.38-44/lib/ld-linux-x86-64.so.2] LOAD 0x0000000000000000 0x0000000000000000 0x0000000000000000 0x0000000000062e48 0x0000000000062e48 R 0x1000 LOAD 0x0000000000063000 0x0000000000063000 0x0000000000063000 0x000000000013063d 0x000000000013063d R E 0x1000 LOAD 0x0000000000194000 0x0000000000194000 0x0000000000194000 0x000000000006e39d 0x000000000006e39d R 0x1000 LOAD 0x0000000000202be8 0x0000000000203be8 0x0000000000203be8 0x000000000002a378 0x000000000002aae0 RW 0x1000 DYNAMIC 0x000000000022a4d0 0x000000000022b4d0 0x000000000022b4d0 0x0000000000000300 0x0000000000000300 RW 0x8 NOTE 0x0000000000000370 0x0000000000000370 0x0000000000000370 0x0000000000000040 0x0000000000000040 R 0x8 NOTE 0x00000000000003b0 0x00000000000003b0 0x00000000000003b0 0x0000000000000044 0x0000000000000044 R 0x4 GNU_PROPERTY 0x0000000000000370 0x0000000000000370 0x0000000000000370 0x0000000000000040 0x0000000000000040 R 0x8 GNU_EH_FRAME 0x00000000001cd448 0x00000000001cd448 0x00000000001cd448 0x00000000000063a4 0x00000000000063a4 R 0x4 GNU_STACK 0x0000000000000000 0x0000000000000000 0x0000000000000000 0x0000000000000000 0x0000000000000000 RW 0x10 GNU_RELRO 0x0000000000202be8 0x0000000000203be8 0x0000000000203be8 0x0000000000029418 0x0000000000029418 R 0x1 Section to Segment mapping: Segment Sections... 00 01 .interp 02 .interp .note.gnu.property .note.gnu.build-id .note.ABI-tag .hash .gnu.hash .dynsym .dynstr .gnu.version .gnu.version_r .rela.dyn .rela.plt 03 .init .plt .plt.got .text .fini 04 .rodata .eh_frame_hdr .eh_frame .gcc_except_table 05 .init_array .fini_array .data.rel.ro .dynamic .got .data .bss 06 .dynamic 07 .note.gnu.property 08 .note.gnu.build-id .note.ABI-tag 09 .note.gnu.property 10 .eh_frame_hdr 11 12 .init_array .fini_array .data.rel.ro .dynamic .got[logan@nixos:~]$ readelf --program-headers $(readlink -f $(which hello)) Elf file type is EXEC (Executable file) Entry point 0x402810 There are 13 program headers, starting at offset 64 Program Headers: Type Offset VirtAddr PhysAddr FileSiz MemSiz Flags Align PHDR 0x0000000000000040 0x0000000000400040 0x0000000000400040 0x00000000000002d8 0x00000000000002d8 R 0x8 INTERP 0x0000000000000318 0x0000000000400318 0x0000000000400318 0x000000000000006c 0x000000000000006c R 0x1 [Requesting program interpreter: /nix/store/dkhhp26aj1s28b9hdy4y2d4qcmj1s6n5-glibc-x86_64-unknown-linux-gnu-2.38-44/lib/ld-linux-x86-64.so.2] LOAD 0x0000000000000000 0x0000000000400000 0x0000000000400000 0x0000000000001648 0x0000000000001648 R 0x1000 LOAD 0x0000000000002000 0x0000000000402000 0x0000000000402000 0x0000000000006a31 0x0000000000006a31 R E 0x1000 LOAD 0x0000000000009000 0x0000000000409000 0x0000000000409000 0x0000000000002200 0x0000000000002200 R 0x1000 LOAD 0x000000000000bad0 0x000000000040cad0 0x000000000040cad0 0x00000000000005bc 0x0000000000000788 RW 0x1000 DYNAMIC 0x000000000000bbd8 0x000000000040cbd8 0x000000000040cbd8 0x0000000000000210 0x0000000000000210 RW 0x8 NOTE 0x0000000000000388 0x0000000000400388 0x0000000000400388 0x0000000000000040 0x0000000000000040 R 0x8 NOTE 0x00000000000003c8 0x00000000004003c8 0x00000000004003c8 0x0000000000000020 0x0000000000000020 R 0x4 GNU_PROPERTY 0x0000000000000388 0x0000000000400388 0x0000000000400388 0x0000000000000040 0x0000000000000040 R 0x8 GNU_EH_FRAME 0x0000000000009cd4 0x0000000000409cd4 0x0000000000409cd4 0x0000000000000384 0x0000000000000384 R 0x4 GNU_STACK 0x0000000000000000 0x0000000000000000 0x0000000000000000 0x0000000000000000 0x0000000000000000 RW 0x10 GNU_RELRO 0x000000000000bad0 0x000000000040cad0 0x000000000040cad0 0x0000000000000530 0x0000000000000530 R 0x1 Section to Segment mapping: Segment Sections... 00 01 .interp 02 .interp .note.gnu.property .note.ABI-tag .hash .gnu.hash .dynsym .dynstr .gnu.version .gnu.version_r .rela.dyn .rela.plt 03 .init .plt .text .fini 04 .rodata .eh_frame_hdr .eh_frame 05 .init_array .fini_array .data.rel.ro .dynamic .got .data .bss 06 .dynamic 07 .note.gnu.property 08 .note.ABI-tag 09 .note.gnu.property 10 .eh_frame_hdr 11 12 .init_array .fini_array .data.rel.ro .dynamic .gotThe thing that stands out is the
INTERP(which I assume is “interpreter”) field. This doesn’t give me any new information for the value, but the name of the field (INTERP/ interpreter) can help me refine future queries.A possible tangent: While I was fiddling around with
command-not-foundandnix-index, I cleaned up thedarwin.nixinvocation to use the internalcallPackagethat is called for everything inmodules. This is what it looks like now:darwinConfigurations."scandium" = darwin.lib.darwinSystem { inherit system; modules = [ home-manager.darwinModules.home-manager # Before I was using a curried function to pass these things in, but # the _module.args idiom is how I can ensure these values get passed # via the internal callPackage mechanism for darwinSystem on these # modules. We want callPackage because it does automatic "splicing" # of nixpkgs to achieve cross-system compiling. I don't know that we # need to use this at this point, but making it all consistent has # value. { _module.args.linux-builder-enabled = true; _module.args.nixpkgs = nixpkgs; } ./darwin.nix ]; };My next run of
nix-darwin-switchseemed to pull down a lot. So it probably had an effect, but with another run I don’t see any changes:error: build of '/nix/store/g9aqal322zaggxvac5gpqa4jl0m0cl9k-lazy-options.json.drv' on 'ssh-ng://builder@linux-builder' failed: builder for '/nix/store/g9aqal322zaggxvac5gpqa4jl0m0cl9k-lazy-options.json.drv' failed with exit code 1; last 7 log lines: > qemu-x86_64: QEMU internal SIGSEGV {code=MAPERR, addr=0x20} > /build/.attr-0l2nkwhif96f51f4amnlf414lhl4rv9vh8iffyp431v6s28gsr90: line 26: 10 Segmentation fault (core dumped) /nix/store/62fm7r6175k2bgvhc19hn9bdhw90wa3n-nix-2.18.1/bin/nix-instantiate --show-trace --eval --json --strict --argstr libPath "$libPath" --argstr pkgsLibPath "$pkgsLibPath" --argstr nixosPath "$nixosPath" --arg modules "import $modulesPath" --argstr stateVersion "24.05" --argstr release "24.05" $nixosPath/lib/eval-cacheable-options.nix > $out > Cacheable portion of option doc build failed. > Usually this means that an option attribute that ends up in documentation (eg `default` or `description`) depends on the restricted module arguments `config` or `pkgs`. > > Rebuild your configuration with `--show-trace` to find the offending location. Remove the references to restricted arguments (eg by escaping their antiquotations or adding a `defaultText`) or disable the sandboxed build for the failing module by setting `meta.buildDocsInSandbox = false`. > For full logs, run 'nix-store -l /nix/store/g9aqal322zaggxvac5gpqa4jl0m0cl9k-lazy-options.json.drv'.I thought it could especially because
callPackagedoes some stuff to “splice”pkgsbased onsystem,hostPlatform,buildPlatform, andtargetPlatformaccording to some things I’ve read but no longer have links on hand.As part of jumping around a lot I noticed that the failing command contains two
nix-instantiatecalls. These are separatenix-instantiateexecutables sitting in the store! But they are just two separate symlinks to the samenixbinary:[logan@nixos:~]$ file $(readlink -f /nix/store/62fm7r6175k2bgvhc19hn9bdhw90wa3n-nix-2.18.1/bin/nix-instantiate) /nix/store/62fm7r6175k2bgvhc19hn9bdhw90wa3n-nix-2.18.1/bin/nix: ELF 64-bit LSB pie executable, x86-64, version 1 (SYSV), dynamically linked, interpreter /nix/store/8mc30d49ghc8m5z96yz39srlhg5s9sjj-glibc-2.38-44/lib/ld-linux-x86-64.so.2, BuildID[sha1]=f91c4b74df9835b6f530d8f3fdd0625d90c7c35b, for GNU/Linux 3.10.0, not stripped [logan@nixos:~]$ file $(readlink -f /nix/store/62fm7r6175k2bgvhc19hn9bdhw90wa3n-nix-2.18.1/bin/nix-instantiate) /nix/store/62fm7r6175k2bgvhc19hn9bdhw90wa3n-nix-2.18.1/bin/nix: ELF 64-bit LSB pie executable, x86-64, version 1 (SYSV), dynamically linked, interpreter /nix/store/8mc30d49ghc8m5z96yz39srlhg5s9sjj-glibc-2.38-44/lib/ld-linux-x86-64.so.2, BuildID[sha1]=f91c4b74df9835b6f530d8f3fdd0625d90c7c35b, for GNU/Linux 3.10.0, not strippedI’ve tried setting
hostPlatform,buildPlatform, andtargetPlatformto no avail. Though I don’t feel great about ticking that box since I’ve set them on animport nixpkgspassed attribute set, but I don’t know if that’s where they should go. The best examples I can find is that they hang off ofstdenv. So how does one customizestdenv? I don’t think I can in this case, due to the nature of howminimal-bootstrap.nixworks. It’s also over my head at the moment. My motivation is waning. I believe for now I’m just going to have to download an installer.I go to Creating a NixOS live CD on the official Wiki. I see this pretty quickly:
{ config, pkgs, ... }: { imports = [ <nixpkgs/nixos/modules/installer/cd-dvd/installation-cd-minimal.nix> # Provide an initial copy of the NixOS channel so that the user # doesn't need to run "nix-channel --update" first. <nixpkgs/nixos/modules/installer/cd-dvd/channel.nix> ]; environment.systemPackages = [ pkgs.neovim ]; }Okay so the installer is imported. And the
cd-dvdchannel. But also there’sneovimsitting there. Why isneovimthere? Wait a second. Wait… Is this what I think it is? The documentation has testing instructions with SSH. It just needs some additional configuration.{ ... # Enable SSH in the boot process. systemd.services.sshd.wantedBy = pkgs.lib.mkForce [ "multi-user.target" ]; users.users.root.openssh.authorizedKeys.keys = [ "ssh-ed25519 AaAeeeeeeeeeeeeeeeeeeeeeeeeeeeeeeeeeeeeeeeeeeeeeeeeeeeeeeeeeeeeeeeee username@host" ]; ... }So this is initial state, deployed by an installer.
Deep. Breaths.
Okay but this is actually good in a way. I don’t have to write directly to a disk but instead I can just boot up through the installer and I guess it just works? Or perhaps this is just my bootstrap state, where I can then run the installer for NixOS itself.
It takes some nudging things around but eventually I arrive at:
# flake.nix: nixosConfigurations.lithium-installer = nixpkgs.lib.nixosSystem (import ./lithium.nix { inherit nixpkgs; pkgs = import nixpkgs { system = "x86_64"; }; }); packages.aarch64-darwin.lithium-installer = self .nixosConfigurations .lithium-installer .config .system .build .isoImage ; # lithium.nix: { nixpkgs, ... } : let system = "x86_64-linux"; in { inherit system; modules = [ # self.nixosModules.vm "${nixpkgs}/nixos/modules/installer/cd-dvd/installation-cd-minimal.nix" ./logan.nix (import ./nix.nix { inherit system; buildPlatform = "aarch64-linux"; }) # ./nix-path.nix ./sshd.nix (import ./lithium-configuration.nix { inherit system; }) ]; } # sshd.nix (just mkForce added): { lib, ... }: { # This setups a SSH server. services.openssh = { enable = true; settings = { # Forbid root login through SSH. ISO installer configurations will turn # this on, but we don't want that since we're using our own, blessed # settings. PermitRootLogin = lib.mkForce "no"; # Use keys only. Remove if you want to SSH using password (not # recommended). PasswordAuthentication = false; }; }; }My new invocation is:
nix build '.#lithium-installer' --debugMy reward:
error: build of '/nix/store/g9aqal322zaggxvac5gpqa4jl0m0cl9k-lazy-options.json.drv' on 'ssh-ng://builder@linux-builder' failed: builder for '/nix/store/g9aqal322zaggxvac5gpqa4jl0m0cl9k-lazy-options.json.drv' failed with exit code 1; last 7 log lines: > qemu-x86_64: QEMU internal SIGSEGV {code=MAPERR, addr=0x20} > /build/.attr-0l2nkwhif96f51f4amnlf414lhl4rv9vh8iffyp431v6s28gsr90: line 26: 10 Segmentation fault (core dumped) /nix/store/62fm7r6175k2bgvhc19hn9bdhw90wa3n-nix-2.18.1/bin/nix-instantiate --show-trace --eval --json --strict --argstr libPath "$libPath" --argstr pkgsLibPath "$pkgsLibPath" --argstr nixosPath "$nixosPath" --arg modules "import $modulesPath" --argstr stateVersion "24.05" --argstr release "24.05" $nixosPath/lib/eval-cacheable-options.nix > $out > Cacheable portion of option doc build failed. > Usually this means that an option attribute that ends up in documentation (eg `default` or `description`) depends on the restricted module arguments `config` or `pkgs`. > > Rebuild your configuration with `--show-trace` to find the offending location. Remove the references to restricted arguments (eg by escaping their antiquotations or adding a `defaultText`) or disable the sandboxed build for the failing module by setting `meta.buildDocsInSandbox = false`. > For full logs, run 'nix-store -l /nix/store/g9aqal322zaggxvac5gpqa4jl0m0cl9k-lazy-options.json.drv'.The exact same place as before. I spend hours trying to disable
documentation, which is where thislazy-options.jsonthing is coming from. I try:# In lithium-configuration.nix: nixpkgs.overlays = [ (prev: final: { nixos-configuration-reference-manpage = builtins.trace "lithium-configuration overlay for nixos-configuration-reference-manpage" prev.stdenv.mkDerivation { name = "nixos-configuration-reference-manpage"; }; documentation = builtins.trace "lithium-configuration overlay for documentation" prev.documentation.overrideAttrs { baseOptionsJSON = null; }; ocumentation = builtins.trace "lithium-configuration overlay for ocumentation" prev.ocumentation.overrideAttrs { baseOptionsJSON = null; }; # documentation = prev.stdenv.mkDerivation { # name = "documentation"; # }; # # So the package may not event exist? # ocumentation = prev.stdenv.mkDerivation { # name = "documentation"; # }; }) ]; documentation.enable = false; documentation.nixos.enable = false; documentation.doc.enable = false; documentation.info.enable = false; # In flake.nix: nixosConfigurations.lithium-installer = nixpkgs.lib.nixosSystem (import ./lithium.nix { inherit nixpkgs; pkgs = import nixpkgs { overlays = [ (prev: final: { nixos-configuration-reference-manpage = builtins.trace "flake.nix overlay for nixos-configuration-reference-manpage" prev.stdenv.mkDerivation { name = "nixos-configuration-reference-manpage"; }; documentation = builtins.trace "flake.nix overlay for documentation" prev.documentation.overrideAttrs { baseOptionsJSON = null; }; ocumentation = builtins.trace "flake.nix overlay for ocumentation" prev.ocumentation.overrideAttrs { baseOptionsJSON = null; }; # documentation = prev.stdenv.mkDerivation { # name = "documentation"; # }; # # So the package may not event exist? # ocumentation = prev.stdenv.mkDerivation { # name = "documentation"; # }; }) ]; system = "x86_64"; }; });Nothing. Nada. Zilch. But this gives me some output at least:
{ nixpkgs, ... } : let system = "x86_64-linux"; in builtins.trace "lithium itself" { inherit system; modules = [ # self.nixosModules.vm "${nixpkgs}/nixos/modules/installer/cd-dvd/installation-cd-minimal.nix" ./logan.nix (import ./nix.nix { inherit system; buildPlatform = "aarch64-linux"; }) # ./nix-path.nix ./sshd.nix (import ./lithium-configuration.nix { inherit system; }) ]; }Where a
builtins.traceis added to the top-level value. I see zero evidence that my overlays are used. The documenteddocumentation.enabledoesn’t prevent this evaluation apparently. Types. Types! Ugh I have seen Nix maintainers argue against types and I just can’t agree with them here. I have no idea what this wants from me, and I have nothing to guide me. I’m moving past frustrated.Okay so wait - I might have built an image. I decided to move things back a bit. It’s all inline.
# In flake.nix: nixosConfigurations.demo-installer = nixpkgs.lib.nixosSystem (let system = "x86_64-linux"; pkgs = import nixpkgs { # overlays = [ # (final: prev: { # nixos-configuration-reference-manpage = # builtins.traceVerbose "flake.nix overlay for nixos-configuration-reference-manpage" # prev.stdenv.mkDerivation { # name = "nixos-configuration-reference-manpage"; # }; # }) # ]; inherit system; }; in builtins.traceVerbose "demo-installer" (nixpkgs.lib.nixosSystem { inherit system; modules = [ { environment.systemPackages = [ pkgs.hello ]; # nixpkgs.overlays = [ # (final: prev: { # documentation = # builtins.traceVerbose "nixos-configuration overlay for documentation" # prev.documentation.overrideAttrs { # baseOptionsJSON = null; # }; # }) # ]; } ]; }) );When the overlays were uncommented, I still didn’t see evidence they were used. But I can save that for another day if I can get the thing to actually work. Let’s slowly refactor it to make it look more like
lithium.nix, or take things out oflithium.nixto assist the process.I’ve been chasing this one down for about two hours:
@ nix build '.#nixosConfigurations.demo-installer.config.system.build.isoImage' --trace-verbose warning: Git tree '/Users/logan/dev/proton-nix' is dirty trace: { modules = [ "/nix/store/zjavgzkmcrk3d8fn1y4qjh4xbsgyafih-source/nixos/modules/installer/cd-dvd/installation-cd-minimal.nix" { environment = { systemPackages = [ ]; }; } ]; system = "x86_64-linux"; } error: … from call site at /nix/store/zjavgzkmcrk3d8fn1y4qjh4xbsgyafih-source/flake.nix:23:11: 22| nixosSystem = args: 23| import ./nixos/lib/eval-config.nix ( | ^ 24| { error: function 'anonymous lambda' called with unexpected argument 'type' at /nix/store/zjavgzkmcrk3d8fn1y4qjh4xbsgyafih-source/nixos/lib/eval-config.nix:11:1: 10| # types.submodule instead of using eval-config.nix 11| evalConfigArgs@ | ^ 12| { # !!! system can be set modularly, would be nice to remove,Which translates to “you passed me an attribute set and not an actual module - a function that takes the
callPackagedependency injection and returns an attribute set of NixOS module values”. What? You didn’t see that in there either? Sigh. I know it’s open source, and really I should open a ticket at the very least.I’ve run it again with the fix (I’ll post below), and it’s still broken. I have seen some documentation stating that this won’t work with cross compiling, but I don’t see that here. It still would be really nice to just outright override and disable that cursed
documentationpackage.Then I look in
installation-cd-minimal.nixand I see it. The cause of my woes. My nemesis.# This module defines a small NixOS installation CD. It does not # contain any graphical stuff. { lib, ... }: { imports = [ ../../profiles/minimal.nix ./installation-cd-base.nix ]; # Causes a lot of uncached builds for a negligible decrease in size. environment.noXlibs = lib.mkOverride 500 false; documentation.man.enable = lib.mkOverride 500 true; # Although we don't really need HTML documentation in the minimal installer, # not including it may cause annoying cache misses in the case of the NixOS manual. documentation.doc.enable = lib.mkOverride 500 true; fonts.fontconfig.enable = lib.mkOverride 500 false; isoImage.edition = lib.mkOverride 500 "minimal"; }Hah! I have found it! And I know it’s secret now. Now I will be the victor.
I just add these in one of the modules:
documentation.enable = pkgs.lib.mkForce false; documentation.man.enable = pkgs.lib.mkForce true; documentation.nixos.enable = pkgs.lib.mkForce false; documentation.doc.enable = pkgs.lib.mkForce false; documentation.info.enable = pkgs.lib.mkForce false;And now I am rewarded with this:
error: build of '/nix/store/6x9pvkg0524d9svsw550x2yxdy88lyi6-dbus-1.drv' on 'ssh-ng://builder@linux-builder' failed: builder for '/nix/store/6x9pvkg0524d9svsw550x2yxdy88lyi6-dbus-1.drv' failed with exit code 1; last 3 log lines: > qemu-x86_64: QEMU internal SIGSEGV {code=MAPERR, addr=0x20} > /build/.attr-0l2nkwhif96f51f4amnlf414lhl4rv9vh8iffyp431v6s28gsr90: line 16: 28 Segmentation fault (core dumped) grep -q '[^[:space:]]' "$out/system.conf" > "/nix/store/b9bwmqmf9kyqxlrxv6c3i79kcx8dz6hh-dbus-1/system.conf" was generated incorrectly and is empty, try building again. For full logs, run 'nix-store -l /nix/store/6x9pvkg0524d9svsw550x2yxdy88lyi6-dbus-1.drv'. error: builder for '/nix/store/6x9pvkg0524d9svsw550x2yxdy88lyi6-dbus-1.drv' failed with exit code 1 error: 1 dependencies of derivation '/nix/store/wdnrm4a0gh9149n6kwj6x00kzxsjz3hz-etc.drv' failed to build error: 1 dependencies of derivation '/nix/store/j6apv44ii2006di74xvz8jakks9p33pb-nixos-system-nixos-24.05.20240227.860a2c5.drv' failed to build error: 1 dependencies of derivation '/nix/store/q8vix9gkak19bpdp3v0zx3sqybbdvfp9-closure-info.drv' failed to build error: 1 dependencies of derivation '/nix/store/bpvydp70wa48dv8vfmr7k6zj4fgjz6br-efi-directory.drv' failed to build error: 1 dependencies of derivation '/nix/store/2zm7l69w1qn1l6wvhg2g1nazsp0b48vp-isolinux.cfg-in.drv' failed to build error: 1 dependencies of derivation '/nix/store/rpd4gjakj3dwjm11131d9395q9r14abs-nixos-24.05.20240227.860a2c5-x86_64-linux.iso.drv' failed to buildThis might be hard to fix, because I don’t have working overlays. My overlays are probably not even printing anything because it’s a lazy evaluation, and there’s another
pkgsbeing injected. I look atiso-image.nixand it’s got a NixOS configuration-like scheme in there. This stands out to me:grubPkgs = if config.boot.loader.grub.forcei686 then pkgs.pkgsi686Linux else pkgs;forcei686defaults tofalseand I don’t see anything else setting it in all ofnixpkgs. I want to try anyways. Of course, I get cryptic errors when trying this out.In a fit of unbridled nerd rage, I copied all of the ISO imaging making files and the profile components they relied upon. I was able to override
makeDBusConf(which was not calledmake-dbus-confafter all). Part of it was that I was getting the package name wrong. This required searching through thenixpkgscode. There has got to be a better way to glean this information! Well, there isn’t actually. But there should be a better way and I don’t think that’s been a focus yet.It takes about 50 minutes to build on my laptop. I understand the complaint about time but all I care about right now is that it works.
I’ll have to go back and post the code. There’s also a lot of miscellaneous files floating around that won’t be easy to track completely in this post. Thus I will put this in a simplified repository. I haven’t done that yet. I’m exhausted. I haven’t tested if the ISO will work or not yet.
But look at this! Look at it!
$ ssh lithium.proton The authenticity of host 'lithium.proton (192.168.254.38)' can't be established. ED25519 key fingerprint is SHA256:ZBnxylGMlP5RA129wQm7x84DkFRMofbgiExaZZU5snY. This key is not known by any other names. Are you sure you want to continue connecting (yes/no/[fingerprint])? yes Warning: Permanently added 'lithium.proton' (ED25519) to the list of known hosts. [logan@lithium:~]$I’m exhausted and will come back later.
- I might be assuming that
Deploying Arbitrary Changes to the Host
I have the image created and a bootable machine with a lot of hacks in place. The next step I have to do is make it so I can easily roll out changes to this host. I don’t want to a burn an image every time! I could go over to the machine, setup my git SSH keys, clone the repository, and do
nixos-rebuild switchconstantly to make the changes. To avoid constant commit+push+pull iteration, Emacs’ Tramp works great for editing files over SSH - it’s almost totally transparent, even for using Magit. This doesn’t feel like Nix to me though. Fortunately I believe there are solutions for this, namely deploy-rs. It aims to allow me to deploy to any system I have SSH access to.I got it all setup pretty quickly, at least as the documentation states I should. Here’s what I have in
outputs:# This is some boilerplate that validates the deploy-rs settings. checks = builtins.mapAttrs (system: deployLib: deployLib.deployChecks self.deploy) deploy-rs.lib ; deploy.nodes.lithium.profiles.system = { path = deploy-rs.lib.x86_64-linux.activate.nixos self.nixosConfigurations.lithium ; };And… I have no idea how to run it. These don’t work:
~/dev/proton-nix on main|✚3?2 logan@scandium 130 [09:21:05] 29577s $ nix build '.#deploy.nodes.lithium' warning: Git tree '/Users/logan/dev/proton-nix' is dirty error: flake output attribute 'deploy.nodes.lithium' is not a derivation or path ~/dev/proton-nix on main|✚3?2 logan@scandium 1 [09:21:27] 0s $ nix build '.#deploy.nodes.lithium.profiles.system' warning: Git tree '/Users/logan/dev/proton-nix' is dirty error: flake output attribute 'deploy.nodes.lithium.profiles.system' is not a derivation or pathUh. The README states I should run this:
nix run github:serokell/deploy-rs your-flakeSurely they don’t mean for me to run my entire flake? And surely not from their remote location? I would like to have an installed package for this, or better - just something that I run from the flake and it Just Works.
This doesn’t work, unsurprisingly:
~/dev/proton-nix on main|✚3?2 logan@scandium 1 [09:21:44] 0s $ nix build warning: Git tree '/Users/logan/dev/proton-nix' is dirty error: flake 'git+file:///Users/logan/dev/proton-nix' does not provide attribute 'packages.aarch64-darwin.default' or 'defaultPackage.aarch64-darwin'Using their actual command results in a long install, and then:
~/dev/proton-nix on main|✚3?2 logan@scandium 1 [09:26:59] 0s $ nix run github:serokell/deploy-rs 🚀 ℹ️ [deploy] [INFO] Running checks for flake in . warning: Git tree '/Users/logan/dev/proton-nix' is dirty error: … while checking flake output 'checks' at /nix/store/ys2lfkymyfycnm5wwhs9q90z8qlhwx4f-source/flake.nix:45:7: 44| # This is some boilerplate that validates the deploy-rs settings. 45| checks = builtins.mapAttrs | ^ 46| (system: deployLib: deployLib.deployChecks self.deploy) … while checking the derivation 'checks.aarch64-darwin.deploy-schema' at «none»:0: (source not available) (stack trace truncated; use '--show-trace' to show the full trace) error: attribute 'lithium' missing at /nix/store/ys2lfkymyfycnm5wwhs9q90z8qlhwx4f-source/flake.nix:51:11: 50| path = deploy-rs.lib.x86_64-linux.activate.nixos 51| self.nixosConfigurations.lithium | ^ 52| ; 🚀 ❌ [deploy] [ERROR] Failed to check deployment: Nix checking command resulted in a bad exit code: Some(1)Okay fair enough. So I’ll add the
nixosConfigurationforlithium, which I’d prepared for and just missed the final step:nixosConfigurations.lithium = pkgs.callPackages ./lithium.nix {};Leaving my total addition to be:
# This is some boilerplate that validates the deploy-rs settings. checks = builtins.mapAttrs (system: deployLib: deployLib.deployChecks self.deploy) deploy-rs.lib ; deploy.nodes.lithium.profiles.system = { path = deploy-rs.lib.x86_64-linux.activate.nixos self.nixosConfigurations.lithium ; }; devShells.aarch64-darwin.default = pkgs.mkShell { packages = []; }; nixosConfigurations.lithium = pkgs.callPackages ./lithium.nix {}; nixosConfigurations.lithium-installer = (let pkgs = import nixpkgs { overlays = overlays-fix-cross-build-issues; system = "x86_64-linux"; }; in pkgs.callPackage ./proton-image-base.nix { inherit nixpkgs pkgs; });~/dev/proton-nix on main|✚3?2 logan@scandium 1 [09:29:35] 85s @ nix run github:serokell/deploy-rs 🚀 ℹ️ [deploy] [INFO] Running checks for flake in . warning: Git tree '/Users/logan/dev/proton-nix' is dirty error: … while checking flake output 'checks' at /nix/store/n0zfp8mj9kgjx2c73sh3hixmy71xfgi3-source/flake.nix:45:7: 44| # This is some boilerplate that validates the deploy-rs settings. 45| checks = builtins.mapAttrs | ^ 46| (system: deployLib: deployLib.deployChecks self.deploy) … while checking the derivation 'checks.aarch64-darwin.deploy-schema' at «none»:0: (source not available) (stack trace truncated; use '--show-trace' to show the full trace) error: function 'anonymous lambda' called without required argument 'nixpkgs' at /nix/store/n0zfp8mj9kgjx2c73sh3hixmy71xfgi3-source/lithium.nix:1:1: 1| { nixpkgs, ... } : let | ^ 2| system = "x86_64-linux"; 🚀 ❌ [deploy] [ERROR] Failed to check deployment: Nix checking command resulted in a bad exit code: Some(1)Types. Types. Types. Please give me types!
nixosConfigurations.lithium = pkgs.callPackages ./lithium.nix { inherit nixpkgs; };@ nix run github:serokell/deploy-rs 🚀 ℹ️ [deploy] [INFO] Running checks for flake in . warning: Git tree '/Users/logan/dev/proton-nix' is dirty trace: warning: system.stateVersion is not set, defaulting to 24.05. Read why this matters on https://nixos.org/manual/nixos/stable/options.html#opt-system.stateVersion. error: … while checking flake output 'checks' at /nix/store/7qg6v52fszl40x8pjdq20m65rrkzv94w-source/flake.nix:45:7: 44| # This is some boilerplate that validates the deploy-rs settings. 45| checks = builtins.mapAttrs | ^ 46| (system: deployLib: deployLib.deployChecks self.deploy) … while checking the derivation 'checks.aarch64-darwin.deploy-schema' at «none»:0: (source not available) (stack trace truncated; use '--show-trace' to show the full trace) error: Failed assertions: - The ‘fileSystems’ option does not specify your root file system. - You must set the option ‘boot.loader.grub.devices’ or 'boot.loader.grub.mirroredBoots' to make the system bootable. 🚀 ❌ [deploy] [ERROR] Failed to check deployment: Nix checking command resulted in a bad exit code: Some(1)This is a little perplexing, but it might also be a NixOS configuration thing. I uncomment my
partitions.nix, add it to mylithium.nixmodules, and try again. Same error. My looking into the error makes me think things are way off here. I decide to look at what’s mounted onlithium.proton:[logan@lithium:~]$ df -h Filesystem Size Used Avail Use% Mounted on devtmpfs 1.2G 0 1.2G 0% /dev tmpfs 12G 0 12G 0% /dev/shm tmpfs 5.9G 4.9M 5.9G 1% /run tmpfs 12G 1.1M 12G 1% /run/wrappers tmpfs 12G 33M 12G 1% / /dev/root 969M 969M 0 100% /iso /dev/loop0 924M 924M 0 100% /nix/.ro-store tmpfs 12G 8.0K 12G 1% /nix/.rw-store overlay 12G 8.0K 12G 1% /nix/store tmpfs 2.4G 4.0K 2.4G 1% /run/user/1001 tmpfs 2.4G 4.0K 2.4G 1% /run/user/1000It looks like I still have more work to do with the image. Perhaps using the “installer” is the wrong thing to do? I’m a lot more familiar now with how the installer is being made and what’s going on (though there’s still a vast amount I don’t know). Still, I should be able to slice and dice this until there’s no more “installer” and instead it’s just a Nix image copied to disk.
I don’t have all of the refactors to put here but I’ll try to whip up somethinmg. Basically I rewrote my own
bootstrap-minimal.nixwhich is justinstallation-cd-minimal.nixandinstallation-cd-base.nixrammed together. ThemkImageMediaOverridecalls have been removed, since that was preventing me from creating partitions on my own. I suspect that is why we have a bootable installer. There might be more installer cruft hanging around, but “it works” is vastly preferable without needing to over-polish this.I quickly ran into this issue:
error: A definition for option `networking.hostName' is not of type `string matching the pattern ^$|^[[:alnum:]]([[:alnum:]_-]{0,61}[[:alnum:]])?$'. Definition values: - In `/nix/store/zjavgzkmcrk3d8fn1y4qjh4xbsgyafih-source/flake.nix': <derivation hostname-net-tools-2.10>In comes
traceValto help see what’s going on there.(import ./image-base-module.nix { hostname = pkgs.lib.debug.traceVal hostname; inherit system; })And then we see:
@ nix build '.#lithium-bootable' --show-trace warning: Git tree '/Users/logan/dev/proton-nix' is dirty trace: { __ignoreNulls = true; __structuredAttrs = false; all = <CODE>; args = <CODE>; buildCommand = <CODE>; buildInputs = <CODE>; builder = <CODE>; cmakeFlags = <CODE>; configureFlags = <CODE>; depsBuildBuild = <CODE>; depsBuildBuildPropagated = <CODE>; depsBuildTarget = <CODE>; depsBuildTargetPropagated = <CODE>; depsHostHost = <CODE>; depsHostHostPropagated = <CODE>; depsTargetTarget = <CODE>; depsTargetTargetPropagated = <CODE>; doCheck = <CODE>; doInstallCheck = <CODE>; drvAttrs = { __ignoreNulls = true; __structuredAttrs = false; args = <CODE>; buildCommand = <CODE>; buildInputs = <CODE>; builder = <CODE>; cmakeFlags = <CODE>; configureFlags = <CODE>; depsBuildBuild = <CODE>; depsBuildBuildPropagated = <CODE>; depsBuildTarget = <CODE>; depsBuildTargetPropagated = <CODE>; depsHostHost = <CODE>; depsHostHostPropagated = <CODE>; depsTargetTarget = <CODE>; depsTargetTargetPropagated = <CODE>; doCheck = <CODE>; doInstallCheck = <CODE>; enableParallelBuilding = true; enableParallelChecking = <CODE>; enableParallelInstalling = <CODE>; mesonFlags = <CODE>; name = <CODE>; nativeBuildInputs = <CODE>; outputs = [ "out" ]; passAsFile = <CODE>; patches = <CODE>; preferLocalBuild = true; propagatedBuildInputs = <CODE>; propagatedNativeBuildInputs = <CODE>; stdenv = { __extraImpureHostDeps = <CODE>; all = <CODE>; allowedRequisites = <CODE>; args = <CODE>; bootstrapTools = { all = <CODE>; args = [ "ash" "-e" /nix/store/zjavgzkmcrk3d8fn1y4qjh4xbsgyafih-source/pkgs/stdenv/linux/bootstrap-tools/scripts/unpack-bootstrap-tools.sh ]; builder = { all = <CODE>; builder = "builtin:fetchurl"; drvAttrs = { builder = "builtin:fetchurl"; executable = true; impureEnvVars = [ "http_proxy" "https_proxy" "ftp_proxy" "all_proxy" "no_proxy" ]; name = "busybox"; outputHash = "sha256-QrTEnQTBM1Y/qV9odq8irZkQSD9uOMbs2Q5NgCvKCNQ="; outputHashAlgo = ""; outputHashMode = "recursive"; preferLocalBuild = true; system = "builtin"; unpack = false; url = "http://tarballs.nixos.org/stdenv/x86_64-unknown-linux-gnu/82b583ba2ba2e5706b35dbe23f31362e62be2a9d/busybox"; urls = [ "http://tarballs.nixos.org/stdenv/x86_64-unknown-linux-gnu/82b583ba2ba2e5706b35dbe23f31362e62be2a9d/busybox" ]; }; drvPath = <CODE>; executable = true; impureEnvVars = «repeated»; name = "busybox"; out = «repeated»; outPath = "/nix/store/p9wzypb84a60ymqnhqza17ws0dvlyprg-busybox"; outputHash = "sha256-QrTEnQTBM1Y/qV9odq8irZkQSD9uOMbs2Q5NgCvKCNQ="; outputHashAlgo = ""; outputHashMode = "recursive"; outputName = "out"; preferLocalBuild = true; system = "builtin"; type = "derivation"; unpack = false; url = "http://tarballs.nixos.org/stdenv/x86_64-unknown-linux-gnu/82b583ba2ba2e5706b35dbe23f31362e62be2a9d/busybox"; urls = «repeated»; }; drvAttrs = { args = «repeated»; builder = «repeated»; hardeningUnsupportedFlags = [ "fortify3" "zerocallusedregs" ]; isGNU = true; langC = true; langCC = true; name = "bootstrap-tools"; system = "x86_64-linux"; tarball = { all = <CODE>; builder = "builtin:fetchurl"; drvAttrs = { builder = "builtin:fetchurl"; executable = false; impureEnvVars = [ "http_proxy" "https_proxy" "ftp_proxy" "all_proxy" "no_proxy" ]; name = "bootstrap-tools.tar.xz"; outputHash = "sha256-YQlr088HPoVWBU2jpPhpIMyOyoEDZYDw1y60SGGbUM0="; outputHashAlgo = ""; outputHashMode = "flat"; preferLocalBuild = true; system = "builtin"; unpack = false; url = "http://tarballs.nixos.org/stdenv/x86_64-unknown-linux-gnu/82b583ba2ba2e5706b35dbe23f31362e62be2a9d/bootstrap-tools.tar.xz"; urls = [ "http://tarballs.nixos.org/stdenv/x86_64-unknown-linux-gnu/82b583ba2ba2e5706b35dbe23f31362e62be2a9d/bootstrap-tools.tar.xz" ]; }; drvPath = <CODE>; executable = false; impureEnvVars = «repeated»; name = "bootstrap-tools.tar.xz"; out = «repeated»; outPath = "/nix/store/2pizl7lq4awa7p9bklr8037yh1sca0hg-bootstrap-tools.tar.xz"; outputHash = "sha256-YQlr088HPoVWBU2jpPhpIMyOyoEDZYDw1y60SGGbUM0="; outputHashAlgo = ""; outputHashMode = "flat"; outputName = "out"; preferLocalBuild = true; system = "builtin"; type = "derivation"; unpack = false; url = "http://tarballs.nixos.org/stdenv/x86_64-unknown-linux-gnu/82b583ba2ba2e5706b35dbe23f31362e62be2a9d/bootstrap-tools.tar.xz"; urls = «repeated»; }; }; drvPath = <CODE>; hardeningUnsupportedFlags = «repeated»; isGNU = true; langC = true; langCC = true; name = "bootstrap-tools"; out = { all = <CODE>; args = «repeated»; builder = «repeated»; drvAttrs = «repeated»; drvPath = <CODE>; hardeningUnsupportedFlags = «repeated»; isGNU = true; langC = true; langCC = true; name = "bootstrap-tools"; out = «repeated»; outPath = "/nix/store/j8ca78l3vdfdwnsq3bjmamwjkhi8wazg-bootstrap-tools"; outputName = "out"; system = "x86_64-linux"; tarball = «repeated»; type = "derivation"; }; outPath = "/nix/store/j8ca78l3vdfdwnsq3bjmamwjkhi8wazg-bootstrap-tools"; outputName = "out"; passthru = { isFromBootstrapFiles = true; }; system = "x86_64-linux"; tarball = «repeated»; type = "derivation"; }; buildPlatform = { aesSupport = false; avx2Support = false; avx512Support = false; avxSupport = false; canExecute = <LAMBDA>; config = "x86_64-unknown-linux-gnu"; darwinArch = "x86_64"; darwinMinVersion = "10.12"; darwinMinVersionVariable = null; darwinPlatform = null; darwinSdkVersion = "10.12"; efiArch = "x64"; emulator = <LAMBDA>; emulatorAvailable = <LAMBDA>; extensions = { executable = ""; library = ".so"; sharedLibrary = ".so"; staticLibrary = ".a"; }; fma4Support = false; fmaSupport = false; gcc = { }; hasSharedLibraries = true; is32bit = false; is64bit = true; isAarch = false; isAarch32 = false; isAarch64 = false; isAbiElfv2 = false; isAlpha = false; isAndroid = false; isArmv7 = false; isAvr = false; isBSD = false; isBigEndian = false; isCompatible = <LAMBDA>; isCygwin = false; isDarwin = false; isEfi = true; isElf = true; isFreeBSD = false; isGenode = false; isGhcjs = false; isGnu = true; isILP32 = false; isJavaScript = false; isLinux = true; isLittleEndian = true; isLoongArch64 = false; isM68k = false; isMacOS = false; isMacho = false; isMicroBlaze = false; isMinGW = false; isMips = false; isMips32 = false; isMips64 = false; isMips64n32 = false; isMips64n64 = false; isMmix = false; isMsp430 = false; isMusl = false; isNetBSD = false; isNone = false; isOpenBSD = false; isOr1k = false; isPower = false; isPower64 = false; isRedox = false; isRiscV = false; isRiscV32 = false; isRiscV64 = false; isRx = false; isS390 = false; isS390x = false; isSparc = false; isSparc64 = false; isStatic = false; isSunOS = false; isUClibc = false; isUnix = true; isVc4 = false; isWasi = false; isWasm = false; isWindows = false; isi686 = false; isiOS = false; isx86 = true; isx86_32 = false; isx86_64 = true; libDir = "lib64"; libc = "glibc"; linker = "bfd"; linux-kernel = { autoModules = true; baseConfig = "defconfig"; name = "pc"; target = "bzImage"; }; linuxArch = "x86_64"; parsed = { _type = "system"; abi = { _type = "abi"; assertions = [ { assertion = <LAMBDA>; message = "The \"gnu\" ABI is ambiguous on 32-bit ARM. Use \"gnueabi\" or \"gnueabihf\" instead.\n"; } { assertion = <LAMBDA>; message = "The \"gnu\" ABI is ambiguous on big-endian 64-bit PowerPC. Use \"gnuabielfv2\" or \"gnuabielfv1\" instead.\n"; } ]; name = "gnu"; }; cpu = { _type = "cpu-type"; arch = "x86-64"; bits = 64; family = "x86"; name = "x86_64"; significantByte = { _type = "significant-byte"; name = "littleEndian"; }; }; kernel = { _type = "kernel"; execFormat = { _type = "exec-format"; name = "elf"; }; families = { }; name = "linux"; }; vendor = { _type = "vendor"; name = "unknown"; }; }; qemuArch = "x86_64"; rust = { cargoEnvVarTarget = "X86_64_UNKNOWN_LINUX_GNU"; cargoShortTarget = "x86_64-unknown-linux-gnu"; isNoStdTarget = false; platform = { arch = "x86_64"; os = "linux"; target-family = [ "unix" ]; vendor = "unknown"; }; rustcTarget = "x86_64-unknown-linux-gnu"; rustcTargetSpec = "x86_64-unknown-linux-gnu"; }; rustc = { }; sse3Support = false; sse4_1Support = false; sse4_2Support = false; sse4_aSupport = false; ssse3Support = false; system = "x86_64-linux"; ubootArch = "x86_64"; uname = { processor = "x86_64"; release = null; system = "Linux"; }; useAndroidPrebuilt = false; useiOSPrebuilt = false; }; builder = <CODE>; cc = null; defaultBuildInputs = <CODE>; defaultNativeBuildInputs = <CODE>; disallowedRequisites = <CODE>; drvAttrs = { allowedRequisites = <CODE>; args = <CODE>; builder = <CODE>; defaultBuildInputs = <CODE>; defaultNativeBuildInputs = <CODE>; disallowedRequisites = <CODE>; initialPath = <CODE>; name = "stdenv-linux"; preHook = <CODE>; setup = /nix/store/zjavgzkmcrk3d8fn1y4qjh4xbsgyafih-source/pkgs/stdenv/generic/setup.sh; shell = <CODE>; system = <CODE>; }; drvPath = <CODE>; extraBuildInputs = <CODE>; extraNativeBuildInputs = <CODE>; extraSandboxProfile = ""; fetchurlBoot = <CODE>; hasCC = false; hostPlatform = «repeated»; initialPath = <CODE>; is32bit = <CODE>; is64bit = <CODE>; isAarch32 = <CODE>; isAarch64 = <CODE>; isBSD = <CODE>; isBigEndian = <CODE>; isCygwin = <CODE>; isDarwin = <CODE>; isFreeBSD = <CODE>; isLinux = <CODE>; isMips = <CODE>; isOpenBSD = <CODE>; isSunOS = <CODE>; isi686 = <CODE>; isx86_32 = <CODE>; isx86_64 = <CODE>; meta = <CODE>; mkDerivation = <CODE>; name = "stdenv-linux"; out = { all = <CODE>; allowedRequisites = <CODE>; args = <CODE>; builder = <CODE>; defaultBuildInputs = <CODE>; defaultNativeBuildInputs = <CODE>; disallowedRequisites = <CODE>; drvAttrs = «repeated»; drvPath = <CODE>; initialPath = <CODE>; name = "stdenv-linux"; out = «repeated»; outPath = <CODE>; outputName = "out"; preHook = <CODE>; setup = /nix/store/zjavgzkmcrk3d8fn1y4qjh4xbsgyafih-source/pkgs/stdenv/generic/setup.sh; shell = <CODE>; system = <CODE>; type = "derivation"; }; outPath = <CODE>; outputName = "out"; override = <CODE>; overrideDerivation = <CODE>; overrides = <LAMBDA>; passthru = <CODE>; preHook = <CODE>; setup = /nix/store/zjavgzkmcrk3d8fn1y4qjh4xbsgyafih-source/pkgs/stdenv/generic/setup.sh; shell = <CODE>; shellDryRun = <CODE>; shellPackage = <CODE>; system = <CODE>; targetPlatform = «repeated»; tests = <CODE>; type = "derivation"; }; strictDeps = <CODE>; system = <CODE>; userHook = <CODE>; }; drvPath = <CODE>; enableParallelBuilding = true; enableParallelChecking = <CODE>; enableParallelInstalling = <CODE>; inputDerivation = <CODE>; mesonFlags = <CODE>; meta = <CODE>; name = <CODE>; nativeBuildInputs = <CODE>; out = <CODE>; outPath = <CODE>; outputName = "out"; outputs = «repeated»; override = <CODE>; overrideAttrs = <CODE>; overrideDerivation = <CODE>; passAsFile = <CODE>; passthru = { provider = <CODE>; }; patches = <CODE>; preferLocalBuild = true; propagatedBuildInputs = <CODE>; propagatedNativeBuildInputs = <CODE>; provider = <CODE>; stdenv = «repeated»; strictDeps = <CODE>; system = <CODE>; type = "derivation"; userHook = <CODE>; } error:That’s definitely not a string! But how did that happen? Let’s look at my passing mechanism:
nixosConfigurations.lithium-bootable = (let system = "x86_64-linux"; pkgs = import nixpkgs { inherit system; overlays = overlays-fix-cross-build-issues; }; in pkgs.callPackage ./proton-image-base.nix { _module.args.hostname = "lithium"; _module.args.buildPlatform = "aarch64-linux"; inherit system nixpkgs pkgs; });The
_module.argsidiom is how you can inject dependencies into thecallPackagedependency management. I want that to be there becausecallPackagedoes some special “splicing” withpkgs(their term, not mine), and that helps with cross system compilation.Since I don’t know all of the dependencies available to
callPackageand getting this information is kind of difficult, I try to see if something else is settinghostname. I don’t really know a way of seeing that either. Not definitively. I can just use a different variable name, which will give a sense of that value being occupied. I chosehostNamejust for the moment, but if it works I will rename it to something more self-descriptive likehostname-is-already-taken-in-weird-ways.Now I see this:
error: evaluation aborted with the following error message: 'lib.customisation.callPackageWith: Function called without required argument "hostName" at /nix/store/qmjmszmziysmlxvanxrm5hbb1c16g954-source/proton-image-base.nix:11, did you mean "hostname"?'I double checked that all of the files in the call chain have been saved. All of the values are consistently renamed to be
hostNamewhere they need to be. The stack points to the parameter list for the function inproton-image-base. I think this means that_module.argsdoesn’t work the way I think it does. I’ve tried both forms:pkgs.callPackage ./proton-image-base.nix { _module.args.hostName = "lithium"; _module.args.buildPlatform = "aarch64-linux"; inherit system nixpkgs pkgs; }And:
pkgs.callPackage ./proton-image-base.nix { _module.args = { hostName = "lithium"; buildPlatform = "aarch64-linux"; }; inherit system nixpkgs pkgs; }But there is no change.
home-manager#1642 suggests that the
@inputscould somehow be involved (even though I don’t use it at all). I’m not using it all, so I remove@inputs. Same result. As I continue to look around, I think there’s got to be a less “maagical” way of handling this. I can surround the module with a function - it becomes curried. But that doesn’t work withcallPackagefrom my earlier efforts. Instead I could try usingoverlaysto inject values directly intopkgs.This becomes my entire change:
nixosConfigurations.lithium-bootable = (let system = "x86_64-linux"; pkgs = import nixpkgs { inherit system; overlays = overlays-fix-cross-build-issues ++ [ (final: prev: { hostName = "lithium"; buildPlatform = "aarch64-linux"; }) ]; }; in pkgs.callPackage ./proton-image-base.nix { # _module.args = { # hostName = "lithium"; # buildPlatform = "aarch64-linux"; # }; inherit system nixpkgs pkgs; } );And now I see
lithiumin thetraceVal. Okay good! Maybe this is what I’ll do from here on, if I can.I expect this build will take a while, so I’ll go off to do something else. I have to push back on fatigue to day so I don’t run away from this whole endeavor, screaming I’ll never touch it again.
Oh now I see:
@ nix build '.#lithium-bootable' --show-trace warning: Git tree '/Users/logan/dev/proton-nix' is dirty trace: lithium error: build of '/nix/store/q635cz668i46y1fcwa4xb1nn28w91jzs-extra-utils.drv' on 'ssh-ng://builder@linux-builder' failed: builder for '/nix/store/q635cz668i46y1fcwa4xb1nn28w91jzs-extra-utils.drv' failed with exit code 139; last 10 log lines: > patching /nix/store/w4pndrk0511z6ms6kcw631z446vrkj7z-extra-utils/lib/libattr.so.1... > patching /nix/store/w4pndrk0511z6ms6kcw631z446vrkj7z-extra-utils/lib/libresolv.so.2... > patching /nix/store/w4pndrk0511z6ms6kcw631z446vrkj7z-extra-utils/lib/libcrypto.so.3... > patching /nix/store/w4pndrk0511z6ms6kcw631z446vrkj7z-extra-utils/lib/libdl.so.2... > patching /nix/store/w4pndrk0511z6ms6kcw631z446vrkj7z-extra-utils/lib/libpam.so.0... > patching /nix/store/w4pndrk0511z6ms6kcw631z446vrkj7z-extra-utils/lib/libgcrypt.so.20... > testing patched programs... > qemu-x86_64: QEMU internal SIGSEGV {code=MAPERR, addr=0x20} > /build/.attr-0l2nkwhif96f51f4amnlf414lhl4rv9vh8iffyp431v6s28gsr90: line 116: 14031 Done $out/bin/ash -c 'echo hello world' > 14032 Segmentation fault (core dumped) | grep "hello world" For full logs, run 'nix-store -l /nix/store/q635cz668i46y1fcwa4xb1nn28w91jzs-extra-utils.drv'. error: builder for '/nix/store/q635cz668i46y1fcwa4xb1nn28w91jzs-extra-utils.drv' failed with exit code 1 error: 1 dependencies of derivation '/nix/store/rph8ffy8w1pg6bwrikgcyflxgh8dwdpl-stage-1-init.sh.drv' failed to build error: 1 dependencies of derivation '/nix/store/abbbcawpjagpj5wj9b0ff5rqjn42h08h-initrd-linux-6.1.79.drv' failed to build error: 1 dependencies of derivation '/nix/store/r5xgcs803pdhjl34yk8kvpfm620y030n-nixos-24.05.20240227.860a2c5-x86_64-linux.iso.drv' failed to buildThis looks very familiar. This comes form
minimal-bootstrap.nix, which I think has been causing me problems for some time. I find it inall-packages.nixto confirm its name, and then add it to my overlays thusly:minimal-bootstrap = prev.lib.recurseIntoAttrs (import ./minimal-bootstrap-fixed.nix { # inherit (stdenv) buildPlatform hostPlatform; inherit (stdenv); buildPlatform = system; hostPlatform = "x86_64-linux"; config = prev.config; lib = prev.lib; # inherit lib config; # fetchurl = import ../build-support/fetchurl/boot.nix { # inherit (prev.stdenv.buildPlatform) system; # }; # checkMeta = callPackage ../stdenv/generic/check-meta.nix { }; });And then
minimal-bootstrap-fixed.nixis:{ lib , config , buildPlatform , hostPlatform , fetchurl , checkMeta }: {}Yep. It does nothing now. I strongly believe I don’t need it anyways, because it’s job is to create a kind of “minimal” environment where as little stuff gets pulled in as possible. But it circumvents my
stdenvstuff that and QEMU does not like that.But that still gives me the same error. Something is wonky here. I look around for the derivation on the stack
stage-1-init.shand findnixos/modules/system/boot/stage-1.nixand see this damning header comment:# This module builds the initial ramdisk, which contains an init # script that performs the first stage of booting the system: it loads # the modules necessary to mount the root file system, then calls the # init in the root file system to start the second boot stage.I don’t think that’s what I want at all - this is still in installer territory. But this module is probably fine on its own - I need to go higher in the stack to see what’s pulling this in. Perhaps I can glean more context there.
stage-1-init.shis just a derivation created inside the same file. That file is then brought in bymodule-list.nix. That then traces back toeval-config.nix.How did I get back here? I had it install… stuff. But why does making it not an installer suddenly bring me back into this misconfigured
stdenvissue?I haven’t really confirmed that my overlay is doing the job. I make my overlay into this:
minimal-bootstrap = builtins.trace "minimal-bootstrap-override used" {};And the trace does not appear. I have left the
lithiumtrace, so I know tracing is still happening.I tried a bunch of things. I went off alone, without you, dear reader. I dived into rage, frustration, and lots of stuff that seems to have a strict assumption that you aren’t running Nix on
aarch64. I triednixos-anywhere. I trieddeploy-rs. I triednixos-rebuild switch --target-host ...and none of it works because it makes some steep assumptions, requires me to jump through configuration hoops that I shouldn’t, or it mandates that you cannot cross compile.It’s a little infuriating that I can build an ISO somehow but not the actual image itself. If I just understood the tooling a little better this could be different.
So let’s do this like an animal. I can get what I think is a bootable ISO going. So we’ll boot to the machine. Hopefully we can do this headlessly but I don’t know the boot order and getting a monitor going might be complicated with my physical setup. I could know for sure just by wiping the disk. This involves opening it up again.
I’m able to use my
image-deploy.shonce again and it flawlessly writes everything to a USB drive. I canscpall of myproton-nixrepository over and just runnixos-rebuild switch --flake '.#lithium'and that should install it? If not I might have to usenixos-install- I need to look into that. Theswitchinvocation does needgit, I find out, so I add that to the installer’spkgsand run it again.I can’t run
nixos-installerbecause there’s noconfiguration.nix. I can’t usenixos-rebuild --target-hostbecause there’s nonixos-configon my local machine. I’ve seen mention ofnixos-generate-configbut I have just a name in this context from nixos-generators:After booting, if you intend to use nixos-switch, consider using nixos-generate-config.
I might try making an
sd-card-x86_64-linux.Yesterday I tried a bunch of things, but recorded very little of it. It’s probably for the best, because I am very irritable with Nix, its community, and its ecosystem at this point. I still keep thinking I could demand my money back, but oh, right, this is open source and I paid nothing. But still, I’ve sunk an enormous amount of my personal time into this and much of it feels wasted. I’ve learned a lot about Nix, but I’ve also learned some really bad things, like I should outright ignore the documentation and just go look in the code. This bodes very ill for Nix, and I hope the values held there change soon. I’ve thought about contributing some documentation, but I have to fucking reverse engineer anything I’d like to document. This is why when I design things, I start with the documentation.
Today, I have something going with
nixos-anywherein which I’ve already forked it in an attempt to get it to work. My settings for the image have caused a lot of problems - namely that I can’t SSH torootdirectly. It’s common practice to use a special user instead, but Nix’s install tooling demands it must be run as root for my purposes. Right now I’m awaiting another build of the installer (I’ve done three today) just to try out some configuration edits.I was really hoping to spin up many more machines. Maybe once I have
lithiumup, I can leverage it as anx86_64-linuxbuilder.I’ve done an immense amount of reading and trying out various permutations. I’ve learned that BIOS and UEFI are different things entirely (and by definition, mutually exclusive). I’ve learned this computer’s older motherboard saying “UEFI BIOS” is both nonsensical and a flat out lie.
I’m going to save some of my energy for updating some of the documentation in
nixos-anywhere. I think a good part of putting code together is explaining why, and it would’ve saved enormous spelunking on my part. Folks should still be curious and learn things, but for a disk formatting and installation tool, having some information on how to troubleshoot booting problems as well as context to various settings would be helpful, and the examples is the right place to do some of that.I can finally move onto making the
stable-diffusion-webuiderivation!This day was mostly spent with me learning how to configure
NixOSand the painful amount of detail I have to know about the physical machine. I had to learn that BIOS and UEFI are two different and mutually exclusive things. I learned that UEFI is often mis-used, as is BIOS. That my motherboard boots with “UEFI BIOS” is… sad. It’s actually just BIOS. Once I sorted that out, I was able to get the boot going.I’ve been trying to install NVidia drivers and the like and running into problems. I was having similar, sporadic build issues found in nixpkgs#206213 wherein the reporter found out they were having faulty memory issues. It was only similar in the error message and that packages seemed to have trouble at random. I’d also seen problems with hash consistency in the nix store, which were also sporadic.
I happen to have memtest installed on a thumb drive just for this kind of thing. Once I got the machine to boot on the drive, it immediately reported memory failures. I’d also noticed my four sticks of 8GB each were actually registering up to 24GB total, instead of the expected 32GB. Memtest is actually really good for this - it shows slots and can report meta information about the memory sticks. It turned out my “slot 3” wasn’t even there, according to Memtest. Upon further inspection, I found this:
Remember to insert the picture!
A bent pin!
Physical memory exists on a memory bus. This means data being written to or read from the memory must travel along the bus. The “bus” is similar to a metaphor of a real bus. One could imagine the route a bus takes across a city to get where it’s going, making stops along the way. On a memory bus, data moving around must go on the bus and it will make a “stop” along each entity of the bus. This gives every device on the bus an opportunity to act on the data in some way. In the case of a physical fault, this could mean corrupting the memory along the way, even if the motherboard didn’t think there was a proper stick of memory in the slot anyways. At this level, it’s really all just 1s and 0s, so we’re talking about changes in electrical charge.
It’s a wonder this machine boots at all.
Memory issues are fixed. I bent the pin back into place.
A quick dump:
nix run github:LoganBarnett/nixos-anywhere/disko-with-sudo --refresh -- --flake '.#lithium' root@lithium.proton --build-on-remote --debugMy branch isn’t strictly necessary, but this is the instructions required to push the build from my machine to it, while the machine sits in the installer.
If it can’t find
diskoScript, it’s not reallydiskoScriptthat’s missing, but that it can’t resolve to a system. Strangely it will try something - I don’t know how it findslithium. Myflake.nixis quite bit with lots of attempts at things, so it could just be my setup. Anyways, this error means that I don’t have anaarch64-darwinforlithium. It has to be there, even though I’m not building it there at all. This means potential duplication.To clean this up, I need to figure out how to do parameters to
callPackageconsistently (and document them) and then also refactor some things so I don’t have so much duplication. I think it had me chasing some weird errors earlier because I had an oldnixosConfigurations.aarch64-darwin.lithium.I also need to setup my SSH key for
root.I was thinking, this installer is really just a rescue/boot disk for me. I should give it a unique name. This would help my bootstrapping process immensely. It could have a plain text
rootpassword because it’s briefly transient (though I still want to encrypt), and then get an SSH key setup.
There is also
nix store info, but I don’t know which is the preferred one - just that one is deprecated. I cannot runnix store infoon my host and attempts to enable it viaexperimental-featureshas failed for me. ↩︎