Bisecting the Linux Kernel with NixOS
cross-posted from: https://beehaw.org/post/551377
Recently my kernel started to panic every time I awoke my monitors from sleep. This seemed to be a regression; it worked one day, then I received a kernel upgrade from upstream, and the next time I was operating my machine it would crash when I came back to it.
After being annoyed for a bit, I realized this was a great time to learn how to bisect the git kernel, find the problem, and either report it upstream, or, patch it out of my kernel! I thought this would be useful to someone else in the future, so here we are.
Step #1: Clone the Kernel; I grabbed Linus' tree from https://github.com/torvalds/linux with
git clone git@github.com:torvalds/linux.git
Step #2: Start a bisect.
If you're not familiar with a bisect, it's a process by which you tell git, "this commit was fine", and "this commit was broken", and it will help you test the commits in-between to find the one that introduced the problem.
You start this by running
git bisect start
, and then you provide a tag or commit ID for the good and the bad kernel withgit bisect good ...
andgit bisect bad ...
.I knew my issue didn't occur on the 5.15 kernel series, but did start with my NixOS upgrade to 6.1. But I didn't know precisely where, so I aimed a little broader... I figured an extra test or two would be better than missing the problem. 😬
git bisect start git bisect good v5.15 git bisect bad master
Step #3: Replace your kernel with that version
In an ideal world, I would have been able to test this in a VM. But it was a graphics problem with my video card and connected monitors, so I went straight for testing this on my desktop to ensure it was easy to reproduce and accurate.
Testing a mid-release kernel with NixOS is pretty easy! All you have to do is override your kernel package, and NixOS will handle building it for you... here's an example from my bisect:
boot.kernelPackages = pkgs.linuxPackagesFor (pkgs.linux_6_2.override { # (#4) make sure this matches the major version of the kernel as well argsOverride = rec { src = pkgs.fetchFromGitHub { owner = "torvalds"; repo = "linux"; # (#1) -> put the bisect revision here rev = "7484a5bc153e81a1740c06ce037fd55b7638335c"; # (#2) -> clear the sha; run a build, get the sha, populate the sha sha256 = "sha256-nr7CbJO6kQiJHJIh7vypDjmUJ5LA9v9VDz6ayzBh7nI="; }; dontStrip = true; # (#3) `head Makefile` from the kernel and put the right version numbers here version = "6.2.0"; modDirVersion = "6.2.0-rc2"; # (#4) `nixos-rebuild boot`, reboot, test. }; });
Getting this defined requires a couple intermediate steps... Step #3.1 -- put the version that
git bisect
asked me to test in (#1) Step #3.2 -- clear outsha256
Step #3.3 -- run anixos-rebuild boot
Step #3.4 -- grab the sha256 and put it into thesha256
field (#2) Step #3.5 -- make sure the major version matches at (#3) and (#4)Then run
nixos-rebuild boot
.Step #4: Test!
Reboot into the new kernel, and test whatever is broken. For me I was able to set up a simple test protocol:
xset dpms force off
to blank my screens, wait 30 seconds, and then wake them. If my kernel panicked then it was a fail.Step #5: Repeat the bisect
Go into the linux source tree and run
git bisect good
orgit bisect bad
depending on whether the test succeeded. Return to step #3.Step #6: Revert it!
For my case, I eventually found a single commit that introduced the problem, and I was able to revert it from my local kernel. This involves leaving a kernel patch in my NixOS config like this:
boot.kernelPatches = [ { patch = ./revert-bb2ff6c27b.patch; name = "revert-bb2ff6c27b"; } ];
This probably isn't the greatest long-term solution, but it gets my desktop stable and I'm happy with that for now.
Profit!
Oh super helpful, been meaning to investigate linux-tkg for my gaming setup. How long was the build time for the kernel on your system?
Just yesterday I thought about the vice versa way... Wouldn't it be nice if one could bisect the nix configuration. Well technical you can already dealing with git, but a 'nixos-rebuild bisect' would be a nice shortcut. Esp. for config problems which aren't identified immediately. One of those "whoa, since when did my exotic seldomly used hardware stop to work..."-moments.
Nix does store some previous system generations so it would be as easy as using
nixos rollback
(or whatever the command is) to bisect them. A wrapper around this would be nice.The problem here is that it would only give you the update that broke it. You likely want to go further and identify the offending change in either your config or nixpkgs.