Conditional PAM modules

I spent some time trying to setup PAM so that I could authenticate (locally for now) using my Yubikey. There are several resources on how to do that out there, so I won’t discuss the Yubikey setup and all, but I thought I’d drop a note about how I ended up doing “conditional” PAM entries.

Specifically, I wanted to achieve the following:

  • login as a regular user (which has no sudo powers) using either the Yubikey in challenge-response mode (and configured to require a button press), or the user’s password
  • login as root using both the Yubikey and the root password

The difficulty here is that the pam_yubico module is sufficient for a regular user, but required for root. After a few failed attempts in which I tried to use pam_rootok.so, I ended up with the following PAM configuration (using pam_succeed_if.so user = root) in /etc/pam.d/system-local-login:

# yubico module switcher: requisite for root, sufficient for non-root
# ("ignore=ignore" is *required*, otherwise the lightdm greeter fails on pam_setcred() with something about "PAM dispatch" and "ignore")
auth        [success=ok ignore=ignore default=2]    pam_succeed_if.so quiet user = root
# root
auth        requisite   pam_yubico.so mode=challenge-response chalresp_path=/var/lib/yubico
auth        [default=1] pam_permit.so
# non-root
auth        sufficient  pam_yubico.so mode=challenge-response chalresp_path=/var/lib/yubico

auth        include     system-login
account     include     system-login
password    include     system-login
session     include     system-login

It feels like assembly: if the user is not root, skip over the 2 entries for root users and end up on the sufficient entry for pam_yubico; if it is root, go on with the requisite entry (like required but failing early), then jump over the sufficient entry for non-root users. It’s not exactly pretty though.

Debugging Ncurses to fix a Mutt segfault on Gentoo

This all started after a regular package upgrade on my Gentoo box:

$ mutt
Segmentation fault

Meh, another breakage. This one already had a bug filed on the Gentoo bug tracker: #651552. However, there was no fix available yet. So time to rebuild with debug enabled (-ggdb3) and sources installed, disable PaX on the binary, and whip out GDB:

$ gdb /usr/bin/mutt
Reading symbols from /usr/bin/mutt...Reading symbols from /usr/lib/debug//usr/bin/mutt.debug...done.
done.
(gdb) run
Starting program: /usr/bin/mutt

Program received signal SIGSEGV, Segmentation fault.
0x00007ffff77e6236 in _nc_setupscreen_sp (spp=spp@entry=0x7fffffffbea8, slines=, scolumns=,
    output=output@entry=0x7ffff7dd25c0 , filtered=, slk_format=slk_format@entry=0)
    at /usr/src/debug/sys-libs/ncurses-6.1-r1/ncurses-6.1/ncurses/base/lib_set_term.c:475
475             if (sp->_default_fg >= MaxColors) {
(gdb) bt
#0  0x00007ffff77e6236 in _nc_setupscreen_sp (spp=spp@entry=0x7fffffffbea8, slines=, scolumns=,
    output=output@entry=0x7ffff7dd25c0 , filtered=, slk_format=slk_format@entry=0)
    at /usr/src/debug/sys-libs/ncurses-6.1-r1/ncurses-6.1/ncurses/base/lib_set_term.c:475
#1  0x00007ffff77e0d2d in newterm_sp (sp=, name=name@entry=0x7fffffffde0f "screen-256color",
    ofp=ofp@entry=0x7ffff7dd25c0 , ifp=ifp@entry=0x7ffff7dd1860 )
    at /usr/src/debug/sys-libs/ncurses-6.1-r1/ncurses-6.1/ncurses/base/lib_newterm.c:222
#2  0x00007ffff77e11db in newterm (name=name@entry=0x7fffffffde0f "screen-256color", ofp=0x7ffff7dd25c0 ,
    ifp=0x7ffff7dd1860 ) at /usr/src/debug/sys-libs/ncurses-6.1-r1/ncurses-6.1/ncurses/base/lib_newterm.c:355
#3  0x00007ffff77dc15a in initscr () at /usr/src/debug/sys-libs/ncurses-6.1-r1/ncurses-6.1/ncurses/base/lib_initscr.c:89
#4  0x000055555556b823 in start_curses () at main.c:296
#5  main (argc=1, argv=0x7fffffffc8b8, environ=) at main.c:584
(gdb) p sp->_default_fg
$1 = 12
(gdb) p MaxColors
Cannot access memory at address 0x6e69746e99

Hmm, weird. What is MaxColors?

A bit of cscope-ing in the ncurses source later, it turns out to be this:

#ifdef USE_TERM_DRIVER
#define MaxColors      InfoOf(sp).maxcolors
#define NumLabels      InfoOf(sp).numlabels
#else
#define MaxColors      max_colors
#define NumLabels      num_labels
#endif

Alright, it’s a macro. Back to GDB then:

(gdb) info macro MaxColors
Defined at /usr/src/debug/sys-libs/ncurses-6.1-r1/ncurses-6.1/ncurses/base/lib_set_term.c:55
#define MaxColors max_colors
(gdb) macro expand MaxColors
expands to: (((sp) ? ((sp)->_term ? (sp)->_term : cur_term) : cur_term))->type2. Numbers[13]
(gdb) p sp
$2 = (SCREEN *) 0x55555586a240
(gdb) p sp->_term
$3 = (TERMINAL *) 0x55555586a8a0
(gdb) p sp->_term->type2
$4 = {term_names = 0x0, str_table = 0x21 ,
  Booleans = 0x75712f656d6f682f , Numbers = 0x6e69746e65, Strings = 0x0,
  ext_str_table = 0x61 , ext_Names = 0x75712f656d6f682f, num_Booleans = 28261,
  num_Numbers = 26996, num_Strings = 12142, ext_Booleans = 29742, ext_Numbers = 29285, ext_Strings = 26989}
(gdb) ptype sp->_term
type = struct term {
    TERMTYPE type;
    short Filedes;
    struct termios Ottyb;
    struct termios Nttyb;
    int _baudrate;
    char *_termname;
    TERMTYPE2 type2;
} *
(gdb) p *sp->_term
$5 = {type = {term_names = 0x55555586bf50 "screen-256color|GNU Screen with 256 colors",
    str_table = 0x55555586bf50 "screen-256color|GNU Screen with 256 colors", Booleans = 0x55555586c290 "", Numbers = 0x55555586c2d0,
    Strings = 0x55555586c330, ext_str_table = 0x55555586d130 "\033(B", ext_Names = 0x55555586d1f0, num_Booleans = 47,
    num_Numbers = 40, num_Strings = 446, ext_Booleans = 3, ext_Numbers = 1, ext_Strings = 32}, Filedes = 1, Ottyb = {c_iflag = 17664,
    c_oflag = 5, c_cflag = 191, c_lflag = 35387, c_line = 0 '\000',
    c_cc = "\003\034\177\025\004\000\001\000\021\023\032\377\022\017\027\026\377", '\000' , c_ispeed = 15,
    c_ospeed = 15}, Nttyb = {c_iflag = 17664, c_oflag = 5, c_cflag = 191, c_lflag = 35387, c_line = 0 '\000',
    c_cc = "\003\034\177\025\004\000\001\000\021\023\032\377\022\017\027\026\377", '\000' , c_ispeed = 15,
    c_ospeed = 15}, _baudrate = 38400, _termname = 0x55555586af40 "screen-256color", type2 = {term_names = 0x0,
    str_table = 0x21 ,
    Booleans = 0x75712f656d6f682f , Numbers = 0x6e69746e65, Strings = 0x0,
    ext_str_table = 0x61 , ext_Names = 0x75712f656d6f682f, num_Booleans = 28261,
    num_Numbers = 26996, num_Strings = 12142, ext_Booleans = 29742, ext_Numbers = 29285, ext_Strings = 26989}}

Okay, so sp->_term->type2 is full of crap, however all other fields in sp->_term look fine. So perhaps something smashed the end of sp->_term?

Let’s track down how it’s built, then we can set a watch on it to catch the rogue write. Working our way up the backtrace and looking at the ncurses code, it turns out to be allocated in TINFO_SETUP_TERM(), called by newterm_sp().

(gdb) b lib_setup.c:711
Breakpoint 1 at 0x7ffff63957d9: lib_setup.c:711. (2 locations)
(gdb) run
The program being debugged has been started already.
Start it from the beginning? (y or n) y
Starting program: /usr/bin/mutt

Breakpoint 1, _nc_setupterm (tname=tname@entry=0x7fffffffe18f "screen-256color", Filedes=1, errret=errret@entry=0x7fffffffc23c,
    reuse=reuse@entry=0) at /usr/src/debug/sys-libs/ncurses-6.1-r1/ncurses-6.1/ncurses/tinfo/lib_setup.c:711
711             termp = typeCalloc(TERMINAL, 1);
(gdb) n
713             if (termp == 0) {
(gdb) p *termp
$6 = {type = {term_names = 0x0, str_table = 0x0, Booleans = 0x0, Numbers = 0x0, Strings = 0x0, ext_str_table = 0x0, ext_Names = 0x0,
    num_Booleans = 0, num_Numbers = 0, num_Strings = 0, ext_Booleans = 0, ext_Numbers = 0, ext_Strings = 0}, Filedes = 0, Ottyb = {
    c_iflag = 0, c_oflag = 0, c_cflag = 0, c_lflag = 0, c_line = 0 '\000', c_cc = '\000' , c_ispeed = 0,
    c_ospeed = 0}, Nttyb = {c_iflag = 0, c_oflag = 0, c_cflag = 0, c_lflag = 0, c_line = 0 '\000', c_cc = '\000' ,
    c_ispeed = 0, c_ospeed = 0}, _baudrate = 0, _termname = 0x0}
(gdb) ptype
type = struct term {
    TERMTYPE type;
    short Filedes;
    struct termios Ottyb;
    struct termios Nttyb;
    int _baudrate;
    char *_termname;
}

Now that gets weirder, the allocated TERMINAL structure doesn’t have the type2 field that was filled with garbage at the time of the crash! That explains the segfault however, and now we must understand why the definition changed.

Time to take another look at the source. Turns out the structure definition is generated by include/MKterm.h.awk.in, and ends up in include/term.h:

    print  "typedef struct term {       /* describe an actual terminal */"
    print  "    TERMTYPE    type;       /* terminal type description */"
    print  "    short   Filedes;    /* file description being written to */"
    print  "    TTY     Ottyb;      /* original state of the terminal */"
    print  "    TTY     Nttyb;      /* current state of the terminal */"
    print  "    int     _baudrate;  /* used to compute padding */"
    print  "    char *  _termname;  /* used for termname() */"
    if (@NCURSES_EXT_COLORS@) {
    print  "    TERMTYPE2   type2;      /* extended terminal type description */"
    }
    print  "} TERMINAL;"

Quick check in GDB:

(gdb) p NCURSES_EXT_COLORS
$7 = 0
(gdb) c
Continuing.

Program received signal SIGSEGV, Segmentation fault.
0x00007ffff77e6236 in _nc_setupscreen_sp (spp=spp@entry=0x7fffffffc968, slines=, scolumns=,
    output=output@entry=0x7ffff7dd25c0 , filtered=, slk_format=slk_format@entry=0)
    at /usr/src/debug/sys-libs/ncurses-6.1-r1/ncurses-6.1/ncurses/base/lib_set_term.c:475
475             if (sp->_default_fg >= MaxColors) {
(gdb) p NCURSES_EXT_COLORS
$8 = 20180127

Indeed, the value of NCURSES_EXT_COLORS changes… That’s super-weird. What’s more, it’s not an #ifdef in the structure definition, it’s processed at ncurses compile time by AWK. So there should be only a single definition possible for struct term

It took me some more time spelunking in the ncurses internals and build system, chasing ghosts from m4 through AWK to C, till I stumbled upon it:

(gdb) run
The program being debugged has been started already.
Start it from the beginning? (y or n) y
Starting program: /usr/bin/mutt

Breakpoint 1, _nc_setupterm (tname=tname@entry=0x7fffffffe28f "screen-256color", Filedes=1, errret=errret@entry=0x7fffffffc33c,
    reuse=reuse@entry=0) at /usr/src/debug/sys-libs/ncurses-6.1-r1/ncurses-6.1/ncurses/tinfo/lib_setup.c:711
711             termp = typeCalloc(TERMINAL, 1);
(gdb) info macro NCURSES_EXT_COLORS
Defined at /usr/src/debug/sys-libs/ncurses-6.1-r1/ncurses-6.1-abi_x86_64.amd64/ncurses/ncurses/../include/ncurses_def.h:729
  included at /usr/src/debug/sys-libs/ncurses-6.1-r1/ncurses-6.1-abi_x86_64.amd64/ncurses/ncurses/../include/ncurses_cfg.h:205
  included at /usr/src/debug/sys-libs/ncurses-6.1-r1/ncurses-6.1/ncurses/curses.priv.h:56
  included at /usr/src/debug/sys-libs/ncurses-6.1-r1/ncurses-6.1/ncurses/tinfo/lib_setup.c:44
#define NCURSES_EXT_COLORS 0
(gdb) c
Continuing.

Program received signal SIGSEGV, Segmentation fault.
0x00007ffff77e6236 in _nc_setupscreen_sp (spp=spp@entry=0x7fffffffc968, slines=, scolumns=,
    output=output@entry=0x7ffff7dd25c0 , filtered=, slk_format=slk_format@entry=0)
    at /usr/src/debug/sys-libs/ncurses-6.1-r1/ncurses-6.1/ncurses/base/lib_set_term.c:475
475             if (sp->_default_fg >= MaxColors) {
(gdb) info macro NCURSES_EXT_COLORS
Defined at /usr/src/debug/sys-libs/ncurses-6.1-r1/ncurses-6.1-abi_x86_64.amd64/ncursesw/ncurses/../include/curses.h:424
  included at /usr/src/debug/sys-libs/ncurses-6.1-r1/ncurses-6.1/ncurses/curses.priv.h:325
  included at /usr/src/debug/sys-libs/ncurses-6.1-r1/ncurses-6.1/ncurses/base/lib_set_term.c:43
#define NCURSES_EXT_COLORS 20180127

Upon allocation, the macro NCURSES_EXT_COLORS was defined in the ncurses source. Upon segfaulting access, it was defined in the ncursesw source…

(gdb) info sharedlibrary
From                To                  Syms Read   Shared Object Library
0x00007ffff7dd7c70  0x00007ffff7df54d0  Yes (*)     /lib64/ld-linux-x86-64.so.2
0x00007ffff7a26680  0x00007ffff7b83f6b  Yes         /lib64/libc.so.6
0x00007ffff77ce450  0x00007ffff77facd9  Yes         /lib64/libncursesw.so.6
0x00007ffff7595190  0x00007ffff75af034  Yes         /lib64/libtinfo.so.6
0x00007ffff732d570  0x00007ffff736ce36  Yes         /usr/lib64/libssl.so.1.1
0x00007ffff6ecb000  0x00007ffff7078414  Yes         /usr/lib64/libcrypto.so.1.1
0x00007ffff6c479a0  0x00007ffff6c5b8c3  Yes (*)     /usr/lib64/libsasl2.so.3
0x00007ffff7fb3cc0  0x00007ffff7fc25fe  Yes         /usr/lib64/liblmdb.so.0
0x00007ffff6a11fe0  0x00007ffff6a17511  Yes (*)     /usr/lib64/libidn.so.11
0x00007ffff67c8500  0x00007ffff67fc8b9  Yes (*)     /usr/lib64/libgpgme.so.11
0x00007ffff65bde20  0x00007ffff65beeba  Yes (*)     /lib64/libdl.so.2
0x00007ffff638e290  0x00007ffff63a8ab4  Yes         /lib64/libtinfow.so.6
0x00007ffff7f983c0  0x00007ffff7fa7ac9  Yes (*)     /lib64/libz.so.1
0x00007ffff6163af0  0x00007ffff6174ead  Yes (*)     /lib64/libpthread.so.0
0x00007ffff5f4c760  0x00007ffff5f58113  Yes (*)     /usr/lib64/libassuan.so.0
0x00007ffff5d35a50  0x00007ffff5d41599  Yes (*)     /usr/lib64/libgpg-error.so.0
(*): Shared library is missing debugging information.

There you have it: Mutt was linked against both libtinfo and libtinfow/libncursesw, which have differing values of NCURSES_EXT_COLORS. Apparently the loader chose to resolve the required tinfo symbols using libtinfo, thus the crash, since the TERMINAL structure allocated in libtinfo was incompatible with the TERMINAL structure manipulated by libncursesw.

The rest is history, however I can’t stress enough how -ggdb3 has proven useful in debugging this issue. I’ve never had to debug code this macro-ridden as the ncurses code, and having GDB able to give me that much info on all macros was an incredible boon. :)

Minor annoyances while installing GitLab

I’ve installed GitLab on my server, and it hasn’t exactly been a smooth ride. So here goes, what went wrong in my case — and how to fix it.

First off, I should mention the server is a LXC container running Debian 9 (Stretch) 64bits.

Prerequisites install failures

Following along the GitLab install guide, the second step is to download a script from https://packages.gitlab.com and blindly pipe it into sudo bash, then… wait, what?

Well, since I’m not one for running as root random shit downloaded over the Internet without prior extensive scrutiny, I first took a look at the script. Basically it:

  1. checks the host distro
  2. installs curl
  3. installs debian-archive-keyring
  4. installs apt-transport-https
  5. fetches an APT sources configuration file from the GitLab repo
  6. fetches the GitLab package repo’s key
  7. runs apt-get update

Sadly, there is very little in the way of error-checking in this script. More specifically, all executions of apt-get don’t bother checking the return code, and hence fail to notice any errors that may have occurred.

In my case, what failed was step 4 (since both curl and debian-archive-keyring were already installed, otherwise they would have failed too). This was because I use etckeeper with stricter settings that the default, by which it will refuse to install stuff if there are uncommitted changes in /etc.
There were uncommitted changes in my /etc, so etckeeper failed the install attempt, which was silently ignored by GitLab’s script. Thus in the end, even though I got the success message from the end of the script, nothing much had actually happened.

This was easily fixed by just properly committing changes in /etc before running the script, and also afterwards (since the script does drop its sources configuration file in /etc/apt).

APT proxy issues

The last step in the above script is to run apt-get update to fetch the InRelease file from the GitLab package repo. On my box, this failed with the following errors:

Ign:7 https://packages.gitlab.com/gitlab/gitlab-ee/debian stretch InRelease
Err:8 https://packages.gitlab.com/gitlab/gitlab-ee/debian stretch Release
  Received HTTP code 403 from proxy after CONNECT
Reading package lists... Done
E: The repository 'https://packages.gitlab.com/gitlab/gitlab-ee/debian stretch Release' does no longer have a Release file.
N: Updating from such a repository can't be done securely, and is therefore disabled by default.

Easy enough, this must be because of my apt-cacher-ng proxy which is not configured for HTTPS proxying.
So I added the following to my APT proxy configuration:

Acquire::https::Proxy::packages.gitlab.com DIRECT;

But the error remained, even though reaching the file directly in a browser worked fine… After much fiddling with APT’s proxy configuration, and much cursing, but still no joy, I brought out the big guns: strace.

And there it was:

read(6, "103 Redirect\nURI: https://packages.gitlab.com/gitlab/gitlab-ee/debian/dists/stretch/InRelease\nNew-URI: https://packages-gitlab-com.s3-accelerate.amazonaws.com/7/11/debian/dists/stretch/InRelease?AWSAccessKeyId=AKIAJ74R7IHMTQVGFCEA&Signature=oo10HdjIUUaV5Ms2OPTS7hPUsPo=&Expires=1516749086\n\n", 64000) = 290
[…]
read(13, "400 URI Failure\nURI: https://packages-gitlab-com.s3-accelerate.amazonaws.com/7/11/debian/dists/stretch/InRelease?AWSAccessKeyId=AKIAJ74R7IHMTQVGFCEA&Signature=oo10HdjIUUaV5Ms2OPTS7hPUsPo=&Expires=1516749086\nMessage: Received HTTP code 403 from proxy after CONNECT\n\n", 64000) = 265

Indeed, in the browser, I had noticed the redirect to packages-gitlab-com.s3-accelerate.amazonaws.com. However, its significance with regards to my proxy issue had eluded me: while the no-proxy directive I had set in the APT configuration did prevent going through the proxy when reaching out to packages.gitlab.com, it didn’t apply to the redirected URL at packages-gitlab-com.s3-accelerate.amazonaws.com.

So at last the solution became clear:

# GitLab repo (first domain redirects to the second, so both need the proxy bypass)
Acquire::https::Proxy::packages.gitlab.com DIRECT;
Acquire::https::Proxy::packages-gitlab-com.s3-accelerate.amazonaws.com DIRECT;

And indeed, apt-get update finally managed to fetch the GitLab repo’s InRelease file.

sysctl settings failure

After downloading and installing the gitlab-ee package, GitLab tries to configure itself.
Several sysctl-related failures happened during this step, all due to the fact that in an LXC container, sysfs is read-only and hence sysctl variables can’t be set. This is a common issue and it even has its own entry in GitLab’s list of common installation problems. The solution is to gather those parameters after GitLab fails to set them, and actually set them on the host:

# Gitlab settings (see forge:/opt/gitlab/embedded/etc/90-omnibus-gitlab*)
kernel.shmall = 4194304
kernel.shmmax = 17179869184
kernel.sem = 250 32000 32 262
net.core.somaxconn = 1024

Another solution mentioned at the end of this GitHub issue would be to use dpkg-divert to override the installed sysctl configuration files, but I didn’t try this.

Some more info on the sysctl variables being set:

  • shmmax and shmall: a blog post, and this extract from the Linux kernel’s Documentation/sysctl/kernel.txt:
    shmall:
    
    This parameter sets the total amount of shared memory pages that
    can be used system wide. Hence, SHMALL should always be at least
    ceil(shmmax/PAGE_SIZE).
    
    If you are not sure what the default PAGE_SIZE is on your Linux
    system, you can run the following command:
    
    # getconf PAGE_SIZE
    
    ==============================================================
    
    shmmax:
    
    This value can be used to query and set the run time limit
    on the maximum shared memory segment size that can be created.
    Shared memory segments up to 1Gb are now supported in the
    kernel.  This value defaults to SHMMAX.
    

    also, from the Linux kernel’s include/uapi/linux/shm.h:

    /*
     * SHMMNI, SHMMAX and SHMALL are default upper limits which can be
     * modified by sysctl. The SHMMAX and SHMALL values have been chosen to
     * be as large possible without facilitating scenarios where userspace
     * causes overflows when adjusting the limits via operations of the form
     * "retrieve current limit; add X; update limit". It is therefore not
     * advised to make SHMMAX and SHMALL any larger. These limits are
     * suitable for both 32 and 64-bit systems.
     */
    #define SHMMIN 1            /* min shared seg size (bytes) */
    #define SHMMNI 4096             /* max num of segs system wide */
    #define SHMMAX (ULONG_MAX - (1UL << 24)) /* max shared seg size (bytes) */
    #define SHMALL (ULONG_MAX - (1UL << 24)) /* max shm system wide (pages) */
    #define SHMSEG SHMMNI           /* max shared segs per process */
    
  • sem: a random wiki page, and the Linux kernel’s include/uapi/linux/sem.h:
    /*
     * SEMMNI, SEMMSL and SEMMNS are default values which can be
     * modified by sysctl.
     * The values has been chosen to be larger than necessary for any
     * known configuration.
     *
     * SEMOPM should not be increased beyond 1000, otherwise there is the
     * risk that semop()/semtimedop() fails due to kernel memory fragmentation when
     * allocating the sop array.
     */
    
    
    #define SEMMNI  32000           /* <= IPCMNI  max # of semaphore identifiers */
    #define SEMMSL  32000           /* <= INT_MAX max num of semaphores per id */
    #define SEMMNS  (SEMMNI*SEMMSL) /* <= INT_MAX max # of semaphores in system */
    #define SEMOPM  500            /* <= 1 000 max num of ops per semop call */
    #define SEMVMX  32767           /* <= 32767 semaphore maximum value */
    #define SEMAEM  SEMVMX          /* adjust on exit max value */
    
  • somaxconn: a NASA wiki page, and this extract from the Linux kernel’s Documentation/networking/ip-sysctl.txt:
    somaxconn - INTEGER
        Limit of socket listen() backlog, known in userspace as SOMAXCONN.
        Defaults to 128.  See also tcp_max_syn_backlog for additional tuning
        for TCP sockets.
    

OpenSSL 1.1.0 and the plague of implicit function declarations

I’m currently rebuilding my Gentoo packages after switching to the (hard-masked) dev-libs/openssl-1.1.0g. The OpenSSL 1.1.0 branch has been available for a while now, and brings with it a lot of sane-looking changes to the OpenSSL API, like opaque structures that don’t let anyone go poking around their insides, and less kludgy handling of library initialization, threading and locking.

However, it doesn’t seem to have had a great adoption rate in the FLOSS ecosystem, and a lot of Gentoo packages fail to build against it. OpenSSL 1.1.0 does have some sort of compatibility layer for its previous API, but this layer must be enabled at build time, both for OpenSSL and for the packages that depend on it: --api=1.0.0 with OpenSSL’s configure script, and #define OPENSSL_API_COMPAT=0x10000000L for code that uses it. (This is as I understand it at this point in time – I may very well be wrong here.)
From what I’ve seen fixing failed builds, a lot of the patches available from upstream mostly fix missing includes, renamed functions and opaque structure accesses. Deprecated APIs usually are still called, so I’m guessing my OpenSSL either doesn’t have the right API compatibility layer, or these packages build with different flags than I do.

Anyway, I’ve set out to have my entire Gentoo @world set build fine against OpenSSL 1.1.0, and I’ve already filed several bugs and patches on the Gentoo bug tracker.

The way I proceed is simple: rebuild packages that depend on OpenSSL 1.1.0, and fix failures by patching the code, hopefully correctly. Then when the build passes again, file a bug on the Gentoo tracker and submit the patch, and also submit the patch upstream if that makes sense. Porting to the new OpenSSL 1.1 API doesn’t involve much changes in code logic, so I deem a passing build to be a good indicator of successful porting.
However, I’ve been bitten a few times now by passing builds that in fact failed at runtime, or rather a bit earlier that actual runtime: at load time. Taking a closer look, I found out that this was because GCC had the distasteful behavior of not erroring out on calls to functions which had not been declared, instead merely issuing a warning. When code used e.g. SSLeay_add_all_algorithms(), which no longer exists in OpenSSL 1.1.0, GCC would just print a warning, assume the function returned an integer, and keep on compiling.
But obviously at load time, this symbol somehow had to get resolved, which failed. Thus my problem: an issue was being detected but ignored at build time, resulting in failure at runtime. :(

At first upon noticing this, I began to compulsively load every shared object that was built using Python’s ctypes.CDLL with RTLD_NOW. Thus any unresolved symbol would cause the shared object to fail loading. This worked, but was very error-prone as I’d often miss some shared objects in the build output, and on top of that it wasn’t exactly fast.

Therefore, I’ve decided to tackle this issue at its source: GCC. Obviously turning every single warning into an error with -Werror is not really an option as I don’t want to go about fixing the shitload of warnings that packages sometimes emit when built. But GCC is king enough to provide -Werror-implicit-function-declaration, which turns the specific warning I loathe into an error. That’s exactly what I want! I must admit I can’t see any real reason why I would want calling an undeclared function to be anything less than an error. Even more so when said function returns a value.
Sadly, adding this flag to my system-wide Portage CFLAGS didn’t turn out so well. Indeed, it seems configure scripts will often try to compile stuff without the proper includes, even for functions as basic as exit(). So, nope.

Searching some more, I stumbled upon this post by Flameeyes with concerns very similar. So I tried adding -Wl,--no-undefined to my LDFLAGS, and lo and behold:

/var/tmp/portage/net-misc/openssh-7.6_p1-r3/temp/cctStwQR.ltrans0.ltrans.o: In function `main':
:(.text.startup+0x50d): undefined reference to `OpenSSL_add_all_algorithms'
collect2: error: ld returned 1 exit status
make: *** [Makefile:182: ssh-keysign] Error 1

We’ll see how things turn out as I rebuild more packages. If needed, I’ll remove the flag for specific packages with package.env.

Cheap torification in C with self-applied LD_PRELOAD

So I was playing around with a lab for exploring the OpenSSL C API, and wanted to add a command-line flag to make connections go through Tor, so that I could e.g. fetch SSL certificates from .onion addresses.
An easy way to do it would just have been to use torify, but I figured, why not have it as a command line flag so it shows in the help text of my lab program? This way I’ll see it and I’ll be more likely to use it.

I thought of two options:

  1. make connections go through the SOCKS5 proxy Tor spawns on my computer on port 9050
  2. use libtorsocks which is the library preloaded by torify

The SOCKS5 route required implementing a custom SOCKS5 BIO since OpenSSL itself does not provide SOCKS5 proxying support, and I wasn’t even sure the DNS request would use the proxy too. So I went for the libtorsocks route.

The constraints I had were:

  • have the Tor option as a runtime switch: no compile-time linking against libtorsocks
  • use the system’s libssl: no compiling a non-Tor and a Tor-enabled libssls, and dynamically dlopen()ing the correct one at runtime
  • use libssl as much as possible even for lower-level network stuff: no control over the calls to gethostbyname(), socket() and all other functions overridden by libtorsocks

At first I thought of using dlopen() to load libtorsocks, but I couldn’t find any way to make libssl not use the symbols that were resolved by the loader upon startup of my lab program. So next I thought it would be neat if I could LD_PRELOAD myself with libtorsocks.

Since handling LD_PRELOAD is the work of the loader, it meant once my program was started and the --tor flag was processed, there was no way it could apply LD_PRELOAD on itself… but it would be applied to its children!
I didn’t even need to fork() since I didn’t need the parent to live on: simply set LD_PRELOAD to point to libtorsocks, then execve() and the loader would kick in and my program would be re-spawned with torification built in.

I’ve tried it and it works like a charm.
The only gotcha is that if for some reason libtorsocks can’t be found or loaded, the loader simply prints an error but carries on, and the program ends up not being torified. However this is easily fixed: as I’m re-exec-ing myself, the second run of the program has the exact same command line arguments as the first. This means that if it sees the --tor option, it can first check if LD_PRELOAD is set, and then:

  • if it’s not, set it and execve()
  • if it is, use dlsym() to get the address of a symbol exported only by libtorsocks to ensure it is actually loaded (the GNU version of dlsym() has a pseudo-handle RTLD_DEFAULT that searches the global symbols of the executable and its dependencies)

In the end, the code looks as follows:

if (args.tor) {
    if (getenv("LD_PRELOAD") == NULL) {
        // not preloaded, let's restart
        setenv("LD_PRELOAD", TORSOCKS_LIB, 1);
        setenv("TORSOCKS_ISOLATE_PID", "1", 0); /* torsocks --isolate */
        (void) execvp(argv[0], argv);
        // return from execvp() => error
        DIE(3, "--tor failed: %s\n", strerror(errno));
    } else {
        // already preloaded, ensure it worked
        (void) dlerror();                                       /* clear error status */
        (void) dlsym(RTLD_DEFAULT, "tsocks_connect_to_tor");    /* don't care about the actual symbol address */
        if (dlerror() != NULL) {
            DIE(3, "--tor preloading error\n");
        }
    }
}

And now I can happily switch between Tor and non-Tor SSL certificate fetching:

$ ./x509 --sni duckduckgo.com
Fetching certificate from duckduckgo.com... ✓
  version = 3
  serial# = 0AD9B00801718556792CD5C6FC12421D
  businessCategory = Private Organization, jurisdictionC = US, jurisdictionST = Delaware, serialNumber = 5019303, street = 20 Paoli Pike, postalCode = 19301, C = US, ST = Pennsylvania, L = Paoli, O = "Duck Duck Go, Inc.", CN = duckduckgo.com
  C = US, O = DigiCert Inc, OU = www.digicert.com, CN = DigiCert SHA2 Extended Validation Server CA
  sigalg = sha256WithRSAEncryption
  SAN = DNS:duckduckgo.com
  SAN = DNS:www.duckduckgo.com
  notBefore = May 30 00:00:00 2017 GMT
  notAfter = Jun  8 12:00:00 2018 GMT

$ ./x509 --sni 3g2upl4pq6kufc4m.onion
Fetching certificate from 3g2upl4pq6kufc4m.onion... 139904841508672:error:20087002:BIO routines:BIO_lookup:system lib:crypto/bio/b_addr.c:693:Name or service not known
error: can't connect to 3g2upl4pq6kufc4m.onion:443

$ ./x509 --sni --tor 3g2upl4pq6kufc4m.onion
Fetching certificate from 3g2upl4pq6kufc4m.onion... ✓
  version = 3
  serial# = 034392BD5A5F5FF930609512157EA17F
  businessCategory = Private Organization, jurisdictionC = US, jurisdictionST = Delaware, serialNumber = 5019303, C = US, ST = Pennsylvania, L = Paoli, O = "Duck Duck Go, Inc.", CN = 3g2upl4pq6kufc4m.onion
  C = US, O = DigiCert Inc, OU = www.digicert.com, CN = DigiCert SHA2 Extended Validation Server CA
  sigalg = sha256WithRSAEncryption
  SAN = DNS:3g2upl4pq6kufc4m.onion
  SAN = DNS:www.3g2upl4pq6kufc4m.onion
  notBefore = Nov 30 00:00:00 2017 GMT
  notAfter = Jan 25 12:00:00 2019 GMT

Restoring package.use from currently installed packages

So I screwed up while re-installing my Gentoo system. Backup was not set, and I had not been committing my /etc/portage in a while as it kept changing. I was in the process of migrating my single-file package.use to a directory, and I made a mistake: end result, my package.use was gone, and any backups of it were too old to be relevant.

So, time to rebuild the file from scratch!

First off, emerge -pvO [package] will tell Portage to pretend to reinstall a package, but assuming all dependencies are fulfilled. This is great as Portage doesn’t waste any time trying to reconcile the default USE flags with your installed ones. It just shows you which USE flags it will enable (i.e. your currently installed package has it disabled) or disable (i.e. it’s currently enabled) for this package, marked with a *.
So this is the tool we need to figure out the USE flags to enable/disable for each package in our package.use.

Now we need to do this for all installed packages. Portage stores state about installed packages in /var/db/pkg, neatly sorted by category and package with version number.
So all that’s needed is to iterate over these and grab the USE flags to enable/disable from the output of emerge -pvO.

For some packages, Portage will throw errors, e.g. if a package only has Python 2.7 as PYTHON_SINGLE_TARGET but you have selected Python 3.x in your make.conf. This is why you can’t do emerge -pvO $(printf ' =%s' */*): Portage will check for errors before printing the list of packages with USE flags. You need to iterate over all packages one by one, and later on update package.use to fix any errors.

At this point we’ve got a list of packages with USE changes displayed among unchanged USE flags. We then only need to filter these to only show USE changes (and other variables too, e.g. PYTHON_TARGETS). After that it’s just a matter of formatting the output so we can directly create a package.use file from it.

Without further ado, the script.
Pardon the wonky formatting and lack of comments, it used to be a one-liner:

#!/bin/bash

cd /var/db/pkg || exit 2
files=(*/*)
for p in "${@:-${files[@]}}"
do
    vars="$(emerge -pvO1 ="$p" | sed -n -e 's/[^=]* \([^= ]\+=".*"\).*/\1/p')" || exit 1
    [[ -n "$vars" ]] && printf '=%s %s\n' "$p" "$vars"
done | sed -n -e '/\*/ p' | while read -r atom rest
do
    printf %s/%s%s $(qatom "$atom" | cut -d' ' -f 1,2,5)
    grep -o '[^ =]\+="[^"]\+"' <<<"$rest" | sed -e 's/\([^=]\+\)="\([^"]\+\)"/\1 \2/' | while read -r type allflags
    do
        flags=()
        while read -r flag
        do
            [[ "$flag" == -* ]] && flags+=("${flag:1:-1}") || flags+=("-${flag::-1}")
        done < <(grep -o '[^ *]\+\*' << 0 )) && {
            [[ "$type" != USE ]] && printf ' %s:' "$type"
            printf ' %s' "${flags[@]}"
        }
    done
    printf ' # %s\n' "$atom"
done | sort

It prints a package.use file to stdout, and Portage errors (that will need to be fixed) to stderr.
Note that it doesn’t handle slots, so you’ll need to scour the output for duplicate package names with different flags.

Unwrapping a LZ4-compressed kernel

So I’m reinstalling my Gentoo system from scratch, and I want it to boot with UEFI and Secure Boot. That means I want to embed the kernel’s initramfs into the kernel image, so that the signature-checking performed by the firmware covers both the kernel and the initramfs.

Roughly following Sakaki’s awesome EFI install guide, I got to the point where I’ve got a kernel with its embedded initramfs – courtesy of genkernel. However now I want to double-check that the contents of the initramfs are fine. Also, since I like to make things harder for myself, I want to check from the actual initramfs embedded in the kernel image – not from the copy that’s still sitting in /var/tmp/genkernel.

First off, I’ve never tried to pick apart a kernel image before, so this is gonna be… exploratory, shall we say. A bit of googling quickly got me there, so I try the binwalk approach:

$ binwalk /boot/kernel

DECIMAL       HEXADECIMAL     DESCRIPTION
--------------------------------------------------------------------------------
0             0x0             Microsoft executable, portable (PE)
4178666       0x3FC2EA        MySQL MISAM compressed data file Version 2
5317291       0x5122AB        mcrypt 2.5 encrypted data, algorithm: "E", keysize: 18609 bytes, mode: ")",
7809361       0x772951        Certificate in DER format (x509 v3), header length: 4, sequence length: 18
7809365       0x772955        Certificate in DER format (x509 v3), header length: 4, sequence length: 18
7809369       0x772959        Certificate in DER format (x509 v3), header length: 4, sequence length: 18
7809373       0x77295D        Certificate in DER format (x509 v3), header length: 4, sequence length: 18
7809377       0x772961        Certificate in DER format (x509 v3), header length: 4, sequence length: 18
7809381       0x772965        Certificate in DER format (x509 v3), header length: 4, sequence length: 17
8283104       0x7E63E0        xz compressed data
8471664       0x814470        Unix path: /505V/F505/F707/F717/P8
8549858       0x8275E2        ELF, 64-bit LSB processor-specific,
8839487       0x86E13F        SHA256 hash constants, little endian
9228073       0x8CCF29        Certificate in DER format (x509 v3), header length: 4, sequence length: 1342
9240733       0x8D009D        Executable script, shebang: "/bin/ash"
9399653       0x8F6D65        SHA256 hash constants, little endian
9400048       0x8F6EF0        xz compressed data
9819275       0x95D48B        xz compressed data
11495515      0xAF685B        Copyright string: "Copyright (C) 2009 Red Hat, Inc. All !"
11698317      0xB2808D        CramFS filesystem, big endian size 5399878 hole_support CRC 0x2B007379, edition 1937113192, 1885762304 blocks, 1239979513 files
17206081      0x1068B41       lzop compressed data,b09,
17252766      0x107419E       xz compressed data
18215265      0x115F161       Unix path: /lib/gcc/x86_64-pc-linux-gnu/5.4.0"
18633646      0x11C53AE       SHA256 hash constants, little endian
18788285      0x11EAFBD       SHA256 hash constants, little endian

Ok so my kernel starts off with a PE executable, that seems reasonable, must be the EFI stub. However afterwards… MySQL MISAM? mcrypt? /505V/F505/F707/F717/P8? CramFS filesystem with 1239979513 files?
Something isn't right. I quickly look up the text bits (copyright string, shebang) with hexdump, and notice they are both isolated strings interspersed with other bits of binary and text, i.e. not part of actual scripts. I guess this must be a side effect of the compression. I also guess that a lot of the patterns reported by binwalk must be garbage because binwalk didn't pick up the LZ4-compressed blob(s?).

So, looks like I must find another way. Time to look for the LZ4 magic pattern (there has to be one, right?) myself. Reading the specs for the LZ4 frame format, I try to grep for the LZ4 magic:

$ grep -abo $'\x04\x22\x4d\x18' /boot/kernel
$ grep -abo $'\x18\x4d\x22\x04' /boot/kernel        # let's try big endian too, just in case

Nothing. Meh. :(

Ok, perhaps the kernel is not using LZ4's frame format (it happens). So let's have a look at how compression is performed. Skimming through the occurrences of lz4 and lz4c in the kernel sources yields some interesting results:

scripts/extract-ikconfig
64:try_decompress '0241\11430' xyy 'lz4 -d -l'

scripts/gen_initramfs_list.sh
263:                && compr="lz4 -l -9 -f"

scripts/Makefile.lib
373:    lz4c -l -c1 stdin stdout && $(call size_append, $(filter-out FORCE,$^))) > $@ || \

What's that -l flag? From the lz4(1) man page:

      -l     Use Legacy format (typically for Linux Kernel compression)

This smells good! Turns out this format is mentioned a little bit further down the page on the LZ4 frame format. It has a different magic, so let's try to grep for it:

$ grep -abo $'\x02\x21\x4c\x18' /boot/kernel
17332:!L
18902071:!L
18902178:!L
18902255:!L

Neat! Let's uncompress the blob:

$ dd if=/boot/kernel bs=17332 skip=1 | lz4 -dlc >/tmp/kernel-image
$ file /tmp/kernel-image
/tmp/kernel-image: ELF 64-bit LSB executable, x86-64, version 1 (SYSV), statically linked, BuildID[sha1]=5e2c94abac046f5e20c55ccba9e07208eba5830b, stripped, with debug_info
$ binwalk /tmp/kernel-image
DECIMAL       HEXADECIMAL     DESCRIPTION
--------------------------------------------------------------------------------
0             0x0             ELF, 64-bit LSB executable, AMD x86-64, version 1 (SYSV)
14684320      0xE010A0        Linux kernel version "4.9.11-hardened (root@sysresccd) (gcc version 5.4.0 (Gentoo Hardened 5.4.0-r3 p1.3, pie-0.6.5) ) #4 SMP Sun Feb 26 23:55:23 CET "
14830216      0xE24A88        gzip compressed data, maximum compression, from Unix, NULL date (1970-01-01 00:00:00)
15155136      0xE73FC0        CRC32 polynomial table, little endian
15301919      0xE97D1F        Unix path: /370R4V/370R5E/3570RE/370R5V
16988114      0x10337D2       Unix path: /arch/x86/include/asm/irqflags.h
[...]
22949458      0x15E2E52       Unix path: /arch/x86/platform/efi/efi.c
22991298      0x15ED1C2       Unix path: /drivers/firmware/efi/memattr.c
22991458      0x15ED262       Unix path: /drivers/firmware/efi/memmap.c
23052576      0x15FC120       Certificate in DER format (x509 v3), header length: 4, sequence length: 1342
23154632      0x1614FC8       ASCII cpio archive (SVR4 with no CRC), file name: ".", file name length: "0x00000002", file size: "0x00000000"
23154744      0x1615038       ASCII cpio archive (SVR4 with no CRC), file name: "proc", file name length: "0x00000005", file size: "0x00000000"
23154860      0x16150AC       ASCII cpio archive (SVR4 with no CRC), file name: "var", file name length: "0x00000004", file size: "0x00000000"
[...]
44243788      0x2A31B4C       ASCII cpio archive (SVR4 with no CRC), file name: "usr/lib64", file name length: "0x0000000A", file size: "0x00000003"
44243912      0x2A31BC8       ASCII cpio archive (SVR4 with no CRC), file name: "TRAILER!!!", file name length: "0x0000000B", file size: "0x00000000

Eureka! Those ASCII cpio archive entries are the files from the initramfs. The cpio archive that holds them is uncompressed, as it is embedded into the kernel and thus already benefits from the compression of the kernel itself.
Reading cpio(5) (specifically the section on the "New ASCII Format", newc), it looks like a cpio archive is just a collection of files, each one with its cpio header. So the first entry at 0x1614fc8 must be the start of the archive:

$ dd if=/tmp/kernel-image bs=$((0x1614fc8)) skip=1 of=/tmp/initramfs.cpio
$ file /tmp/initramfs.cpio
/tmp/initramfs.cpio: ASCII cpio archive (SVR4 with no CRC)
$ mkdir /tmp/initramfs
$ cpio -idHnewc --no-absolute-filenames -d /tmp/initramfs </tmp/initramfs.cpio
41191 blocks

And there it is! The initramfs is now successfully extracted from the kernel, ready to be scrutinized.

Samba with LTO… not!

So my Gentoo now builds with LTO –mostly. A number of packages fail to compile properly with LTO, so I disable it on a per-package basis using portage.env.

Samba (4.2.11) is one of those packages. I get the exact same failure as reported in this thread, which remains unanswered to date.
So I disabled LTO for Samba, but the build kept failing. Though not at the same stage: instead of failing during the compile stage, it now failed during the configure stage. Looking at the log in /var/tmp/portage/net-fs/samba-4.2.11/work/samba-4.2.11-abi_x86_64.amd64/bin/config.log, I saw this:


[1/2] Compiling test.c
['x86_64-pc-linux-gnu-gcc', '-march=core-avx-i', '-O2', '-flto=8', '-pipe', '-fno-lto', '-MD', '-march=core-avx-i', '-flto=8', '-fno-strict-aliasing', '-I/usr/local/include', '-I/usr/include/python2.7', '-I/usr/include/et', '-D_SAMBA_BUILD_=4', '-DHAVE_CONFIG_H=1', '-D_GNU_SOURCE=1', '-D_XOPEN_SOURCE_EXTENDED=1', '../test.c', '-c', '-o', 'default/test_1.o']
[2/2] Linking default/testprog
/usr/lib/gcc/x86_64-pc-linux-gnu/5.3.0/../../../../x86_64-pc-linux-gnu/bin/ld: default/test_1.o: plugin needed to handle lto object
/usr/lib/gcc/x86_64-pc-linux-gnu/5.3.0/../../../../lib64/Scrt1.o: In function `_start':
(.text+0x20): undefined reference to `main'
collect2: error: ld returned 1 exit status

As expected from my /etc/portage/env/no-lto environment file, the -flto=8 in CFLAGS was overridden by a following -fno-lto. However, where did the last -flto=8 come from?

I noticed that the error did appear at least a few times towards the end of the file, but not at the beginning. So I searched for the first occurrence, and found those lines just above:


Checking for Python version >= 2.5.0
ok 2.7.11
Configuration returned from '/usr/bin/python2.7':
python_prefix = '/usr'
python_SO = '.so'
python_SYSLIBS = '-lm'
python_LDFLAGS = '-Wl,-O1 -Wl,--as-needed -march=core-avx-i -O2 -flto=8 -pipe -L.'
python_SHLIBS = '-lpthread -ldl  -lutil'
python_LIBDIR = '/usr/lib64'
python_LIBPL = '/usr/lib64/python2.7/config'
INCLUDEPY = '/usr/include/python2.7'
Py_ENABLE_SHARED = 1
MACOSX_DEPLOYMENT_TARGET = ''

It seemed I had found my culprit: Python was indeed compiled with LTO, and Samba –using the Waf build system– appended Python’s flags after its own CFLAGS/LDFLAGS, re-enabling LTO.

So I took a look at Waf’s handling of Python flags, and found the code that prints the above text around buildtools/wafadmin/Tools/python.py:180 in the Samba source tree.
Poking around, I found this interesting snippet a few lines below:


    # Allow some python overrides from env vars for cross-compiling
    os_env = dict(os.environ)

    override_python_LDFLAGS = os_env.get('python_LDFLAGS', None)
    if override_python_LDFLAGS is not None:
        conf.log.write("python_LDFLAGS override from environment = %r\n" % (override_python_LDFLAGS))
        python_LDFLAGS = override_python_LDFLAGS

Let’s try it out, using the same code as in Waf’s python.py to get the value of python_LDFLAGS:


$ python_LDFLAGS="$(python2 <<< "from distutils.sysconfig import get_config_var ; print get_config_var('LDFLAGS')") -fno-lto" \
> ebuild /usr/portage/net-fs/samba/samba-4.2.11.ebuild compile

It works!

And looking at the log, there is now indeed a new line below all the python_* variables:


python_LDFLAGS override from environment = '-Wl,-O1 -Wl,--as-needed -march=core-avx-i -O2 -flto=8 -pipe -L. -fno-lto'

So there goes, after adding this python_LDFLAGS variable override to my /etc/portage/env/no-lto environment file, Samba builds fine without LTO. :)

$TERM issues with tmux and vim

tmux 2.1 is out (well it’s been out for a few months now), and brings with it a shiny new terminfo file: tmux-256color.

Unfortunately, switching my $TERM from screen-256color to tmux-256color inside tmux doesn’t work as expected: when starting Vim, … nothing happens. It just sits there, not showing up Vim, not giving me back my prompt, and ignoring any amount of head-bashing on the keyboard (most notably Ctrl-C and Ctrl-Z).
However, if I resize the pane (either by resizing the terminal window or by spawning a new pane inside tmux), then Vim comes to life (and all the head-bashed keystrokes happen).

tl;dr: I don’t know how to fix it (yet) except by reverting $TERM to screen-256color.

So here is what I gathered:

  • use another terminal (e.g. xterm instead of urxvt): FAIL
  • TERM=tmux-256color vim outside tmux: FAIL
  • vim --noplugin: FAIL
  • other curses programs (mutt, irssi, …) work just fine
  • infocmp /usr/share/terminfo/{s/screen,t/tmux}-256color lists the following differences (with meaning according to terminfo(5)):
    has_status_line hs: F:T
    dis_status_line dsl: NULL, '\E]0;07'
    from_status_line fsl: NULL, '^G'
    exit_italics_mode ritm: NULL, '\E[23m'
    exit_standout_mode rmso: '\E[23m', '\E[27m'
    enter_italics_mode sitm: NULL, '\E[3m'
    enter_standout_mode smso: '\E[3m', '\E[7m'
    to_status_line tsl: NULL, '\E]0;'

    Nothing ground-breaking. :(

I’m kinda stuck now, and my Google-fu has failed me so far. Perhaps debugging/tracing Vim would help.
Anyway, I’ll let it rest for a while, and come back to it later.

chroot rm-ing disaster recovery

So you’re setting up your chroot, mounting or bind-mounting important parts of your filesystem (commonly /dev, /sys, /proc), you do some work, and then you realize you missed something and you have to start over.
No biggie, it’s just a chroot, let’s … rm -rf /path/to/chroot it. But you forgot to unmount mounts in the chroot!

Disaster.

From the triplet of mounts above, /proc and /sys should survive the event unscathed. But /dev is another story… And your system is crippled (no /dev/tty hurts useful tools like, say, sudo).

First of all: don’t worry, you’re not the first (this I know for a fact), and surely not the last (this I can only guess) to do this.
So now, assuming that for some reason rebooting the machine is not an option, how to recover?

Well it turns out to be relatively easy: just restart udev. How to actually do this depends on your system.
This should get you back on your feet.

But there still remains some cleanup to be done, since all running services started prior to the blunder are still referencing the “old” /dev. They all need to be restarted. You can get a list with lsof /path/to/chroot/dev: they are the ones with a (deleted) label in the output.

There may still be some missing nodes in /dev. In my case, both my encrypted block devices were missing from /dev/mapper. To recover those, I had to manually create the nodes with mknod /dev/mapper/name b major minor. Fill in the blanks using the output of dmsetup info.

Now just be more careful next time!
I, for one, took the safety precaution of adding alias rm=NOOP to the end of root’s .bashrc.