Un-bricking Debian: apt-get crash in non-critical packages

I’ve got an old Asus Eee netbook that I run Debian on, since as much as I love Gentoo, I wouldn’t want to compile anything on that Atom N280 – and I’m too lazy to setup distcc.
Unfortunately, the beast tends to fall victim to chronic MCEs, almost systematically in the course of apt-get operation. Reproducibility is always top-notch: trying to resume the upgrade will trigger another MCE at the exact same stage at which the previous occurred. Top-notch indeed, but quite brutal too.

This often leaves the system a bit wonky: it is halfway through an upgrade – i.e., in an inconsistent state – and keeps crashing if I try to complete the upgrade. Varying degrees of wonkiness can be reached according to the package upgrade that failed: from a minor, non-vital package to, say, an X server package. The system may even be left crippled if the crash occurred while upgrading the kernel image, but this will be the subject of another post (I’ll write about it the next time it happens – hopefully not too soon).

So this is the sorry state the system is in after a crash:

$ apt-get upgrade
# says the previous operation borked and I should run dpkg --configure -a
$ dpkg --configure -a
# sorts of resumes the upgrade but eventually panics in the process

The procedure that I’ve been using to work around this kind of breakage is the following:

$ dpkg -r <crashing package>
# remove the culprit
$ dpkg --configure -a
# configure all remaining packages (hopefully these won't trigger a crash)
$ apt-get download <crashing package>
$ dpkg -i <crashing package's .deb archive>
# download and reinstall the crashing package (it usually installs fine at this point)
# (the .deb archive must be downloaded anew as often the cached one is corrupted)
# (installing through dpkg prevents apt-get from marking the package as manually installed)

Took me some time (and quite a bunch of MCEs) to figure out the exact sequence, but thanks to this I’m now pretty effective at recovering from those apt-get MCEs.

GNU patch and end-of-file newline (or lack thereof)

So I was trying to version bump the ebuild for dev-tcltk/tcllib-1.15-r1 into dev-tcltk/tcllib-1.16 (I needed tcl::chan::string 1.0.2 instead of 1.0.1).

tcllib comes with a cartload of man pages, some of which have names which conflict with Tcl ones. Thus, as is the case for dev-tcltk/tcllib-1.15-r1, I needed to remove/rename them in the ebuild. Those that were present in tcllib-1.15 were already taken care of, but there were a few new files to remove as well. Namely:

  • tcllib-tcllib_1_16/embedded/man/files/modules/coroutine/coroutine.n
  • tcllib-tcllib_1_16/embedded/man/files/modules/virtchannel_base/string.n
  • tcllib-tcllib_1_16/embedded/man/files/modules/virtchannel_base/variable.n
  • tcllib-tcllib_1_16/embedded/man/files/modules/virtchannel_transform/zlib.n

I created a patch file with the contents of all four files, with each line prepended with a “-” and the right headers, and applied the patch: failed. Supposedly the patch looked to patch as if it had already been applied, and forcing it to be applied (or reverse-applied) failed with no useful error message whatsoever.
It took me quite a while to figure out that what I was missing was this line at the end of each patch hunk:

\ No newline at end of file

Theses four files had no end-of-file newline, and without the above indication in the patch, patch would bail out because it’s picky (which is fine – but I’ve seen more explicit error reporting).

The moral here is that I should remember not to piece together my patch files by hand: letting diff handle that would have saved me some headaches…

Note: On the upside, dev-tcltk/tcllib-1.16 is now available in my overlay.

Locating program strings in memory

Disclaimer: I’m a total newbie to all this executable file format stuff. Slowly learning!

The strings command lists all sufficiently long printable character strings in a file.
I recently found myself needing to locate a string found in an ELF executable in the memory of the running program. strings did its job just fine in reporting the location of the string in the executable:

$ strings --radix=x ./bin.x86_64
[snip]
 143fd5 really interesting string
[snip]

However, strings merely operates on binary data and doesn’t care if it be an ELF executable, a dump of random data or even a plain text file. So I had to find how to map this file offset to a memory address in the running program.

Time to head to Wikipedia for a quick readup on the ELF file format:

Elf-layout--en.svg

An ELF file has two views: The program header shows the segments used at run-time, whereas the section header lists the set of sections of the binary.

Elf-layout–en” by SurueñaOwn work. Licensed under CC BY-SA 3.0 via Wikimedia Commons.

So turns out the location of the string is pretty simple to figure out, and doesn’t even require the program to be run. The silver bullet here is readelf, which provides all kinds of information on the contents of an ELF file:

$ readelf --program-headers ./bin.x86_64

Elf file type is EXEC (Executable file)
Entry point 0x406840
There are 8 program headers, starting at offset 64

Program Headers:
  Type           Offset             VirtAddr           PhysAddr
                 FileSiz            MemSiz              Flags  Align
  PHDR           0x0000000000000040 0x0000000000400040 0x0000000000400040
                 0x00000000000001c0 0x00000000000001c0  R E    8
  INTERP         0x0000000000000200 0x0000000000400200 0x0000000000400200
                 0x000000000000001c 0x000000000000001c  R      1
      [Requesting program interpreter: /lib64/ld-linux-x86-64.so.2]
  LOAD           0x0000000000000000 0x0000000000400000 0x0000000000400000
                 0x00000000001af658 0x00000000001af658  R E    200000
  LOAD           0x00000000001af658 0x00000000007af658 0x00000000007af658
                 0x0000000000002ce8 0x0000000000008418  RW     200000
  DYNAMIC        0x00000000001afd18 0x00000000007afd18 0x00000000007afd18
                 0x0000000000000270 0x0000000000000270  RW     8
  NOTE           0x000000000000021c 0x000000000040021c 0x000000000040021c
                 0x0000000000000020 0x0000000000000020  R      4
  GNU_EH_FRAME   0x00000000001709d0 0x00000000005709d0 0x00000000005709d0
                 0x000000000000b5fc 0x000000000000b5fc  R      4
  GNU_STACK      0x0000000000000000 0x0000000000000000 0x0000000000000000
                 0x0000000000000000 0x0000000000000000  RW     8

 Section to Segment mapping:
  Segment Sections...
   00
   01     .interp
   02     .interp .note.ABI-tag .gnu.hash .dynsym .dynstr .gnu.version .gnu.version_r .rela.dyn .rela.plt .init .plt .text .fini .rodata .eh_frame_hdr .eh_frame .gcc_except_table
   03     .ctors .dtors .jcr .dynamic .got .got.plt .data .bss
   04     .dynamic
   05     .note.ABI-tag
   06     .eh_frame_hdr
   07

The first column indicates the offsets and sizes of program segments in the file, while the second column indicates the location of those segments in memory (i.e., at runtime).
So my string at file offset 0x143fd5 was located inside the first LOAD segment (file range [0x000000,0x1af658]), mapped to memory range [0x400000,0x5af658].
Hence the location of the string in memory: 0x543fd5.

This can be confirmed easily with other tools:

$ objdump --full-contents --file-offsets --section=.rodata --start-address=0x543fd5 --stop-address=$((0x543fd5 + 26)) ./bin.x86_64

./bin.x86_64:     file format elf64-x86-64

Contents of section .rodata:  (Starting at file offset: 0x143fd5)
 543fd5 726561 6c6c7920 696e7465 72657374 69 really interesti
 543fe5 6e6720 73747269 6e6700               ng string.
$ gdb ./bin.x86_64
(gdb) x/s 0x543fd5
0x543fd5:       "really interesting string"

Note: I do not have the slightest idea how all of this plays with ASLR.

Emerge failure: dev-lisp/clisp-2.49-r8

dev-lisp/clisp-2.49-r8 fails to emerge with sys-libs/ncurses[tinfo] (required for CUDA), complaining that it does not find tgetent:

configure: ** checks for libraries
checking for library containing tgetent... no
configure: error: in `/var/tmp/portage/dev-lisp/clisp-2.49-r8/work/clisp-2.49/builddir':
configure: error: despite --with-readline, GNU readline was not found (try --with-libreadline-prefix)
See `config.log' for more details.
 * ERROR: dev-lisp/clisp-2.49-r8::gentoo failed (configure phase):
 *   ./configure failed
 *
 * Call stack:
 *     ebuild.sh, line  93:  Called src_configure
 *   environment, line 2282:  Called die
 * The specific snippet of code:
 *       ${configure} || die "./configure failed";
 *
 * If you need support, post the output of `emerge --info '=dev-lisp/clisp-2.49-r8::gentoo'`,
 * the complete build log and the output of `emerge -pqv '=dev-lisp/clisp-2.49-r8::gentoo'`.
 * The complete build log is located at '/var/tmp/portage/dev-lisp/clisp-2.49-r8/temp/build.log'.
 * The ebuild environment file is located at '/var/tmp/portage/dev-lisp/clisp-2.49-r8/temp/environment'.
 * Working directory: '/var/tmp/portage/dev-lisp/clisp-2.49-r8/work/clisp-2.49'
 * S: '/var/tmp/portage/dev-lisp/clisp-2.49-r8/work/clisp-2.49'

This is Gentoo bug #497600.
The bug is also referenced upstream as CLISP bug #665 on SourceForge. The fix is included in the bug report, and has been applied to the CLISP repository (in src/m4/termcap.m4 – the configure scripts have not yet been regenerated).

It only needs be applied to the tarball installed by the ebuild:

$ ebuild /usr/portage/dev-lisp/clisp/clisp-2.49-r8.ebuild prepare
$ sed -e "s/for ac_lib in '' ncurses termcap; do/for ac_lib in '' ncurses termcap tinfo; do/" -i /var/tmp/portage/dev-lisp/clisp-2.49-r8/work/clisp-2.49/src/configure
$ sed -e "s/for ac_lib in '' ncurses termcap; do/for ac_lib in '' ncurses termcap tinfo; do/" -i /var/tmp/portage/dev-lisp/clisp-2.49-r8/work/clisp-2.49/modules/readline/configure
$ ebuild /usr/portage/dev-lisp/clisp/clisp-2.49-r8.ebuild merge clean

Fossil admin password

When cloning a repository with the Fossil SCM, the fossil command-line tool outputs something along these lines:

admin-user: quentin (password is "1fa55b")

I wondered whether I needed to note down yet another password, and what it was required for. So I googled a bit, and it turns out (from the Password Management page in the Fossil documentation) this password is used by the repository’s web interface and the Fossil sync protocol.

My Fossil version stores it in cleartext in the repository database, meaning it can easily be retrieved from the command line:

$ fossil version
This is fossil version 1.27 [13ad130920] 2013-09-11 11:43:49 UTC
$ sqlite3 repo.fossil
sqlite> .schema user
CREATE TABLE user(
  uid INTEGER PRIMARY KEY,
  login TEXT UNIQUE,
  pw TEXT,
  cap TEXT,
  cookie TEXT,
  ipaddr TEXT,
  cexpire DATETIME,
  info TEXT,
  mtime DATE,
  photo BLOB
);
sqlite> SELECT login,pw,info FROM user;                -- Whole user table:
quentin|1fa55b|
anonymous|F463AD50A48DE1C2|Anon
nobody||Nobody
developer||Dev
reader||Reader
sqlite> SELECT pw FROM user WHERE login='quentin';     -- More targeted query:
1fa55b

Future versions of Fossil may no longer store the credentials as plain text, but rather as 40-character SHA1 hashes. Retrieving the password would then no longer be possible. However it could still be reset:

$ sqlite3 repo.fossil
sqlite> UPDATE user SET pw='some-cleartext-password' WHERE login='quentin';
$ fossil test-hash-passwords repo.fossil     # Convert to SHA1-hashed passwords again

Note that the cleartext password must NOT be 40 characters long so as not to be mistaken for a (most likely invalid) password hash.

Bonus: The project-code Fossil uses to generate its hash is given by the fossil info command. It is also printed when cloning a repository.