Debugging Ncurses to fix a Mutt segfault on Gentoo

This all started after a regular package upgrade on my Gentoo box:

$ mutt
Segmentation fault

Meh, another breakage. This one already had a bug filed on the Gentoo bug tracker: #651552. However, there was no fix available yet. So time to rebuild with debug enabled (-ggdb3) and sources installed, disable PaX on the binary, and whip out GDB:

$ gdb /usr/bin/mutt
Reading symbols from /usr/bin/mutt...Reading symbols from /usr/lib/debug//usr/bin/mutt.debug...done.
done.
(gdb) run
Starting program: /usr/bin/mutt

Program received signal SIGSEGV, Segmentation fault.
0x00007ffff77e6236 in _nc_setupscreen_sp (spp=spp@entry=0x7fffffffbea8, slines=, scolumns=,
    output=output@entry=0x7ffff7dd25c0 , filtered=, slk_format=slk_format@entry=0)
    at /usr/src/debug/sys-libs/ncurses-6.1-r1/ncurses-6.1/ncurses/base/lib_set_term.c:475
475             if (sp->_default_fg >= MaxColors) {
(gdb) bt
#0  0x00007ffff77e6236 in _nc_setupscreen_sp (spp=spp@entry=0x7fffffffbea8, slines=, scolumns=,
    output=output@entry=0x7ffff7dd25c0 , filtered=, slk_format=slk_format@entry=0)
    at /usr/src/debug/sys-libs/ncurses-6.1-r1/ncurses-6.1/ncurses/base/lib_set_term.c:475
#1  0x00007ffff77e0d2d in newterm_sp (sp=, name=name@entry=0x7fffffffde0f "screen-256color",
    ofp=ofp@entry=0x7ffff7dd25c0 , ifp=ifp@entry=0x7ffff7dd1860 )
    at /usr/src/debug/sys-libs/ncurses-6.1-r1/ncurses-6.1/ncurses/base/lib_newterm.c:222
#2  0x00007ffff77e11db in newterm (name=name@entry=0x7fffffffde0f "screen-256color", ofp=0x7ffff7dd25c0 ,
    ifp=0x7ffff7dd1860 ) at /usr/src/debug/sys-libs/ncurses-6.1-r1/ncurses-6.1/ncurses/base/lib_newterm.c:355
#3  0x00007ffff77dc15a in initscr () at /usr/src/debug/sys-libs/ncurses-6.1-r1/ncurses-6.1/ncurses/base/lib_initscr.c:89
#4  0x000055555556b823 in start_curses () at main.c:296
#5  main (argc=1, argv=0x7fffffffc8b8, environ=) at main.c:584
(gdb) p sp->_default_fg
$1 = 12
(gdb) p MaxColors
Cannot access memory at address 0x6e69746e99

Hmm, weird. What is MaxColors?

A bit of cscope-ing in the ncurses source later, it turns out to be this:

#ifdef USE_TERM_DRIVER
#define MaxColors      InfoOf(sp).maxcolors
#define NumLabels      InfoOf(sp).numlabels
#else
#define MaxColors      max_colors
#define NumLabels      num_labels
#endif

Alright, it’s a macro. Back to GDB then:

(gdb) info macro MaxColors
Defined at /usr/src/debug/sys-libs/ncurses-6.1-r1/ncurses-6.1/ncurses/base/lib_set_term.c:55
#define MaxColors max_colors
(gdb) macro expand MaxColors
expands to: (((sp) ? ((sp)->_term ? (sp)->_term : cur_term) : cur_term))->type2. Numbers[13]
(gdb) p sp
$2 = (SCREEN *) 0x55555586a240
(gdb) p sp->_term
$3 = (TERMINAL *) 0x55555586a8a0
(gdb) p sp->_term->type2
$4 = {term_names = 0x0, str_table = 0x21 ,
  Booleans = 0x75712f656d6f682f , Numbers = 0x6e69746e65, Strings = 0x0,
  ext_str_table = 0x61 , ext_Names = 0x75712f656d6f682f, num_Booleans = 28261,
  num_Numbers = 26996, num_Strings = 12142, ext_Booleans = 29742, ext_Numbers = 29285, ext_Strings = 26989}
(gdb) ptype sp->_term
type = struct term {
    TERMTYPE type;
    short Filedes;
    struct termios Ottyb;
    struct termios Nttyb;
    int _baudrate;
    char *_termname;
    TERMTYPE2 type2;
} *
(gdb) p *sp->_term
$5 = {type = {term_names = 0x55555586bf50 "screen-256color|GNU Screen with 256 colors",
    str_table = 0x55555586bf50 "screen-256color|GNU Screen with 256 colors", Booleans = 0x55555586c290 "", Numbers = 0x55555586c2d0,
    Strings = 0x55555586c330, ext_str_table = 0x55555586d130 "\033(B", ext_Names = 0x55555586d1f0, num_Booleans = 47,
    num_Numbers = 40, num_Strings = 446, ext_Booleans = 3, ext_Numbers = 1, ext_Strings = 32}, Filedes = 1, Ottyb = {c_iflag = 17664,
    c_oflag = 5, c_cflag = 191, c_lflag = 35387, c_line = 0 '\000',
    c_cc = "\003\034\177\025\004\000\001\000\021\023\032\377\022\017\027\026\377", '\000' , c_ispeed = 15,
    c_ospeed = 15}, Nttyb = {c_iflag = 17664, c_oflag = 5, c_cflag = 191, c_lflag = 35387, c_line = 0 '\000',
    c_cc = "\003\034\177\025\004\000\001\000\021\023\032\377\022\017\027\026\377", '\000' , c_ispeed = 15,
    c_ospeed = 15}, _baudrate = 38400, _termname = 0x55555586af40 "screen-256color", type2 = {term_names = 0x0,
    str_table = 0x21 ,
    Booleans = 0x75712f656d6f682f , Numbers = 0x6e69746e65, Strings = 0x0,
    ext_str_table = 0x61 , ext_Names = 0x75712f656d6f682f, num_Booleans = 28261,
    num_Numbers = 26996, num_Strings = 12142, ext_Booleans = 29742, ext_Numbers = 29285, ext_Strings = 26989}}

Okay, so sp->_term->type2 is full of crap, however all other fields in sp->_term look fine. So perhaps something smashed the end of sp->_term?

Let’s track down how it’s built, then we can set a watch on it to catch the rogue write. Working our way up the backtrace and looking at the ncurses code, it turns out to be allocated in TINFO_SETUP_TERM(), called by newterm_sp().

(gdb) b lib_setup.c:711
Breakpoint 1 at 0x7ffff63957d9: lib_setup.c:711. (2 locations)
(gdb) run
The program being debugged has been started already.
Start it from the beginning? (y or n) y
Starting program: /usr/bin/mutt

Breakpoint 1, _nc_setupterm (tname=tname@entry=0x7fffffffe18f "screen-256color", Filedes=1, errret=errret@entry=0x7fffffffc23c,
    reuse=reuse@entry=0) at /usr/src/debug/sys-libs/ncurses-6.1-r1/ncurses-6.1/ncurses/tinfo/lib_setup.c:711
711             termp = typeCalloc(TERMINAL, 1);
(gdb) n
713             if (termp == 0) {
(gdb) p *termp
$6 = {type = {term_names = 0x0, str_table = 0x0, Booleans = 0x0, Numbers = 0x0, Strings = 0x0, ext_str_table = 0x0, ext_Names = 0x0,
    num_Booleans = 0, num_Numbers = 0, num_Strings = 0, ext_Booleans = 0, ext_Numbers = 0, ext_Strings = 0}, Filedes = 0, Ottyb = {
    c_iflag = 0, c_oflag = 0, c_cflag = 0, c_lflag = 0, c_line = 0 '\000', c_cc = '\000' , c_ispeed = 0,
    c_ospeed = 0}, Nttyb = {c_iflag = 0, c_oflag = 0, c_cflag = 0, c_lflag = 0, c_line = 0 '\000', c_cc = '\000' ,
    c_ispeed = 0, c_ospeed = 0}, _baudrate = 0, _termname = 0x0}
(gdb) ptype
type = struct term {
    TERMTYPE type;
    short Filedes;
    struct termios Ottyb;
    struct termios Nttyb;
    int _baudrate;
    char *_termname;
}

Now that gets weirder, the allocated TERMINAL structure doesn’t have the type2 field that was filled with garbage at the time of the crash! That explains the segfault however, and now we must understand why the definition changed.

Time to take another look at the source. Turns out the structure definition is generated by include/MKterm.h.awk.in, and ends up in include/term.h:

    print  "typedef struct term {       /* describe an actual terminal */"
    print  "    TERMTYPE    type;       /* terminal type description */"
    print  "    short   Filedes;    /* file description being written to */"
    print  "    TTY     Ottyb;      /* original state of the terminal */"
    print  "    TTY     Nttyb;      /* current state of the terminal */"
    print  "    int     _baudrate;  /* used to compute padding */"
    print  "    char *  _termname;  /* used for termname() */"
    if (@NCURSES_EXT_COLORS@) {
    print  "    TERMTYPE2   type2;      /* extended terminal type description */"
    }
    print  "} TERMINAL;"

Quick check in GDB:

(gdb) p NCURSES_EXT_COLORS
$7 = 0
(gdb) c
Continuing.

Program received signal SIGSEGV, Segmentation fault.
0x00007ffff77e6236 in _nc_setupscreen_sp (spp=spp@entry=0x7fffffffc968, slines=, scolumns=,
    output=output@entry=0x7ffff7dd25c0 , filtered=, slk_format=slk_format@entry=0)
    at /usr/src/debug/sys-libs/ncurses-6.1-r1/ncurses-6.1/ncurses/base/lib_set_term.c:475
475             if (sp->_default_fg >= MaxColors) {
(gdb) p NCURSES_EXT_COLORS
$8 = 20180127

Indeed, the value of NCURSES_EXT_COLORS changes… That’s super-weird. What’s more, it’s not an #ifdef in the structure definition, it’s processed at ncurses compile time by AWK. So there should be only a single definition possible for struct term

It took me some more time spelunking in the ncurses internals and build system, chasing ghosts from m4 through AWK to C, till I stumbled upon it:

(gdb) run
The program being debugged has been started already.
Start it from the beginning? (y or n) y
Starting program: /usr/bin/mutt

Breakpoint 1, _nc_setupterm (tname=tname@entry=0x7fffffffe28f "screen-256color", Filedes=1, errret=errret@entry=0x7fffffffc33c,
    reuse=reuse@entry=0) at /usr/src/debug/sys-libs/ncurses-6.1-r1/ncurses-6.1/ncurses/tinfo/lib_setup.c:711
711             termp = typeCalloc(TERMINAL, 1);
(gdb) info macro NCURSES_EXT_COLORS
Defined at /usr/src/debug/sys-libs/ncurses-6.1-r1/ncurses-6.1-abi_x86_64.amd64/ncurses/ncurses/../include/ncurses_def.h:729
  included at /usr/src/debug/sys-libs/ncurses-6.1-r1/ncurses-6.1-abi_x86_64.amd64/ncurses/ncurses/../include/ncurses_cfg.h:205
  included at /usr/src/debug/sys-libs/ncurses-6.1-r1/ncurses-6.1/ncurses/curses.priv.h:56
  included at /usr/src/debug/sys-libs/ncurses-6.1-r1/ncurses-6.1/ncurses/tinfo/lib_setup.c:44
#define NCURSES_EXT_COLORS 0
(gdb) c
Continuing.

Program received signal SIGSEGV, Segmentation fault.
0x00007ffff77e6236 in _nc_setupscreen_sp (spp=spp@entry=0x7fffffffc968, slines=, scolumns=,
    output=output@entry=0x7ffff7dd25c0 , filtered=, slk_format=slk_format@entry=0)
    at /usr/src/debug/sys-libs/ncurses-6.1-r1/ncurses-6.1/ncurses/base/lib_set_term.c:475
475             if (sp->_default_fg >= MaxColors) {
(gdb) info macro NCURSES_EXT_COLORS
Defined at /usr/src/debug/sys-libs/ncurses-6.1-r1/ncurses-6.1-abi_x86_64.amd64/ncursesw/ncurses/../include/curses.h:424
  included at /usr/src/debug/sys-libs/ncurses-6.1-r1/ncurses-6.1/ncurses/curses.priv.h:325
  included at /usr/src/debug/sys-libs/ncurses-6.1-r1/ncurses-6.1/ncurses/base/lib_set_term.c:43
#define NCURSES_EXT_COLORS 20180127

Upon allocation, the macro NCURSES_EXT_COLORS was defined in the ncurses source. Upon segfaulting access, it was defined in the ncursesw source…

(gdb) info sharedlibrary
From                To                  Syms Read   Shared Object Library
0x00007ffff7dd7c70  0x00007ffff7df54d0  Yes (*)     /lib64/ld-linux-x86-64.so.2
0x00007ffff7a26680  0x00007ffff7b83f6b  Yes         /lib64/libc.so.6
0x00007ffff77ce450  0x00007ffff77facd9  Yes         /lib64/libncursesw.so.6
0x00007ffff7595190  0x00007ffff75af034  Yes         /lib64/libtinfo.so.6
0x00007ffff732d570  0x00007ffff736ce36  Yes         /usr/lib64/libssl.so.1.1
0x00007ffff6ecb000  0x00007ffff7078414  Yes         /usr/lib64/libcrypto.so.1.1
0x00007ffff6c479a0  0x00007ffff6c5b8c3  Yes (*)     /usr/lib64/libsasl2.so.3
0x00007ffff7fb3cc0  0x00007ffff7fc25fe  Yes         /usr/lib64/liblmdb.so.0
0x00007ffff6a11fe0  0x00007ffff6a17511  Yes (*)     /usr/lib64/libidn.so.11
0x00007ffff67c8500  0x00007ffff67fc8b9  Yes (*)     /usr/lib64/libgpgme.so.11
0x00007ffff65bde20  0x00007ffff65beeba  Yes (*)     /lib64/libdl.so.2
0x00007ffff638e290  0x00007ffff63a8ab4  Yes         /lib64/libtinfow.so.6
0x00007ffff7f983c0  0x00007ffff7fa7ac9  Yes (*)     /lib64/libz.so.1
0x00007ffff6163af0  0x00007ffff6174ead  Yes (*)     /lib64/libpthread.so.0
0x00007ffff5f4c760  0x00007ffff5f58113  Yes (*)     /usr/lib64/libassuan.so.0
0x00007ffff5d35a50  0x00007ffff5d41599  Yes (*)     /usr/lib64/libgpg-error.so.0
(*): Shared library is missing debugging information.

There you have it: Mutt was linked against both libtinfo and libtinfow/libncursesw, which have differing values of NCURSES_EXT_COLORS. Apparently the loader chose to resolve the required tinfo symbols using libtinfo, thus the crash, since the TERMINAL structure allocated in libtinfo was incompatible with the TERMINAL structure manipulated by libncursesw.

The rest is history, however I can’t stress enough how -ggdb3 has proven useful in debugging this issue. I’ve never had to debug code this macro-ridden as the ncurses code, and having GDB able to give me that much info on all macros was an incredible boon. :)

Leave a Reply

Your email address will not be published. Required fields are marked *