This all started after a regular package upgrade on my Gentoo box:
$ mutt
Segmentation fault
Meh, another breakage. This one already had a bug filed on the Gentoo bug tracker: #651552. However, there was no fix available yet. So time to rebuild with debug enabled (-ggdb3) and sources installed, disable PaX on the binary, and whip out GDB:
$ gdb /usr/bin/mutt
Reading symbols from /usr/bin/mutt...Reading symbols from /usr/lib/debug//usr/bin/mutt.debug...done.
done.
(gdb) run
Starting program: /usr/bin/mutt
Program received signal SIGSEGV, Segmentation fault.
0x00007ffff77e6236 in _nc_setupscreen_sp (spp=spp@entry=0x7fffffffbea8, slines=, scolumns=,
output=output@entry=0x7ffff7dd25c0 , filtered=, slk_format=slk_format@entry=0)
at /usr/src/debug/sys-libs/ncurses-6.1-r1/ncurses-6.1/ncurses/base/lib_set_term.c:475
475 if (sp->_default_fg >= MaxColors) {
(gdb) bt
#0 0x00007ffff77e6236 in _nc_setupscreen_sp (spp=spp@entry=0x7fffffffbea8, slines=, scolumns=,
output=output@entry=0x7ffff7dd25c0 , filtered=, slk_format=slk_format@entry=0)
at /usr/src/debug/sys-libs/ncurses-6.1-r1/ncurses-6.1/ncurses/base/lib_set_term.c:475
#1 0x00007ffff77e0d2d in newterm_sp (sp=, name=name@entry=0x7fffffffde0f "screen-256color",
ofp=ofp@entry=0x7ffff7dd25c0 , ifp=ifp@entry=0x7ffff7dd1860 )
at /usr/src/debug/sys-libs/ncurses-6.1-r1/ncurses-6.1/ncurses/base/lib_newterm.c:222
#2 0x00007ffff77e11db in newterm (name=name@entry=0x7fffffffde0f "screen-256color", ofp=0x7ffff7dd25c0 ,
ifp=0x7ffff7dd1860 ) at /usr/src/debug/sys-libs/ncurses-6.1-r1/ncurses-6.1/ncurses/base/lib_newterm.c:355
#3 0x00007ffff77dc15a in initscr () at /usr/src/debug/sys-libs/ncurses-6.1-r1/ncurses-6.1/ncurses/base/lib_initscr.c:89
#4 0x000055555556b823 in start_curses () at main.c:296
#5 main (argc=1, argv=0x7fffffffc8b8, environ=) at main.c:584
(gdb) p sp->_default_fg
$1 = 12
(gdb) p MaxColors
Cannot access memory at address 0x6e69746e99
Hmm, weird. What is MaxColors
?
A bit of cscope-ing in the ncurses source later, it turns out to be this:
#ifdef USE_TERM_DRIVER
#define MaxColors InfoOf(sp).maxcolors
#define NumLabels InfoOf(sp).numlabels
#else
#define MaxColors max_colors
#define NumLabels num_labels
#endif
Alright, it’s a macro. Back to GDB then:
(gdb) info macro MaxColors
Defined at /usr/src/debug/sys-libs/ncurses-6.1-r1/ncurses-6.1/ncurses/base/lib_set_term.c:55
#define MaxColors max_colors
(gdb) macro expand MaxColors
expands to: (((sp) ? ((sp)->_term ? (sp)->_term : cur_term) : cur_term))->type2. Numbers[13]
(gdb) p sp
$2 = (SCREEN *) 0x55555586a240
(gdb) p sp->_term
$3 = (TERMINAL *) 0x55555586a8a0
(gdb) p sp->_term->type2
$4 = {term_names = 0x0, str_table = 0x21 ,
Booleans = 0x75712f656d6f682f , Numbers = 0x6e69746e65, Strings = 0x0,
ext_str_table = 0x61 , ext_Names = 0x75712f656d6f682f, num_Booleans = 28261,
num_Numbers = 26996, num_Strings = 12142, ext_Booleans = 29742, ext_Numbers = 29285, ext_Strings = 26989}
(gdb) ptype sp->_term
type = struct term {
TERMTYPE type;
short Filedes;
struct termios Ottyb;
struct termios Nttyb;
int _baudrate;
char *_termname;
TERMTYPE2 type2;
} *
(gdb) p *sp->_term
$5 = {type = {term_names = 0x55555586bf50 "screen-256color|GNU Screen with 256 colors",
str_table = 0x55555586bf50 "screen-256color|GNU Screen with 256 colors", Booleans = 0x55555586c290 "", Numbers = 0x55555586c2d0,
Strings = 0x55555586c330, ext_str_table = 0x55555586d130 "\033(B", ext_Names = 0x55555586d1f0, num_Booleans = 47,
num_Numbers = 40, num_Strings = 446, ext_Booleans = 3, ext_Numbers = 1, ext_Strings = 32}, Filedes = 1, Ottyb = {c_iflag = 17664,
c_oflag = 5, c_cflag = 191, c_lflag = 35387, c_line = 0 '\000',
c_cc = "\003\034\177\025\004\000\001\000\021\023\032\377\022\017\027\026\377", '\000' , c_ispeed = 15,
c_ospeed = 15}, Nttyb = {c_iflag = 17664, c_oflag = 5, c_cflag = 191, c_lflag = 35387, c_line = 0 '\000',
c_cc = "\003\034\177\025\004\000\001\000\021\023\032\377\022\017\027\026\377", '\000' , c_ispeed = 15,
c_ospeed = 15}, _baudrate = 38400, _termname = 0x55555586af40 "screen-256color", type2 = {term_names = 0x0,
str_table = 0x21 ,
Booleans = 0x75712f656d6f682f , Numbers = 0x6e69746e65, Strings = 0x0,
ext_str_table = 0x61 , ext_Names = 0x75712f656d6f682f, num_Booleans = 28261,
num_Numbers = 26996, num_Strings = 12142, ext_Booleans = 29742, ext_Numbers = 29285, ext_Strings = 26989}}
Okay, so sp->_term->type2
is full of crap, however all other fields in sp->_term
look fine. So perhaps something smashed the end of sp->_term
?
Let’s track down how it’s built, then we can set a watch on it to catch the rogue write. Working our way up the backtrace and looking at the ncurses code, it turns out to be allocated in TINFO_SETUP_TERM()
, called by newterm_sp()
.
(gdb) b lib_setup.c:711
Breakpoint 1 at 0x7ffff63957d9: lib_setup.c:711. (2 locations)
(gdb) run
The program being debugged has been started already.
Start it from the beginning? (y or n) y
Starting program: /usr/bin/mutt
Breakpoint 1, _nc_setupterm (tname=tname@entry=0x7fffffffe18f "screen-256color", Filedes=1, errret=errret@entry=0x7fffffffc23c,
reuse=reuse@entry=0) at /usr/src/debug/sys-libs/ncurses-6.1-r1/ncurses-6.1/ncurses/tinfo/lib_setup.c:711
711 termp = typeCalloc(TERMINAL, 1);
(gdb) n
713 if (termp == 0) {
(gdb) p *termp
$6 = {type = {term_names = 0x0, str_table = 0x0, Booleans = 0x0, Numbers = 0x0, Strings = 0x0, ext_str_table = 0x0, ext_Names = 0x0,
num_Booleans = 0, num_Numbers = 0, num_Strings = 0, ext_Booleans = 0, ext_Numbers = 0, ext_Strings = 0}, Filedes = 0, Ottyb = {
c_iflag = 0, c_oflag = 0, c_cflag = 0, c_lflag = 0, c_line = 0 '\000', c_cc = '\000' , c_ispeed = 0,
c_ospeed = 0}, Nttyb = {c_iflag = 0, c_oflag = 0, c_cflag = 0, c_lflag = 0, c_line = 0 '\000', c_cc = '\000' ,
c_ispeed = 0, c_ospeed = 0}, _baudrate = 0, _termname = 0x0}
(gdb) ptype
type = struct term {
TERMTYPE type;
short Filedes;
struct termios Ottyb;
struct termios Nttyb;
int _baudrate;
char *_termname;
}
Now that gets weirder, the allocated TERMINAL
structure doesn’t have the type2
field that was filled with garbage at the time of the crash! That explains the segfault however, and now we must understand why the definition changed.
Time to take another look at the source. Turns out the structure definition is generated by include/MKterm.h.awk.in, and ends up in include/term.h:
print "typedef struct term { /* describe an actual terminal */"
print " TERMTYPE type; /* terminal type description */"
print " short Filedes; /* file description being written to */"
print " TTY Ottyb; /* original state of the terminal */"
print " TTY Nttyb; /* current state of the terminal */"
print " int _baudrate; /* used to compute padding */"
print " char * _termname; /* used for termname() */"
if (@NCURSES_EXT_COLORS@) {
print " TERMTYPE2 type2; /* extended terminal type description */"
}
print "} TERMINAL;"
Quick check in GDB:
(gdb) p NCURSES_EXT_COLORS
$7 = 0
(gdb) c
Continuing.
Program received signal SIGSEGV, Segmentation fault.
0x00007ffff77e6236 in _nc_setupscreen_sp (spp=spp@entry=0x7fffffffc968, slines=, scolumns=,
output=output@entry=0x7ffff7dd25c0 , filtered=, slk_format=slk_format@entry=0)
at /usr/src/debug/sys-libs/ncurses-6.1-r1/ncurses-6.1/ncurses/base/lib_set_term.c:475
475 if (sp->_default_fg >= MaxColors) {
(gdb) p NCURSES_EXT_COLORS
$8 = 20180127
Indeed, the value of NCURSES_EXT_COLORS
changes… That’s super-weird. What’s more, it’s not an #ifdef
in the structure definition, it’s processed at ncurses compile time by AWK. So there should be only a single definition possible for struct term
…
It took me some more time spelunking in the ncurses internals and build system, chasing ghosts from m4 through AWK to C, till I stumbled upon it:
(gdb) run
The program being debugged has been started already.
Start it from the beginning? (y or n) y
Starting program: /usr/bin/mutt
Breakpoint 1, _nc_setupterm (tname=tname@entry=0x7fffffffe28f "screen-256color", Filedes=1, errret=errret@entry=0x7fffffffc33c,
reuse=reuse@entry=0) at /usr/src/debug/sys-libs/ncurses-6.1-r1/ncurses-6.1/ncurses/tinfo/lib_setup.c:711
711 termp = typeCalloc(TERMINAL, 1);
(gdb) info macro NCURSES_EXT_COLORS
Defined at /usr/src/debug/sys-libs/ncurses-6.1-r1/ncurses-6.1-abi_x86_64.amd64/ncurses/ncurses/../include/ncurses_def.h:729
included at /usr/src/debug/sys-libs/ncurses-6.1-r1/ncurses-6.1-abi_x86_64.amd64/ncurses/ncurses/../include/ncurses_cfg.h:205
included at /usr/src/debug/sys-libs/ncurses-6.1-r1/ncurses-6.1/ncurses/curses.priv.h:56
included at /usr/src/debug/sys-libs/ncurses-6.1-r1/ncurses-6.1/ncurses/tinfo/lib_setup.c:44
#define NCURSES_EXT_COLORS 0
(gdb) c
Continuing.
Program received signal SIGSEGV, Segmentation fault.
0x00007ffff77e6236 in _nc_setupscreen_sp (spp=spp@entry=0x7fffffffc968, slines=, scolumns=,
output=output@entry=0x7ffff7dd25c0 , filtered=, slk_format=slk_format@entry=0)
at /usr/src/debug/sys-libs/ncurses-6.1-r1/ncurses-6.1/ncurses/base/lib_set_term.c:475
475 if (sp->_default_fg >= MaxColors) {
(gdb) info macro NCURSES_EXT_COLORS
Defined at /usr/src/debug/sys-libs/ncurses-6.1-r1/ncurses-6.1-abi_x86_64.amd64/ncursesw/ncurses/../include/curses.h:424
included at /usr/src/debug/sys-libs/ncurses-6.1-r1/ncurses-6.1/ncurses/curses.priv.h:325
included at /usr/src/debug/sys-libs/ncurses-6.1-r1/ncurses-6.1/ncurses/base/lib_set_term.c:43
#define NCURSES_EXT_COLORS 20180127
Upon allocation, the macro NCURSES_EXT_COLORS
was defined in the ncurses source. Upon segfaulting access, it was defined in the ncursesw source…
(gdb) info sharedlibrary
From To Syms Read Shared Object Library
0x00007ffff7dd7c70 0x00007ffff7df54d0 Yes (*) /lib64/ld-linux-x86-64.so.2
0x00007ffff7a26680 0x00007ffff7b83f6b Yes /lib64/libc.so.6
0x00007ffff77ce450 0x00007ffff77facd9 Yes /lib64/libncursesw.so.6
0x00007ffff7595190 0x00007ffff75af034 Yes /lib64/libtinfo.so.6
0x00007ffff732d570 0x00007ffff736ce36 Yes /usr/lib64/libssl.so.1.1
0x00007ffff6ecb000 0x00007ffff7078414 Yes /usr/lib64/libcrypto.so.1.1
0x00007ffff6c479a0 0x00007ffff6c5b8c3 Yes (*) /usr/lib64/libsasl2.so.3
0x00007ffff7fb3cc0 0x00007ffff7fc25fe Yes /usr/lib64/liblmdb.so.0
0x00007ffff6a11fe0 0x00007ffff6a17511 Yes (*) /usr/lib64/libidn.so.11
0x00007ffff67c8500 0x00007ffff67fc8b9 Yes (*) /usr/lib64/libgpgme.so.11
0x00007ffff65bde20 0x00007ffff65beeba Yes (*) /lib64/libdl.so.2
0x00007ffff638e290 0x00007ffff63a8ab4 Yes /lib64/libtinfow.so.6
0x00007ffff7f983c0 0x00007ffff7fa7ac9 Yes (*) /lib64/libz.so.1
0x00007ffff6163af0 0x00007ffff6174ead Yes (*) /lib64/libpthread.so.0
0x00007ffff5f4c760 0x00007ffff5f58113 Yes (*) /usr/lib64/libassuan.so.0
0x00007ffff5d35a50 0x00007ffff5d41599 Yes (*) /usr/lib64/libgpg-error.so.0
(*): Shared library is missing debugging information.
There you have it: Mutt was linked against both libtinfo and libtinfow/libncursesw, which have differing values of NCURSES_EXT_COLORS
. Apparently the loader chose to resolve the required tinfo symbols using libtinfo, thus the crash, since the TERMINAL
structure allocated in libtinfo was incompatible with the TERMINAL
structure manipulated by libncursesw.
The rest is history, however I can’t stress enough how -ggdb3 has proven useful in debugging this issue. I’ve never had to debug code this macro-ridden as the ncurses code, and having GDB able to give me that much info on all macros was an incredible boon. :)