OpenSSL 1.1.0 and the plague of implicit function declarations

I’m currently rebuilding my Gentoo packages after switching to the (hard-masked) dev-libs/openssl-1.1.0g. The OpenSSL 1.1.0 branch has been available for a while now, and brings with it a lot of sane-looking changes to the OpenSSL API, like opaque structures that don’t let anyone go poking around their insides, and less kludgy handling of library initialization, threading and locking.

However, it doesn’t seem to have had a great adoption rate in the FLOSS ecosystem, and a lot of Gentoo packages fail to build against it. OpenSSL 1.1.0 does have some sort of compatibility layer for its previous API, but this layer must be enabled at build time, both for OpenSSL and for the packages that depend on it: --api=1.0.0 with OpenSSL’s configure script, and #define OPENSSL_API_COMPAT=0x10000000L for code that uses it. (This is as I understand it at this point in time – I may very well be wrong here.)
From what I’ve seen fixing failed builds, a lot of the patches available from upstream mostly fix missing includes, renamed functions and opaque structure accesses. Deprecated APIs usually are still called, so I’m guessing my OpenSSL either doesn’t have the right API compatibility layer, or these packages build with different flags than I do.

Anyway, I’ve set out to have my entire Gentoo @world set build fine against OpenSSL 1.1.0, and I’ve already filed several bugs and patches on the Gentoo bug tracker.

The way I proceed is simple: rebuild packages that depend on OpenSSL 1.1.0, and fix failures by patching the code, hopefully correctly. Then when the build passes again, file a bug on the Gentoo tracker and submit the patch, and also submit the patch upstream if that makes sense. Porting to the new OpenSSL 1.1 API doesn’t involve much changes in code logic, so I deem a passing build to be a good indicator of successful porting.
However, I’ve been bitten a few times now by passing builds that in fact failed at runtime, or rather a bit earlier that actual runtime: at load time. Taking a closer look, I found out that this was because GCC had the distasteful behavior of not erroring out on calls to functions which had not been declared, instead merely issuing a warning. When code used e.g. SSLeay_add_all_algorithms(), which no longer exists in OpenSSL 1.1.0, GCC would just print a warning, assume the function returned an integer, and keep on compiling.
But obviously at load time, this symbol somehow had to get resolved, which failed. Thus my problem: an issue was being detected but ignored at build time, resulting in failure at runtime. :(

At first upon noticing this, I began to compulsively load every shared object that was built using Python’s ctypes.CDLL with RTLD_NOW. Thus any unresolved symbol would cause the shared object to fail loading. This worked, but was very error-prone as I’d often miss some shared objects in the build output, and on top of that it wasn’t exactly fast.

Therefore, I’ve decided to tackle this issue at its source: GCC. Obviously turning every single warning into an error with -Werror is not really an option as I don’t want to go about fixing the shitload of warnings that packages sometimes emit when built. But GCC is king enough to provide -Werror-implicit-function-declaration, which turns the specific warning I loathe into an error. That’s exactly what I want! I must admit I can’t see any real reason why I would want calling an undeclared function to be anything less than an error. Even more so when said function returns a value.
Sadly, adding this flag to my system-wide Portage CFLAGS didn’t turn out so well. Indeed, it seems configure scripts will often try to compile stuff without the proper includes, even for functions as basic as exit(). So, nope.

Searching some more, I stumbled upon this post by Flameeyes with concerns very similar. So I tried adding -Wl,--no-undefined to my LDFLAGS, and lo and behold:

/var/tmp/portage/net-misc/openssh-7.6_p1-r3/temp/cctStwQR.ltrans0.ltrans.o: In function `main':
:(.text.startup+0x50d): undefined reference to `OpenSSL_add_all_algorithms'
collect2: error: ld returned 1 exit status
make: *** [Makefile:182: ssh-keysign] Error 1

We’ll see how things turn out as I rebuild more packages. If needed, I’ll remove the flag for specific packages with package.env.

Leave a Reply

Your email address will not be published. Required fields are marked *