Locating program strings in memory

Disclaimer: I’m a total newbie to all this executable file format stuff. Slowly learning!

The strings command lists all sufficiently long printable character strings in a file.
I recently found myself needing to locate a string found in an ELF executable in the memory of the running program. strings did its job just fine in reporting the location of the string in the executable:

$ strings --radix=x ./bin.x86_64
[snip]
 143fd5 really interesting string
[snip]

However, strings merely operates on binary data and doesn’t care if it be an ELF executable, a dump of random data or even a plain text file. So I had to find how to map this file offset to a memory address in the running program.

Time to head to Wikipedia for a quick readup on the ELF file format:

Elf-layout--en.svg

An ELF file has two views: The program header shows the segments used at run-time, whereas the section header lists the set of sections of the binary.

Elf-layout–en” by SurueñaOwn work. Licensed under CC BY-SA 3.0 via Wikimedia Commons.

So turns out the location of the string is pretty simple to figure out, and doesn’t even require the program to be run. The silver bullet here is readelf, which provides all kinds of information on the contents of an ELF file:

$ readelf --program-headers ./bin.x86_64

Elf file type is EXEC (Executable file)
Entry point 0x406840
There are 8 program headers, starting at offset 64

Program Headers:
  Type           Offset             VirtAddr           PhysAddr
                 FileSiz            MemSiz              Flags  Align
  PHDR           0x0000000000000040 0x0000000000400040 0x0000000000400040
                 0x00000000000001c0 0x00000000000001c0  R E    8
  INTERP         0x0000000000000200 0x0000000000400200 0x0000000000400200
                 0x000000000000001c 0x000000000000001c  R      1
      [Requesting program interpreter: /lib64/ld-linux-x86-64.so.2]
  LOAD           0x0000000000000000 0x0000000000400000 0x0000000000400000
                 0x00000000001af658 0x00000000001af658  R E    200000
  LOAD           0x00000000001af658 0x00000000007af658 0x00000000007af658
                 0x0000000000002ce8 0x0000000000008418  RW     200000
  DYNAMIC        0x00000000001afd18 0x00000000007afd18 0x00000000007afd18
                 0x0000000000000270 0x0000000000000270  RW     8
  NOTE           0x000000000000021c 0x000000000040021c 0x000000000040021c
                 0x0000000000000020 0x0000000000000020  R      4
  GNU_EH_FRAME   0x00000000001709d0 0x00000000005709d0 0x00000000005709d0
                 0x000000000000b5fc 0x000000000000b5fc  R      4
  GNU_STACK      0x0000000000000000 0x0000000000000000 0x0000000000000000
                 0x0000000000000000 0x0000000000000000  RW     8

 Section to Segment mapping:
  Segment Sections...
   00
   01     .interp
   02     .interp .note.ABI-tag .gnu.hash .dynsym .dynstr .gnu.version .gnu.version_r .rela.dyn .rela.plt .init .plt .text .fini .rodata .eh_frame_hdr .eh_frame .gcc_except_table
   03     .ctors .dtors .jcr .dynamic .got .got.plt .data .bss
   04     .dynamic
   05     .note.ABI-tag
   06     .eh_frame_hdr
   07

The first column indicates the offsets and sizes of program segments in the file, while the second column indicates the location of those segments in memory (i.e., at runtime).
So my string at file offset 0x143fd5 was located inside the first LOAD segment (file range [0x000000,0x1af658]), mapped to memory range [0x400000,0x5af658].
Hence the location of the string in memory: 0x543fd5.

This can be confirmed easily with other tools:

$ objdump --full-contents --file-offsets --section=.rodata --start-address=0x543fd5 --stop-address=$((0x543fd5 + 26)) ./bin.x86_64

./bin.x86_64:     file format elf64-x86-64

Contents of section .rodata:  (Starting at file offset: 0x143fd5)
 543fd5 726561 6c6c7920 696e7465 72657374 69 really interesti
 543fe5 6e6720 73747269 6e6700               ng string.
$ gdb ./bin.x86_64
(gdb) x/s 0x543fd5
0x543fd5:       "really interesting string"

Note: I do not have the slightest idea how all of this plays with ASLR.

Leave a Reply

Your email address will not be published. Required fields are marked *