.eh_frame
When gcc generates code that handles exceptions, it produces tables that
describe how to unwind the stack. These tables are found in the .eh_frame
section. The format of the .eh_frame
section is very similar to the format of
a DWARF .debug_frame
section. Unfortunately, it is not precisely identical. I
don’t know of any documentation which describes this format. The following
should be read in conjunction with the relevant section of the DWARF standard,
available from http://dwarfstd.org.
The .eh_frame
section is a sequence of records. Each record is either a CIE
(Common Information Entry) or an FDE (Frame Description Entry). In general
there is one CIE per object file, and each CIE is associated with a list of
FDEs. Each FDE is typically associated with a single function. The CIE and the
FDE together describe how to unwind to the caller if the current instruction
pointer is in the range covered by the FDE.
There should be exactly one FDE covering each instruction which may be being
executed when an exception occurs. By default an exception can only occur
during a function call or a throw. When using the -fnon-call-exceptions
gcc
option, an exception can also occur on most memory references and floating
point operations. When using -fasynchronous-unwind-tables
, the FDE will cover
every instruction, to permit unwinding from a signal handler.
The general format of a CIE or FDE starts as follows:
- Length of record. Read 4 bytes. If they are not
0xffffffff
, they are the length of the CIE or FDE record. Otherwise the next 64 bits holds the length, and this is a 64-bit DWARF format. This is like.debug_frame
. - A 4 byte ID. For a CIE this is 0. For an FDE it is the byte offset from this
field to the start of the CIE with which this FDE is associated. The byte
offset goes to the length record of the CIE. A positive value goes backward;
that is, you have to subtract the value of the ID field from the current byte
position to get the CIE position. This differs from
.debug_frame
in that the offset is relative rather than being an offset into the.debug_frame
section.
A CIE record continues as follows:
- 1 byte CIE version. As of this writing this should be 1 or 3.
- NUL terminated augmentation string. This is a sequence of characters. Very old versions of gcc used the string “eh” here, but I won’t document that. This is described further below.
- Code alignment factor, an unsigned LEB128 (LEB128 is a DWARF encoding for
numbers which I won’t describe here). This should always be 1 for
.eh_frame
. - Data alignment factor, a signed LEB128. This is a constant factored out of
offset instructions, as in
.debug_frame
. - The return address register. In CIE version 1 this is a single byte; in CIE version 3 this is an unsigned LEB128. This indicates which column in the frame table represents the return address.
The next fields of the CIE depend on the augmentation string.
- If the augmentation string starts with ‘z’, we now find an unsigned LEB128 which is the length of the augmentation data, rounded up so that the CIE ends on an address boundary. This is used to skip to the end of the augmentation data if an unrecognized augmentation character is seen.
- If the next character in the augmentation string is ‘L’, the next byte in the
CIE is the LSDA (Language Specific Data Area) encoding. This is a
DW_EH_PE_xxx
value (described later). The default isDW_EH_PE_absptr
. - If the next character in the augmentation string is ‘R’, the next byte in the
CIE is the FDE encoding. This is a
DW_EH_PE_xxx
value. The default isDW_EH_PE_absptr
. - The character ‘S’ in the augmentation string means that this CIE represents a stack frame for the invocation of a signal handler. When unwinding the stack, signal stack frames are handled slightly differently: the instruction pointer is assumed to be before the next instruction to execute rather than after it.
- If the next character in the augmentation string is ‘P’, the next byte in the
CIE is the personality encoding, a
DW_EH_PE_xxx
value. This is followed by a pointer to the personality function, encoded using the personality encoding. I’ll describe the personality function some other day.
The remaining bytes are an array of DW_CFA_xxx
opcodes which define the
initial values for the frame table. This is then followed by DW_CFA_nop
padding bytes as required to match the total length of the CIE.
An FDE starts with the length and ID described above, and then continues as follows.
- The starting address to which this FDE applies. This is encoded using the FDE encoding specified by the associated CIE.
- The number of bytes after the start address to which this FDE applies. This is encoded using the FDE encoding.
- If the CIE augmentation string starts with ‘z’, the FDE next has an unsigned LEB128 which is the total size of the FDE augmentation data. This may be used to skip data associated with unrecognized augmentation characters.
- If the CIE does not specify
DW_EH_PE_omit
as the LSDA encoding, the FDE next has a pointer to the LSDA, encoded as specified by the CIE.
The remaining bytes in the FDE are an array of DW_CFA_xxx
opcodes which set
values in the frame table for unwinding to the caller.
The DW_EH_PE_xxx
encodings describe how to encode values in a CIE or FDE. The
basic encoding is as follows:
DW_EH_PE_absptr = 0x00
: An absolute pointer. The size is determined by whether this is a 32-bit or 64-bit address space, and will be 32 or 64 bits.DW_EH_PE_omit = 0xff
: The value is omitted.DW_EH_PE_uleb128 = 0x01
: The value is an unsigned LEB128.DW_EH_PE_udata2 = 0x02
,DW_EH_PE_udata4 = 0x03
,DW_EH_PE_udata8 = 0x04
: The value is stored as unsigned data with the specified number of bytes.DW_EH_PE_signed = 0x08
: A signed number. The size is determined by whether this is a 32-bit or 64-bit address space. I don’t think this ever appears in a CIE or FDE in practice.DW_EH_PE_sleb128 = 0x09
: A signed LEB128. Not used in practice.DW_EH_PE_sdata2 = 0x0a
,DW_EH_PE_sdata4 = 0x0b
,DW_EH_PE_sdata8 = 0x0c
: The value is stored as signed data with the specified number of bytes. Not used in practice.
In addition the above basic encodings, there are modifiers.
DW_EH_PE_pcrel = 0x10
: Value is PC relative.DW_EH_PE_textrel = 0x20
: Value is text relative.DW_EH_PE_datarel = 0x30
: Value is data relative.DW_EH_PE_funcrel = 0x40
: Value is relative to start of function.DW_EH_PE_aligned = 0x50
: Value is aligned: padding bytes are inserted as required to make value be naturally aligned.DW_EH_PE_indirect = 0x80
: This is actually the address of the real value.
If you follow all that, and also read up on .debug_frame
, then you have
enough information to unwind the stack at runtime, e.g. to implement glibc’s
backtrace function. Later I’ll describe the LSDA and the personality function,
which work together to implement exception catching on top of stack unwinding.