Linkers part 15
COMDAT sections
In C++ there are several constructs which do not clearly live in a single place. Examples are inline functions defined in a header file, virtual tables, and typeinfo objects. There must be only a single instance of each of these constructs in the final linked program (actually we could probably get away with multiple copies of a virtual table, but the others must be unique since it is possible to take their address). Unfortunately, there is not necessarily a single object file in which they should be generated. These types of constructs are sometimes described as having vague linkage.
Linkers implement these features by using COMDAT sections (there may be other approaches, but this is the only I know of). COMDAT sections are a special type of section. Each COMDAT section has a special string. When the linker sees multiple COMDAT sections with the same special string, it will only keep one of them.
For example, when the C++ compiler sees an inline function f1
defined in a
header file, but the compiler is unable to inline the function in all uses
(perhaps because something takes the address of the function), the compiler
will emit f1
in a COMDAT section associated with the string f1
. After the
linker sees a COMDAT section f1
, it will discard all subsequent f1
COMDAT
sections.
This obviously raises the possibility that there will be two entirely different
inline functions named f1
, defined in different header files. This would be
an invalid C++ program, violating the One Definition Rule (often abbreviated
ODR). Unfortunately, if no source file included both header files, the
compiler would be unable to diagnose the error. And, unfortunately, the linker
would simply discard the duplicate COMDAT sections, and would not notice the
error either. This is an area where some improvements are needed (at least in
the GNU tools; I don’t know whether any other tools diagnose this error
correctly).
The Microsoft PE object file format provides COMDAT sections. These sections can be marked so that duplicate COMDAT sections which do not have identical contents cause an error. That is not as helpful as it seems, as different compiler options may cause valid duplicates to have different contents. The string associated with a COMDAT section is stored in the symbol table.
Before I learned about the Microsoft PE format, I introduced a different type
of COMDAT sections into the GNU ELF linker, following a suggestion from Jason
Merrill. Any section whose name starts with “.gnu.linkonce.” is a COMDAT
section. The associated string is simply the section name itself. Thus the
inline function f1
would be put into the section “.gnu.linkonce.f1”. This
simple implementation works well enough, but it has a flaw in that some
functions require data in multiple sections; e.g., the instructions may be in
one section and associated static data may be in another section. Since
different instances of the inline function may be compiled differently, the
linker can not reliably and consistently discard duplicate data (I don’t know
how the Microsoft linker handles this problem).
Recent versions of ELF introduce section groups. These implement an officially
sanctioned version of COMDAT in ELF, and avoid the problem of “.gnu.linkonce”
sections. I described these briefly in an earlier blog entry. A special section
of type SHT_GROUP
contains a list of section indices in the group. The group
is retained or discarded as a whole. The string associated with the group is
found in the symbol table. Putting the string in the symbol table makes it
awkward to retrieve, but since the string is generally the name of a symbol it
means that the string only needs to be stored once in the object file; this is
a minor optimization for C++ in which symbol names may be very long.
More tomorrow.