Linkers part 15

COMDAT sections

In C++ there are several constructs which do not clearly live in a single place. Examples are inline functions defined in a header file, virtual tables, and typeinfo objects. There must be only a single instance of each of these constructs in the final linked program (actually we could probably get away with multiple copies of a virtual table, but the others must be unique since it is possible to take their address). Unfortunately, there is not necessarily a single object file in which they should be generated. These types of constructs are sometimes described as having vague linkage.

Linkers implement these features by using COMDAT sections (there may be other approaches, but this is the only I know of). COMDAT sections are a special type of section. Each COMDAT section has a special string. When the linker sees multiple COMDAT sections with the same special string, it will only keep one of them.

For example, when the C++ compiler sees an inline function f1 defined in a header file, but the compiler is unable to inline the function in all uses (perhaps because something takes the address of the function), the compiler will emit f1 in a COMDAT section associated with the string f1. After the linker sees a COMDAT section f1, it will discard all subsequent f1 COMDAT sections.

This obviously raises the possibility that there will be two entirely different inline functions named f1, defined in different header files. This would be an invalid C++ program, violating the One Definition Rule (often abbreviated ODR). Unfortunately, if no source file included both header files, the compiler would be unable to diagnose the error. And, unfortunately, the linker would simply discard the duplicate COMDAT sections, and would not notice the error either. This is an area where some improvements are needed (at least in the GNU tools; I don’t know whether any other tools diagnose this error correctly).

The Microsoft PE object file format provides COMDAT sections. These sections can be marked so that duplicate COMDAT sections which do not have identical contents cause an error. That is not as helpful as it seems, as different compiler options may cause valid duplicates to have different contents. The string associated with a COMDAT section is stored in the symbol table.

Before I learned about the Microsoft PE format, I introduced a different type of COMDAT sections into the GNU ELF linker, following a suggestion from Jason Merrill. Any section whose name starts with “.gnu.linkonce.” is a COMDAT section. The associated string is simply the section name itself. Thus the inline function f1 would be put into the section “.gnu.linkonce.f1”. This simple implementation works well enough, but it has a flaw in that some functions require data in multiple sections; e.g., the instructions may be in one section and associated static data may be in another section. Since different instances of the inline function may be compiled differently, the linker can not reliably and consistently discard duplicate data (I don’t know how the Microsoft linker handles this problem).

Recent versions of ELF introduce section groups. These implement an officially sanctioned version of COMDAT in ELF, and avoid the problem of “.gnu.linkonce” sections. I described these briefly in an earlier blog entry. A special section of type SHT_GROUP contains a list of section indices in the group. The group is retained or discarded as a whole. The string associated with the group is found in the symbol table. Putting the string in the symbol table makes it awkward to retrieve, but since the string is generally the name of a symbol it means that the string only needs to be stored once in the object file; this is a minor optimization for C++ in which symbol names may be very long.

More tomorrow.