Commonly the linker is only invoked once. A linear list of input files can be specified for symbol resolution; there are flags for looping through the linker inputs. But for more sophisticated software composition, it could be beneficial to nest linker invocations.
For that the linker needs to export to a format where not all symbols are resolved yet. Would it be possible for the linker to produce object files? You could also create a static library, but that is like a container of different object files, they are not yet linked to each other. You could also create a dynamic library, but this has everything already prepared for the runtime linker. Is there a way to output the linking result to object files, so that for subsequent linking phases it's as if it had been a translation unit to the compiler?
Of course that wouldn't be useful, if it would have the same effect as linking in one step (besides parallelization and different runtime space behaviour). As the linker's task is to inject symbol references I expect this to be useful for combining some symbols of the input object files so that they are not exposed anymore in the output object file. (This would obviously enable possibility for optimization like inlining that was previously not possible, because the symbol was defined outside of the respective translation unit. Maybe that's possible by combining it with -flto.) This concept could also be used to have nested namespaces without putting them in the language.
For illustration here is some example pseudo-code. In a real case, there would be of course many more symbols that need to stay unaffected by the nested linking. This would preclude using -Bsymbolic and shared objects.
a.c
int a = 0;
static int b = 0;
static foo ()
{
...
}
bar ()
{
...
foo ()
...
}
b.c
extern int a;
static int b = 5;
static foo ()
{
...
}
baz ()
{
...
foo ()
bar ()
...
}
c.c
int a = 0;
static int b = 0;
static foo ()
{
...
}
bar ()
{
...
foo ()
...
}
d.c
extern int a;
static int b = 5;
static foo ()
{
...
}
baz2 ()
{
...
foo ()
bar ()
...
}
They should be build as indicated by the following diagram.
a.c -> a.o (local: b, foo; global: a, bar)
b.c -> b.o (local: b, foo; global: a, baz)
a.o b.o -> x.o (local: a, bar; global: baz)
c.c -> c.o (local: b, foo; global: a, bar)
d.c -> d.o (local: b, foo; global: a, baz2)
c.o d.o -> y.o (local: a, bar; global: baz2)
x.o y.o -> foo (executable) (local: baz, baz2)
In the first translation units, both foo and b have internal linkage. After the first linking steps, they should be changed to no linkage so they don't occur in the symbol tables. In the result of the first linker invocation bar and a should have internal linkage, i.e. so they don't occur in the symbol table of x.o. (Can this be specified by a linker script?) This exposes potential for inlining, because they previously had external linkage. Can this happen here?
In most cases a single linking step is sufficient. However in some cases this isn't possible:
Having different overlapping, but not identical, linkage for different symbols. This is an extension to the case outlined above. If there are static variables in the original translation unit, but you also want to have some static variables in a larger translation unit, it's not possible to achieve that in a single linking step. You can't just compose the files as larger translation units, because that will conflict with the other static symbols you want to be in smaller translation units. With a single linker step, at most there levels are possible: per-translation-unit/per-library/per-program.
Using object files compiled from different languages. They can't have no linkage, because they should be accessible across languages, but you can't compile them as a single translation unit. When they have internal/external linkage, they will conflict with symbols from other translation units. This means that when they are linked together, the symbols could be changed to no linkage and so that they behave as if they have been a single translation unit all along.
The first might indicate, that the project has no clear boundaries, since that symbol linkage needs to overlap or doesn't fit into per-translation-unit/per-library/per-program. The latter is somewhat possible to work around with inline assembly.
Still to me nested linking seams to be useful and obviously technically possible. Is it true that this is uncommon? Why?
Are there papers about this, since it's hardly a novel idea ?
Are there build systems, that have support for this approach?
It was pointed out, that I use the terms linkage and translation unit incorrectly.
Linkage according to Wikipedia:
[...] linkage describes how names can or can not refer to the same entity throughout the whole program or one single translation unit.
In this question the linkage of symbols should be affected by the linking steps on symbol granularity, so that the linkage is larger than a single translation unit, but smaller than the whole program.
Wikipedia again about translation units:
a translation unit (or more casually a compilation unit) is the ultimate input to a C or C++ compiler from which an object file is generated
The linking behaviour is described in comparison to how a single translation would result in the same behaviour. The nested linker invocations aren't needed when it would be possible to have the symbols in a single translation unit in the first place. This question is intended for the cases when that is not possible, e.g. due to overlapping linkage or when using different languages (see above).
I also referred to the GOT and PLT carelessly in the comments. I meant to describe the behaviour of the exposed symbol list, which in case of dynamic linking is the GOT/PLT. I didn't intended to infer out that this question is only about dynamic linking.
barandacannot, in general, be changed to internal linkage because there might be other translation units in the program that refer to them.