-4

Commonly the linker is only invoked once. A linear list of input files can be specified for symbol resolution; there are flags for looping through the linker inputs. But for more sophisticated software composition, it could be beneficial to nest linker invocations.

For that the linker needs to export to a format where not all symbols are resolved yet. Would it be possible for the linker to produce object files? You could also create a static library, but that is like a container of different object files, they are not yet linked to each other. You could also create a dynamic library, but this has everything already prepared for the runtime linker. Is there a way to output the linking result to object files, so that for subsequent linking phases it's as if it had been a translation unit to the compiler?

Of course that wouldn't be useful, if it would have the same effect as linking in one step (besides parallelization and different runtime space behaviour). As the linker's task is to inject symbol references I expect this to be useful for combining some symbols of the input object files so that they are not exposed anymore in the output object file. (This would obviously enable possibility for optimization like inlining that was previously not possible, because the symbol was defined outside of the respective translation unit. Maybe that's possible by combining it with -flto.) This concept could also be used to have nested namespaces without putting them in the language.


For illustration here is some example pseudo-code. In a real case, there would be of course many more symbols that need to stay unaffected by the nested linking. This would preclude using -Bsymbolic and shared objects.

a.c

int a = 0;

static int b = 0;

static foo ()
{
    ...
}

bar ()
{
    ...
    foo ()
    ...
}

b.c

extern int a;

static int b = 5;

static foo ()
{
    ...
}

baz ()
{
    ...
    foo ()
    bar ()
    ...
}

c.c

int a = 0;

static int b = 0;

static foo ()
{
    ...
}

bar ()
{
    ...
    foo ()
    ...
}

d.c

extern int a;

static int b = 5;

static foo ()
{
    ...
}

baz2 ()
{
    ...
    foo ()
    bar ()
    ...
}

They should be build as indicated by the following diagram.

a.c -> a.o  (local: b, foo; global: a, bar)
b.c -> b.o  (local: b, foo; global: a, baz)
a.o b.o -> x.o (local: a, bar; global: baz)

c.c -> c.o  (local: b, foo; global: a, bar)
d.c -> d.o  (local: b, foo; global: a, baz2)
c.o d.o -> y.o (local: a, bar; global: baz2)

x.o y.o -> foo (executable)  (local: baz, baz2)

In the first translation units, both foo and b have internal linkage. After the first linking steps, they should be changed to no linkage so they don't occur in the symbol tables. In the result of the first linker invocation bar and a should have internal linkage, i.e. so they don't occur in the symbol table of x.o. (Can this be specified by a linker script?) This exposes potential for inlining, because they previously had external linkage. Can this happen here?


In most cases a single linking step is sufficient. However in some cases this isn't possible:

  • Having different overlapping, but not identical, linkage for different symbols. This is an extension to the case outlined above. If there are static variables in the original translation unit, but you also want to have some static variables in a larger translation unit, it's not possible to achieve that in a single linking step. You can't just compose the files as larger translation units, because that will conflict with the other static symbols you want to be in smaller translation units. With a single linker step, at most there levels are possible: per-translation-unit/per-library/per-program.

  • Using object files compiled from different languages. They can't have no linkage, because they should be accessible across languages, but you can't compile them as a single translation unit. When they have internal/external linkage, they will conflict with symbols from other translation units. This means that when they are linked together, the symbols could be changed to no linkage and so that they behave as if they have been a single translation unit all along.

The first might indicate, that the project has no clear boundaries, since that symbol linkage needs to overlap or doesn't fit into per-translation-unit/per-library/per-program. The latter is somewhat possible to work around with inline assembly.

Still to me nested linking seams to be useful and obviously technically possible. Is it true that this is uncommon? Why?

Are there papers about this, since it's hardly a novel idea ?

Are there build systems, that have support for this approach?



It was pointed out, that I use the terms linkage and translation unit incorrectly.
Linkage according to Wikipedia:

[...] linkage describes how names can or can not refer to the same entity throughout the whole program or one single translation unit.

In this question the linkage of symbols should be affected by the linking steps on symbol granularity, so that the linkage is larger than a single translation unit, but smaller than the whole program.

Wikipedia again about translation units:

a translation unit (or more casually a compilation unit) is the ultimate input to a C or C++ compiler from which an object file is generated

The linking behaviour is described in comparison to how a single translation would result in the same behaviour. The nested linker invocations aren't needed when it would be possible to have the symbols in a single translation unit in the first place. This question is intended for the cases when that is not possible, e.g. due to overlapping linkage or when using different languages (see above).

I also referred to the GOT and PLT carelessly in the comments. I meant to describe the behaviour of the exposed symbol list, which in case of dynamic linking is the GOT/PLT. I didn't intended to infer out that this question is only about dynamic linking.

19
  • 2
    This is really what static libraries are for. Linking with a static library is equivalent to linking with the individual object files in the library archive. Commented Jul 21 at 16:00
  • 3
    And why is that a problem? Linking with a single static library file, or a single combined object file, there should not be any difference. This smells of an XY problem. Commented Jul 21 at 16:11
  • 1
    In C standardese, "translation" means what you're probably more used to calling "compilation". A "translation unit" is not about the representation of the result as a single file. It is about what source was translated (compiled) as a combined unit. Commented Jul 21 at 20:20
  • 1
    Moreover, in your example, bar and a cannot, in general, be changed to internal linkage because there might be other translation units in the program that refer to them. Commented Jul 21 at 20:25
  • 1
    My edits change nothing, your post is an unclear jumble. Please edit to ask 1 specific researched non-duplicate question. Yes, the compiler doesn't take object files, so what this post says doesn't make sense. I'm done. Commented Jul 23 at 11:05

3 Answers 3

2

This seems like a duplicate of combine two GCC compiled .o object files into a third .o file, right?

The top answer over there says yes, with GNU ld you can use ld --relocatable a.o b.o -o c.o. I can neither confirm nor deny whether that's (still) true, nor whether any linkers besides GNU ld (e.g. lld, mold, gold) support a similar option, but I would guess so, because it seems like a pretty obvious and easy feature to add.

(FYI, I found the duplicate by googling "gnu ld combine .o file"; it's the top hit.)

Sign up to request clarification or add additional context in comments.

7 Comments

Guess I suck at googling then, I only found things about combining C++ and C.
Can that optimse and inline symbols?
AFAIK, GNU ld is not an optimizing linker: it doesn't have any notion of "performing inlining [or outlining] optimization" at the linker level. But I see no fundamental incompatibility between being an optimizing linker and this "produce a .o file as output" feature.
Not sure if it's the same but most compilers support LTO: stackoverflow.com/q/23736507/5382650
I don't mean real optimsation, yes that's possible with LTO. I mean bypassing the PLT/GOT, because the symbol now isn't exported.
ld -r is a venerable option — it was in 7th Edition Unix back in the late 70s. It is probably available still on most Unix-like platforms. POSIX doesn't define the behaviour of the ld command, so that isn't a help.
“This seems like a duplicate of …” - So flag it as a duplicate. Don’t answer a duplicate with an answer that mentions it’s a duplicate.
1

Can the linker produce object files?

I assume you are talking about the GNU/Linux linker , ld, in use with a GCC or Clang Linux toolchain.

Yes, it can produce object files, per the manual:

-r
--relocatable
    Generate relocatable output—i.e., generate an output file that can in turn serve as input to ld. 
    ...

and as illustrated by the top answer to @Quuxplusone's proposed duplicate.

But that's not what you really want to know. You really want to know if a relocatable (-r) linkage can optimise external references between object files a.o and b.o into local references in a combined object file c.o "with potential inlining so that they don't occur in the symbol table of c.o ." And this question seems to harbour some confusion.

To enable link-time optimisations you must pass -flto to the link command. And before that you must also pass -flto to the compile commands, prompting the compiler to emit object files in an intermediate representation enabling link-time optimisations to be applied when such object files are linked into an output file. A relocatable linkage per se does enact any link-time optimisation.

Localisation of a function - converting its binding from strongly or weakly global to local - does not imply inlining it, and inlining it does require it to local. Being local and being inline are independent function attributes.

It seems you are not really interested in inlining but in whether a relocatable linkage combining a.o and b.o into c.o will localise the external references and definitions in either of these files wherever a reference in one of the files has a definition in the other one, as per your example function bar and integer a.

It won't. A relocatable linkage of a.o and b.o into c.o outputs a c.o that defines just the same global symbols, with just the same definitions, as a.o + b.o. It would not be fit for purpose if didn't.

The linker will only ever localise an input global definition when it is assured that the output definition is excused from reference in any subsequent linkage, including runtime linkage by the dynamic linker. There are only a few such scenarios, arising from the use of the -fvisibility=hidden compile option, or of the __attribute__((visibility("hidden"))) declaration qualifier in-source, or the whole-program optimisation options -flto or --whole-program. All of them result in the localisation of definitions in a linker output file that is a program or shared library, not an object file.

If you wish to produce an object file in which you localise a global definition compiled in another object file, then you must resort to objcopy, as per:

$ objcopy --localize-symbol=foo in.o out.o

For the objective that interests you - localising a symbol within an object file that combines a set of original object files, one of which globally defines the symbol while the rest make references to that definition - you need to perform a relocatable linkage of the original object files and then apply objcopy --localize-symbol=foo ... to the output file of that linkage to produce another one in which foo is localised. Here is a illustration inspired by your psuedo-code.

Source files:

$ tail -n +1 *.c
==> a.c <==
#include <stdio.h>

int a = 0;

static int b = 0;

static void foo(void)
{
    printf("%d\n",a);
}


void bar(void)
{
    foo();
    printf("%d\n",b);
}

==> b.c <==
#include <stdio.h>
extern void bar(void);

extern int a;

static int b = 5;

static inline void foo(void)
{
    printf("%d\n",b);
}

void baz(void)
{
    foo();
    bar();
}

==> main.c <==
extern void baz(void);
extern void bar(void);

int main(void)
{
    baz();
    bar();
    return 0;
}

==> other_bar.c <==
#include <stdio.h>

int a = 42;

void bar(void)
{
    printf("a = %d in the other %s\n",a,__func__);
}
    

Compile the 2 source files in which we want to localise definitions of bar and a:

$ gcc -c a.c b.c

Relocatable link:

$ gcc -r -o ab.o a.o b.o

Localise bar and a:

$ objcopy --localize-symbol=bar --localize-symbol=a ab.o c.o

Compile the other source files:

$ gcc -c main.c other_bar.c

And link this program:

$ gcc -o prog main.o other_bar.o c.o

whose symbol table now contains local definitions of bar and a from c.o

$ readelf -sW prog | grep -P 'LOCAL.*(\sa|\sbar)'
    14: 0000000000000000     0 FILE    LOCAL  DEFAULT  ABS a.c
    20: 000000000000401c     4 OBJECT  LOCAL  DEFAULT   26 a
    21: 00000000000011b7    44 FUNC    LOCAL  DEFAULT   16 bar
    

as well as global definitions from other_bar.o:

$ readelf -sW prog | grep -P 'GLOBAL.*(\sa|\sbar)'
    33: 0000000000001162    46 FUNC    GLOBAL DEFAULT   16 bar
    42: 0000000000004010     4 OBJECT  GLOBAL DEFAULT   25 a
    

and runs:

$ ./prog
5
0
0
a = 42 in the other bar

But this is unnecessary

Editing of binaries between compilation and linkage - with objcopy or other binary editors - is to be shunned except as a last resort, and it is not the last resort when as in your case you just want to prempt multiple definition errors in a linkage where some function is called that in turn externally references a definition of foo and some other function is or may be called that exernally references a different definition of foo. You can achieve this without binary editing by using regular dynamic linkage together with the linker option Bsymbolic

-Bsymbolic
    When creating a shared library, bind references to global symbols to the 
    definition within the shared library, if any. Normally, it is possible for 
    a program linked against a shared library to override the definition within 
    the shared library. This option is only meaningful on ELF platforms which 
    support shared libraries.

Like so:

Make shared library libbaz.so

$ gcc -c -fPIC a.c b.c
$ gcc -shared -o libbaz.so a.o b.o -Wl,-Bsymbolic

Compile the other source files:

$ gcc -c main.c other_bar.c

Link the program:

$ gcc -o prog main.o other_bar.o -L. -lbaz -Wl,-rpath='$ORIGIN'

which suceeds, and runs identically:

$ ./prog
5
0
0
a = 42 in the other bar

The call to baz() in the program is resolved to the definition in libbaz.so, which calls its internal definition of bar and references its internal definition of a. The call to bar() in the program is resolved to the definition in other_bar.o and references that file's definition of a.

4 Comments

Thanks, that's helpful. Your proposed alternative has the drawback, that it only works with dynamic linking, which isn't always intended. Also I think -Bsymbolic works per linking unit, not per symbol so it wouldn't really work. This was a minimal example, of course there are a lot of other symbols, which shouldn't be affected.
Your comments are correct. For control of a symbol's visibility outside a particular linkage unit on a per-symbol basis, you can use the apparatus of -fvisibility=[default|hidden] together with __attribute__((visibility("hidden|default"))), but as with -Bsymbolic the linkage unit must be a shared library. This dynamic visibility apparatus is what the toolchain offers you to compartmentalise linkage namespaces. For static linkages, if you don't like the global namespace you have to hack it (objcopy) or change the source.
Does that visibility attribute has any meaning after compiling? I.e. can it be changed after the compilation, or is that basically what objcopyis used for?
-fvisibility=[...] is an option for the compiler, not linker, and there is no linker or objcopy option to change just the dynamic visibility of a compiler-generated symbol.
0

You can create a static library, but that is like a container of different object files, they are not yet linked to each other.

Yes, and in fact, creating static libraries is not even a function of the linker.

You can also create a dynamic library, but this has everything already prepared for the runtime linker.

Yes, and this is a useful thing to do.

Is there a way to output the linking result of to object files as if it would have been a translation unit to the compiler.

Note well that "translation unit" is a compile-time concept with implications for the scope and linkage of some identifiers. And this "linkage" is a language-level concept. Although related to the operation of your toolchain's linker, it is not something that the linker can manipulate. Often compilation will produce an object file that corresponds to a particular translation unit, but such an object file is not itself a translation unit.

Some linkers can combine multiple relocatable objects [files] to produce another relocatable object [file]. GNU ld can do this, for example, and its docs refer to it as "partial linking". But those linkers that can do this are among the ones producing object file formats that are not directly tied to the language level scope and linkage properties. Partial linking does not change which symbol references will resolve to which symbol definitions. For most intents and purposes, it is equivalent to creating a static library.

The use-case would be that you have two translation units potentially from different languages, with all their isolation of local linkage, optimized for themself, and then link them together potentially also optimizing across the original separate translation units and taking into account symbols that have now local linkage.

I've never heard the term "linkage" used that way before, probably because it conflicts with the related, but distinct, usage of the same term at the C (and C++) language level. And perhaps that's part of your issue. Your "linkage" has to do with the contents of relocatable object files and shared objects, including both shared libraries and dynamic executables, not, directly, with translation units. From comments, your idea is to improve function-call performance by having direct calls instead of calls mediated by a GOT or PLT, but

  1. that's a way premature optimization, and
  2. what you've asked for is not necessary for producing it.

We'll suppose that your ultimate build target is a shared libary, a dynamically linked executable, or both, because if you're targeting statically linked anything then dynamic symbol resolution via a GOT / PLT is not relevant anyway. When you build a shared object, it is the static linker that creates its GOT. The linker has everything it needs to link references between the contributing object files exactly the same way it would do if it were creating a relocatable object (as you asked) instead, so creating a relocatable object separately does not gain you anything. At least not anything serving your stated purpose.

  • Having different name scope (linkage) for different symbols. As I wrote, like if you have static variables for some translation unit first and then compile to an object file, but want to have a larger translation unit for another static variable. But you can't just put the files into larger translation units, because that will conflict with the other static symbols you want smaller translation units for.

Linkers operate on object files, not translation units. Generally speaking, it is among their key objectives to implement the language-level semantics associated with the contents of the objects they work with, and to avoid altering those semantics. The programmer determines the scope and (language-level) linkage of various identifiers according to their needs. The linker works with that; it does not redefine these after the fact.

  • Using object files compiled from different languages. They can't have local linkage, because you can't compile them as a single translation unit, but when they should be accessible across languages. Then when they are linked together, the symbols could be changed to local linkage and then linked as if they have been a single translation unit all along.

The contents of different relocatable objects used to build one shared object absolutely can have what you are calling "local linkage" relative to each other, meaning that references from one to another are resolved at static link time, this isn't really a question either way of whether (external) symbols definitions and references appear in the same object file.

1 Comment

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.