Identifiers with leading underscore defined in a linker script

Question

Per the C standard section 7.1.3 Reserved identifiers

All identifiers that begin with an underscore are always reserved for use as identifiers with file scope in both the ordinary and tag name spaces.

and

If the program declares or defines an identifier in a context in which it is reserved (other than as allowed by 7.1.4), or defines a reserved identifier as a macro name, the behavior is undefined.

I understand that the referenced 7.1.4 is for the note 187 stating:

Because external identifiers and some macro names beginning with an underscore are reserved, implementations may provide special semantics for such names. For example, the identifier _BUILTIN_abs could be used to indicate generation of in-line code for the abs function. Thus, the appropriate header could specify
#define abs(x) _BUILTIN_abs(x)

Now, the common pattern (as shown in many reputable online resources, including numerous SO questions/answers) to define linker script symbols with leading underscores such as

  .bss :
  {
    . = ALIGN(4);
    _bss_start = .;
    *(.bss)
    *(.bss*)
    . = ALIGN(4);
    _bss_end = .;
  } >RAM

and then refer to these symbols in the C code by declaring like

extern char _bss_start[];
extern char _bss_end[];

or similar. From the cited standard excerpts I gather that these declarations are undefined behavior. Is that correct or I am misreading something? Is there any proper naming convention for linker exported symbols that do not violate the C standard (I understand that we can simply use the proper C naming, but apparently there is some reason people insist on the special naming)?

Using code with undefined behavior does not violate the C standard. It is undefined by the standard, not forbidden by the standard. If your C implementation (including the linker) defines what happens, then it is defined behavior in your C implementation. — Eric Postpischil
– Eric Postpischil, Commented Jun 10, 2024 at 14:17
The entire concept of a linker script is outside the scope of the C standard, but supposing that you have an implementation that documents the external symbols _bss_start and _bss_end as usable by application code, then using them makes your program have implementation-defined behavior, not undefined behavior. However any such usage becomes undefined behavior if the application is moved to an implementation that doesn't provide and document those symbols. — zwol
– zwol, Commented Jun 10, 2024 at 14:19
@zwol Does it mean that linker script is an acceptable "extension" to the implementation when it comes to the mainstream compilers (as in gcc and friends) ? For instance if I have a static analyzer complaining about this code, I can rightfully consider it as a false positive? — Eugene Sh.
– Eugene Sh., Commented Jun 10, 2024 at 14:22
@EricPostpischil Understood. But does the code above constitute UB as per the standard, or the linker script is considered a part of "implementation" ? — Eugene Sh.
– Eugene Sh., Commented Jun 10, 2024 at 14:24
You're being much more pedantic than I think is appropriate for this conversation. Do you agree that if the symbols defined by the Fortran object file aren't in the C application namespace then the program-as-a-whole definitely does have undefined behavior per the C standard? Note that "does have undefined behavior per the C standard" is, in my view, a significantly stronger statement than "the behavior of the program is not defined by the C standard alone". — zwol
– zwol, Commented Jun 10, 2024 at 16:37

Lundin · Accepted Answer · 2024-06-28 13:02:11Z

The reason why leading undercore is reserved is because it might be used by the "implementation" - that is, the compiler + libraries.

The part of the code using extern char _bss_start[]; etc would be the "CRT" start-up code, which is often considered part of the implementation, particularly if delivered together with the compiler. Then it need not be written in conforming C - in fact it is often written at least partially in assembler.

In order to have a conforming C compiler which can create conforming C programs, then all objects of static storage duration need to be initialized at a point before main() is called (see C17 5.1.2). Which in turn means that there is need for a CRT to do that, particularly on ROM-based computers like microcontrollers. And in order for the CRT to perform static initialization in this case, it needs access to the mentioned symbols. But the CRT might as well be written in assembler and using some manner of "import" command.

Other compilers/linkers which are not of the gcc flavour sometimes solves this by having a name such as bss as some manner of pre-defined symbol which is accessible to the CRT without treating it as a declared variable with a certain linkage.

No matter the compiler, I see no apparent need to name the segment starting with an underscore unless this is intentionally meant to be used by the implementation alone.

Collectives™ on Stack Overflow

Identifiers with leading underscore defined in a linker script

1 Answer 1

Comments

Your Answer

Hot Network Questions

Collectives™ on Stack Overflow

1 Answer 1

Comments

Your Answer

Sign up or log in

Post as a guest

Related