Is the C compiler allowed to emit spaghetti assembly for the sake of optimization? How can I get the compiler to perform such an optimization? [duplicate]

Question

So I have the following code:

float param1 = SOME_VALUE;  
switch (State)
{
    case A:
    {
        foo(param1);
        statement1;
        break;
    }
    case B:
    {
        bar();
        statement2;
        break;
    }
    case C:
    {      
        float param2 = OTHER_VALUE;
        switch (Expression)
        {
            case D:
            {
                baz(param2);
                break;
            }
            case E:
            {
                foo(param2);
                statement1;
                break;
            }
            case F:
            {
                bar();
                statement2;
                break;
            }
        }
        break;
    }
}

SOME_VALUE and OTHER_VALUE are not constants, just to illustrate that parameters are being set every time this code is executed.

Is the compiler allowed to relocate case A to a temporary label between the end of case D and the beginning of case E with the following code:

param2 = param1

that falls through into case E? Since case A and case E effectively do the same thing. Setting param2 to a different value has no effect outside of its usage with function foo();

The code in the compiler's intermediate representation would then look like this:

float param1 = SOME_VALUE;  
switch (State)
{
    case A:
    {
        goto NewA;
    }
    case B:
    {
        bar();
        statement2;
        break;
    }
    case C:
    {      
        float param2 = OTHER_VALUE;
        switch (Expression)
        {
            case D:
            {
                baz(param2);
                break;
            }
            NewA:
            {
                param2 = param1;
            }
            /* FALL-THROUGH */
            case E:
            {
                foo(param2);
                statement1;
                break;
            }
            case F:
            {
                bar();
                statement2;
                break;
            }
        }
        break;
    }
}

This is effectively creating spaghetti code in the emitted assembly, but the functionality is the same.

I am building with -O3 and -flto with GCC 14.3. The ability to debug this code is already forgone.

Right now, bar() and baz() are being inlined by -flto but foo() is not, presumably because the compiler is emitting two function calls in this construct. I would like foo() to be called once, then subsequently inlined.

I can force the compiler to inline foo() using __attribute__((always_inline)) but then it inlines the function at both call sites, which significantly increases code size because it exists at both call sites, i.e., compiler is not performing the optimization I described.

I am working on a hard real-time embedded system where every cycle counts, force in-lining saves about a dozen cycles. The application is high frequency control for a power converter.

EDIT: I suppose my question now has a second component: How do I get the compiler to perform this optimization without manually writing spaghetti code? Are there optimization flags I am missing? There are additional independent statements, i.e., does not affect any function call or variable, in each case block, but those statements are identical for case A and case E as indicated in the code blocks; is the optimizer not sophisticated enough to perform this redundant code factoring / optimization? Or is it because redundant code factoring and function in-lining are performed as separate passes, so the optimizer is unable to find this optimization because either one on its own is not enough to be worthwhile? If optimized in the way I described above, it would improve execution speed without increasing code size.

Also, in response to @chqrlie's answer. foo() is used in the same way for both case A and case E. foo() is a fairly large function at 4KB. I cannot combine case A and case E because the usage of foo() in case E requires the assignment of param2.

An alternative would be to shove all of case C into case A and have a separate if block handle case C and its branch cases D, and F, with E falling through to case A's behavior, but that would arguably look worse.

Side note: case B and case F are also identical in behavior, but I am not as concerned with optimizing that as it is not part of the critical path. The compiler is currently not factoring and optimizing that either.

Finally, one option is simply not spend time fretting over every last cycle. I've already optimized the critical path down to <1200 cycles, so another 12 cycles is only a 1% improvement. My original concern was building enough cycle margin so that the inevitable scope creep doesn't slow the critical path down to a point where it no longer hits computational targets; this has happened a few times on previous projects I've worked on.

The 'as-if' rule lets it do anything it likes as long as the target program behaves correctly. Are you going to keep reposting this question? Please don't. — user207421
– user207421, Commented Nov 24 at 5:26
Yes it CAN, but it doesn't mean it WOULD(and we can't check it with your C-like pseudo codes). Use tools like godbolt.org to check how your codes get optimized by the compiler. — CDnX
– CDnX, Commented Nov 24 at 5:59
It depends on the type and scope of param2, so without that variable declaration visible, the question cannot be answered. — Lundin
– Lundin, Commented Nov 24 at 7:42
@Lundin No, it doesn't depend on the type and scope of param2, and the question has already been answered. — user207421
– user207421, Commented Nov 25 at 7:04
@user207421 Because you say so or because of any actual arguments? It does depend on the type and scope, because if param2 is declared at file scope then any write to it is a "needed side effect", which would block optimizations as per C23 5.2.2.4. If it is a local variable however, then side effects may be optimized out as long as it does not affect any code below in that same function. The type may also matter for aliasing-related optimizations, especially if there are pointers. — Lundin
– Lundin, Commented Nov 25 at 7:27

chqrlie · Accepted Answer · 2025-11-24 07:09:35Z

4

The compiler is allowed to generate any assembly code that implements the behavior specified by the C Standard.

In this case, it is a palatable optimisation to factor the call to foo(). The overhead is minimal: one extra variable assignment and one jump, which is probably not even effective. The downside is indeed that it makes debugging less precise as you can no longer distinguish the calls to foo() to break on a specific one.

You mention that inlining foo() significantly increases code size, so it is likely the call overhead is small compared to the actual execution time inside foo().

To optimize both code size and execution time, you can try and split the foo() function into an inline version that handles the most common cases and calls another function (not inlined) for the more complex but likely less common cases.

answered Nov 24 at 7:09

chqrlie

152k12 gold badges145 silver badges231 bronze badges

Sign up to request clarification or add additional context in comments.

Collectives™ on Stack Overflow

Is the C compiler allowed to emit spaghetti assembly for the sake of optimization? How can I get the compiler to perform such an optimization? [duplicate]

1 Answer 1

Comments

Linked

Hot Network Questions

Collectives™ on Stack Overflow

1 Answer 1

Comments

Linked

Related