6

Our school tasks us to reproduce the atoi function. Many students (I included) do it some way that causes an overflow in case of INT_MIN, which wraps around neatly so the function still works. We could say it's a "controlled" overflow.

However, I'm reading that signed overflow in C is "undefined" in principle (not sure what that means). As such, I'm not sure that there can ever be a "controlled" signed overflow. Maybe our function could break in extreme cases?

But which extreme cases? I know of -ftrapv and -fsanitize=signed-integer-overflow, but as of now I couldn't explain or justify why anyone would realistically ever compile a project with these flags for any other reason than to explicitly show bad practice.

I'm looking for any realistic applications of any flags or any potential cases where our functions would break horribly and crash our entire projects if it were ever included in a library and used. Or... maybe it would be fine?

I'm referring to https://www.gnu.org/software/c-intro-and-ref/manual/html_node/Signed-Overflow.html which says "For signed integers, the result of overflow in C is in principle undefined, meaning that anything whatsoever could happen."

Here is an example of the function that overflows:

int my_atoi(const char *s)
{
    int neg;
    int res;
    size_t  i;

    i = 0;
    res = 0;
    neg = 1;
    while (s[i] == ' ' || (s[i] >= 9 && s[i] <= 13))
        i++;
    if (s[i] == '+' || s[i] == '-')
    {
        if (s[i] == '-')
            neg = -1;
        i++;
    }
    while (s[i] >= '0' && s[i] <= '9')
    {
        res = res * 10 + s[i] - '0';
        i++;
    }
    return (res * neg);
}

Compiling with fsanitize=signed-integer-overflow shows:

my_atoi.c:20:18: runtime error: signed integer overflow: 2147483640 + 56 cannot be represented in type 'int'
my_atoi.c:23:14: runtime error: signed integer overflow: -2147483648 * -1 cannot be represented in type 'int'

It can be fixed easily by moving the negative multiplication up three lines

res = res * 10 + neg * (s[i] - '0');

10
  • 1
    Not sure I understand the question. If the function indeed overflows a signed int it invokes UB (undefined behavior) which should [always] be avoided. You already mentioned the way to avoid it, so what are you actually missing ? Commented Oct 8 at 11:08
  • 3
    There is no "controlled" signed integer overflow in standard C. Undefined behaviour means that anything could happen. Always avoid it, it is incorrect to allow signed integers to wrap. You can emulate signed integer overflow safely by checking the arguments to an operation for whether it would cause an overflow, and then generating the equivalent answer. "I couldn't explain or justify why [...] explicitly show bad practice" it's not just bad practice, it is incorrect. These flags help catch incorrect code. Commented Oct 8 at 11:09
  • Our assignment tells us explicitly to ignore cases where the char * would contain numbers too big for an int. But it doesn't really define what we should or shouldn't do in case of INT_MIN. As such I assume we have to handle it as cleanly as possible, as it fits within a signed int. I know how to do it cleanly, yes, but what I'm missing is a way to show other students WHY you really really shouldn't overflow a signed int if you have no good reason to/if it's not a case you're ignoring on purpose as per the assignment. Is there a way to show that, or can I only ever just tell them "it's bad"? Commented Oct 8 at 11:21
  • 1
    If you use a larger type, you can work with defined behaviour, not in the hope that it will 'work'. The C standard does not mention what should happen in the case of overflow by input error, although MSVC's version returns 0. If you just use a larger type to build the number, you can also detect overflow, so why not do that? And if so, return 0 so that the behaviour is at least consistent. I am not a believer in throwing one's hands in the air and allowing rubbish to occur, but something reasonable. I would use the larger type anyway. Commented Oct 8 at 11:59
  • 2
    C23 has macros declared by #include <stdckdint.h> for checked integer addition, subtraction, and multiplication that seem similar to your "controlled" operations. Basically, they do what most implementations do in overflow situations (wrap around the result value), but also return a value to indicate whether there was an overflow or not. Commented Oct 8 at 15:59

5 Answers 5

8

The "why" mainly originates around signed integer representations that are not twos complement, such as sign-magnitude and ones complement.

On architectures that use these representations, it is possible that signed integer overflow can cause a hardware trap. Because of this, the C standard made signed integer overflow undefined behavior. The odds of finding such a system these days is pretty small however, so much so that starting with the C23 standard twos complement representation is the standard, although signed integer overflow remains undefined behavior.

Another thing about undefined behavior is that modern optimizing compilers will assume undefined behavior doesn't happen and perform optimizations based on that assumption. So it's possible that the compiler can generate code that would cause signed integer overflow to behave in a way that differs from unsigned "overflow" (which is actually a well-defined wraparound).

If you want to ensure that signed integers wraparound similarly to unsigned integers, GCC has the -fwrapv option which explicitly sets this behavior. Then you won't have any unexpected results such as the example in Lundin's answer.

Sign up to request clarification or add additional context in comments.

5 Comments

I'm not sure I understood everything but at least I learned about different signed integer representations! I can't find any optimization flag for gcc that breaks the overflowing function (for INT_MIN at least). I take it that this signed overflow thing is a very minor problem that likely won't ever cause trouble for our beginner or even intermediate-level projects?
@m0d1nst4ll3r if you want to write robust code, then start out in the way you will continue throughout your programming career, and not make value judgements about whether poor code should be 'ok'.
"On architectures that use these representations, it is possible that signed integer overflow can cause a hardware trap." --> still even with 2's complement, C (and C23) allows overflow to exit code as it is UB.
I'd argue that C23 should have defined signed wraparound.
I am certain defined signed wraparound was considered, yet that would defeated various optimizations and oblige compiler changes/verifications. Without strong agreement for defined signed wraparound, overflow remains UB.
5

Mainly the problem is this: if you add two positive signed numbers and there is an overflow, then a 2's complement CPU would wrap around the number to a negative value. C used to support various other exotic signedness formats however, that behaved differently. Therefore overflow (and underflow) was categorized as undefined behavior meaning anything can happen. Which in turn means that the compiler is free to make all manner of assumptions, like for example "addition of two positive numbers can never result in a negative number".

Take this as an example of how undefined behavior causes bugs:

#include <stdio.h>
#include <limits.h>
#include <stdlib.h>

int main (int argc, char* argv[])
{
  int a = atoi(argv[1]);
  int b = atoi(argv[2]);
  if(a < 0 || b < 0)
    return 0;

  int n = a + b;

  if(n>0)
    printf("%d is > 0\n", n);
}

With 32 bit int, I pass the command line arguments 2147483647 1. (The reason why I'm doing it with command line is to block various compile-time optimizations.) Then compile with gcc -O3.

The compiler concludes:

  • At the point of addition, a and b are both positive or we wouldn't have gotten there.
  • Thus n>0 is always true so that comparison can be removed.

And then the program output is: -2147483648 is > 0 - the printf statement got executed even though it shouldn't be possible.

Now if I remove optimizations, that doesn't happen, because the compiler then includes the comparison.


As for what can be done about it - if you don't use signed numbers but unsigned, then C does no longer call that overflow but wrap-around. Meaning unsigned numbers always behave well-defined since they wrap around.

3 Comments

Thanks! That's a great demonstration :) I'll be sure to use it myself
"Thus n>0 is always true so that comparison can be removed." is more like "Thus n>=0 is always true ..."
"Thus n>0 is always true so that comparison can be removed." is strange in that n may be zero and the compare is still needed, even with optimized compilation.
4

How bad is a "controlled" signed int overflow and can it even be "controlled"?

Do not assume common compiler results will be the same in the future when dealing with undefined behavior (UB). Compilers have a history of taking advantage of UB found in old weak code. Don't rely on it else the code has a latent (expensive) bug for the future.


Many students (I included) do it some way that causes an overflow in case of INT_MIN ...

This can easily be avoided. So the bad overflow becomes a non-issue.

With very little re-work, OP's code can be adjusted to avoid the UB when the input string encodes INT_MIN.

The key idea is that rather than accumulate the result res as a positive intermediate value, accumulate it as a negative one. This takes advantage that the range [0...INT_MIN] is greater than [0...INT_MAX].

int my_atoi(const char *s)
{
    int neg;
    int res;
    size_t  i;

    i = 0;
    res = 0;
    //neg = 1;
    neg = -1;
    while (s[i] == ' ' || (s[i] >= 9 && s[i] <= 13))
        i++;
    if (s[i] == '+' || s[i] == '-')
    {
        if (s[i] == '-')
            // neg = -1;
            neg = 1;
        i++;
    }
    while (s[i] >= '0' && s[i] <= '9')
    {
        // res = res * 10 + s[i] - '0';
        res = res * 10 - (s[i] - '0');
        i++;
    }
    return (res * neg);
}

Other improvements possible too: No UB on any overflow, locale white-space, better names, ...

2 Comments

3

Note that C23 includes the header <stdckdint.h> (§7.20) which defines three type-generic macros:

#include <stdckdint.h>
bool ckd_add(type1 *result, type2 a, type3 b);
bool ckd_sub(type1 *result, type2 a, type3 b);
bool ckd_mul(type1 *result, type2 a, type3 b);

You can use these to check whether the computation will overflow.

If these type-generic macros return false, the value assigned to *result correctly represents the mathematical result of the operation. Otherwise, these type-generic macros return true. In this case, the value assigned to *result is the mathematical result of the operation wrapped around to the width of *result.

There isn't a macro for division because there's only one case that can produce overflow (division by zero is undefined behaviour, of course). If you're dealing with the int type, dividing INT_MIN by -1 overflows.

1 Comment

Wow I didn't know dividing INT_MIN by -1 was a problem! And I also didn't know about these macros. Very nice, thank you :)
1

Signed integer overflow should be avoided because the compiler can make assumptions about it never occurring which in some cases can have unexpected consequences.

For example, this simplistic overflow checking code does not work:

    int n = 0;
    while (*p >= '0' && *p <= '9') {
        n = n * 10 + (*p - '0');
        if (n < 0) n = INT_MAX;
    }

The compiler may assume that given the above code, n cannot become negative without undefined behavior hence the if (n < 0) n = INT_MAX; can be ignored.

For this reason, it is recommended to check for potential overflow before the operation.

Here is a sample implementation of atoi() with reliable overflow control:

#include <ctype.h>
#include <errno.h>
#include <limits.h>
#include <stdio.h>

int my_atoi(const char *s) {
    int n = 0;

    /* skip optional initial white space */
    while (isspace((unsigned char)*s))
        s++;

    if (*s == '-') {
        /* convert negative number: use negative digit values */
        s++;
        if (!isdigit((unsigned char)*s) {
            errno = ERANGE;
            return 0;
        }
        while (isdigit((unsigned char)*s)) {
            int d = '0' - *s++;
            /* check for potential arithmetic overflow */
            if (n < INT_MIN / 10 || (n == INT_MIN / 10 && d < INT_MIN % 10)) {
                errno = ERANGE;
                n = INT_MIN;
                break;
            }
            n = n * 10 + d;
        }
    } else {
        /* ignore optional positive sign */
        if (*s == '+')
            s++;
        if (!isdigit((unsigned char)*s) {
            errno = ERANGE;
            return 0;
        }
        while (isdigit((unsigned char)*s)) {
            int d = *s++ - '0';
            /* check for potential arithmetic overflow */
            if (n > INT_MAX / 10 || (n == INT_MAX / 10 && d > INT_MAX % 10)) {
                errno = ERANGE;
                n = INT_MAX;
                break;
            }
            n = n * 10 + d;
        }
    }
    return n;
}

int main(int argc, char *argv[]) {
    for (int i = 1; i < argc; i++) {
        errno = 0;
        int n = my_atoi(argv[i]);
        printf("\"%s\" -> %d%s\n", argv[i], n, errno ? " (error)" : "");
    }
    return 0;
}

The function sets errno to ERANGE if the string is not a number and returns 0 or if the number is outside the range of int and returns the range limit of the same sign.

2 Comments

Some thoughts, to be more like strto...(), maybe set errno. Maybe some errno setting with non-digits strings like "-".
Done. Sadly, the only portable values for errno are EDOM, EILSEQ and ERANGE.

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.