2

Now I try to improve my knowledge of pointers reading "Understanding and Using C Pointers" by Richard Reese.

Here's one code example from this book concerning realloc() function.

char* getLine(void) {
    const size_t sizeIncrement = 10;
    char* buffer = malloc(sizeIncrement);
    char* currentPosition = buffer;
    size_t maximumLength = sizeIncrement;
    size_t length = 0;
    int character;

    if(currentPosition == NULL) { return NULL; }

    while(1) {
        character = fgetc(stdin);

        if(character == '\n') { break; }

        if(++length >= maximumLength) {
            char *newBuffer = realloc(buffer, maximumLength += sizeIncrement);

            if(newBuffer == NULL) {
                free(buffer);
                return NULL;
            }

            currentPosition = newBuffer + (currentPosition - buffer);
            buffer = newBuffer;
        }

        *currentPosition++ = character;
    }

    *currentPosition = '\0';
    return buffer;
}

The main idea is to read all symbols into the buffer until we meet \n.

We don't know the total number of symbols to read so it's reasonable to use realloc() function to expand buffer periodically.

So, to expand buffer we use:

char *newBuffer = realloc(buffer, maximumLength += sizeIncrement);

In this case realloc() returns newBuffer pointer to the expanded buffer.

After that, if realloc() was invoked successfully, currentPosition is recalculated as:

currentPosition = newBuffer + (currentPosition - buffer);

QUESTION:

Is it valid to recalculate currentPosition in such way?

As I know, after realloc() invocation buffer pointer is invalidated. (See, for example, this). Any access to the buffer pointer leads to the undefined behaviour. So... where am I wrong?

11
  • Rokyan Is it indeed the code as it is written in the book? If so then it is a bad code and I am sure the book aslo is bad. Commented Jan 27, 2016 at 20:23
  • Well it is invalidated if you deference it, but the code does not. It just computes the new pointer. The previous pointer still retains its value. Commented Jan 27, 2016 at 20:27
  • If you really want to improve your knowledge don't call a pointer =>> Array Commented Jan 27, 2016 at 20:27
  • @VladfromMoscow If you want you can check. May be I missed some blank lines... Commented Jan 27, 2016 at 20:30
  • 1
    Related thread Commented Jan 27, 2016 at 21:34

4 Answers 4

6

This code causes undefined behaviour:

currentPosition = newBuffer + (currentPosition - buffer);

After passing a pointer to realloc, that pointer variable (and all other pointers based on that pointer) become indeterminate, which is the same status that an uninitialized variable has.

Reference: C11 6.2.4/2:

[...] The value of a pointer becomes indeterminate when the object it points to (or just past) reaches the end of its lifetime.

Then, doing pointer arithmetic on an invalid pointer causes undefined behaviour, C11 6.5.6/8:

When an expression that has integer type is added to or subtracted from a pointer, [...] If both the pointer operand and the result point to elements of the same array object, or one past the last element of the array object, the evaluation shall not produce an overflow; otherwise, the behavior is undefined

The pointer operand doesn't point to an object at that time. The object it used to point to has already been freed.

In fact, evaluating the pointer at all may cause undefined behaviour, since an indeterminate value may be a trap representation. (Imagine a system where loading a value into an address register also performs a hardware check that the address belongs to this process). Refs: C11 3.19.2, 6.2.6.1/5:

If the stored value of an object has such a representation and is read by an lvalue expression that does not have character type, the behavior is undefined


The correct way to write the code would have been:

if(++length >= maximumLength)
{
    size_t currentOffset = currentPosition - buffer;

    char *newBuffer = realloc(......
    // ...

    currentPosition = newBuffer + currentOffset;
    buffer = newBuffer;
}

(Personally I would use the offset the whole way , instead of currentPosition, to avoid this problem entirely)

Sign up to request clarification or add additional context in comments.

22 Comments

I think you're wrong there. You would be correct if the code dereferenced the pointer, but it does not. Since the value of buffer cannot change in the call (since it is call by value), the variable is guaranteed to have the original value of buffer. This allows for simple pointer arithmetic to determine what the correct offset is to add to the new newBuffer memory region.
@DavidHoelzer the C standard disagrees with you
@CareyGregory the arithmetic is not perfectly valid (see the second quote in my answer)
Here is a patent for hardware trapping in response to a register containing an invalid pointer value. Here is a description of how a MMU can allow for the same pointer to be valid at one time, and invalid at a later time. Another useful feature that the C standard allows is for the compiler to set freed pointers to a known invalid representation (e.g. 0xCDCDCDCD); so that if a program goes on to use a freed pointer then it is obvious what happened.
@CareyGregory the C standard doesn't guarantee that at all (see my first quote). For example the compiler could translate free(p); to a call to the free library function followed by setting p to null (or some other value that would be useful for debugging).
|
4

Yes. Here's why.

currentPosition and buffer, as they are being used in the expression currentPosition = newBuffer + (currentPosition - buffer);, are simply being used for their arithmetic value. At no time after the realloc is buffer dereferenced.

When you call realloc you are correct that the pointer must no longer be relied on as a pointer into the memory region of the buffer. However, the actual address value in the pointer is not changed by the call.

Comments

1

My five cents.:)

For starters I think that the program shown in the book is bad.

It does not check whether fgetc returns EOF.

And it does not return NULL as usual when the end of the file is encountered and no data has been read.

This drawback does not allow to use the function for example the following way

while ( ( buffer = getLine() ) != NULL )
{
    //...
    free( buffer );
}

Also there are too many variables. The same logic can be performed with less variables.

So I would write the function the following way

#include <stdlib.h>
#include <stdio.h>

char * getLine( void ) 
{
    const size_t SIZE_INCREMENT = 10;

    char  *buffer = NULL;
    size_t length = 0;

    int c = fgetc( stdin );

    if ( c != EOF )
    {        
        while ( 1 )
        {
            if ( length % SIZE_INCREMENT == 0 )
            {
                char *tmp = realloc( buffer, length + SIZE_INCREMENT );
                if ( !tmp )
                {
                    free( buffer );
                    buffer = NULL;
                    break;
                }
                else
                {
                    buffer = tmp;
                }
            }

            if ( c == EOF || c == '\n' ) break;

            buffer[length++] = c;

            c = fgetc( stdin );
        }            

        if ( buffer ) buffer[length] = '\0';
    }

    return buffer;
}

int main( void )
{
    char *s;

    while ( ( s = getLine() ) != NULL )
    {        
        puts( s );
        free( s );
    }

    return 0;
}

If to enter for example

This is first line
This is second line

then the output will echo the input

This is first line
This is second line

As for the discussion whether this statement

currentPosition = newBuffer + (currentPosition - buffer);

is well-defined then in my opinion it is well-defined.

According to the C Standard (6.5.6 Additive operators)

6 The result of the binary - operator is the difference resulting from the subtraction of the second operand from the first.

Though the object pointed to by the pointers is not already alive nevertheless the pointers contain valid values relative to the pointer arithmetic that refered elements of the same array. So if to apply the subtruction the result will be a valid value of elements between these two pointers that the non-alive array had. The subtraction operator just does the arithmetic using integer values and taking into account the type of the pointers like

( second_pointer - first_pointer ) / sizeof( *first_pointer )

13 Comments

@EdgarRokyan The pointer passed to the function is passed by value. The original pointer keeps its value. Nothing can occur with it.
Now I totally agree with @M.M answer. After realloc() invocation buffer pointer doesn't point to the real object. In general we can't use it even in subtraction cause it's not guaranteed to be valid. Do you agree?
@EdgarRokyan See my answer one more. This binary operator performs subtraction of the operands. The operands in this case have valid values for the operation because the both pointers refered elements of the same array. Nothing was occurred with these values. They are valid values for the subtraction.
@EdgarRokyan Otherwise you should think that for example 4 - 2 can equal to any value or have some indeterminate value.:)
Vlad, I want to say that there's no array when we do subtraction. It's well explained in M.M's answer. But it looks like you don't agree with him...
|
0

The code uses a pattern which, on many platforms, would be naturally supported by any implementation which does not go out of its way to do otherwise. Unfortunately, some implementations will occasionally break such code in the name of "optimization", even though the value of any practical optimizations obtained thereby would pale in comparison to the efficiency loss that would often result from requiring programmers to use alternative approaches.

Comments

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.