3

This is a follow up to the following question. I was under the assumption, that the pointer arithmetic I originally used would cause undefined behavior. However I was told by a colleague, that the usage is actually well defined. The following is a simplified example:

typedef struct StructA {
    int a;
} StructA ;

typedef struct StructB {
    StructA a;
    StructA* b;
} StructB;

int main() {
    StructB* original = (StructB*)malloc(sizeof(StructB));
    original->a.a = 5;
    original->b = &original->a;

    StructB* copy = (StructB*)malloc(sizeof(StructB));
    memcpy(copy, original, sizeof(StructB));
    free(original);
    ptrdiff_t offset = (char*)copy - (char*)original;
    StructA* a = (StructA*)((char*)(copy->b) + offset);
    printf("%i\n", a->a);
    free(copy)
}

According to §5.7 ¶5 of the C++11 spec:

If both the pointer operand and the result point to elements of the same array object, or one past the last element of the array object, the evaluation shall not produce an overflow; otherwise, the behavior is undefined.

I assumed, that the following part of the code:

ptrdiff_t offset = (char*)copy - (char*)original;
StructA* a = (StructA*)((char*)(copy->b) + offset);

causes undefined behavior, since it:

  1. subtracts two pointers, which point to different arrays
  2. the resulting pointer of the offset calculation does not point into the same array anymore.

Does this cause undefined behavior, or do I misinterpret the C++ specification? Does the same apply in C as well?

Edit:

Following the comments I assume the following modification would still be undefined behavior because of the object usage after the lifetime has ended:

ptrdiff_t offset = (char*)(copy->b) - (char*)original;
StructA* a = (StructA*)((char*)copy + offset);

Would it be defined when working with indexes instead:

typedef struct StructB {
    StructA a;
    ptrdiff_t b_offset;
} StructB;

int main() {
    StructB* original = (StructB*)malloc(sizeof(StructB));
    original->a.a = 5;
    original->b_offset = (char*)&(original->a) -  (char*)original

    StructB* copy = (StructB*)malloc(sizeof(StructB));
    memcpy(copy, original, sizeof(StructB));
    free(original);
    StructA* a = (StructA*)((char*)copy + copy->b_offset);
    printf("%i\n", a->a);
    free(copy);
}
19
  • 5
    Your assumption is correct, for the exact reasons you mentioned. And, Yes same for C. Commented May 7, 2021 at 10:50
  • 2
    Also, there is no StructB object at the malloc'ed location. You need to create one using placement new. Commented May 7, 2021 at 10:53
  • 2
    @maxbachmann no, you allocated memory but didn't create the object Commented May 7, 2021 at 11:02
  • 3
    Stop using that colleague and get a new one. Commented May 7, 2021 at 11:24
  • 2
    @Jean-BaptisteYunès: That is not a good basis for reasoning here. Quite commonly, there is a guarantee that the allocations differ by a multiple of the size of the structure: The size of the structure is either eight bytes (two four-byte members) or 16 (the pointer is eight bytes with eight-byte alignment), and malloc returns addresses that are multiples of 16. To conclude the behavior is undefined, it is sufficient to observe original is invalid after the memory it points to is freed, and, also, the subtraction is not between pointers into the same array. Commented May 7, 2021 at 12:02

2 Answers 2

7

It is undefined behavior because there are severe restrictions on what can be done with pointer arithmetic.

Undefined Behavior in Subtraction

ptrdiff_t offset = (char*)original - (char*)(copy->b);

The subtraction of your two pointers is undefined behavior:

When two pointer expressions P and Q are subtracted, the type of the result is an implementation-defined signed integral type; [...]

  • (5.1) If P and Q both evaluate to null pointer values, the result is 0.
  • (5.2) Otherwise, if P and Q point to, respectively, array elements i and j of the same array object x, the expression P - Q has the value i − j.
  • (5.3) Otherwise, the behavior is undefined.

See https://eel.is/c++draft/expr.add#5

So subtracting pointers from one another, when they are not both null or pointers to elements of the same array is undefined behavior.

Undefined Behavior in C

The C standard has similar restrictions:

(8) [...] If the pointer operand points to an element of an array object, and the array is large enough, the result points to an element offset from the original element such that the difference of the subscripts of the resulting and original array elements equals the integer expression.

(The standard does not mention what happens for non-array pointer addition)

(9) When two pointers are subtracted, both shall point to elements of the same array object, or one past the last element of the array object; [...]

See §6.5.6 Additive Operators in the C11 standard (n1570).

Using Data Member Pointers Instead

A clean and type-safe solution in C++ would be to use data member pointers.

typedef struct StructB {
    StructA a;
    StructA StructB::*b_offset;
} StructB;

int main() {
    StructB* original = (StructB*) malloc(sizeof(StructB));
    original->a.a = 5;
    original->b_offset = &StructB::a;

    StructB* copy = (StructB*) malloc(sizeof(StructB));
    memcpy(copy, original, sizeof(StructB));
    free(original);
    printf("%i\n", (copy->*(copy->b_offset)).a);
    free(copy);
}

Notes

The standard citations are from a C++ draft. The C++11 which you have cited does not appear to have any looser restrictions on pointer arithmetic, it is just formatted differently. See C++11 standard (n3337).

Sign up to request clarification or add additional context in comments.

4 Comments

As far as I understand you answer this means that even my last approach, which stores the offset would cause undefined behavior. Is there any way to implement this without copying the struct and updating all pointers?
@maxbachmann you can use data member pointers in C++ to avoid all pointer arithmetic.
@JanSchultke Regarding "Undefined Behavior in Addition": Does CWG 1314 not state that the pointer addition would be legal here, as long as the result is not out of bounds?
@Becon yes, I was not aware of this exemption at the time of writing the answer. To be fair, while the issue is closed as "it's sufficiently clear that it's okay", the standard wording doesn't really support that opinion. P1839R7 should put an end to the confusion.
4

The Standard explicitly provides that in situations it characterizes as Undefined Behavior, implementations may behave "in a documented fashion characteristic of the environment". According to the Rationale, the intention of such characterization was, among other things, to identify avenues of "conforming language extension"; the question of when implementations support such "popular extensions" was a Quality of Implementation issue best left to the marketplace.

Many implementations intended and/or configured for low-level programming on commonplace platforms extend the language by specifying that the following equivalences hold, for any pointers p and q of type T* and integer expression i:

  • The bit patterns of p, (uintptr_t)p, and (intptr_t)p are identical.
  • p+i is equivalent to (T*)((uintptr_t)p + (uintptr_t)i * sizeof (T))
  • p-i is equivalent to (T*)((uintptr_t)p - (uintptr_t)i * sizeof (T))
  • p-q is equivalent to ((uintptr_t)p - (uintptr_t)q) / sizeof (T) in all cases where the division would have no remainder.
  • p>q is equivalent to (uintptr_t)p > (uintptr_t)q and likewise for all other relational and comparison operators.

The Standard does not recognize any category of implementations that always uphold those equivalences, as distinct from those that do not, in part because they did not wish to portray as "inferior" implementations for unusual platforms where such upholding equivalence would be impractical. Instead, it expected that such implementations would be upheld on implementations where that would make sense, and programmers would know when they were targeting such implementations. Someone writing memory-management code for the 68000, or for small-model 8086 (where such equivalences would naturally hold) could write memory management code that would run interchangeably on other systems where those equivalences would hold, but someone writing memory-management code for large-model 8086 would need to design it explicitly for that platform because those equivalences do not hold (pointers are 32 bits, but individual objects are limited to 65520 bytes and most pointer operations only act upon the bottom 16 bits of a pointer).

Unfortunately, even on platforms where such equivalences would normally hold, some kinds of optimizations may yield corner-case behaviors that differ from those otherwise implied by those equivalences. Commercial compilers generally uphold the Spirit of C principle "don't prevent the programmer from doing what needs to be done", and can be configured to uphold the equivalences even when most optimizations are enabled. The gcc and clang C compilers, however, don't allow such control over semantics. When all optimizations are disabled, they will uphold those equivalences on commonplace platforms, but there is no other optimization setting that will prevent them from making inferences that would be inconsistent with them.

Comments

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.