Calculate the size of the non-standard-layout structure with open array

Question

It seems it is a trivial task, but I cannot find the way to do it properly without UBs and still have a compile time validation.

I need to implement the linked list of different types. The problem is that I cannot find how to calculate the memory size needed for the MultipleValuesEntry structure, that contains a flexible array inside.

In case if I use offsetof, I get a warning.

#include <cstddef>

enum class EntryType
{
   SingleElement,
   MultipleElements
};

struct Entry
{
   Entry* next_;
   EntryType type_;
};

struct SingleValueEntry : Entry
{
   int value_;
};

struct MultipleValuesEntry : Entry
{
   int values_[1]; // < Open array

   static constexpr size_t estimateSize(size_t valuesCount) noexcept
   {
      return offsetof(MultipleValuesEntry, values_)
                     + sizeof(values_[0]) * valuesCount;
      //      ^^^^^ ‘offsetof’ within non-standard-layout type warning
   }
};

The reason why I want to have a flexible array and not for instance std::vector, is I want the memory for the flexible part be allocated close the the rest of the data members.

In order to solve the warning, instead of inheritance I can have EntryType as a first data member in the MultipleValuesEntry.

struct MultipleValuesEntry
{
   Entry entry_;
   int values_[1]; // < Open array

   static constexpr size_t estimateSize(size_t valuesCount) noexcept
   {
      return offsetof(MultipleValuesEntry, values_)
                     + sizeof(values_[0]) * valuesCount;
   }
};

void process(Entry* entry)
{
   if (entry->type_ == EntryType::MultipleElements)
   {
      auto mpe = reinterpret_cast<MultipleValuesEntry*>(entry); // pointer-interconvertible

      // consume `mpe`
   }
}

Then when I need to consume the entry, I can perform a reinterpret_cast from EntryType* to MultipleValuesEntry* because two pointers are pointer-interconvertible. The problem is there is no compile time validation that it is safe to perform reinterpret_cast. If someone in future change the structure layout, they won't get a compile time error and app will crash in production.

So, the only choice I've got is to implement my own offsetof macro, but it seems it is not that trivial. There are a lot of pitfalls.

There are a lot of articles around. Chat GPT suggests to use std::aligned_storage:

   static constexpr size_t estimateSize(size_t valuesCount) noexcept
   {
      // Calculate the offset manually using aligned storage
      using StorageType = std::aligned_storage<sizeof(MultipleValuesEntry), alignof(MultipleValuesEntry)>::type;
      StorageType storage;
      auto* entryPtr = reinterpret_cast<MultipleValuesEntry*>(&storage);
      size_t values_offset = reinterpret_cast<char*>(entryPtr->values_) - reinterpret_cast<char*>(entryPtr);

      // Calculate the total size
      return values_offset + sizeof(values_[0]) * valuesCount;
   }

I'm not sure if it is legal to access the uninitialized storage as a MultipleValuesEntry. I do not trust this tool. It often suggests some b.sh..t and when I point on issue, it apologies and suggests another one.

So from the low level (ASM) point of view it is piece of cake, but C++ is not that trivial. Please help.

What you call "open array" is already UB in standard C++, although it is conditionally supported in C standard and may work with some C++ compilers. Consider changing your design. — Eugene
– Eugene, Commented Aug 2, 2024 at 14:19
@Eugene Even for C it is done the wrong way: In C a so-called "flexible array member" must be declared as an array without any size in the brackets. — user17732522
– user17732522, Commented Aug 2, 2024 at 14:22
Allocate a byte buffer big enough for the struct (without the final flexible array member), and what would have been the "stretchy buffer". Make the struct non-move-able, non-copy-able (...because that would slice the stretchy buffer). Use a byte pointer to the struct object's this, then add the offset of the sizeof the struct to that byte pointer. There's the start of your stretchy buffer! You'll need to carefully destruct the object, and then delete[] the buffer. All very persnickety, and lots of caveats. But can be done without UB. — Eljay
– Eljay, Commented Aug 2, 2024 at 14:28
"Chat GPT suggests to use std::alligned_storage..." I wouldn't recommend trusting/using chatgpt generated code. — Richard
– Richard, Commented Aug 2, 2024 at 14:29
How about allocate "raw memory" and do placement new? (mostly as you will do when reimplementing std::vector. — Jarod42
– Jarod42, Commented Aug 2, 2024 at 16:27

user17732522 · Accepted Answer · 2024-08-02 14:26:42Z

4

The whole concept has undefined behavior in C++.

There is generally no valid way to get from a pointer to a MultipleValuesEntry beyond it's fixed compile-time size. There are specific exceptions, e.g. when MultipleValuesEntry itself is first member of a standard-layout class object that also contains all the entries after it, but they don't help you for an actually dynamic number of entries.

In C this is allowed as so-called "flexible array member", but such a member must be declared as an array with unknown size, i.e. without a specified size in the brackets.

In C++ there is no equivalent to this. Some compilers support flexible array members as a language extension, but if they do there will be a lot of limitations in how the feature interacts with other C++ features, in particular inheritance. In any case, the standard does not consider this extension and therefore you won't find it defining any of your attempts to use this pattern.

answered Aug 2, 2024 at 14:26

user17732522

78.1k3 gold badges82 silver badges147 bronze badges

Sign up to request clarification or add additional context in comments.

5 Comments

Dmytro Ovdiienko Over a year ago

I use values_[1] because VS complains. Otherwise I would use values_[].

user17732522 Over a year ago

@DmytroOvdiienko That's because MSVC does not support this extension.

Pepijn Kramer Over a year ago

values_[] is still bad! In C++ you should not work with datastructures of unknown size.

463035818_is_not_an_ai Over a year ago

@DmytroOvdiienko "I use values_[1] because VS complains." every time you silence a warning / error without fixing the root cause a kitten dies somewhere. If your compiler warns/errors you better take it serious.

Dmytro Ovdiienko Over a year ago

@463035818_is_not_an_ai Apparently VS2022 supports [] and emits no warnings

Eljay · Accepted Answer · 2024-08-02 15:33:50Z

1

It can be done in C++, without UB. But you will have to manage the Make and Destroy of the backing store buffer of the "open array" object yourself. This is not idiomatic C++ code, and I strongly advise against doing something like this. Instead use std::vector.

Unless you've profiled the performance and have found that cache friendly locality has a measurable improvement that warrants using something like this approach.

Real code should also prohibit copy-ctor, move-ctor, copy-assignment, move-assignment. They won't work right.

Compiler explorer link.

#include <algorithm>
#include <cstddef>
#include <iostream>
#include<cassert>

using std::size_t;

namespace {
enum class EntryType {
    SingleElement,
    MultipleElements
};

struct Entry {
    Entry* next_{};
    EntryType type_{EntryType::SingleElement};
    virtual ~Entry() = default;
    virtual void print() = 0;
};

struct SingleValueEntry : Entry {
    SingleValueEntry(int value_) : value{value_} {}
    int value;
    void print() override { std::cout << value << "\n"; }
};

struct MultipleValuesEntry : Entry {
    MultipleValuesEntry(size_t size_, int value_) : size{size_} {
        for (auto p{ begin() }, e{ end() }; p != e; ++p) {
            // Placement new each int.  Not placement array new[].
            new (p)int{value_};
        }
    }

    ~MultipleValuesEntry() override {
        for (auto p{ begin() }, e{ end() }; p != e; ++p) {
            // Placement delete each int.  Not placement array delete[].
            // p->~int(); // Hmmm, does `int` have a placement destructor syntax?  Just let it drop on the floor, since its destructor is super-trivial no-op?
        }
    }

    size_t size;
    int dummy_[1]; // Not used.  Kind of needed for appropriate alignment padding, which could be done with a static_assert instead.

    void set_values(int value_) {
        std::fill(begin(), end(), value_);
    }

    auto operator[](size_t i) -> int& {
        assert(i < size);
        return begin()[i];
    }

    auto begin() -> int* {
        void* addr{ this };
        auto ptr{ static_cast<std::byte*>(addr) };
        ptr += sizeof(*this);
        return reinterpret_cast<int*>(ptr);
    }

    auto end() -> int* {
        auto ptr{ begin() + size };
        return ptr;
    }

    static auto Make(size_t value_array_size_, int value_) -> MultipleValuesEntry* {
        assert(value_array_size_ > 1);
        auto byte_size{ estimate_size(value_array_size_) };
        auto ptr = new std::byte[byte_size];
        auto result = new (ptr)MultipleValuesEntry{value_array_size_, value_};
        return result;
    }

    static void Destroy(MultipleValuesEntry* ptr) {
        ptr->~MultipleValuesEntry();
        auto byte_ptr{ reinterpret_cast<std::byte*>(ptr) };
        delete[] byte_ptr;
    }

    void print() override {
        for (auto p{ begin() }, e{ end() }; p != e; ++p) {
            std::cout << *p << " ";
        }

        std::cout << "\n";
    }

private:
    static constexpr auto estimate_size(size_t value_array_size_) noexcept -> size_t {
        assert(value_array_size_ > 1);
        auto size{ sizeof(MultipleValuesEntry) + sizeof(int) * value_array_size_ };
        return size;
    }
};

void print(Entry* p) {
    while (p) {
        p->print();
        p = p->next_;
    }
}
} // anon

int main() {
    auto sve{ new SingleValueEntry{100} };
    auto mve{ MultipleValuesEntry::Make(9, 81) };
    auto sve2{ new SingleValueEntry{200} };

    // Connect the entries.
    sve->next_ = mve;
    mve->next_ = sve2;

    // Print them!
    std::cout << "Original...\n";
    print(sve);

    // Tweak values.
    mve->set_values(86);
    (*mve)[4] = 444;

    // Print them!
    std::cout << "\nAfter tweaking...\n";
    print(sve);

    // Show is over.  Elvis has left the building.  Everyone go home.
    delete sve;
    MultipleValuesEntry::Destroy(mve);
    delete sve2;
}

answered Aug 2, 2024 at 15:33

Eljay

5,5313 gold badges19 silver badges32 bronze badges

14 Comments

Dmytro Ovdiienko Over a year ago

Why don't you use new (arena) T[size] for the values after MultipleValuesEntry ?

user17732522 Over a year ago

I am not saying that this won't work in practice on current compilers, but this is also simply UB. In particular reinterpret_cast<int*>(ptr) doesn't point to an int object. To point to an int object you would first need to std::launder the pointer. But std::launder is not allowed, because it has a precondition that it shall not make any more bytes reachable than are reachable through the original pointer value. The rules are written so that a compiler can optimize under the assumption that one cannot reach outside of the object from this other than via stored pointers.

Dmytro Ovdiienko Over a year ago

@user17732522 You can look it at as a construction of two objects inside one storage (MultipleValuesEntry and an array). Basically I can allocate the space for two objects and do placement new on both of them. That should be legal

user17732522 Over a year ago

@DmytroOvdiienko That part is fine, but one cannot use this to access the other object.

Eljay Over a year ago

@DmytroOvdiienko • I don't use new (arena) T[size] because of stackoverflow.com/q/15254/4641116

|

Collectives™ on Stack Overflow

Calculate the size of the non-standard-layout structure with open array

2 Answers 2

5 Comments

14 Comments

Your Answer

Linked

Hot Network Questions

Collectives™ on Stack Overflow

2 Answers 2

5 Comments

14 Comments

Your Answer

Sign up or log in

Post as a guest

Linked

Related