0

It seems it is a trivial task, but I cannot find the way to do it properly without UBs and still have a compile time validation.

I need to implement the linked list of different types. The problem is that I cannot find how to calculate the memory size needed for the MultipleValuesEntry structure, that contains a flexible array inside.

In case if I use offsetof, I get a warning.

#include <cstddef>

enum class EntryType
{
   SingleElement,
   MultipleElements
};

struct Entry
{
   Entry* next_;
   EntryType type_;
};

struct SingleValueEntry : Entry
{
   int value_;
};

struct MultipleValuesEntry : Entry
{
   int values_[1]; // < Open array

   static constexpr size_t estimateSize(size_t valuesCount) noexcept
   {
      return offsetof(MultipleValuesEntry, values_)
                     + sizeof(values_[0]) * valuesCount;
      //      ^^^^^ ‘offsetof’ within non-standard-layout type warning
   }
};

The reason why I want to have a flexible array and not for instance std::vector, is I want the memory for the flexible part be allocated close the the rest of the data members.

In order to solve the warning, instead of inheritance I can have EntryType as a first data member in the MultipleValuesEntry.

struct MultipleValuesEntry
{
   Entry entry_;
   int values_[1]; // < Open array

   static constexpr size_t estimateSize(size_t valuesCount) noexcept
   {
      return offsetof(MultipleValuesEntry, values_)
                     + sizeof(values_[0]) * valuesCount;
   }
};

void process(Entry* entry)
{
   if (entry->type_ == EntryType::MultipleElements)
   {
      auto mpe = reinterpret_cast<MultipleValuesEntry*>(entry); // pointer-interconvertible

      // consume `mpe`
   }
}

Then when I need to consume the entry, I can perform a reinterpret_cast from EntryType* to MultipleValuesEntry* because two pointers are pointer-interconvertible. The problem is there is no compile time validation that it is safe to perform reinterpret_cast. If someone in future change the structure layout, they won't get a compile time error and app will crash in production.

So, the only choice I've got is to implement my own offsetof macro, but it seems it is not that trivial. There are a lot of pitfalls.

There are a lot of articles around. Chat GPT suggests to use std::aligned_storage:

   static constexpr size_t estimateSize(size_t valuesCount) noexcept
   {
      // Calculate the offset manually using aligned storage
      using StorageType = std::aligned_storage<sizeof(MultipleValuesEntry), alignof(MultipleValuesEntry)>::type;
      StorageType storage;
      auto* entryPtr = reinterpret_cast<MultipleValuesEntry*>(&storage);
      size_t values_offset = reinterpret_cast<char*>(entryPtr->values_) - reinterpret_cast<char*>(entryPtr);

      // Calculate the total size
      return values_offset + sizeof(values_[0]) * valuesCount;
   }

I'm not sure if it is legal to access the uninitialized storage as a MultipleValuesEntry. I do not trust this tool. It often suggests some b.sh..t and when I point on issue, it apologies and suggests another one.

So from the low level (ASM) point of view it is piece of cake, but C++ is not that trivial. Please help.

11
  • 3
    What you call "open array" is already UB in standard C++, although it is conditionally supported in C standard and may work with some C++ compilers. Consider changing your design. Commented Aug 2, 2024 at 14:19
  • 2
    @Eugene Even for C it is done the wrong way: In C a so-called "flexible array member" must be declared as an array without any size in the brackets. Commented Aug 2, 2024 at 14:22
  • Allocate a byte buffer big enough for the struct (without the final flexible array member), and what would have been the "stretchy buffer". Make the struct non-move-able, non-copy-able (...because that would slice the stretchy buffer). Use a byte pointer to the struct object's this, then add the offset of the sizeof the struct to that byte pointer. There's the start of your stretchy buffer! You'll need to carefully destruct the object, and then delete[] the buffer. All very persnickety, and lots of caveats. But can be done without UB. Commented Aug 2, 2024 at 14:28
  • 1
    "Chat GPT suggests to use std::alligned_storage..." I wouldn't recommend trusting/using chatgpt generated code. Commented Aug 2, 2024 at 14:29
  • 1
    How about allocate "raw memory" and do placement new? (mostly as you will do when reimplementing std::vector. Commented Aug 2, 2024 at 16:27

2 Answers 2

4

The whole concept has undefined behavior in C++.

There is generally no valid way to get from a pointer to a MultipleValuesEntry beyond it's fixed compile-time size. There are specific exceptions, e.g. when MultipleValuesEntry itself is first member of a standard-layout class object that also contains all the entries after it, but they don't help you for an actually dynamic number of entries.

In C this is allowed as so-called "flexible array member", but such a member must be declared as an array with unknown size, i.e. without a specified size in the brackets.

In C++ there is no equivalent to this. Some compilers support flexible array members as a language extension, but if they do there will be a lot of limitations in how the feature interacts with other C++ features, in particular inheritance. In any case, the standard does not consider this extension and therefore you won't find it defining any of your attempts to use this pattern.

Sign up to request clarification or add additional context in comments.

5 Comments

I use values_[1] because VS complains. Otherwise I would use values_[].
@DmytroOvdiienko That's because MSVC does not support this extension.
values_[] is still bad! In C++ you should not work with datastructures of unknown size.
@DmytroOvdiienko "I use values_[1] because VS complains." every time you silence a warning / error without fixing the root cause a kitten dies somewhere. If your compiler warns/errors you better take it serious.
@463035818_is_not_an_ai Apparently VS2022 supports [] and emits no warnings
1

It can be done in C++, without UB. But you will have to manage the Make and Destroy of the backing store buffer of the "open array" object yourself. This is not idiomatic C++ code, and I strongly advise against doing something like this. Instead use std::vector.

Unless you've profiled the performance and have found that cache friendly locality has a measurable improvement that warrants using something like this approach.

Real code should also prohibit copy-ctor, move-ctor, copy-assignment, move-assignment. They won't work right.

Compiler explorer link.

#include <algorithm>
#include <cstddef>
#include <iostream>
#include<cassert>

using std::size_t;

namespace {
enum class EntryType {
    SingleElement,
    MultipleElements
};

struct Entry {
    Entry* next_{};
    EntryType type_{EntryType::SingleElement};
    virtual ~Entry() = default;
    virtual void print() = 0;
};

struct SingleValueEntry : Entry {
    SingleValueEntry(int value_) : value{value_} {}
    int value;
    void print() override { std::cout << value << "\n"; }
};

struct MultipleValuesEntry : Entry {
    MultipleValuesEntry(size_t size_, int value_) : size{size_} {
        for (auto p{ begin() }, e{ end() }; p != e; ++p) {
            // Placement new each int.  Not placement array new[].
            new (p)int{value_};
        }
    }

    ~MultipleValuesEntry() override {
        for (auto p{ begin() }, e{ end() }; p != e; ++p) {
            // Placement delete each int.  Not placement array delete[].
            // p->~int(); // Hmmm, does `int` have a placement destructor syntax?  Just let it drop on the floor, since its destructor is super-trivial no-op?
        }
    }

    size_t size;
    int dummy_[1]; // Not used.  Kind of needed for appropriate alignment padding, which could be done with a static_assert instead.

    void set_values(int value_) {
        std::fill(begin(), end(), value_);
    }

    auto operator[](size_t i) -> int& {
        assert(i < size);
        return begin()[i];
    }

    auto begin() -> int* {
        void* addr{ this };
        auto ptr{ static_cast<std::byte*>(addr) };
        ptr += sizeof(*this);
        return reinterpret_cast<int*>(ptr);
    }

    auto end() -> int* {
        auto ptr{ begin() + size };
        return ptr;
    }

    static auto Make(size_t value_array_size_, int value_) -> MultipleValuesEntry* {
        assert(value_array_size_ > 1);
        auto byte_size{ estimate_size(value_array_size_) };
        auto ptr = new std::byte[byte_size];
        auto result = new (ptr)MultipleValuesEntry{value_array_size_, value_};
        return result;
    }

    static void Destroy(MultipleValuesEntry* ptr) {
        ptr->~MultipleValuesEntry();
        auto byte_ptr{ reinterpret_cast<std::byte*>(ptr) };
        delete[] byte_ptr;
    }

    void print() override {
        for (auto p{ begin() }, e{ end() }; p != e; ++p) {
            std::cout << *p << " ";
        }

        std::cout << "\n";
    }

private:
    static constexpr auto estimate_size(size_t value_array_size_) noexcept -> size_t {
        assert(value_array_size_ > 1);
        auto size{ sizeof(MultipleValuesEntry) + sizeof(int) * value_array_size_ };
        return size;
    }
};

void print(Entry* p) {
    while (p) {
        p->print();
        p = p->next_;
    }
}
} // anon

int main() {
    auto sve{ new SingleValueEntry{100} };
    auto mve{ MultipleValuesEntry::Make(9, 81) };
    auto sve2{ new SingleValueEntry{200} };

    // Connect the entries.
    sve->next_ = mve;
    mve->next_ = sve2;

    // Print them!
    std::cout << "Original...\n";
    print(sve);

    // Tweak values.
    mve->set_values(86);
    (*mve)[4] = 444;

    // Print them!
    std::cout << "\nAfter tweaking...\n";
    print(sve);

    // Show is over.  Elvis has left the building.  Everyone go home.
    delete sve;
    MultipleValuesEntry::Destroy(mve);
    delete sve2;
}

14 Comments

Why don't you use new (arena) T[size] for the values after MultipleValuesEntry ?
I am not saying that this won't work in practice on current compilers, but this is also simply UB. In particular reinterpret_cast<int*>(ptr) doesn't point to an int object. To point to an int object you would first need to std::launder the pointer. But std::launder is not allowed, because it has a precondition that it shall not make any more bytes reachable than are reachable through the original pointer value. The rules are written so that a compiler can optimize under the assumption that one cannot reach outside of the object from this other than via stored pointers.
@user17732522 You can look it at as a construction of two objects inside one storage (MultipleValuesEntry and an array). Basically I can allocate the space for two objects and do placement new on both of them. That should be legal
@DmytroOvdiienko That part is fine, but one cannot use this to access the other object.
@DmytroOvdiienko • I don't use new (arena) T[size] because of stackoverflow.com/q/15254/4641116
|

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.