0

I'm interesting in possible implementation approaches for a quite special variant of a list, with the following requirements:

  • Efficient inverse lookup ("index of"): "give me an index of the given element"
    • This is the crucial and the most non-standard requirement, that's why it's the one mentioned in the title, but it's not the only one.
  • Efficient lookup: "give me the element's value at the given index"
  • Efficient insertion: "insert a value before/after a given index" or "insert a given value before/after element"
  • Efficient deletion: "remove element at the given index" or "remove the given element"

When I say an element (as opposed to a value), I refer to the list elements in the general, abstract sense. In the context of a possible implementation, this doesn't have to literally mean an element's value†, but rather some indirect mean, like a non-invalidating C++ iterator or another form of a handle. A list's user could store such an iterator/handle for later use when inserting something to a list.

When I say efficient, I assume that iterating over all list's elements is unacceptable. Acceptable time complexities include O(1) and O(log(N)).


† Finding an index of a given value, assuming it's present in the list (even in a hypothetical case of a list implementation disallowing duplicates) in sub-linear time is, with all likeliness, impossible, unless we compromise the performance of insertion/removal, but I'd be gladly proven wrong.

29
  • @ScottHunter Assuming that you mean a list implementation that maintains a helper dictionary/map mapping the respective indices to the elements occupying those indices, then this has O(n) insertion/removal time complexity. Commented Nov 21 at 16:25
  • 5
    With those strange requirements, this looks a lot more like a faulty concept than an abstract search for a specific list that can handle all that. My raw suggestion would be to marry a TreeMap with a LinkedList, or find an implementation for that. I've seen some a long time ago, but never used them myself. But my better suggestions is: Post the reason/context where you need that. Commented Nov 21 at 16:28
  • 6
    "I imagine that a good answer would either link to ... or, alternatively, discuss" Stack Overflow is not designed to provide links or discussions. Commented Nov 21 at 16:28
  • 3
    en.wikipedia.org/wiki/Order_statistic_tree (after you remove the search tree comparison stuff) or en.wikipedia.org/wiki/Rope_(data_structure) might fit the bill Commented Nov 21 at 16:42
  • 1
    How should index_of handle duplicates? Commented Nov 22 at 3:49

1 Answer 1

2

The usual strategy to get a 'have your cake and eat it too' concept, and really embracing that whole o(log n) thing (because everything is O(log n) with this kind of structure), is to have buckets-in-buckets, or skip lists.

For the latter, check wikipedia, and there are various implementations available.

The bucket-in-bucket idea:

Let's say you have a bib data structure that currently holds 1000 elements.

The data structure is wrapping around an array that is hardcoded to have exactly 64 elements in it. Each element represents one of three different notions:

  • It is a value
  • It is blank
  • It is, itself, another bib node.

In addition, each data structure knows a bit about its internals. For example, it knows how 'large' it is. An empty node trivially knows it has 0 elements. A node that consists solely of values and blanks also trivially knows. A node that contains a mix of all 3 knows its size. You could calculate it by asking each element its size (blank is 0, element is 1, if a node, ask the node, and this is recursive), but nodes keep track of this.

To look up, say, the 505th element, the top node asks the first node (let's say the first value is a node) how large it is, which is an O(1) operation. It answers 300. Then it asks the next - 200. Then the next - 50. Great, the 5th element of the 3rd node is what you want. So recursively the top node calls nodes[2].get(5). This entire operation is O(logn).

Inserting is similarly cheap. It's a bit more convoluted; you need to go on a spree to update all parent nodes: This is a 2-way data structure. Nodes don't just have a ref to their up-to-64 children, they also have a ref to their parent, so they can propagate size differences. Any such update is trivially o(logn) (the amount of parent nodes that need to be touched is ~log base 64).

In theory it is possible for the structure to become idiotically unbalanced; where each node consists of 'a node' plus 63 empty slots, but this is rare. One could write a rebalancer (which basically rebuilds a completely new one, or at least asks each node to think for itself if it is too unbalanced and needs to do some packing which in turn uses recursion, just like essentially all operations on bib nodes do).

This does not yet tackle fast lookup. To do that, you also manage separate hashmaps. This is complicated; you can't just link them to index, because if you do that, a removal or insertion in the middle operation would require an O(n) amount of updates to this map-of-indices. Instead, each node has a hashset that contains each and every value it contains.

Ask a bib node the index of a value, and it returns -1 if its hashset does not contain the value (O log n or O 1 operation depending on which hash impl). If it does, it first scans its array for any direct values; if its there, it can trivially return the index (O 1 - that's at most 64 operations: Either 'are you X' or 'how many elements do you have' which as established is O 1). If it doesn't find the element, it asks each node child. For each node child, either [A] it returns in O 1 when it is not there, or [B] takes a little longer but that's okay because at most only one node will do that. Still means lookup-index-by-value is O(log n). Note, however, the memory load is considerable; the top node's hashset contains everything; there's a lot of hash sets in this datastructure.

This system supports having the same value in different locations in the structure just fine; an operation to find the first index is O(logn); an operation to find them all can be O(n) if literally every element in the construct has this value. But if it's a handful, the performance characteristic remains O(log n) (it's essentially (O log n * m, where m is the number of times the value is in the structure).

It does it all. Everything is O(log n).. unfortunately that includes 'the number of times any object in the map shows up one of the many, many hashsets this data structure internally uses): That answer is O(log n) too (namely: the node it is directly in + every parent node's hashset contains the value, and that holds for all values).

Sign up to request clarification or add additional context in comments.

Comments

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.