2

The context and the problem below are only examples that can help to visualize the question.

Context: Let's say that I'm continously generating random binary vectors G with length 1x64 (whose values are either 0 or 1).

Problem: I don't want to check vectors that I've already checked, so I want to create a kind of table that can identify what vectors are already generated before.

So, how can I identify each vector in an optimized way?

My first idea was to convert the binary vectors into decimal numbers. Due to the maximum length of the vectors, I would need 2^64 = 1.8447e+19 numbers to encode them. That's huge, so I need an alternative.

I thought about using hexadecimal coding. In that case, if I'm not wrong, I would need nchoosek(16+16-1,16) = 300540195 elements, which is also huge.

So, there are better alternatives? For example, a kind of hash function that can identify that vectors without repeating values?

4
  • 2
    Are you going to actually generate 2^64 different vectors? If it's fewer than 2^32, I'd just encode them as uint64 Commented Jan 10, 2017 at 23:12
  • 1
    The only hash table I know of that's implemented in MATLAB is containers.Map. I don't know how efficient a map will be in this instance, but it's worth a try. Trying to create even a bitset of 2^64 bits is probably overly memory-intensive. Commented Jan 10, 2017 at 23:16
  • @Suever I'mt not performing an exhaustive search, so it is quite improbable that I'll reach the 2^32. Thanks for the sugerence! Commented Jan 11, 2017 at 8:37
  • @beaker Effectively, containers.Map is what I wanted! Commented Jan 11, 2017 at 10:35

1 Answer 1

1

So you have 64 bit values (or vectors) and you need a data structure in order to efficiently check if a new value is already existing?

Hash sets or binary trees come to mind, depending on if ordering is important or not.

Matlab has a hash table in containers.Map.

Here is a example:

tic;
n = 1e5; % number of random elements
keys = uint64(rand(n, 1) * 2^64); % random uint64

% check and add key if not already existing (using a containers.Map)
map = containers.Map('KeyType', 'uint64', 'ValueType', 'logical');
for i = 1 : n
    key = keys(i);
    if ~isKey(map, key)
        map(key) = true;
    end
end
toc;

However, depending on why you really need that and when you really need to check, the Matlab function unique might also be something for you.

Just throwing out duplicates once at the end like:

tic;
unique_keys = unique(keys);
toc;

is in this example 300 times faster than checking every time.

Sign up to request clarification or add additional context in comments.

1 Comment

Awesome answer! containers.Map adapts perfectly to what I wanted. In addition, it allows me to set double values asigned to each key :)

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.