2

I'm looking for an implementation of hash code to use alongside IReadOnlySet<T>.SetEquals.

.NET's HashCode type appears to be order sensitive, so it is not a good fit.

var random = new Random();
var items = random.GetItems(Enumerable.Range(0, 100).ToArray(), 100);

for (var i = 0; i < 10; i++)
{
    if (i > 0)
    {
        random.Shuffle(items);
    }

    var combined = new HashCode();

    foreach (var item in items)
    {
        combined.Add(item);
    }

    Console.WriteLine("Hash code is {0}", combined.ToHashCode());
}

Hash code is -1745381383
Hash code is 206620979
Hash code is 1544865526
Hash code is 877430619
Hash code is 1668984788
Hash code is 54187377
Hash code is -758239719
Hash code is 1005287804
Hash code is 614467421
Hash code is -954645367

How can I generate the same hash code from components combined in an arbitrary order?

3
  • Still pretty unsatisfied with the current answers. Notably, .NET's HashCode struct elects to use the xxHash algorithm ostensibly because of its better performance or reduced collisions compared with naive aggregation using binary operators. Ideally, I would like to obtain similar benefits without order sensitivity. Commented Jun 13, 2024 at 2:25
  • Stumbled across HashSet.CreateSetComparer() which appears to implement the XOR solutions suggested below. There was also a more robust framework proposal to address this problem that was mysteriously closed. The plot thickens... Commented Jun 13, 2024 at 2:30
  • Your choice of algorithms boils down to those that operate on a sorted collection and those that don't. For the latter, there are not many order-independent Int32 aggregation operators. Depending on your data, the simple XOR algorithm might be too simple; I've suggested an improved version in my updated answer. Commented Jun 13, 2024 at 5:17

3 Answers 3

3

With Sorting

If T is comparable, then you could counteract HashCode’s order sensitivity by sorting the items before combining their hash codes:

public class UnorderedCollectionComparer<T> : IEqualityComparer<IReadOnlyCollection<T>>
{
    public bool Equals(IReadOnlyCollection<T>? x, IReadOnlyCollection<T>? y)
    {
        // ...
    }

    public int GetHashCode(IReadOnlyCollection<T> items)
    {
        HashCode hashCode = new();

        foreach (T item in items.OrderBy(item => item))
        {
            hashCode.Add(item);
        }

        return hashCode.ToHashCode();
    }
}

If T isn’t comparable (or even if it is), then you could sort and combine the hash codes of the items:

public int GetHashCode(IReadOnlyCollection<T> items)
{
    HashCode hashCode = new();

    foreach (int itemHashCode in items
        .Select(item => item?.GetHashCode() ?? 0)
        .OrderBy(itemHashCode => itemHashCode))
    {
        hashCode.Add(itemHashCode);
    }

    return hashCode.ToHashCode();
}

But sorting is an O(n log n) operation, so you may want to consider these options only if n is small or you’re able to cache the result.

Without Sorting

If sorting isn’t feasible, then you could consider the following order-independent hash functions:

  1. items.Count
  2. Sum of each item.GetHashCode()
  3. Product of each item.GetHashCode() (in practice, you’d want to ignore zero values)
  4. Bitwise XOR of each item.GetHashCode()
  5. Minimum of each item.GetHashCode()
  6. Maximum of each item.GetHashCode()

However, each of these hash functions, used alone, could perform poorly in certain common situations. For example, for the two that Tim Schmelter mentioned in his answer:

  1. items.Count — This will perform poorly if there are many sets with few possible sizes.

  2. Bitwise XOR of each item.GetHashCode() — This will perform poorly if there are many sets but all the item hash codes have a small number of possible bits.

    To illustrate the problem, suppose your sets are Int32 subsets of {0, 1, ..., 15}. There are 216 = 65,536 possible such subsets. However, Int32.GetHashCode returns the Int32 value itself, and the result of XORing a set of integers between 0 and 15 will again be an integer between 0 and 15. So the 65,536 possible subsets have only 16 possible hash codes—quite bad.

    Fortunately, we can alleviate this problem by multiplying each item hash code by a large constant before performing the XOR.

Depending on the characteristics of your data, combining several of these hash functions with HashCode could make it more likely that different inputs yield different outputs, which is desirable for a hash function.

Here’s a sample implementation that combines the first four hash functions from the list above:

public int GetHashCode(IReadOnlyCollection<T> items)
{
    int sum = 0;
    int product = 1;
    int xor = 0;

    foreach (T item in items)
    {
        if (item != null)
        {
            int itemHashCode = item.GetHashCode();
            if (itemHashCode != 0)
            {
                unchecked
                {
                    sum += itemHashCode;
                    product *= itemHashCode;
                    xor ^= itemHashCode * -0x61C88647; // 0x9E3779B9
                }
            }
        }
    }

    HashCode hashCode = new();
    hashCode.Add(items.Count);
    hashCode.Add(sum);
    hashCode.Add(product);
    hashCode.Add(xor);
    return hashCode.ToHashCode();
}

For Int32 subsets of {0, 1, ..., 15}, this implementation yields 65,536 different hash codes—the best possible.

Essentially the same approach is taken by Scala (a fact I discovered after writing this post).

Improved XOR Implementation

If you prefer a simpler and faster option at the expense of lower hash quality, you could try the following implementation:

public int GetHashCode(IReadOnlyCollection<T> items)
{
    int hashCode = 0;

    foreach (T item in items)
    {
        unchecked
        {
            hashCode ^= ((item?.GetHashCode() ?? 0) ^ int.MinValue) * -0x61C88647;
        }
    }

    return hashCode;
}

(The XOR with int.MinValue ensures that any zero item hash codes aren’t simply ignored. You can delete this if your data likely won’t have zero item hash codes.)

For Int32 subsets of {0, 1, ..., 15}, this implementation yields 32,768 different hash codes, which is a little worse than the above implementation that combines four different hash functions, but is much better than HashSet<T>.CreateSetComparer.

Sign up to request clarification or add additional context in comments.

3 Comments

I don’t consider “sort it first” an answer to this question since it does not combine hash codes insensitive of order.
I think I would prefer to use the standard comparer that ships with the framework.
"Sort it first" combines hash codes insensitive to the order of the original collection, which satisfies your requirement of "an implementation of hash code to use alongside IReadOnlySet<T>.SetEquals". If, in the next release of .NET, Microsoft changed the implementation of HashSet<T>.CreateSetComparer to sort the items, it would still be a valid comparer (and could produce better hash codes than simple XOR).
1

You could use a custom IEqualityComparer<IList<T>>

public class IgnoreOrderComparer<T> : IEqualityComparer<IList<T>>
{
    public bool Equals(IList<T>? x, IList<T>? y)
    {
        if (ReferenceEquals(x, y)) return true;
        if (ReferenceEquals(x, null)) return false;
        if (ReferenceEquals(y, null)) return false;
        if (x.Count != y.Count) return false;

        ISet<T> set = x as ISet<T> ?? new HashSet<T>(x);
        ISet<T> set2 = y as ISet<T> ?? new HashSet<T>(y);
        return set.SetEquals(set2);
    }

    public int GetHashCode(IList<T> obj)
    {
        return obj.Count;
    }
}

Here GetHashCode is always equal if the collections have the same size.

var random = new Random();
int[] items = random.GetItems(Enumerable.Range(0, 100).ToArray(), 100);
List<int> items2 = [..items]; // create a copy to compare it after shuffling
var comparer = new IgnoreOrderComparer<int>();
for (var i = 0; i < 10; i++)
{
    if (i > 0)
    {
        random.Shuffle(items);
    }

    bool areEqual = comparer.Equals(items, items2);
    Console.WriteLine("Hash code is {0}, equal collection? {1}", comparer.GetHashCode(items), areEqual);
}

Results:

Hash code is 100, equal collection? True
Hash code is 100, equal collection? True
Hash code is 100, equal collection? True
Hash code is 100, equal collection? True
Hash code is 100, equal collection? True
Hash code is 100, equal collection? True
Hash code is 100, equal collection? True
Hash code is 100, equal collection? True
Hash code is 100, equal collection? True
Hash code is 100, equal collection? True

Edit: You could get less collisions with this implementation of GetHashCode using the xor operator. So this should be more efficient than the Count approach:

public int GetHashCode(IList<T>? list)
{
    if(list == null) return 0;
    int hashCode = 0;
    foreach (T i in list)
    {
        hashCode ^= i?.GetHashCode() ?? int.MinValue;
    }
    return hashCode;
}

1 Comment

This comparer already exists in the framework: learn.microsoft.com/en-us/dotnet/api/…
0

Use HashSet.CreateSetComparer(), which relies on XOR to aggregate the hash codes of the elements.

1 Comment

Just keep in mind that, as explained in my answer, HashSet<T>.CreateSetComparer returns poor hash codes for sets of small integers, like new HashSet<int> { 1, 2, 3 }.

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.