With Sorting
If T is comparable, then you could counteract HashCode’s order sensitivity by sorting the items before combining their hash codes:
public class UnorderedCollectionComparer<T> : IEqualityComparer<IReadOnlyCollection<T>>
{
public bool Equals(IReadOnlyCollection<T>? x, IReadOnlyCollection<T>? y)
{
// ...
}
public int GetHashCode(IReadOnlyCollection<T> items)
{
HashCode hashCode = new();
foreach (T item in items.OrderBy(item => item))
{
hashCode.Add(item);
}
return hashCode.ToHashCode();
}
}
If T isn’t comparable (or even if it is), then you could sort and combine the hash codes of the items:
public int GetHashCode(IReadOnlyCollection<T> items)
{
HashCode hashCode = new();
foreach (int itemHashCode in items
.Select(item => item?.GetHashCode() ?? 0)
.OrderBy(itemHashCode => itemHashCode))
{
hashCode.Add(itemHashCode);
}
return hashCode.ToHashCode();
}
But sorting is an O(n log n) operation, so you may want to consider these options only if n is small or you’re able to cache the result.
Without Sorting
If sorting isn’t feasible, then you could consider the following order-independent hash functions:
items.Count
- Sum of each
item.GetHashCode()
- Product of each
item.GetHashCode() (in practice, you’d want to ignore zero values)
- Bitwise XOR of each
item.GetHashCode()
- Minimum of each
item.GetHashCode()
- Maximum of each
item.GetHashCode()
However, each of these hash functions, used alone, could perform poorly in certain common situations. For example, for the two that Tim Schmelter mentioned in his answer:
items.Count — This will perform poorly if there are many sets with few possible sizes.
Bitwise XOR of each item.GetHashCode() — This will perform poorly if there are many sets but all the item hash codes have a small number of possible bits.
To illustrate the problem, suppose your sets are Int32 subsets of {0, 1, ..., 15}. There are 216 = 65,536 possible such subsets. However, Int32.GetHashCode returns the Int32 value itself, and the result of XORing a set of integers between 0 and 15 will again be an integer between 0 and 15. So the 65,536 possible subsets have only 16 possible hash codes—quite bad.
Fortunately, we can alleviate this problem by multiplying each item hash code by a large constant before performing the XOR.
Depending on the characteristics of your data, combining several of these hash functions with HashCode could make it more likely that different inputs yield different outputs, which is desirable for a hash function.
Here’s a sample implementation that combines the first four hash functions from the list above:
public int GetHashCode(IReadOnlyCollection<T> items)
{
int sum = 0;
int product = 1;
int xor = 0;
foreach (T item in items)
{
if (item != null)
{
int itemHashCode = item.GetHashCode();
if (itemHashCode != 0)
{
unchecked
{
sum += itemHashCode;
product *= itemHashCode;
xor ^= itemHashCode * -0x61C88647; // 0x9E3779B9
}
}
}
}
HashCode hashCode = new();
hashCode.Add(items.Count);
hashCode.Add(sum);
hashCode.Add(product);
hashCode.Add(xor);
return hashCode.ToHashCode();
}
For Int32 subsets of {0, 1, ..., 15}, this implementation yields 65,536 different hash codes—the best possible.
Essentially the same approach is taken by Scala (a fact I discovered after writing this post).
Improved XOR Implementation
If you prefer a simpler and faster option at the expense of lower hash quality, you could try the following implementation:
public int GetHashCode(IReadOnlyCollection<T> items)
{
int hashCode = 0;
foreach (T item in items)
{
unchecked
{
hashCode ^= ((item?.GetHashCode() ?? 0) ^ int.MinValue) * -0x61C88647;
}
}
return hashCode;
}
(The XOR with int.MinValue ensures that any zero item hash codes aren’t simply ignored. You can delete this if your data likely won’t have zero item hash codes.)
For Int32 subsets of {0, 1, ..., 15}, this implementation yields 32,768 different hash codes, which is a little worse than the above implementation that combines four different hash functions, but is much better than HashSet<T>.CreateSetComparer.
HashCodestruct elects to use thexxHashalgorithm ostensibly because of its better performance or reduced collisions compared with naive aggregation using binary operators. Ideally, I would like to obtain similar benefits without order sensitivity.