1

I tested HashMap retainAll method to test which one is faster with JMH.

Two HashMap A, B

  • Size : A > B, A : 300~400, B : 100~200
  • Most of B elements is in A, so B - A is like 10~20, very small

Test : get intersecion of two HashMap

  1. A.keySet().retainAll(B.keySet())

  2. B.keySet().retainAll(A.keySet())

//JMH 
@State(Scope.Benchmark)
@BenchmarkMode(Mode.AverageTime)
@OutputTimeUnit(TimeUnit.MICROSECONDS)
@Fork(value = 1, jvmArgs={"-Xms4G", "-Xmx4G"})
@Warmup(iterations = 5, time = 5)
@Measurement(iterations = 10, time = 5)

private Map<String, MyClass> bigOne, smallOne;

private final int testIteration = 1000;

@Benchmark
public void smallToBig(){
    for(int i=0;i<testIteration;i++){
        smallOne.keySet().retainAll(bigOne.keySet());
    }
}

@Benchmark
public void bigToSmall(){
    for(int i=0;i<testIteration;i++){
        bigOne.keySet().retainAll(smallOne.keySet());
    }
}

RetainAll method uses contains in AbstractMap, so HashMap contains is O(1). so iterating Map will takes most of performance, means smallOne iteration should be faster.

I expected the former to be faster, but the actual results were different

Result

I tried lots of time, but get same result. Could you tell me why the Big.retainAll(Small) is faster?

Thanks for helping.

1
  • 1
    Are you really trying to conclude which is faster from a single test scenario? Did you notice how large the error values are, compared to the difference between the two results? Did you actually read the text after “REMEMBER:”? Commented Oct 30, 2023 at 16:09

1 Answer 1

0

contains has complexity O(1) in average, and O(n) in worst case. EDIT: it is O(logn) in worst case after java 8!

For Big.retainAll(Small) you do "contains" on a small set and many removes on a big set. For Small.retainAll(Big) you do contains on a larger set and almost no removes u said. (and removes are for sure more complex than contains anyway).

So, if your benchmarks are correct this could only mean that your contains complexity is closer to O(n) and this is why smallToBig takes more time. Try some different keys, maybe Integer for example to see if the same happens

EDIT: i just run into some analysis of Java hashcode for String and how you can exploit it to run attacks. this slide came up (source is princeton algorithms course)

enter image description here

Sign up to request clarification or add additional context in comments.

4 Comments

Thanks alot, my testing dataset made by Random() + unique, and use for loop to insert Big, and if(i%2) insert Small, and then put some more data in Small, and for complexity evenif map is HashMap, maybe my knowledge is bad, contains cost O(n)? calculating hashcode and just takes by index is all thing in HashMap's contains
you can follow this discussion for more info stackoverflow.com/questions/8923251/…
ahh.. thanks alot, so it means my key is String, might similar with charactor array, so making hashcode with it might takes O(String.length()), right??
i m just doing assumptions. as i said "if the benchmarks are correct" then it means that your 'contains' method is slow so you have bad hashcode. The hashcode can be good but if your keys follow some pattern that somehow make them end up at the same bucket, it is the same thing but this is if your benchmarks are ok. it could be some other factor. maybe your machine is doing things on the background and it affects benchmarks. So what i propose is use different keys to see if anything changes and then we revise it

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.