Java HashMap retainAll time complexity

Question

I tested HashMap retainAll method to test which one is faster with JMH.

Two HashMap A, B

Size : A > B, A : 300~400, B : 100~200
Most of B elements is in A, so B - A is like 10~20, very small

Test : get intersecion of two HashMap

A.keySet().retainAll(B.keySet())
B.keySet().retainAll(A.keySet())

//JMH 
@State(Scope.Benchmark)
@BenchmarkMode(Mode.AverageTime)
@OutputTimeUnit(TimeUnit.MICROSECONDS)
@Fork(value = 1, jvmArgs={"-Xms4G", "-Xmx4G"})
@Warmup(iterations = 5, time = 5)
@Measurement(iterations = 10, time = 5)

private Map<String, MyClass> bigOne, smallOne;

private final int testIteration = 1000;

@Benchmark
public void smallToBig(){
    for(int i=0;i<testIteration;i++){
        smallOne.keySet().retainAll(bigOne.keySet());
    }
}

@Benchmark
public void bigToSmall(){
    for(int i=0;i<testIteration;i++){
        bigOne.keySet().retainAll(smallOne.keySet());
    }
}

RetainAll method uses contains in AbstractMap, so HashMap contains is O(1). so iterating Map will takes most of performance, means smallOne iteration should be faster.

I expected the former to be faster, but the actual results were different

I tried lots of time, but get same result. Could you tell me why the Big.retainAll(Small) is faster?

Thanks for helping.

Are you really trying to conclude which is faster from a single test scenario? Did you notice how large the error values are, compared to the difference between the two results? Did you actually read the text after “REMEMBER:”? — Holger
– Holger, Commented Oct 30, 2023 at 16:09

stelios.anastasakis · Accepted Answer · 2023-11-09 09:02:27Z

0

contains has complexity O(1) in average, and O(n) in worst case. EDIT: it is O(logn) in worst case after java 8!

For Big.retainAll(Small) you do "contains" on a small set and many removes on a big set. For Small.retainAll(Big) you do contains on a larger set and almost no removes u said. (and removes are for sure more complex than contains anyway).

So, if your benchmarks are correct this could only mean that your contains complexity is closer to O(n) and this is why smallToBig takes more time. Try some different keys, maybe Integer for example to see if the same happens

EDIT: i just run into some analysis of Java hashcode for String and how you can exploit it to run attacks. this slide came up (source is princeton algorithms course)

edited Nov 9, 2023 at 9:02

answered Oct 26, 2023 at 12:24

stelios.anastasakis

1,21612 silver badges16 bronze badges

Sign up to request clarification or add additional context in comments.

4 Comments

박효상 Over a year ago

Thanks alot, my testing dataset made by Random() + unique, and use for loop to insert Big, and if(i%2) insert Small, and then put some more data in Small, and for complexity evenif map is HashMap, maybe my knowledge is bad, contains cost O(n)? calculating hashcode and just takes by index is all thing in HashMap's contains

stelios.anastasakis Over a year ago

you can follow this discussion for more info stackoverflow.com/questions/8923251/…

박효상 Over a year ago

ahh.. thanks alot, so it means my key is String, might similar with charactor array, so making hashcode with it might takes O(String.length()), right??

stelios.anastasakis Over a year ago

i m just doing assumptions. as i said "if the benchmarks are correct" then it means that your 'contains' method is slow so you have bad hashcode. The hashcode can be good but if your keys follow some pattern that somehow make them end up at the same bucket, it is the same thing but this is if your benchmarks are ok. it could be some other factor. maybe your machine is doing things on the background and it affects benchmarks. So what i propose is use different keys to see if anything changes and then we revise it

Collectives™ on Stack Overflow

Java HashMap retainAll time complexity

1 Answer 1

4 Comments

Your Answer

Linked

Hot Network Questions

Collectives™ on Stack Overflow

1 Answer 1

4 Comments

Your Answer

Sign up or log in

Post as a guest

Linked

Related