3

I made a median filter algorithm and I want to optimize it. Currently it's taking around 1 second to filter 2MM lines(a file read into an ArrayList elements) and I am trying to reduce it to less(maybe half the time?) I'm using ArrayLists for my algorithm and minimized the use of nested loops as well to avoid an increase in time, however I still can't achieve lower than 0.98 seconds tops.

Here's a code snippet that does the median filter:

//Start Filter Algorithm 2
        int index=0;
        while(index<filterSize){
            tempElements.add(this.elements.get(index+counter)); //Add element to a temporary arraylist
            index+=1;
            if(index==filterSize){
                outputElements.add(tempElements.get((filterSize-1)/2)); //Add median Value to output ArrayList
                tempElements.clear(); //Clear temporary ArrayList
                index = 0; //Reset index
                counter+=1; //Counter increments by 1 to move to start on next element in elements ArrayList                    
            }
            if(elementsSize-counter <filterSize){
                break; //Break if there is not enough elements for the filtering to work
            }
        }

What's happening is that I'm looping through the elements arraylist for the filterSize I provided. Then I add the elements to a temporary(tempElements) arraylist, sort it using Collections.sort()(this is what I want to avoid), find the median value and add it to my final output arraylist. Then I clear the tempElements arraylist and keep going through my loop until I cannot filter anymore due to the lack of elements(less than filterSize).

I'm just looking for a way to optimize it and get it faster. I tried to use a TreeSet but I cannot get the value at an index from it.

Thanks

2 Answers 2

9

The Java Collections.sort() implementation is as fast as it gets when it comes to sorting (dual pivot quick sort).

The problem here is not in the nitty gritty details but the fact that you are sorting at all! You only need to find the median and there are linear algorithms for that (sorting is log-linear). See selection for some inspiration. You might need to code it yourself since I don't think the java library has any public implementation available.

The other thing I suggest is to use a fixed size array (created once) instead of an ArrayList. Since you know the size of the filter beforehand that will give you a small speed boost.

Also I don't see how avoiding for loops helps performance in any way. Unless you profiled it and proved that it's the right thing to do, I would just write the most readable code possible.

Finally, TreeSet or any other kind of sorted data structure won't help either because the time complexity is log-linear for n insertions.

Sign up to request clarification or add additional context in comments.

Comments

2

As an alternative to Giovanni Botta's excellent answer:

Suppose that you have an array [7, 3, 8, 4, 6, 6, 2, 4, 6] and a filterSize of 4. Then our first temp array will be [7, 3, 8, 4] and we can sort it to get [3, 4, 7, 8]. When we compute our next temporary array, we can do it in linear (or better?) time as follows:

  1. remove 7
  2. insert 6 in sorted position

We can repeat this for all temp arrays after the initial sort. If you're spending a lot of time sorting subarrays, this might not be a bad way to go. The trick is that it increases required storage since you need to remember the order in which to remove the entries, but that shouldn't be a big deal (I wouldn't think).

2 Comments

Hard to say which one is optimal, this one or the selection based one. In this case you are doing a total of n between comparisons and assignments per iteration; you could write selection to only do one assignment (by keeping track of the first element to remove, as in a circular buffer), then you do at most n between comparisons and assignments. I think this is a case where profiling and a lot of test data will tell you which algorithm is faster.
I shall do both algorithms and report back which one is faster. I have a good set of data to test the algorithms. Thanks a lot for the help

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.