How to verify that two files are equal using Java AWS SDK

I am looking for the way how to compare two files (especially large files) in S3 within the same bucket using Java AWS SDK.

I do not need to verify whole bucket if there are duplicates. As I understood the Athena tool should be good for that to find all duplicates in a bucket. I need to compare only two files (objects) in S3 and nothing else.

Is there some better way than downloading data to local? I know that I can verify MD5, but if the MD5 hash is the same, I still need to download those files and compare them if those files are really identical. It is pretty ineffective to download two large files from S3.

edited Apr 17 at 9:08

Mark Rotteveel

110k240 gold badges160 silver badges233 bronze badges

asked Apr 17 at 8:38

Bronek Kristal

11 bronze badge

Are you worried about an attacker creating 2 files with the same checksum, or are you only worried about random collisions?

Sören
– Sören

2025-04-17 10:34:39 +00:00
Commented Apr 17 at 10:34
I will suggest below two options, 1 - Compare ETag/MD5 checksums 2 - Compare size first, then use range-based comparison

swawge
– swawge

2025-04-21 04:34:42 +00:00
Commented Apr 21 at 4:34

Add a comment |

0 Your Answer

Sign up or log in

Post as a guest

Name

Required, but never shown

Post as a guest

Name

Required, but never shown

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.

Collectives™ on Stack Overflow

How to verify that two files are equal using Java AWS SDK

0

Your Answer

Linked

Hot Network Questions

Collectives™ on Stack Overflow

0

Know someone who can answer? Share a link to this question via email, Twitter, or Facebook.

Your Answer

Sign up or log in

Post as a guest

Linked