It shows pretty well that in general, it is important to read datasets documentation and understand their overall context. Here, it seems you're talking about the "ASL Alphabet" dataset available on Kaggle, which provides 87,000 images.
For one, according to its documentation, it seems that the dataset has been collected from just one adult. So in the first place, any generalization from this data could be difficult, as hands and fingers can vary quite a bit between people, e.g. in size, mobility, color, wrinkles, scars, etc. I inspected visually some of the images, and a lot of them were so similar that in some instances I've been wondering if I was not looking at the same image cropped differently or artificially darkened or blurred. For instance, here are the 10 first images from the dataset for the letter "A":

You see that they are extremely similar, and for some of them I can't even tell the difference with the naked eye. So I don't find it very surprising that you got a high accuracy when testing your model on data originating from the same dataset, but it's unlikely to generalize well.
To explore a bit more systematically my first impression, I randomly sampled 500 images for the letter "A", and compared them automatically to each other to see how similar they were, using their structural similarity index for this purpose. Then I generated the following (quick and dirty) heatmap, where the maximum value "1" means that the two compared images are identical. The horizontal and vertical axis represent the identifiers of the images, sorted in order from "A4" to "A2995" (only some of the identifiers appear on the heatmap, but this is really a 500x500 table):

While there are no perfect match among the randomly sampled images, at first glance it looks like there are perhaps 4 main clusters of quite similar images, with some variations inside each cluster. It hints to a systemic lack of diversity in the dataset. Depending on your use case, this is something you may want to investigate further perhaps by testing other letters, using other metrics for similarity, or using other methods for detecting clusters.
Secondly, it seems that the dataset contains incorrect data, with some letters not coming from the American Sign Language (ASL), as stated on this related Kaggle thread:
[...] it looks like several of the letters in the data set are not
ASL. Some, like M and N, appear to be Italian Sign Language. Others,
like T, I'm not sure what language they come from, but it isn't ASL.
Overall, G, K, T, M, N, and P are all not ASL.
If we want to check it for ourselves, and compare visually the Kaggle images to other sources of information, we see that there's indeed a problem. For instance, here is how "T" is fingerspelled in the Kaggle dataset:

Compare it to the version of the American Society for Deaf Children:

I'm not an ASL practitioner or expert, and can't say with 100% certainty which one is correct, even though I'd bet a lot of money that this is not the American Society for Deaf Children who is incorrect here. But in any case, there is some sort of disagreement between the two versions, which makes it harder to model the problem correctly with the data you have.
If you're interested in solving this fingerspelling recognition problems, discussing the issue with sign language experts and practitioners may be certainly fruitful. For instance, just a quick online research will teach you that people with some physical limitations might fingerspell a bit differently from other people. So discussing in depth with experts will certainly give you a good idea of how to model correctly the problem you want to solve, and what kind of data you need for that. In particular, you should note that a person on the Kaggle forum says that still images are not suitable for this kind of task, so you might find that you'll have to change your approach altogether, depending on what you want to do with your model ultimately.