Plenty has been discussed on Cross Validated about the drawbacks of classification accuracy when it comes to evaluating classification models. One good answer is here, for instance.
Under what conditions would classification accuracy be the correct performance measure?
Two interesting examples can be found here and here. Can the logic there be fleshed out?