I am doing time series forecasting with neural network (feedforward for now, but I will test also RNNs) and my problem is that, even though the network learned general patterns, it doesn't forecast well sudden peaks in the data. Example:

Blue is real data and orange is forecasted. Data was standard scaled and RMSE was used.
From my experiments I noticed that MAE is worse than RMSE, probably due to RMSE giving extra high loss for the wrongly predicted high values due to squaring them.
My guess is that the network might be 'shy' to forecast big peaks, because if it will forecast high peak wrongly it will get very high penalty.
Is there any technique that might help here or a better metric for this use case to help forecast high peaks? Could giving higher weights to units with high values help without hurting overall performance?