Over the last 2 days we have experienced intermittent problems with the performance of our web-dashboard. At seemingly random intervals we were seeing spikes in the consumed CPU and memory consumption on our servers. We have automated systems in place to assign more resources when such spikes occur, but even with the additional resources our servers were struggling. In the end we were forced to restart the application process for our servers to fully recover.
This problem resurfaced a couple of times over the last two days, meaning customers would experiment slower response times than usual, and in short periods the web-application would be unresponsive. Some customers might have seen error messages when trying to load the web-dashboard during these periods.
We have identified the root cause of the problem, and taken steps to make sure this problem will not resurface. We will also look into how we can improve the process for automatically adding more resources to our servers, as this process in some cases introduced additional delays.