Performance degradation for logged in experience
Incident Report for RebelMouse
Postmortem

The recently implemented code update for the Sections API was successfully deployed to the stage cluster, marking the commencement of the testing phase. Automated tests were then initiated during the overnight period as a standard step in the code release procedure. Notably, it should be acknowledged that the stage cluster shares Memcached servers with the production cluster for this particular API. As a result, the automated regression test run led to the generation of cache records in a novel format, causing compatibility challenges for the production application's read and processing capabilities.

A dedicated DevOps team member identified the issue and promptly halted further execution of the tests. Subsequently, an immediate response was undertaken by implementing an application-level modification. This alteration facilitated the purging of existing cache records, ensuring that the production applications can recreate cache data in the anticipated format.

This incident, while unprecedented, has propelled us to take preventative measures to mitigate any recurrence. As part of this effort, a dedicated Memcached cluster will be established explicitly for our stage environment. This proactive step is aimed at fortifying our testing infrastructure to uphold the highest standards of quality and compatibility assurance moving forward.

Posted Aug 16, 2023 - 23:40 EDT

Resolved
This incident has been resolved.
Posted Aug 16, 2023 - 21:19 EDT
Monitoring
The issue was resolved, we are monitoring the solution and preparing postmortem
Posted Aug 16, 2023 - 20:51 EDT
Identified
We identified a source of issue and working on the fix
Posted Aug 16, 2023 - 20:19 EDT
Investigating
We are experiencing degradation for editorial experience. We are investigating what is a cause and fixing it ASAP
Posted Aug 16, 2023 - 19:45 EDT
This incident affected: AWS ec2-us-east-1.