Feb 8, 2024, 4:20 PM EST – An increase in error rate was observed.
Feb 8, 2024, 4:25 PM EST – Monitoring systems detected anomalies, prompting the RebelMouse team to initiate an investigation.
Feb 8, 2024, 5:00 PM EST – Error rates experienced a significant surge.
Feb 8, 2024, 5:16 PM EST – The RebelMouse team officially categorized the incident as Major and communicated it through the Status Portal.
Feb 8, 2024, 5:30 PM EST – The root cause was pinpointed: unavailability in launching new instances within the EKS cluster.
Feb 8, 2024, 6:00 PM EST – The RebelMouse team rectified the issue by updating the network configuration and manually launching required instances to restore system performance.
Feb 8, 2024, 8:51 PM EST – RebelMouse initiated a support request regarding AWS services outage.
Feb 8, 2024, 9:10 PM EST – Systems reconfiguration was completed, and the team entered monitoring mode.
Feb 8, 2024, 10:10 PM EST – The incident was officially resolved.
Feb 10, 2024, 2:30 AM EST – AWS confirmed an issue with the EKS service in the us-east-1 region during the specified period, and services have been restored.
Stores multiple key services hosted on AWS us-east-1 region for RebelMouse were impacted leading to partial unavailability.
The underlying cause if known
The root cause of this problem has been identified as a networking issue within AWS, specifically affecting the EKS service within the us-east-1 region. AWS acknowledged the issue and the team was actively working on resolving it.
RebelMouse engineering teams were engaged as soon as the problem was identified. They worked diligently to resolve the issue in the fastest manner possible while keeping customers updated about the situation.
We have recognized the importance of enhancing our strategies for handling potential networking issues. Going forward, we will seek opportunities to mitigate these challenges by implementing extensive caching systems and boosting our redundant capacity for caching.