Posts Loading Issue
Incident Report for RebelMouse
Postmortem

Incident During Application Deployment 

During a regular application deployment, our team encountered two critical issues that affected the functionality of our application.  

  1. During the deployment, we introduced a new field to the post model. After the deployment, we observed that some posts were not loading, necessitating an immediate update of the post cache version to accommodate the changes in the post model. To address this issue, we promptly updated the post cache version to align with the modified model, restoring the functionality of the affected posts. We are reviewing as a team to make sure that no deploy with new fields or changes to data storage are done without further understanding how to avoid incidents. 
  2. The time to recover was slower than it should have been because of a network connection error occurred during the deployment process, preventing the successful completion of code deployment to one of our clusters. This situation left us unable to restart the Celery processes, even when all other clusters were in a ready state. As a result, we had to restart the deployment and wait for its completion, causing an unexpected delay in the deployment process.

Immediate Actions Taken: For the issue with the post model field addition, we acted swiftly by updating the post cache version to ensure compatibility with the modified model. 

Mitigation and Preventive Measures: Based on the incident analysis we've already integrated a new test which won't allow deployment to happen in case the post model was changed without post cache modification. "The network issue occurred randomly and, regrettably, coincided with the ongoing deployment process."

Posted Sep 21, 2023 - 09:16 EDT

Resolved
This incident has been resolved.
Posted Sep 21, 2023 - 07:54 EDT
Monitoring
A fix has been implemented and we are monitoring the results.
Posted Sep 21, 2023 - 07:33 EDT
Identified
The issue has been identified and a fix is being implemented.
Posted Sep 21, 2023 - 07:12 EDT
Investigating
We've faced a problem during the deploy which caused problems with posts loading. We are already solving the problem.
Posted Sep 21, 2023 - 07:08 EDT
This incident affected: Full Platform.