Incident Summary & Timeline
On January 20, 2025, starting at 14:47 CET, our platform experienced a major disruption when a malfunction in an AWS infrastructure process blocked auto-scaling, hardware rotation, and deployment tasks. Our monitoring and alerting systems worked as intended, notifying our on-call team promptly. However, because our usual rapid-response mechanisms were unavailable, the team performed manual intervention and a partial rebuild of key infrastructure components in pa...