Email Delivery Issues

Minor incident Regions Global Switzerland GDPR safe countries
2024-10-04 10:46 CEST · 2 hours, 29 minutes

Updates

Post-mortem

Incident Title

Email Delivery Issues

Incident Description

On October 3rd, 2024, the deployment of Zitadel Cloud version 2.62.4 introduced an error when sending emails for customers that had their SMTP configuration created before ZITADEL version 2.50.0. The issue manifested as an inability to use OTP email or verify the user email address.

Incident Timeline

  • 2024-10-03 12:32 UTC: Version 2.62.4 rolled out to all regions.
  • 2024-10-03 15:29 UTC: First Customer contacted support.
  • 2024-10-04 07:55 UTC: Team convened for an internal video chat to assess the situation.
  • 2024-10-04 08:06 UTC: Second Customer contacted support.
  • 2024-10-04 08:58 UTC: Rollback to version 2.61.3 for all regions completed.
  • 2024-10-04 08:00 UTC: Monitoring showed the situation improving.
  • 2024-10-04 10:18 UTC: Version 2.62.6 rolled out to all regions.
  • 2024-10-04 10:23 UTC: Monitoring showed the situation resolved.
  • 2024-10-04 14:49 UTC: Customers confirmed a return to normal functionality.

Impact of the Incident

Customers, which changed the SMTP configuration that was created before v2.50.0 were not able to send out any email. This prevented users from verifying their email address or using OTP via email.

Root Cause

The root cause was a code change (https://github.com/zitadel/zitadel/pull/8545) where the possibility to create HTTP Email providers was added. Due to this change, the list of SMTP providers had to be created from scratch.
SMTP configurations created before v2.50.0 were not correctly handled, which led to the observed errors.

Resolution Steps

The issue was resolved by a rollback of the problematic release and by opening a fix https://github.com/zitadel/zitadel/pull/8724

Key Learnings

Enhanced Observability: The incident highlighted the need for improved monitoring and alerting mechanisms to detect spikes in 4xx errors, enabling faster incident identification and response.

October 7, 2024 · 15:05 CEST
Resolved

We close this incident and will provide a post-mortem in due course.

October 4, 2024 · 13:12 CEST
Monitoring

We rolled out a fix. Email are correctly sent out again. We monitor the situation.

October 4, 2024 · 12:32 CEST
Update

We have pinpointed the root cause of the issue and are actively working on a resolution for Zitadel version 2.62.x.

In the interim, as a remediation measure, Zitadel Cloud has been rolled back to version 2.61.3 until the fix becomes available.

October 4, 2024 · 11:00 CEST
Investigating

We are currently experiencing intermittent issues with sending emails. Our team is actively investigating the root cause of this problem and working diligently to restore full email functionality. We apologize for any inconvenience this may cause and appreciate your patience during this time.

October 4, 2024 · 10:46 CEST

← Back