1) Confirm impact and scope
- What is broken: checkout, login, email, DNS, app deploy, device access?
- Who is affected: one user, a segment, or everyone?
- When did it start: “just now” vs “since last deploy?”
- Is it total outage or partial degradation?
2) Stop the bleeding (containment)
If the incident is actively worsening, your first goal is containment, not root-cause perfection.
- Pause new deploys/releases.
- Revert obvious breaking changes (DNS/SSL/payment config) when safe.
- Disable a failing integration temporarily (feature flag / toggle) if available.
- Preserve logs — don’t wipe evidence with repeated resets.
3) Capture the signal (before it disappears)
- Exact error message(s) and timestamps.
- Screenshots of payment failures / browser console errors.
- Status pages and monitoring alerts (if any).
- Last known change: deploy, DNS update, SSL renewal, credential rotation.
4) Run quick “truth checks”
Use a few fast checks to avoid guessing. The goal is to decide whether this is DNS/SSL, app logic, third-party outage, or device/network scope.
- Test from a second network/device (rules out local device issues).
- Check DNS/SSL visibility from more than one resolver.
- Confirm third-party status (payments, email, auth provider).
- Validate recent changes or tokens were applied to the correct hostname.
5) Escalate with a clean packet
If you open a ticket with a structured summary, resolution is faster. Include: impact, start time, last change, and the evidence you captured.