Reference architecture

    Stripe webhooks end-to-end — the five guarantees a production handler has to give you

    Signature verification. Idempotency. Ordering. Replay. Observability. The five things that turn a fragile POST /webhooks/stripe endpoint into one you can sleep through Black Friday on. With the SQL schema we ship and the TypeScript code that runs it.

    May 16, 202620 min readBy Ritesh
    Stripe webhooks end-to-end — five guarantees

    Webhook bugs are silent and expensive

    Of the 30+ Stripe integrations we have either built or audited, the most common production incident is not the webhook handler going down — it is the webhook handler succeeding for the wrong reason. Returning 200 OK before the event is processed, double-processing a retry, or missing a critical event because of an upstream timeout. Each of these can sit undetected for weeks. By the time someone notices, the reconciliation cost is real.

    This post is structured around the five guarantees a production handler has to provide. Each guarantee gets its own section with the code and the database shape we use. The last section is the dead-letter queue and the observability stack that watches all five.

    Guarantee 1 — Signature verification

    The webhook must verify the Stripe-Signature header before doing anything else. Without this, anyone who guesses your endpoint URL can post forged events and trigger the same side effects (account upgrades, refund issuing) that your real handler does. This is the easiest part to get right and the most consequential to get wrong.

    Stripe's SDK gives you stripe.webhooks.constructEvent, which takes the raw request body, the signature header, and your endpoint signing secret. Two non-obvious implementation details: the body must be passed to the verifier before any JSON parser sees it, and the endpoint signing secret is per-endpoint, not per-account. Test mode and live mode have separate secrets; staging and production also have separate endpoints with separate secrets.

    server/webhooks/stripe.ts — express endpoint
    typescript
    The express.raw middleware is the part teams miss. If express.json() runs first, the signature check fails because the bytes have been re-serialised.
    import express, { Request, Response } from "express";
    import Stripe from "stripe";
    
    const stripe = new Stripe(process.env.STRIPE_SECRET_KEY!, { apiVersion: "2024-12-18.acacia" });
    const endpointSecret = process.env.STRIPE_WEBHOOK_SECRET!;
    
    export const stripeWebhookRouter = express.Router();
    
    stripeWebhookRouter.post(
      "/webhooks/stripe",
      express.raw({ type: "application/json" }),
      async (req: Request, res: Response) => {
        const sig = req.header("stripe-signature");
        if (!sig) return res.status(400).send("missing signature header");
    
        let event: Stripe.Event;
        try {
          event = stripe.webhooks.constructEvent(req.body, sig, endpointSecret);
        } catch (err) {
          console.error("stripe signature verification failed", err);
          return res.status(400).send("invalid signature");
        }
    
        // From here on, we trust the event identity. The next four
        // guarantees take over.
        await enqueue(event);
        res.status(200).json({ received: true });
      },
    );

    Acknowledge the receipt with a 200 OK beforeprocessing the event. Stripe's timeout is 10 seconds; if you process inline and your processing takes longer, Stripe retries and you double-process. Acknowledge fast, then process from a queue. The full handler shape is what we ship by default on every payments integration through our API & integration engagement.

    Guarantee 2 — Idempotency

    Stripe retries webhooks — if Stripe doesn't get a 2xx back, you get the same event again, and not just once. Up to a 3-day window. Your handler has to be able to receive the same event 12 times and produce the same end state every time.

    The mechanism is a unique constraint on the event id. Insert the id into a processed_webhooks table before you do the work; if the insert fails with a unique-constraint violation, the event has already been processed. The pattern is the same one used by Stripe itself for idempotency keys on the API side.

    processed_webhooks — the idempotency table
    sql
    CREATE TABLE processed_webhooks (
      event_id      text          PRIMARY KEY,
      type          text          NOT NULL,
      received_at   timestamptz   NOT NULL DEFAULT now(),
      processed_at  timestamptz,
      status        text          NOT NULL DEFAULT 'queued', -- queued | done | failed
      attempts      int           NOT NULL DEFAULT 0,
      last_error    text,
      payload       jsonb         NOT NULL
    );
    
    CREATE INDEX processed_webhooks_status_idx
      ON processed_webhooks (status) WHERE status <> 'done';
    
    -- Optional: cron prune events older than ~30 days that succeeded.
    -- Keep failed events forever; they are the audit trail.
    server/webhooks/process.ts — the worker
    typescript
    Two distinct phases: the insert that establishes idempotency, and the actual handler that does the work. The handler is wrapped in a DB transaction so partial updates can't leak.
    export async function processWebhook(event: Stripe.Event) {
      // Phase 1: idempotency
      try {
        await db.query(`
          INSERT INTO processed_webhooks (event_id, type, payload)
          VALUES ($1, $2, $3)
        `, [event.id, event.type, event]);
      } catch (err: any) {
        if (err.code === "23505") {
          // unique_violation — we have seen this event before
          return { skipped: true, reason: "already_processed" };
        }
        throw err;
      }
    
      // Phase 2: actual handler
      try {
        await db.transaction(async (tx) => {
          await dispatch(tx, event);
        });
        await db.query(`
          UPDATE processed_webhooks SET status = 'done', processed_at = now()
          WHERE event_id = $1
        `, [event.id]);
      } catch (err: any) {
        await db.query(`
          UPDATE processed_webhooks
          SET status = 'failed', attempts = attempts + 1, last_error = $2
          WHERE event_id = $1
        `, [event.id, String(err.message ?? err)]);
        throw err;
      }
    }

    Guarantee 3 — Ordering

    Stripe does not guarantee event order. The invoice.payment_succeeded webhook can arrive before the customer.subscription.updated event that caused the invoice, or vice versa. Handlers must not depend on order; they must look up the current state from the API or the database, not trust the event payload as the truth.

    The pattern: every handler reads the related resource fresh, either from Stripe via the API or from your own cached projection that was updated by an earlier event. Treat the webhook as a trigger to reconcile, not as the source of truth itself.

    server/webhooks/handlers/subscription.ts — order-independent
    typescript
    async function handleSubscriptionUpdated(
      tx: Tx,
      event: Stripe.Event,
    ) {
      const sub = event.data.object as Stripe.Subscription;
    
      // Don't trust the payload to be current — refetch.
      const current = await stripe.subscriptions.retrieve(sub.id, {
        expand: ["customer", "items.data.price.product"],
      });
    
      // Upsert our local mirror. Conflict resolution: the higher
      // updated_at wins — handles out-of-order delivery safely.
      await tx.query(`
        INSERT INTO subscriptions (
          stripe_id, status, current_period_end, plan_id, updated_at
        ) VALUES ($1, $2, to_timestamp($3), $4, to_timestamp($5))
        ON CONFLICT (stripe_id) DO UPDATE
          SET status = EXCLUDED.status,
              current_period_end = EXCLUDED.current_period_end,
              plan_id = EXCLUDED.plan_id,
              updated_at = EXCLUDED.updated_at
          WHERE subscriptions.updated_at < EXCLUDED.updated_at
      `, [
        current.id,
        current.status,
        (current.items.data[0]?.current_period_end ?? 0),
        current.items.data[0]?.price.id,
        Math.floor(Date.now() / 1000),
      ]);
    }

    Guarantee 4 — Replay

    Sometimes a webhook handler had a bug, processed events wrongly, and now you need to re-process a window of events. Stripe lets you re-send any individual event from the dashboard, but for a real recovery you want an admin endpoint that re-runs your own handler against a window of events, in order.

    server/admin/replay.ts — bounded replay
    typescript
    Admin-only. Rate-limited. The replay re-dispatches via the same processWebhook path, so all the idempotency and observability you already have applies.
    export const replayRouter = express.Router();
    replayRouter.use(requireAdminAuth, rateLimit({ max: 1, windowMs: 60_000 }));
    
    replayRouter.post("/admin/stripe/replay", async (req, res) => {
      const { from, to, types } = req.body as {
        from: string;            // ISO date
        to: string;
        types?: string[];        // optional filter
      };
    
      // 1. Pull events from Stripe in the requested window
      let starting_after: string | undefined;
      const events: Stripe.Event[] = [];
      do {
        const page = await stripe.events.list({
          created: {
            gte: Math.floor(new Date(from).getTime() / 1000),
            lte: Math.floor(new Date(to).getTime() / 1000),
          },
          type: types?.length === 1 ? types[0] : undefined,
          limit: 100,
          starting_after,
        });
        events.push(...page.data);
        starting_after = page.has_more ? page.data[page.data.length - 1].id : undefined;
      } while (starting_after);
    
      // 2. Optional type filter
      const filtered = types && types.length > 1
        ? events.filter((e) => types.includes(e.type))
        : events;
    
      // 3. Process oldest first (events.list returns newest-first by default)
      filtered.reverse();
    
      // 4. Mark them for re-processing: bypass the idempotency table by
      //    deleting prior records first, then run the normal path.
      for (const event of filtered) {
        await db.query("DELETE FROM processed_webhooks WHERE event_id = $1", [event.id]);
        await enqueue(event);
      }
    
      res.json({ queued: filtered.length });
    });

    Guarantee 5 — Observability

    You need three things visible at all times: (a) the lag between when an event was emitted and when your handler ran, (b) the failure rate per event type, and (c) the backlog of events sitting in status = 'queued' or status = 'failed'.

    We export these three metrics on a Prometheus / OpenTelemetry endpoint that the alerting stack consumes. The names we use, exactly as we ship them:

    server/webhooks/metrics.ts
    typescript
    import { Counter, Histogram, Gauge } from "prom-client";
    
    export const webhookReceived = new Counter({
      name: "stripe_webhook_received_total",
      help: "Stripe webhooks received, by event type",
      labelNames: ["type"],
    });
    
    export const webhookLagSeconds = new Histogram({
      name: "stripe_webhook_lag_seconds",
      help: "Time from Stripe event.created to handler completion",
      labelNames: ["type"],
      buckets: [0.5, 1, 2, 5, 10, 30, 60, 300, 900],
    });
    
    export const webhookFailures = new Counter({
      name: "stripe_webhook_failures_total",
      help: "Webhook events that exited with status=failed",
      labelNames: ["type"],
    });
    
    export const webhookBacklog = new Gauge({
      name: "stripe_webhook_backlog",
      help: "Count of processed_webhooks rows not yet done",
      labelNames: ["status"],
      async collect() {
        const rows = await db.query(
          "SELECT status, count(*)::int FROM processed_webhooks WHERE status <> 'done' GROUP BY 1"
        );
        this.reset();
        for (const r of rows.rows) this.set({ status: r.status }, r.count);
      },
    });

    The alerts we ship by default:

    • stripe_webhook_lag_seconds p99 > 60s for 5 minutes — the handler is keeping up but slowly.
    • stripe_webhook_failures_total increases by > 5 over 5 minutes — something is broken, page someone.
    • stripe_webhook_backlog{status="failed"} > 10 — events have stalled in the dead-letter; needs manual review.

    The dead-letter pattern

    Some events will fail in ways no retry helps with. The customer's underlying record was deleted; the plan was renamed to a value your code does not handle; the event type is one you never coded for. Dead-letter those events into a manual-review queue rather than losing them.

    The schema in Guarantee 2 already gives us this: rows with status = 'failed' are the dead-letter. Add a small admin UI (or a Linear / Slack integration) that surfaces these events, lets a human decide what to do, and re-queues or marks them as ignored. The 30+ Stripe integrations we maintain typically generate 0-2 dead-letter events per month per app — rare, but losing them silently is the disaster. The Slack-routing of dead-letters and the monthly review cadence are part of our maintenance retainer on the apps we keep ownership of post-launch.

    The 30-second production checklist

    The list we ship at the bottom of every Stripe webhook PR:

    1. Endpoint uses express.raw (or framework equivalent) so the body is bytes when the verifier sees it.
    2. STRIPE_WEBHOOK_SECRET is set per environment and rotates without redeploying user code.
    3. 200 OK is returned before processing begins; processing happens off the request path.
    4. Idempotency table has a primary-key constraint on event_id.
    5. Every handler refetches the resource from Stripe rather than trusting the payload.
    6. Replay admin endpoint exists and is rate-limited.
    7. Three metrics exported: received, lag, failures.
    8. Dead-letter queue is monitored; failures generate an alert, not just a log line.

    The full reference architecture ships by default on every SaaS we build that has paid subscriptions — see our SaaS web-app development engagement for the rest of the stack the webhook handler slots into.

    ■ Related services

    Three engagements where we ship this pattern

    The API engagement that builds the integration end-to-end, the SaaS build that includes Stripe as a default, and the retainer that owns the alerting after launch:

    Frequently asked questions

    How do I make a Stripe webhook handler idempotent?
    Insert the Stripe event id into a `processed_webhooks` table with a unique-constraint primary key before doing any work. If the insert fails with a unique-violation, the event has already been processed — skip. The pattern is the same one Stripe uses for idempotency keys on the API side.
    Why does Stripe sometimes deliver webhooks out of order?
    Because Stripe does not guarantee event ordering — `invoice.payment_succeeded` can arrive before `customer.subscription.updated`, or vice versa. Handlers must not depend on order; they must refetch the related resource fresh from the API or from your own projection, and treat the webhook as a trigger to reconcile, not as the source of truth.
    Should I process a Stripe webhook synchronously?
    No. Acknowledge with 200 OK first, then process from a queue. Stripe's timeout is 10 seconds; if your processing takes longer Stripe retries and you double-process. Fast ack + async work is the safe shape and it lets you keep the verification logic on the HTTP path while the heavy lifting runs elsewhere.
    Ritesh — Founding Partner, Appycodes

    About the author

    RiteshFounding Partner, Appycodes

    LinkedIn

    Ritesh leads engineering at Appycodes. The reference architecture above is what we ship by default on every new Stripe integration — the idempotency table, the replay admin endpoint, the three metrics, and the dead-letter surfaced into the team's Slack. Across our 30+ Stripe integrations the pattern has now run for several years without a duplicate-charge or missed-event incident.

    Last reviewed: May 16, 2026

    Full stack web and mobile tech company

    Taking the first step is the hardest. We make everything after that simple.

    Let's talk today