ShipEasy
Flags & ExperimentsCase studies

Case studies

Real scenarios — when to reach for a gate vs config vs killswitch vs experiment.

Each case below picks one primitive, names a real-feeling scenario, and shows the code. Use them as templates when you're stuck choosing.

Roll out a redesigned checkout

You finished the checkout rewrite. It passes QA, the team is convinced. Now you have to ship it to real money-spending users without redeploying twelve times and without one rare browser quirk torching your conversion rate for a day before you notice.

The choice: ship the rewrite progressively without a redeploy each ramp step. The primitive: Gate. Why not experiment? You already know v2 is the future. You just don't want to break carts on the way. No statistical question — you're ramping for safety, not for an answer.

A gate gives you two knobs: a targeting rule (who's eligible at all) and a rollout percentage (of the eligible, how many actually get the new path). Target safe geographies first — countries where you control the payment integration end-to-end — then ramp the percentage as your dashboards stay green. Each step is a dashboard click, not a deploy.

import { gate } from "@shipeasy/sdk/server";
if (await gate("checkout-v2", { userId, country })) {
  return renderCheckoutV2();
}
return renderCheckoutV1();
# Day 1
shipeasy gate update checkout-v2 --rule "country IN ['US','CA']" --rollout 5
# Day 3
shipeasy gate update checkout-v2 --rollout 25
# Day 5
shipeasy gate update checkout-v2 --rollout 100

Change homepage hero copy

Marketing wants to swap the hero headline on Tuesday. Engineering does not want to ship a build on Tuesday. Solve once: read the string from a config, let marketing edit it in the dashboard, never touch the deploy pipeline again.

The primitive: Config. Why not gate? Gates are boolean. The thing you actually want to change is the string.

import { config } from "@shipeasy/sdk/server";

export default async function Home() {
  const title = await config<string>("home.hero.title", { default: "Ship faster." });
  const cta = await config<string>("home.hero.cta", { default: "Get started" });
  return (
    <section>
      <h1>{title}</h1>
      <Button>{cta}</Button>
    </section>
  );
}

Always pass a default — it's what renders if the SDK can't reach KV, and it's the value your local dev environment will read until you've created the config. Treat the defaults as the source-of-truth copy; the dashboard value as the override.

If the copy needs translation, don't store strings in a config — use the i18n module. Configs are for values that vary across environments or experiments; translations are for values that vary across locales. Different problem, different tool.

Pause outbound emails during incident

3am. The transactional-email provider is having an outage and your retry queue is now sending the same "your order has shipped" email seven times. You need one switch to stop the bleeding while you call the vendor.

The primitive: Killswitch. Why not gate? A 3am on-call shouldn't have to think about rollout percentages. Killswitches have one switch — on or off — and they're audited so you have a paper trail of who flipped what when.

import { gate } from "@shipeasy/sdk/server";

export async function sendOrderEmail(order: Order) {
  if (!(await gate("emails-enabled", undefined, { defaultValue: true }))) {
    logger.warn("emails paused via killswitch", { orderId: order.id });
    return;
  }
  await mailer.send(order);
}

The defaultValue: true is load-bearing. If KV is unreachable during the same incident, you do not want the killswitch to silently default to false and amplify the outage by also halting emails. Default open, fail closed only when you explicitly flip the switch.

Wire a webhook to the killswitch so when it flips, it pages the team and posts to your incident channel. The switch itself is one bit; the social signal that someone flipped it is what makes this an actual incident response tool instead of a forgotten config field.

Test a new paywall vs the old one

The old paywall converts. The new paywall might convert better — the copy is clearer, the price anchor is moved, your designer is sure of it. You can't ship the new one as-is because if it's worse you've thrown away real revenue while you waited to notice.

The primitive: Experiment. Why not gate? You don't know the answer yet. You need a confidence interval to decide.

import { experiment } from "@shipeasy/sdk/server";

const { variant, log } = await experiment("paywall-v2", { userId });

await log("paywall_viewed");
const paywall = variant === "treatment" ? <PaywallV2 /> : <PaywallV1 />;

// elsewhere, in the checkout success handler
await log("paywall_converted", { revenueCents: order.totalCents });

Set the primary metric to paywall_converted and the universe split to 50/50. Let it run until the daily analysis page shows p<0.05 and the minimum-detectable-effect you committed to before launch. Don't peek; don't stop early on a "looking good" day-three reading. The whole point of running an experiment instead of a gate is the discipline of waiting for the answer.

If the result comes back negative, ship variant: "control" to 100% (it already is) and write a short doc about why the hypothesis was wrong. That doc is worth more than the experiment.

Open a beta to allow-listed accounts

A handful of design-partner customers want early access to a feature you're not ready to ramp to everyone. You don't want a deploy step every time you add or remove an account. You want a list you can edit.

The primitive: Gate. The trick: targeting rule on an account attribute, rollout fixed at 100% of the matched set.

import { gate } from "@shipeasy/sdk/server";

const ctx = {
  userId: session.user.id,
  account: { id: session.account.id, plan: session.account.plan, tier: session.account.tier },
};

if (await gate("new-analytics-dashboard", ctx)) {
  return <AnalyticsV2 />;
}
return <AnalyticsV1 />;

In the dashboard, write the targeting rule against an attribute you actually maintain — e.g. account.tier IN ['beta'] — and set rollout to 100%. Adding a customer to the beta becomes "set their tier to beta in your own database" (which you probably already have a workflow for) rather than "redeploy with a new array literal." When you graduate the feature, drop the gate; don't leave it sitting at 100% forever as dead conditional.

Tune search ranking weights

Your search has three signals — recency, relevance score, popularity — combined as a weighted sum. You want to find weights that maximise click-through rate. That's a parameter-search problem, not a boolean ship/no-ship.

The primitive: Experiment + Config (one config per variant). Why this combo? The experiment splits users between weight sets. The configs hold the weights — different configs per variant — so you can tune them without redeploying.

import { experiment, config } from "@shipeasy/sdk/server";

const { variant, log } = await experiment("search-weights-v3", { userId });
const weights = await config<{ recency: number; relevance: number; popularity: number }>(
  `search.weights.${variant}`,
);

const results = ranked(query, weights);
await log("search_results_shown", { hasResults: results.length > 0 });

Create three configs (search.weights.control, search.weights.a, search.weights.b) with three weight sets. Run the experiment with three variants. Daily analysis tells you which variant has the highest click-through; the config tells you exactly what weights that variant used. Ship the winning config to search.weights.control once you're done.

The pattern generalises: any time you have a thing-with-knobs that you want to A/B, put the knobs in configs and the assignment in the experiment.

More scenarios

The six cases below are deeper cuts of the same primitives — one per page in this section's sub-menu. Each shows a less obvious use of the tool, including the wiring that's easy to get wrong the first time.

Roll out by region progressively

You've got a new shipping integration that's been built region-by-region. You don't want a global ramp — you want to expand the country list as each region's integration is qualified, with the percentage fixed at 100% of the matched set.

The primitive: Gate. The mechanic: ramp by widening the targeting rule, not by raising the rollout percentage.

# Day 1 — domestic only
shipeasy flags update intl-shipping \
  --rule 'country IN ["US","CA"]' --rollout 100

# Week 2 — EU added after EU couriers qualified
shipeasy flags update intl-shipping \
  --rule 'country IN ["US","CA","GB","DE","FR","NL","IE"]' --rollout 100

# Week 4 — APAC added
shipeasy flags update intl-shipping \
  --rule 'country IN ["US","CA","GB","DE","FR","NL","IE","JP","AU","SG"]' --rollout 100

The code never changes:

import { gate } from "@shipeasy/sdk/server";

if (await gate("intl-shipping", { userId, country })) {
  return renderInternationalShipping();
}
return renderDomesticOnly();

Why not a percentage rollout? Because the failure mode is regional. A 25% global rollout might catch your DE integration in a state where it's only ready for the US — a percentage doesn't help you. Targeting by country gives you explicit, auditable control of which regions are live.

When the launch is complete, drop the rule entirely (--rule '') and leave the gate at 100% as a kill-switch lever for incidents.

Block stale mobile clients

A bug in v2.4.1 of your mobile app sends malformed payment payloads. v2.4.2 is fixed and forced via the app store, but old clients are stuck. You need server-side enforcement: refuse the bad path on the API for clients below v2.4.2.

The primitive: Gate with a semverGte rule on the client's reported version.

apps/api/checkout/route.ts
import { gate } from "@shipeasy/sdk/server";

export async function POST(req: Request) {
  const appVersion = req.headers.get("x-app-version") ?? "0.0.0";

  if (!(await gate("checkout-v2-eligible", { userId, appVersion }))) {
    return Response.json(
      { error: "upgrade_required", message: "Please update to the latest app." },
      { status: 426 },
    );
  }

  return processCheckout(req);
}

In the dashboard:

Rule: appVersion semverGte "2.4.2"
Rollout: 100%

A user on v2.4.1 fails the rule → falls out of eligibility → gets the upgrade message. A user on v2.5.3 passes → flows through. No code change is needed when v2.4.3 ships; the gate stays at semverGte 2.4.2 until you decide to raise the floor.

The trick that's easy to miss: semverGte is in Targeting rules. Use it instead of gte on a string — string comparison would mis-order 2.10.0 before 2.4.0.

Change prices without a deploy

Marketing wants to A/B price points across geos, run a temporary promo, and roll back instantly if the new price tanks conversion. Engineering does not want to deploy four times this week.

The primitive: Config (typed number) with audit log.

lib/pricing.ts
import { config } from "@shipeasy/sdk/server";

export async function priceForCountry(country: string, currency: string) {
  const cents = await config<number>(`price.pro.${country}`, {
    default: defaults[country] ?? defaults.US,
  });
  return { cents, currency };
}

Now marketing edits the dashboard. Every change is logged with actor + timestamp + previous value, which matters for two reasons:

  1. Compliance. "Why was this customer charged $24 when our public price is $19?" → audit log shows the override window precisely.
  2. Rollback. A bad price change is one click — the audit log's revert button restores the previous value across all SDKs in under a second.

What you do not do: store prices in your code as constants and ship deploys to change them. The audit story is invisible, the rollback story is "redeploy and pray," and the experiment story is "branch the code, ship, branch the code, ship."

For experiments on price (test $19 vs $24), wire the config into an experiment — see the search ranking case above for the experiment-config combo pattern.

Typed announcement banner

You want a site-wide announcement banner that ops can toggle on, with a structured payload (message, severity, link, expiry). You also want the SDK to refuse to render malformed payloads — so a typo in the dashboard JSON doesn't crash the page.

The primitive: Config with a Zod schema.

lib/banner.ts
import { config } from "@shipeasy/sdk/server";
import { z } from "zod";

const BannerSchema = z
  .object({
    enabled: z.boolean(),
    severity: z.enum(["info", "warn", "incident"]),
    message: z.string().max(180),
    linkHref: z.string().url().optional(),
    linkLabel: z.string().max(40).optional(),
    expiresAt: z.string().datetime().optional(),
  })
  .nullable();

export async function getBanner() {
  const raw = await config<unknown>("site.banner", { default: null });
  const parsed = BannerSchema.safeParse(raw);
  if (!parsed.success) {
    console.error("Banner config failed schema check", parsed.error.flatten());
    return null;
  }
  const banner = parsed.data;
  if (!banner?.enabled) return null;
  if (banner.expiresAt && Date.parse(banner.expiresAt) < Date.now()) return null;
  return banner;
}

The dashboard stores this as JSON:

{
  "enabled": true,
  "severity": "warn",
  "message": "Scheduled maintenance Saturday 02:00–04:00 UTC.",
  "linkHref": "https://status.example.com/maint-2026-05-18",
  "linkLabel": "Status page",
  "expiresAt": "2026-05-19T00:00:00Z"
}

The schema gives you two safety properties: a fat-fingered key in the dashboard doesn't render garbage (the schema check rejects it and the banner stays hidden), and expiresAt means ops can schedule the banner to disappear without needing to remember to turn it off.

Don't reach for a gate for this. A gate is a boolean — you'd then need a second config for the payload, and now two things must be in sync. One typed config is the smaller, more honest pattern.

Measure quarterly cumulative lift

You shipped 14 experiments last quarter. Each individually claimed a positive lift. You suspect they don't all add up — at least one of them is probably overcounting because users were already in another experiment's treatment. You need a number for the whole quarter's work.

The primitive: Experiment universe with a long-running holdout.

The mechanic: 10% of users in the universe never see any treatment for the entire quarter. Compare them to the other 90% to get the cumulative lift of everything you shipped.

shipeasy universes update product-decisions \
  --holdout-pct 10 \
  --holdout-window 90d
route handlers
import { experiment } from "@shipeasy/sdk/server";

// All your experiments live under this universe
const { variant, log } = await experiment(
  "paywall-v2",
  { userId },
  {
    universe: "product-decisions",
  },
);

await log("conversion");

At the end of the quarter, the universe page shows:

Holdout (10%):     baseline_metric = 0.182 conv
Treated (90%):     baseline_metric = 0.207 conv
Cumulative lift:   +13.7%  (95% CI [+9.1%, +18.4%])

That's the actual compound effect of everything you shipped, including all the experiments that "won" and the second-order interactions you can't measure per-experiment.

A few things to internalise:

  • Power drops. A 10% holdout removes 10% of every experiment's sample. Plan duration accordingly. See Edge cases — holdouts.
  • The holdout slice must be stable. Don't re-bucket between experiments — the universe-level holdout exists for exactly this reason.
  • Read the result honestly. If cumulative lift is flat while every individual experiment claimed a win, the wins were noise or the second-order effects cancelled.

CTR as a ratio metric

You want to measure click-through rate on a redesigned recommendation rail — clicks per impression shown. Naive instinct: log clicks, log impressions, compute the ratio in the dashboard. Wrong.

The primitive: Ratio metric with the delta method for variance.

shipeasy metrics create rec_rail_ctr \
  --type ratio \
  --numerator-event rec_click \
  --denominator-event rec_impression
app/components/RecRail.tsx
"use client";
import { shipeasy } from "@shipeasy/sdk/client";
import { useEffect } from "react";

export function RecRail({ items, userId }) {
  useEffect(() => {
    items.forEach((item) =>
      shipeasy.track("rec_impression", { userId, itemId: item.id }),
    );
  }, [items, userId]);

  return items.map((item) => (
    <a
      href={item.href}
      key={item.id}
      onClick={() => shipeasy.track("rec_click", { userId, itemId: item.id })}
    >
      {item.title}
    </a>
  ));
}

Why not just compute clicks / impressions per user and use a mean metric? Because the math is wrong. A user with 10 impressions and 1 click contributes 0.1. A user with 1 impression and 1 click contributes 1.0. Treating each as an equal data point inflates the apparent rate and under-counts the high-volume users.

The ratio metric uses the delta method — variance is computed jointly on numerator and denominator means, accounting for their covariance. The dashboard's p-values, CIs, and lift numbers are all honest under this. Mean-of-ratios is not.

See Metrics — aggregation types for the formal treatment of why ratios need special handling.

On this page