ShipEasy Docs

Run your first A/B test from scratch — define a metric, create the experiment, ship variants, read results.

TutorialOn this page · 8 min readUpdated · May 3, 2026Works with · Server SDK

This walks through a complete experiment end to end: a button-label test on the checkout CTA. We'll set up a conversion metric, create the experiment with two variants at 50/50, instrument the SDK, and read results.

▶

Your first experiment

~5 min setup · 24h to first stats

01 · METRIC

Define the conversion

$shipeasy metrics create purchase_conversion --type conversion --event purchase

02 · EXPERIMENT

Two groups, equal weight

$shipeasy experiments create checkout-cta

03 · WIRE

assign + track

$shipeasy experiments start checkout-cta

Prerequisites

The SDK installed and configured. See Install and How it works.
A metric in mind. Here: did the user purchase after seeing checkout?

$npm install @shipeasy/sdk

Define the conversion metric

A metric wraps an event type and an aggregation function. For "did this user buy?" we use conversion over the purchase event.

shipeasy metrics create purchase_conversion \
  --type conversion \
  --event purchase

Or in the dashboard: Experiments → Metrics → New metric.

Tip: also add guardrails — secondary metrics that should not regress. For checkout, page load time and error rate are good guardrails.

Create the experiment

Two groups, equal weight, one parameter (label):

shipeasy experiments create checkout-cta \
  --allocation 100 \
  --groups '[
    {"name":"control","weight":50,"params":{"label":"Pay"}},
    {"name":"v1","weight":50,"params":{"label":"Buy now"}}
  ]' \
  --params '{"label":"string"}'

--allocation 100 means 100% of eligible users are in the experiment, split 50/50. Drop allocation to e.g. 20 if you want only a 20% sample of traffic to see any variant — the rest get neither, useful for a soft launch.

Attach purchase_conversion as the primary metric in the dashboard.

Wire the SDK

import { configureShipeasy, flags, experiments } from "@shipeasy/sdk/server";

configureShipeasy({ apiKey: process.env.SHIPEASY_SERVER_KEY! });
await flags.init();

Then on the request path:

const result = experiments.assign<{ label: string }>("checkout-cta", {
  user_id, plan, country,
});

const label = result.params.label; // "Pay" or "Buy now"

The first time you call assign() for a user, an exposure event is logged automatically. Subsequent calls in the same process don't re-log — exposures are deduplicated.

Track the conversion

Wherever the purchase succeeds:

flags.track(user_id, "purchase", { value: orderTotal });

The event lands in events store. The daily analysis job scans events and joins them against exposures by user_id.

Start the experiment

Until you start it, assign() returns inExperiment: false and the control params for everyone — safe.

shipeasy experiments start checkout-cta

Or click Start in the dashboard.

Read the results

Daily, the analysis cron computes per-metric, per-group results — lift, p-value, 95% CI — and writes them back to your project. Read them in the dashboard, or:

shipeasy experiments status checkout-cta

A typical result row:

purchase_conversion
  control  N=12,418  rate=4.8%
  v1       N=12,503  rate=5.2%   lift +8.3%   p=0.018   CI [+1.4%, +15.2%]

p < 0.05 and the CI excluding zero is your stop-condition. Promote v1 by adding a gate that mirrors the variant, then deleting the experiment.

Stop & clean up

shipeasy experiments stop checkout-cta

Stopping is safe at any time. Stopped experiments keep their config and results; they just stop assigning.

Anti-patterns to avoid

Don't peek and stop early

The p-values are valid for fixed-horizon tests. If you sneak a look every hour and stop the moment something turns significant, you'll get false wins. Pre-decide the experiment's duration based on traffic.

Don't change variants mid-flight

Changing params on a running experiment invalidates the analysis. Stop the experiment, create a new one with v2 in the name.

Don't run too many experiments on the same surface

Two experiments touching the same checkout flow can confound each other. That's what universes and mutual exclusion are for.

Where to next

◇

Universes & holdouts→

When you have more than one experiment in flight on the same surface.

mutual exclusion·global control

∑

Metrics, deeper→

Counts, means, ratios. Primary vs guardrail. Outlier handling.

primary · guardrail·outliers

∫

How analysis works→

What the platform actually does to those numbers.

statistical analysis·p · CI · lift

✎

Events→

What you should track and what you definitely shouldn't.

events store·fire-and-forget

▲ ALREADY GATING?

Promote a gate to an experiment.

If a feature is already behind a gate at 50%, you're one CLI command away from collecting stats on it instead of guessing whether it works.

Read the overview →Back to gates

Create from existing gate

$shipeasy experiments create checkout-v2 --from-gate checkout-v2

Then attach a metric

$shipeasy experiments attach-metric checkout-v2 purchase_conversion --primary

Was this page helpful?✎ Edit on GitHub

Quickstart