ShipEasy
Flags & ExperimentsExperiments

Experiments — A/B tests with stats built in

A/B and multivariate experiments with managed stats. statistical analysis, holdouts, segmentation — out of the box.

ShipEasy · Experiments

Was the new thing actually better?

Define a metric, ship a variant, get a defensible answer the next morning. statistical analysis, lift, p-value and 95% CI — written back to your project.

Production readyOn this page · 6 min readUpdated · May 3, 2026Works with · Server SDK

ShipEasy Experiments lets you ask "is the new thing actually better?" and get a defensible answer.

You define what you want to measure (a metric), set up variant payloads (the groups), let users land in one bucket or another, log conversions, and look at the dashboard the next morning. The platform handles assignment, exposure stitching, deduplication, daily aggregation, and the statistics.

The five building blocks

The 5-minute mental model

You ship feature X behind a flag. You believe it'll lift purchase_conversion. So instead of just gating it, you make it an experiment: half the eligible users see X, half don't. Both groups are tracked. The next day, you look at the dashboard and read either "X lifts conversion by 3.1% (p=0.002)" — or "no significant difference, scrap it."

The honest version requires more care than that summary, and the docs walk through it.

When to gate vs. when to experiment

SituationGateExperiment
Is the change reversible?
Does it have a known clear win (security fix, bug)?
Do you need an answer to "did it work"?
Cheap to ship, expensive to undo (data migration)?
Affects a metric you're paid to move?

A practical workflow is to gate first (so the rollout is safe), then promote the gate to an experiment once the feature is stable.

The 5-minute path

From metric to result

~5 minutes setup · 24h to first stats
01 · METRIC

Define what success looks like

One number. A primary, plus a couple of guardrails (latency, error rate).

$shipeasy metrics create purchase_conversion --type conversion --event purchase
02 · EXPERIMENT

Two groups, equal weight

100% allocation, 50/50 split, two label variants. Targeting via a gate is optional.

$shipeasy experiments create checkout-cta
03 · WIRE

assign + track

First call logs an exposure. flags.track() on conversion. Daily aggregation does the math.

$experiments.assign('checkout-cta', user)

Wire the SDK

$npm install @shipeasy/sdk
import { configureShipeasy, flags, experiments } from "@shipeasy/sdk/server";

configureShipeasy({ apiKey: process.env.SHIPEASY_SERVER_KEY! });
await flags.init();

const result = experiments.assign<{ label: string }>("checkout-cta", {
  user_id,
  plan,
  country,
});
const label = result.params.label; // "Pay" or "Buy now"
// Wherever the purchase succeeds:
flags.track(user_id, "purchase", { value: orderTotal });

The first time you call assign() for a user, an exposure is logged automatically. Subsequent calls in the same process don't re-log — exposures are deduplicated.

API · experiments.assign

Field
Type
Description
namerequired
string
The experiment name. Stable identifier, used in URLs and result rows.
userrequired
EvalContext
An object with at least user_id. Add attributes used by the targeting gate (plan, country, …).
defaultParams
T ?
Returned when the experiment is stopped, the user is in the holdout, or assignment fails. Defaults to the control group's params.

The return shape:

{
  inExperiment: boolean,   // false if stopped, holdout, or excluded
  group: string,           // "control" | "v1" | …
  params: T,               // typed payload from the assigned group
  reason: AssignmentReason // "assigned" | "holdout" | "stopped" | "excluded"
}

What the daily analysis does

Cron enqueues a job per project

A scheduled trigger on the ShipEasy fans out one queue message per project that has running experiments.

Consumer scans events store

For each project, the consumer pulls yesterday's exposures and events from Analytics Engine, joins by user_id, and aggregates per metric × group.

statistical analysis

Lift, two-sided p-value, 95% CI. Per metric, per group, vs. the control group.

Persist results

Results are persisted to your project, scoped to the project. The dashboard reads them; you can export them.

Don't peek and stop early

The p-values are valid for fixed-horizon tests. If you sneak a look every hour and stop the moment something turns significant, you'll get false wins. Pre-decide the experiment's duration based on traffic.

Don't change variants mid-flight

Changing params on a running experiment invalidates the analysis. Stop the experiment, create a new one with v2 in the name.

READY?

Run your first experiment.

A complete walk-through: define the metric, create the experiment, wire two SDK calls, read the result the next morning.

Create the metric
$shipeasy metrics create purchase_conversion --type conversion --event purchase
Create the experiment
$shipeasy experiments create checkout-cta
Was this page helpful?✎ Edit on GitHub

On this page