Quickstart
Create your first metric, log the events it depends on, attach it to an experiment — five minutes, end to end.
This walks you through the metric pipeline end to end: pick a metric, log the underlying events, create the metric definition, attach it to an experiment, and read the result. By the end you'll have a primary metric that drives a real ship/no-ship decision.
The 5-minute path
define · log · attach · readCreate a conversion metric
Log the underlying event from your code
Attach to an experiment as the primary metric
Watch the lift + p-value on the dashboard
1. Pick what to measure
Before you create anything, answer one question: what number tells you the experiment worked?
For a checkout-flow rewrite: probably purchase_conversion — did exposed users buy? For a new
paywall: probably subscription_conversion — did they sign up? For a homepage redesign:
typically session_engagement — did they click past the fold?
Pick one. Two is fine if they're closely related. Five primary metrics means you haven't decided yet — go back and decide.
The metric needs to be:
- Computable from events you already log (or are willing to start logging).
- Specific to the change — not "DAU," which moves for a hundred reasons. "Did this exposed user convert after exposure."
- Reasonable to detect — a 1% conversion lift on 1,000 users a day will take a month. Check the power table before committing.
2. Define the metric
The simplest case — did event X happen for this user at least once?
shipeasy metrics create purchase_conversion \
--type conversion \
--event purchaseYou now have a metric definition. It does nothing on its own; it tells the analysis pipeline how to aggregate the underlying events per user.
For a revenue metric (sum the revenueCents property across purchase events per user):
shipeasy metrics create revenue_per_user \
--type sum \
--event purchase \
--property revenueCentsFor more aggregation types, see Aggregations.
3. Log the underlying events
The metric is a rule for aggregating events. The events themselves come from your code:
import { shipeasy } from "@shipeasy/sdk/server";
export default async function CheckoutSuccess({ order }: { order: Order }) {
await shipeasy.track("purchase", {
userId: order.userId,
revenueCents: order.totalCents,
currency: order.currency,
channel: order.acquisitionChannel,
});
return <ThankYou />;
}A few rules that matter:
userIdis required for attribution. Without it, the event can't be assigned to an experiment exposure.- Properties become filterable. You can later add a metric like "paid purchases from organic
channel" by filtering on
channel. track()is fire-and-forget. It returns immediately; the event flushes asynchronously. Do not await it on the hot path expecting a delivery guarantee — it's analytics, not transactional state.
Deploy this. Events start flowing. The metric definition will pick them up on the next analysis window (daily by default).
4. Attach the metric to an experiment
Now wire the metric into the experiment whose lift you want to read:
shipeasy experiments metric add paywall-v2 \
purchase_conversion --role primary
# Add a guardrail too — don't tank page load while you're at it
shipeasy metrics create p95_page_load_ms \
--type mean --event page_view --property loadTimeMs --direction down
shipeasy experiments metric add paywall-v2 \
p95_page_load_ms --role guardrail--role primary is the metric the experiment is judged by. --role guardrail is a metric that
must not regress, even if the primary moves. See Guardrails.
5. Read the result
The dashboard's experiment page shows lift, confidence interval, and p-value per metric, refreshed once per analysis window:
purchase_conversion control v1 lift p 95% CI
4.8% 5.2% +8.3% 0.018 [1.2%, 15.6%]
p95_page_load_ms (guard) 860 ms 862 ms +0.2% 0.71 [-1.1%, 1.5%]You read this as: paywall v1 lifted conversion by 8.3% (p=0.018, 95% CI excludes zero) and did not regress page-load time. Ship it.
What if the result is flat or noisy? Two checks:
- Are events actually landing? Check the Events tab on the experiment — exposed users
should have non-trivial event counts. If the count is zero, your
track()call isn't running for the variant. - Is the experiment powered? The dashboard shows MDE alongside the lift. If MDE is
±10%and the real lift is+2%, you can't tell — run longer or accept the null.
See Power & sample size.