Performance Starts on Day 0: k6 & Chaos Tests Inside Your Pull Request

December 9, 2025 / admin

Most teams discover latency spikes in staging—48 hours before launch. Real shift-left performance means catching a 300 ms endpoint before it leaves the PR. In this article we show how our Micro-GCC squads run k6 load scripts and Chaos mesh injections on every pull-request in under three minutes, gate merges on a JSON “performance budget,” and keep production p95 under 200 ms—even on Black-Friday traffic.

Why Waiting for Staging Is a Losing Game

A modern microservice hits prod in 4–10 deploys every day. If you test performance only in staging:

Latency bugs compound across services.
Fixes collide with feature freeze, creating hot-fix Fridays.
Developers never “feel” latency locally—so they keep writing slow code.

Shift-Left performance puts load, chaos, and budget checks right next to unit tests and linting.

The “Performance Budget” JSON Node

Create perf-budget.json at repo root:

json

CopyEdit

{

“globals”: {

“avg_ms”: 150,

“p95_ms”: 200,

“error_rate_pct”: 0.5

“endpoints”: {

“/api/v1/login”: { “p95_ms”: 180 },

“/api/v1/cart”: { “p95_ms”: 220 }

}

Globals apply to all requests.
Endpoint overrides handle heavier paths (e.g., cart).
Keep three KPIs: average, p95, error-rate.

Store this file in Git so changes trigger PR diff—same workflow as package.json.

Adding k6 to Every Pull-Request

3.1 Minimal k6 script (smoke.js)

CopyEdit

import http from ‘k6/http’;

import { check } from ‘k6’;

import { Trend } from ‘k6/metrics’;

export let options = {

vus: 5,

duration: ’30s’,

thresholds: {

http_req_duration: [‘p(95)<200’],

http_req_failed: [‘rate<0.5’]

}

};

const loginTrend = new Trend(‘login_p95’);

const cartTrend = new Trend(‘cart_p95’);

export default function () {

const resLogin = http.post(`${__ENV.BASE_URL}/api/v1/login`, {u:’demo’, p:’pw’});

loginTrend.add(resLogin.timings.duration);

check(resLogin, { ‘login p95 OK’: (r) => r.timings.duration < 180 });

const resCart = http.get(`${__ENV.BASE_URL}/api/v1/cart`);

cartTrend.add(resCart.timings.duration);

check(resCart, { ‘cart p95 OK’: (r) => r.timings.duration < 220 });

}

Environment variable BASE_URL points to Docker-compose service spun up in CI.

3.2 GitHub Action (k6-perf.yml)

yaml

CopyEdit

jobs:

perf:

runs-on: ubuntu-latest

steps:

– uses: actions/checkout@v4

– name: Build & run containers

run: docker compose up -d –build

– name: Run k6 smoke

uses: grafana/k6-action@v0.2.0

with:

filename: ./perf/smoke.js

env:

BASE_URL: http://localhost:3000

– name: Upload k6 summary

uses: actions/upload-artifact@v4

with:

name: k6-summary

path: perf/summaries/
Median runtime: 95 seconds for a five-VU, 30-second smoke test.

Injecting Chaos Before Merge

Latency isn’t the only killer—upstream timeouts can cascade. Enter Chaos Mesh (Kubernetes) or Toxiproxy (Docker).

Toxiproxy CI Step

yaml

CopyEdit

– name: Inject 300ms latency on PostgreSQL

run: |

docker run -d –name toxiproxy -p 8474:8474 shopify/toxiproxy

curl -XPOST -d ‘{“name”:”pg”,”listen”:”0.0.0.0:5433″,”upstream”:”db:5432″}’ \

http://localhost:8474/proxies

curl -XPOST -d ‘{“latency”:{ “latency”: 300, “jitter”: 50 }}’ \

http://localhost:8474/proxies/pg/toxics

Re-run the k6 job against DB-latency chaos. Performance budget remains the same; PR fails if p95 breaches 200 ms.Why developers don’t hate it: Chaos step runs only on PRs that modify Dockerfile, docker-compose.yml, or /db/**. Use paths: filter in GitHub Actions.

Surfacing Results Where Devs Live

Use k6-summary-commenter Action to drop a Markdown table into the PR:

Metric	Budget	Result	Status
Avg ms	150	132	✅
p95 ms	200	178	✅
Errors	0.5 %	0.2 %	✅

Developer sees fail/pass inline—no need to dig in CI logs. Link the artifact for full Grafana run.

Cost & Time Benchmarks

CI Level	Time Added	Compute Cost (GitHub-Hosted)
k6 smoke	95 s	$0.01
Toxiproxy chaos + k6	+70 s	$0.007
Total	165 s	$0.017 per PR

At 200 PRs/month that’s $3.40—cheaper than one post-mortem.

Real-World Impact (FinTech Scale-Up)

Baseline: latency p95 oscillated 210–260 ms; two hot-fixes during release freeze.
After shift-left performance:

p95 held at < 190 ms for six months.
Hot-fix count: 0.
Release freeze shrank from 3 days → ½ day.

PM said: “We spend freeze week on marketing now, not firefighting.”

Pitfalls & Pro Tips

Pitfall	Fix
“CI job flaky on Mondays”	Warm cache containers; pre-pull k6 image.
“Chaos proxy breaks DB auth”	Exclude 127.0.0.1 or use TLS passthrough config.
“Developers ignore perf budget”	Fail PR when any KPI > budget—no bypass.
“Smoke test too small to matter”	Keep smoke quick (≤ 30 s) per PR; schedule nightly soak test for 10 m.

Sprint-by-Sprint Adoption Plan

Sprint	Action
1	Add perf-budget.json & k6 smoke (read-only)
2	Gate PR on p95, upload summary comment
3	Add chaos injection for DB timeouts
4	Nightly soak test + Grafana dashboard

Four sprints later, latency becomes a leading indicator, not a launch-day surprise.

Takeaway Checklist

Define performance budget JSON.
Run k6 smoke in PR; fail merge on breach.
Inject chaos on risky components.
Post table comment for instant dev feedback.

Add nightly soak to catch GC leaks.

/ Shift-Left Engineering /

Why Waiting for Staging Is a Losing Game

The “Performance Budget” JSON Node

Adding k6 to Every Pull-Request

3.1 Minimal k6 script (smoke.js)

3.2 GitHub Action (k6-perf.yml)

Injecting Chaos Before Merge

Toxiproxy CI Step

Surfacing Results Where Devs Live

Cost & Time Benchmarks

Real-World Impact (FinTech Scale-Up)

Pitfalls & Pro Tips

Sprint-by-Sprint Adoption Plan

Takeaway Checklist

Recent posts

ROI Math: When the Predictability Premium Pays for Itself in One Sprint

Governance Without Bureaucracy: 7 Plan-Left Gates Your Squad Needs

Scale Up in 48 Hours: How Core-Flex Talent Pipelines Add an Engineer Before the Next Stand-Up

The Buffer Bench Blueprint: Zero % Velocity Loss When Engineers Quit

Archive

Tags

AI Strategy and Consulting