{"id":35,"date":"2025-12-09T10:55:25","date_gmt":"2025-12-09T10:55:25","guid":{"rendered":"https:\/\/steadyrabbit.in\/blogs\/?p=35"},"modified":"2025-12-09T11:01:27","modified_gmt":"2025-12-09T11:01:27","slug":"performance-starts-on-day-0-k6-chaos-tests-inside-your-pull-request","status":"publish","type":"post","link":"https:\/\/steadyrabbit.in\/blogs\/performance-starts-on-day-0-k6-chaos-tests-inside-your-pull-request\/","title":{"rendered":"Performance Starts on Day 0: k6 &amp; Chaos Tests Inside Your Pull Request"},"content":{"rendered":"\n<p class=\"wp-block-paragraph\">Most teams discover latency spikes in staging\u201448 hours before launch. Real shift-left performance means catching a 300 ms endpoint before it leaves the PR. In this article we show how our Micro-GCC squads run <strong>k6 load scripts and Chaos mesh injections on every pull-request<\/strong> in under three minutes, gate merges on a JSON \u201cperformance budget,\u201d and keep production p95 under 200 ms\u2014even on Black-Friday traffic.<\/p>\n\n\n\n<h2 class=\"wp-block-heading\">Why Waiting for Staging Is a Losing Game\u00a0<\/h2>\n\n\n\n<p class=\"wp-block-paragraph\">A modern microservice hits prod in 4\u201310 deploys every day. If you test performance only in staging:<\/p>\n\n\n\n<ul class=\"wp-block-list\">\n<li>Latency bugs compound across services.<br><\/li>\n\n\n\n<li>Fixes collide with feature freeze, creating hot-fix Fridays.<br><\/li>\n\n\n\n<li>Developers never \u201cfeel\u201d latency locally\u2014so they keep writing slow code.<br><\/li>\n<\/ul>\n\n\n\n<p class=\"wp-block-paragraph\"><strong>Shift-Left performance<\/strong> puts <strong>load, chaos, and budget checks<\/strong> right next to unit tests and linting.<\/p>\n\n\n\n<h2 class=\"wp-block-heading\">The \u201cPerformance Budget\u201d JSON Node\u00a0<\/h2>\n\n\n\n<p class=\"wp-block-paragraph\">Create perf-budget.json at repo root:<\/p>\n\n\n\n<p class=\"wp-block-paragraph\">json<\/p>\n\n\n\n<p class=\"wp-block-paragraph\">CopyEdit<\/p>\n\n\n\n<p class=\"wp-block-paragraph\">{<\/p>\n\n\n\n<p class=\"wp-block-paragraph\">&nbsp;&nbsp;&#8220;globals&#8221;: {<\/p>\n\n\n\n<p class=\"wp-block-paragraph\">&nbsp;&nbsp;&nbsp;&nbsp;&#8220;avg_ms&#8221;: 150,<\/p>\n\n\n\n<p class=\"wp-block-paragraph\">&nbsp;&nbsp;&nbsp;&nbsp;&#8220;p95_ms&#8221;: 200,<\/p>\n\n\n\n<p class=\"wp-block-paragraph\">&nbsp;&nbsp;&nbsp;&nbsp;&#8220;error_rate_pct&#8221;: 0.5<\/p>\n\n\n\n<p class=\"wp-block-paragraph\">&nbsp;&nbsp;},<\/p>\n\n\n\n<p class=\"wp-block-paragraph\">&nbsp;&nbsp;&#8220;endpoints&#8221;: {<\/p>\n\n\n\n<p class=\"wp-block-paragraph\">&nbsp;&nbsp;&nbsp;&nbsp;&#8220;\/api\/v1\/login&#8221;: { &#8220;p95_ms&#8221;: 180 },<\/p>\n\n\n\n<p class=\"wp-block-paragraph\">&nbsp;&nbsp;&nbsp;&nbsp;&#8220;\/api\/v1\/cart&#8221;:&nbsp; { &#8220;p95_ms&#8221;: 220 }<\/p>\n\n\n\n<p class=\"wp-block-paragraph\">&nbsp;&nbsp;}<\/p>\n\n\n\n<p class=\"wp-block-paragraph\">}<\/p>\n\n\n\n<ul class=\"wp-block-list\">\n<li><strong>Globals<\/strong> apply to all requests.<br><\/li>\n\n\n\n<li><strong>Endpoint overrides<\/strong> handle heavier paths (e.g., cart).<br><\/li>\n\n\n\n<li>Keep three KPIs: average, p95, error-rate.<br><\/li>\n<\/ul>\n\n\n\n<p class=\"wp-block-paragraph\">Store this file in Git so changes trigger PR diff\u2014same workflow as package.json.<\/p>\n\n\n\n<h2 class=\"wp-block-heading\">Adding k6 to Every Pull-Request\u00a0<\/h2>\n\n\n\n<h3 class=\"wp-block-heading\"><strong>3.1 Minimal k6 script (<\/strong><strong>smoke.js<\/strong><strong>)<\/strong><\/h3>\n\n\n\n<p class=\"wp-block-paragraph\">js<\/p>\n\n\n\n<p class=\"wp-block-paragraph\">CopyEdit<\/p>\n\n\n\n<p class=\"wp-block-paragraph\">import http from &#8216;k6\/http&#8217;;<\/p>\n\n\n\n<p class=\"wp-block-paragraph\">import { check } from &#8216;k6&#8217;;<\/p>\n\n\n\n<p class=\"wp-block-paragraph\">import { Trend } from &#8216;k6\/metrics&#8217;;<\/p>\n\n\n\n<p class=\"wp-block-paragraph\">export let options = {<\/p>\n\n\n\n<p class=\"wp-block-paragraph\">&nbsp;&nbsp;vus: 5,<\/p>\n\n\n\n<p class=\"wp-block-paragraph\">&nbsp;&nbsp;duration: &#8217;30s&#8217;,<\/p>\n\n\n\n<p class=\"wp-block-paragraph\">&nbsp;&nbsp;thresholds: {<\/p>\n\n\n\n<p class=\"wp-block-paragraph\">&nbsp;&nbsp;&nbsp;&nbsp;http_req_duration: [&#8216;p(95)&lt;200&#8217;],<\/p>\n\n\n\n<p class=\"wp-block-paragraph\">&nbsp;&nbsp;&nbsp;&nbsp;http_req_failed: &nbsp; [&#8216;rate&lt;0.5&#8217;]<\/p>\n\n\n\n<p class=\"wp-block-paragraph\">&nbsp;&nbsp;}<\/p>\n\n\n\n<p class=\"wp-block-paragraph\">};<\/p>\n\n\n\n<p class=\"wp-block-paragraph\">const loginTrend = new Trend(&#8216;login_p95&#8217;);<\/p>\n\n\n\n<p class=\"wp-block-paragraph\">const cartTrend&nbsp; = new Trend(&#8216;cart_p95&#8217;);<\/p>\n\n\n\n<p class=\"wp-block-paragraph\">export default function () {<\/p>\n\n\n\n<p class=\"wp-block-paragraph\">&nbsp;&nbsp;const resLogin = http.post(`${__ENV.BASE_URL}\/api\/v1\/login`, {u:&#8217;demo&#8217;, p:&#8217;pw&#8217;});<\/p>\n\n\n\n<p class=\"wp-block-paragraph\">&nbsp;&nbsp;loginTrend.add(resLogin.timings.duration);<\/p>\n\n\n\n<p class=\"wp-block-paragraph\">&nbsp;&nbsp;check(resLogin, { &#8216;login p95 OK&#8217;: (r) =&gt; r.timings.duration &lt; 180 });<\/p>\n\n\n\n<p class=\"wp-block-paragraph\">&nbsp;&nbsp;const resCart = http.get(`${__ENV.BASE_URL}\/api\/v1\/cart`);<\/p>\n\n\n\n<p class=\"wp-block-paragraph\">&nbsp;&nbsp;cartTrend.add(resCart.timings.duration);<\/p>\n\n\n\n<p class=\"wp-block-paragraph\">&nbsp;&nbsp;check(resCart, { &#8216;cart p95 OK&#8217;: (r) =&gt; r.timings.duration &lt; 220 });<\/p>\n\n\n\n<p class=\"wp-block-paragraph\">}<\/p>\n\n\n\n<p class=\"wp-block-paragraph\">Environment variable BASE_URL points to Docker-compose service spun up in CI.<\/p>\n\n\n\n<h3 class=\"wp-block-heading\"><strong>3.2 GitHub Action (<\/strong><strong>k6-perf.yml<\/strong><strong>)<\/strong><\/h3>\n\n\n\n<p class=\"wp-block-paragraph\">yaml<\/p>\n\n\n\n<p class=\"wp-block-paragraph\">CopyEdit<\/p>\n\n\n\n<p class=\"wp-block-paragraph\">jobs:<\/p>\n\n\n\n<p class=\"wp-block-paragraph\">&nbsp;&nbsp;perf:<\/p>\n\n\n\n<p class=\"wp-block-paragraph\">&nbsp;&nbsp;&nbsp;&nbsp;runs-on: ubuntu-latest<\/p>\n\n\n\n<p class=\"wp-block-paragraph\">&nbsp;&nbsp;&nbsp;&nbsp;steps:<\/p>\n\n\n\n<p class=\"wp-block-paragraph\">&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&#8211; uses: actions\/checkout@v4<\/p>\n\n\n\n<p class=\"wp-block-paragraph\">&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&#8211; name: Build &amp; run containers<\/p>\n\n\n\n<p class=\"wp-block-paragraph\">&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;run: docker compose up -d &#8211;build<\/p>\n\n\n\n<p class=\"wp-block-paragraph\">&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&#8211; name: Run k6 smoke<\/p>\n\n\n\n<p class=\"wp-block-paragraph\">&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;uses: grafana\/k6-action@v0.2.0<\/p>\n\n\n\n<p class=\"wp-block-paragraph\">&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;with:<\/p>\n\n\n\n<p class=\"wp-block-paragraph\">&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;filename: .\/perf\/smoke.js<\/p>\n\n\n\n<p class=\"wp-block-paragraph\">&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;env:<\/p>\n\n\n\n<p class=\"wp-block-paragraph\">&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;BASE_URL: http:\/\/localhost:3000<\/p>\n\n\n\n<p class=\"wp-block-paragraph\">&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&#8211; name: Upload k6 summary<\/p>\n\n\n\n<p class=\"wp-block-paragraph\">&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;uses: actions\/upload-artifact@v4<\/p>\n\n\n\n<p class=\"wp-block-paragraph\">&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;with:<\/p>\n\n\n\n<p class=\"wp-block-paragraph\">&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;name: k6-summary<\/p>\n\n\n\n<p class=\"wp-block-paragraph\">&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;path: perf\/summaries\/<br><strong>Median runtime:<\/strong> 95 seconds for a five-VU, 30-second smoke test.<\/p>\n\n\n\n<h2 class=\"wp-block-heading\">Injecting Chaos Before Merge\u00a0<\/h2>\n\n\n\n<p class=\"wp-block-paragraph\">Latency isn\u2019t the only killer\u2014upstream timeouts can cascade. Enter <strong>Chaos Mesh<\/strong> (Kubernetes) or <strong>Toxiproxy<\/strong> (Docker).<\/p>\n\n\n\n<h3 class=\"wp-block-heading\"><strong>Toxiproxy CI Step<\/strong><\/h3>\n\n\n\n<p class=\"wp-block-paragraph\">yaml<\/p>\n\n\n\n<p class=\"wp-block-paragraph\">CopyEdit<\/p>\n\n\n\n<p class=\"wp-block-paragraph\">&#8211; name: Inject 300ms latency on PostgreSQL<\/p>\n\n\n\n<p class=\"wp-block-paragraph\">&nbsp;&nbsp;run: |<\/p>\n\n\n\n<p class=\"wp-block-paragraph\">&nbsp;&nbsp;&nbsp;&nbsp;docker run -d &#8211;name toxiproxy -p 8474:8474 shopify\/toxiproxy<\/p>\n\n\n\n<p class=\"wp-block-paragraph\">&nbsp;&nbsp;&nbsp;&nbsp;curl -XPOST -d &#8216;{&#8220;name&#8221;:&#8221;pg&#8221;,&#8221;listen&#8221;:&#8221;0.0.0.0:5433&#8243;,&#8221;upstream&#8221;:&#8221;db:5432&#8243;}&#8217; \\<\/p>\n\n\n\n<p class=\"wp-block-paragraph\">&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;http:\/\/localhost:8474\/proxies<\/p>\n\n\n\n<p class=\"wp-block-paragraph\">&nbsp;&nbsp;&nbsp;&nbsp;curl -XPOST -d &#8216;{&#8220;latency&#8221;:{ &#8220;latency&#8221;: 300, &#8220;jitter&#8221;: 50 }}&#8217; \\<\/p>\n\n\n\n<p class=\"wp-block-paragraph\">&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;http:\/\/localhost:8474\/proxies\/pg\/toxics<\/p>\n\n\n\n<p class=\"wp-block-paragraph\">Re-run the k6 job against DB-latency chaos. Performance budget remains the same; PR fails if p95 breaches 200 ms.<strong>Why developers don\u2019t hate it:<\/strong> Chaos step runs <strong>only<\/strong> on PRs that modify Dockerfile, docker-compose.yml, or \/db\/**. Use paths: filter in GitHub Actions.<\/p>\n\n\n\n<h2 class=\"wp-block-heading\">Surfacing Results Where Devs Live\u00a0<\/h2>\n\n\n\n<p class=\"wp-block-paragraph\">Use <strong>k6-summary-commenter<\/strong> Action to drop a Markdown table into the PR:<\/p>\n\n\n\n<figure class=\"wp-block-table\"><table class=\"has-fixed-layout\"><tbody><tr><td><strong>Metric<\/strong><\/td><td><strong>Budget<\/strong><\/td><td><strong>Result<\/strong><\/td><td><strong>Status<\/strong><\/td><\/tr><tr><td>Avg ms<\/td><td>150<\/td><td>132<\/td><td>\u2705<\/td><\/tr><tr><td>p95 ms<\/td><td>200<\/td><td>178<\/td><td>\u2705<\/td><\/tr><tr><td>Errors<\/td><td>0.5 %<\/td><td>0.2 %<\/td><td>\u2705<\/td><\/tr><\/tbody><\/table><\/figure>\n\n\n\n<p class=\"wp-block-paragraph\">Developer sees fail\/pass inline\u2014no need to dig in CI logs. Link the artifact for full Grafana run.<\/p>\n\n\n\n<h2 class=\"wp-block-heading\">Cost &amp; Time Benchmarks\u00a0<\/h2>\n\n\n\n<figure class=\"wp-block-table\"><table class=\"has-fixed-layout\"><tbody><tr><td><strong>CI Level<\/strong><\/td><td><strong>Time Added<\/strong><\/td><td><strong>Compute Cost (GitHub-Hosted)<\/strong><\/td><\/tr><tr><td>k6 smoke<\/td><td>95 s<\/td><td>$0.01<\/td><\/tr><tr><td>Toxiproxy chaos + k6<\/td><td>+70 s<\/td><td>$0.007<\/td><\/tr><tr><td><strong>Total<\/strong><\/td><td><strong>165 s<\/strong><\/td><td><strong>$0.017<\/strong> per PR<\/td><\/tr><\/tbody><\/table><\/figure>\n\n\n\n<p class=\"wp-block-paragraph\">At 200 PRs\/month that\u2019s $3.40\u2014cheaper than one post-mortem.<\/p>\n\n\n\n<h2 class=\"wp-block-heading\">Real-World Impact (FinTech Scale-Up)\u00a0<\/h2>\n\n\n\n<p class=\"wp-block-paragraph\"><em>Baseline<\/em>: latency p95 oscillated 210\u2013260 ms; two hot-fixes during release freeze.<br><em>After shift-left performance<\/em>:<\/p>\n\n\n\n<ul class=\"wp-block-list\">\n<li>p95 held at <strong>&lt; 190 ms<\/strong> for six months.<br><\/li>\n\n\n\n<li>Hot-fix count: <strong>0<\/strong>.<br><\/li>\n\n\n\n<li>Release freeze shrank from 3 days \u2192 \u00bd day.<br><\/li>\n<\/ul>\n\n\n\n<p class=\"wp-block-paragraph\">PM said: <em>\u201cWe spend freeze week on marketing now, not firefighting.\u201d<\/em><\/p>\n\n\n\n<h2 class=\"wp-block-heading\">Pitfalls &amp; Pro Tips\u00a0<\/h2>\n\n\n\n<figure class=\"wp-block-table\"><table class=\"has-fixed-layout\"><tbody><tr><td><strong>Pitfall<\/strong><\/td><td><strong>Fix<\/strong><\/td><\/tr><tr><td>\u201cCI job flaky on Mondays\u201d<\/td><td>Warm cache containers; pre-pull k6 image.<\/td><\/tr><tr><td>\u201cChaos proxy breaks DB auth\u201d<\/td><td>Exclude 127.0.0.1 or use TLS passthrough config.<\/td><\/tr><tr><td>\u201cDevelopers ignore perf budget\u201d<\/td><td>Fail PR when any KPI &gt; budget\u2014no bypass.<\/td><\/tr><tr><td>\u201cSmoke test too small to matter\u201d<\/td><td>Keep smoke quick (\u2264 30 s) per PR; schedule nightly soak test for 10 m.<\/td><\/tr><\/tbody><\/table><\/figure>\n\n\n\n<hr class=\"wp-block-separator has-alpha-channel-opacity\"\/>\n\n\n\n<h2 class=\"wp-block-heading\">Sprint-by-Sprint Adoption Plan\u00a0<\/h2>\n\n\n\n<figure class=\"wp-block-table\"><table class=\"has-fixed-layout\"><tbody><tr><td><strong>Sprint<\/strong><\/td><td><strong>Action<\/strong><\/td><\/tr><tr><td>1<\/td><td>Add perf-budget.json &amp; k6 smoke (read-only)<\/td><\/tr><tr><td>2<\/td><td>Gate PR on p95, upload summary comment<\/td><\/tr><tr><td>3<\/td><td>Add chaos injection for DB timeouts<\/td><\/tr><tr><td>4<\/td><td>Nightly soak test + Grafana dashboard<\/td><\/tr><\/tbody><\/table><\/figure>\n\n\n\n<p class=\"wp-block-paragraph\">Four sprints later, latency becomes a <em>leading<\/em> indicator, not a launch-day surprise.<\/p>\n\n\n\n<h2 class=\"wp-block-heading\">Takeaway Checklist\u00a0<\/h2>\n\n\n\n<ol class=\"wp-block-list\">\n<li>Define performance budget JSON.<br><\/li>\n\n\n\n<li>Run k6 smoke in PR; fail merge on breach.<br><\/li>\n\n\n\n<li>Inject chaos on risky components.<br><\/li>\n\n\n\n<li>Post table comment for instant dev feedback.<br><\/li>\n<\/ol>\n\n\n\n<p class=\"wp-block-paragraph\">Add nightly soak to catch GC leaks.<\/p>\n","protected":false},"excerpt":{"rendered":"<p>Most teams discover latency spikes in staging\u201448 hours before launch. Real shift-left performance means catching a 300 ms endpoint before it leaves the PR. In this article we show how our Micro-GCC squads run k6 load scripts and Chaos mesh injections on every pull-request in under three minutes, gate merges on a JSON \u201cperformance budget,\u201d [&hellip;]<\/p>\n","protected":false},"author":1,"featured_media":20,"comment_status":"open","ping_status":"open","sticky":false,"template":"","format":"standard","meta":{"footnotes":""},"categories":[3],"tags":[],"class_list":["post-35","post","type-post","status-publish","format-standard","has-post-thumbnail","hentry","category-shift-left-engineering"],"_links":{"self":[{"href":"https:\/\/steadyrabbit.in\/blogs\/wp-json\/wp\/v2\/posts\/35","targetHints":{"allow":["GET"]}}],"collection":[{"href":"https:\/\/steadyrabbit.in\/blogs\/wp-json\/wp\/v2\/posts"}],"about":[{"href":"https:\/\/steadyrabbit.in\/blogs\/wp-json\/wp\/v2\/types\/post"}],"author":[{"embeddable":true,"href":"https:\/\/steadyrabbit.in\/blogs\/wp-json\/wp\/v2\/users\/1"}],"replies":[{"embeddable":true,"href":"https:\/\/steadyrabbit.in\/blogs\/wp-json\/wp\/v2\/comments?post=35"}],"version-history":[{"count":3,"href":"https:\/\/steadyrabbit.in\/blogs\/wp-json\/wp\/v2\/posts\/35\/revisions"}],"predecessor-version":[{"id":39,"href":"https:\/\/steadyrabbit.in\/blogs\/wp-json\/wp\/v2\/posts\/35\/revisions\/39"}],"wp:featuredmedia":[{"embeddable":true,"href":"https:\/\/steadyrabbit.in\/blogs\/wp-json\/wp\/v2\/media\/20"}],"wp:attachment":[{"href":"https:\/\/steadyrabbit.in\/blogs\/wp-json\/wp\/v2\/media?parent=35"}],"wp:term":[{"taxonomy":"category","embeddable":true,"href":"https:\/\/steadyrabbit.in\/blogs\/wp-json\/wp\/v2\/categories?post=35"},{"taxonomy":"post_tag","embeddable":true,"href":"https:\/\/steadyrabbit.in\/blogs\/wp-json\/wp\/v2\/tags?post=35"}],"curies":[{"name":"wp","href":"https:\/\/api.w.org\/{rel}","templated":true}]}}