speaking

Lessons from a Security Incident (GrafanaCON 2026)

A walkthrough of the April 2025 CI/CD incident at Grafana Labs - one small misconfiguration, fast detection thanks to canary tokens, and why preparation beats reaction.

David Andersson
David Andersson
2 min read

I had the opportunity to speak at GrafanaCON 2026 in Barcelona, together with Nick Moore, Principal Security Engineer at Grafana Labs, on the topic of Lessons from that security incident when everything went wrong (but ended up right).

Nick and David on stage at GrafanaCON 2026, with a slide reading "One small change: a small change for Grafana, a giant opportunity for hackers"

Abstract

On a Saturday morning in April 2025, two members of the Grafana Labs security team got an alert they had never seen fire before. By the time the dust settled, the team had walked through a full CI/CD compromise, traced every action the attacker took, and confirmed - with evidence, not hope - that no customer or user data had been touched.

The talk tells that story end to end. The vulnerability was a single workflow change that swapped pull_request for pull_request_target, which is the kind of edit that looks safe and isn’t. The attacker found it within hours, exfiltrated GitHub Secrets with Gato-X, and started validating tokens. The detection that bought us the time we needed came not from a flashy commercial tool, but from a canary token that fired the moment the attacker tried to authenticate. The rest of the response leaned heavily on open source: Loki for forensic queries that GitHub’s native tooling cannot do, Trufflehog and Zizmor for sweeping the rest of the estate, Gato-X turned around to give us the attacker’s view of our own repositories.

If there is one thing to take away, it is that preparation beats reaction. Detection engineering, observability, and secret hygiene applied to CI/CD is not optional infrastructure; it is the thing that earns you the right to say “no customer impact” and actually mean it.

Key takeaways

  • The gap between looks safe and is safe is one keyword wide. pull_request_target exists for legitimate reasons, but combined with user-controllable inputs, it is a foothold waiting to happen. Treat workflow changes the way you would treat changes to an IAM policy.
  • Canary tokens earn their keep when nothing else fires. Even though they are an unfashionable, decades-old idea, they were the reason this incident was caught in hours rather than weeks.
  • Observability belongs in CI/CD, not only in production. GitHub’s native logs can be deleted by an attacker with the right token. Loki’s logs cannot. That difference matters when you are trying to reconstruct what happened, and to whom you need to say what.
  • Open source is a defensive advantage. Trufflehog, Zizmor, and Gato-X were available the moment we needed them. The same tools the attacker reached for were the ones we used to push them back out.
  • Preparation beats reaction. The migration from GitHub Secrets to HashiCorp Vault, which was already underway when the incident happened, is a large part of the reason the impact was contained. Subsequent supply-chain compromises in the ecosystem (Aqua, Axios) landed in a much harder Grafana than they would have a year earlier.

Slides

The slides are available as a PDF download.

Recording

A recording of the talk is published on the GrafanaCON agenda page, with the full transcript available under the transcript tab.


Presented at GrafanaCON 2026, Barcelona, Spain - May 15, 2026

Share share mail
David Andersson

David Andersson

David Andersson — Security engineering leader, CISSP, with nearly 20 years building and scaling security programs for software companies, government agencies, and global enterprises.