How GitHub Ensures Deployment Safety with eBPF
GitHub uses eBPF to prevent circular dependencies during deployment, ensuring site reliability even during outages.
GitHub faces a unique challenge: it runs on its own platform, creating a circular dependency where an outage could prevent engineers from accessing the code needed to fix it. To address this, GitHub has turned to eBPF (extended Berkeley Packet Filter) to monitor and block dangerous network calls during deployments. By intercepting specific system calls, eBPF allows GitHub to enforce strict boundaries, ensuring deployment scripts don't accidentally create new dependencies that worsen an outage. Below, we explore this approach through key questions about the problem and the solution.
What circular dependency does GitHub face when deploying its own platform?
GitHub hosts its source code on github.com—a practice that makes them their own biggest customer. However, this creates a simple circular dependency: to deploy fixes to GitHub, engineers need access to GitHub. If an outage strikes and the site is down, they cannot pull updated code or assets to restore service. While GitHub maintains a mirror for emergency fixes and pre-built assets for rollbacks, deeper circular dependencies remain. For instance, deployment scripts might inadvertently rely on internal services or download binaries from GitHub itself, introducing new failure points. These hidden risks can turn a minor outage into a major incident, as the very tools meant to resolve problems become the bottleneck.

What are the three types of circular dependencies described?
In a hypothetical MySQL outage scenario, GitHub identifies three types:
- Direct dependency: A deploy script tries to fetch a release from GitHub to complete its action. Since GitHub is down, the script fails.
- Hidden dependency: A tool already on the server checks for an update every time it runs. If it cannot reach GitHub, it may hang or error out.
- Transient dependency: A script calls an internal API (e.g., a migration service) that in turn attempts to download a binary from GitHub, cascading the failure back.
Each type represents a distinct way deployment logic can accidentally depend on the same system it’s trying to fix.
How did GitHub traditionally handle these dependencies?
Previously, the responsibility fell on every team managing stateful hosts to manually review their deployment scripts and identify potential circular dependencies. This was a tedious and error-prone process. Teams had to test scripts in offline environments or manually audit every network call to ensure nothing relied on GitHub or internal services that might be unavailable. In practice, many dependencies went unnoticed until an outage occurred, leading to deployment failures exactly when they needed to work most. The manual approach also slowed down innovation, as teams spent significant time verifying safety rather than focusing on features.
Why did GitHub choose eBPF to solve this problem?
While designing a new host-based deployment system, GitHub evaluated several approaches to prevent circular dependencies. eBPF stood out because it can selectively monitor and block system calls at the kernel level, without modifying application code. This allowed GitHub to enforce rules like “this deploy script must not make outbound HTTP requests to github.com” without needing each script to opt-in. eBPF attaches to kernel events (e.g., connect(), sendto()), decides in real-time whether to allow or deny, and logs violations. This non-intrusive, programmable approach gave GitHub precise control over deployment behavior without overhead to developers.

How does eBPF specifically prevent circular dependencies during deployments?
GitHub writes small eBPF programs that run in the kernel, watching for specific network connections initiated by deployment processes. When a deploy script attempts to contact a blacklisted endpoint (like GitHub’s API or an internal service that itself depends on GitHub), eBPF can block the connection and log the attempt. The program uses context—process ID, domain, and port—to enforce policies. For example, any DNS resolution for github.com from a deployment script can be refused immediately. This ensures that even if a script contains a hidden download, it will fail fast during normal operations (in a test environment), allowing teams to fix it before an outage. By catching these issues early, eBPF breaks the cycle at scale.
Can you outline a practical example of eBPF in action for deployment safety?
Consider a MySQL node in an outage. A deploy script runs to apply a config change. The script calls an internal servicing tool that, unbeknownst to the authors, checks for an update on GitHub. Normally this would cause a hang if GitHub is unreachable. With eBPF, a program is attached to the connect syscall for processes belonging to deployment tools. When the tool tries to reach api.github.com, eBPF evaluates the destination IP and process credentials, then denies the connection immediately. The tool receives a connection error and can fall back safely. The eBPF program also sends a performance event to user-space logs, alerting operators to the hidden dependency. This real-time enforcement turns a potential outage extension into a minor log entry.
What advice does GitHub offer for teams wanting to implement eBPF for similar challenges?
GitHub recommends starting small: identify the most critical circular dependencies in your deployment pipeline and write eBPF programs to block them. Use tools like bpftrace or the libbpf library for development. Focus on system calls that matter—connect, sendto, and open for file access—and attach to specific cgroups or process trees. Test the eBPF program in a staging environment first; mistakes can block legitimate traffic. Log all denials and review them regularly to refine rules. Finally, involve both security and platform engineering teams, as eBPF requires kernel knowledge and careful permission management. With this approach, teams can systematically eliminate deployment-time dependencies and improve overall site reliability.