4852
Programming

How to Capture and Analyze Go Execution Traces with the Flight Recorder

Posted by u/Zheng01 · 2026-05-02 19:29:04

Introduction

In Go 1.25, the flight recorder introduces a powerful new way to diagnose latency issues in long-running services. Unlike traditional execution traces that require you to start recording before a problem occurs, the flight recorder continuously buffers the last few seconds of trace data in memory. When your program detects an anomaly—such as a timeout or a failed health check—it can instantly snapshot that exact window, giving you the precise trace data you need to find the root cause. This guide walks you through setting up and using the flight recorder step by step.

How to Capture and Analyze Go Execution Traces with the Flight Recorder
Source: blog.golang.org

What You Need

  • Go 1.25 or later installed on your development and production environments.
  • A Go application (web service, background worker, etc.) that you want to monitor.
  • Basic familiarity with Go execution traces and the runtime/trace package.
  • A text editor or IDE for code editing.
  • Access to the go tool trace command for analyzing captured traces.

Step-by-Step Guide

Step 1: Import the Necessary Package

Start by importing the runtime/trace package in your Go source file. This package contains both the traditional trace API and the new flight recorder functionality.

import "runtime/trace"

Step 2: Create a Flight Recorder Instance

Initialize a flight recorder with a specific buffer duration. This determines how many seconds of execution trace are kept in memory. For most services, a buffer of 10–30 seconds provides a good balance between memory usage and diagnostic coverage.

fr := trace.NewFlightRecorder(15 * time.Second) // buffer last 15 seconds

Step 3: Start the Flight Recorder

Call the Start method to begin continuous tracing. This replaces the old pattern of calling trace.Start and trace.Stop manually. The flight recorder runs in the background, constantly overwriting the buffer.

if err := fr.Start(); err != nil {
    log.Fatalf("failed to start flight recorder: %v", err)
}
defer fr.Stop()

Step 4: Define Your Anomaly Detection Logic

Identify the conditions that indicate a problem in your application. Common triggers include:

  • Long request latencies exceeding a threshold
  • Failed health checks
  • Errors from external services
  • Sudden goroutine spikes

For example, in an HTTP server, you might log a warning when a handler takes longer than 500ms:

http.HandleFunc("/api", func(w http.ResponseWriter, r *http.Request) {
    start := time.Now()
    // ... handle request ...
    if elapsed := time.Since(start); elapsed > 500*time.Millisecond {
        // Trigger snapshot here (see step 5)
    }
})

Step 5: Capture a Snapshot When a Problem Occurs

When your detection logic fires, call the flight recorder’s Snapshot method to obtain the buffered trace data. This returns a []byte slice containing the trace in the standard format.

data, err := fr.Snapshot()
if err != nil {
    log.Printf("snapshot failed: %v", err)
    return
}

Step 6: Write the Snapshot to a File

Save the captured trace to a file so you can analyze it later. Use a descriptive filename that includes a timestamp or request identifier.

How to Capture and Analyze Go Execution Traces with the Flight Recorder
Source: blog.golang.org
filename := fmt.Sprintf("trace_snapshot_%d.out", time.Now().Unix())
if err := os.WriteFile(filename, data, 0644); err != nil {
    log.Printf("failed to write trace file: %v", err)
    return
}

Step 7: Analyze the Trace

Use the go tool trace command to open and examine the captured file. This tool provides a web-based viewer that shows goroutine lifetimes, network activity, GC events, and more.

go tool trace trace_snapshot_1234567890.out

In the viewer, focus on the time window just before the snapshot was taken to identify what caused the slowdown or failure.

Step 8: Integrate with Your Logging and Alerting

For production use, combine snapshot capture with your existing logging and alerting systems. For example:

  • Write the trace data to a central storage (e.g., S3, GCS) for later analysis.
  • Send a notification to your team when a snapshot is taken.
  • Attach the trace file to a bug report or incident.

Tips for Effective Use

  • Choose the right buffer size. A buffer of 10–15 seconds is usually enough to capture the events leading to a problem. Larger buffers consume more memory but provide longer context.
  • Test the flight recorder in staging first. Ensure it does not interfere with your application’s performance (overhead is minimal, but verify in your environment).
  • Combine with structured logging. Log the reason for each snapshot so you can correlate trace data with application events.
  • Monitor memory usage. The flight recorder uses a fixed-size buffer; check that this fits within your service’s memory budget.
  • Automate cleanup. Snapshots can accumulate quickly; implement a retention policy to delete old trace files.
  • Use the trace viewer effectively. Filter by goroutine, look for long-running network calls, and inspect garbage collection pauses.
  • Consider sampling for high-traffic services. If snapshots are triggered too often, randomly sample only a fraction of them to reduce storage and noise.

By following these steps, you can leverage the Go flight recorder to quickly diagnose latency and reliability issues in your services, turning a vague “something went wrong” into actionable trace data.