Running profiler In Go
There are several options to run the profiler. For all options we will generate pprof
file which might be
rendered with go tool pprof
tool to interactive or website mode (other formats are also available). There also
other tools for rendering profile files - GoLand IDE for example.
//useful example
go tool pprof -http :8080 file.pprof
In Code
There is runtime/pprof
and package named profile
which is easier to use.
func main() {
//creates pprof file for memory profile
defer profile.Start(profile.MemProfile).Stop()
//creates pprof file for CPU profile
defer profile.Start(profile.CPUProfile).Stop()
}
Other examples are available in documentation https://pkg.go.dev/github.com/pkg/profile
If you are confused why Start and Stop is in defer then remember - in defer chained functions only last one is called after main function return, others are called at defer inicialization.
Benchmarks
When we have ready benchmarks then we can run those with additional profilers.
go test -cpuprofile cpu.pprof -memprofile mem.pprof -bench .
Other profilers might be found in Go documentation.
Via HTTP
Most common way to capture profiles for programs at macro and production levels.
import _ "net/http/pprof"
Or
m := http.NewServeMux()
m.HandleFunc("/debug/pprof/", pprof.Index)
m.HandleFunc("/debug/pprof/profile", pprof.Profile)
_ = http.Server{Handler: m}
//listener
Other handlers or profiler implementations also might be defined.
Those endpoints just provides pprof text which should be viewed using go tool ppfrof
.
It might be done directly by passing url to go tool ppfrof <url>
Rendering tips
Use pprof tool directly from http provided profile:
go tool pprof -http :8080 http://<address>/debug/pprof/heap
``
Available profilers
Heap
Shows only memory blocks allocated on the heap, not memory allocated on stack, or custom mmap calls. By default, Go records a sample per every 512 KB of allocated memory on the heap. It might be configured. Heap/alloc contains 4 sample value types:
- alloc_space - total number of allocated bytes by location on the heap since the start of your program, even cleaned by garbage collector.
- alloc_objects - number of all allocated memory blocks but not the actual space.
- inuse_space - currently allocated bytes on the heap
- inuse_objects - current number of allocated memory blocks (objects) on the heap
CPU
CPU profiler doesn’t return its profile immediately, It must be explicitly started and stopped, then it is usable to diagnose. Currently, profile rate is set to content value - 100Hz. Sample values:
- samples - number of samples observed at the location.
- CPU - cpu time at the location
Goroutines
Profiler in view contains functions that are involved in goroutines internal runtime:
- runtime.gopark - used for park goroutines when they are waiting for specified bellow things(e.g I/O, channel communication).
- runtime.chanrecv - used when goroutine wait for new value from channel.
- runtime.chansend - used when goroutine wait to send something to channel.
- runtime.selectgo - used when goroutine is waiting or checking cases in select statement.
- runtime.netpollblock - used when goroutine waits for network I/O.
Off-CPU Time
Profile is not available in native profiler but external profiler like fgprof
provides it.
This value give us information about time when CPU time is not used, e.g. I/O from disk, network, external device or just syscalls.
Additional info
- Heap profile doesn’t show variables connected to specific memory, so external tools like
viewcore
(CockroachDB) might be needed. - When compiling binary we might remove readable stack trace and source code information to make the binary size much smaller (check DWARF table and ldflags).
- Profile is only an estimation and in reality real allocations might be larger or smaller.
- Profiles might be aggregated using subtracting, diff and merge functions.
- There is
lines granularity
in profiler options to get more granular data.
Continuous profiling
If you need level up profiling then you can check continuous profiling. There are few tools that provide that - one is Pyroscope. It supports pull and push modes and eBPF. https://github.com/grafana/pyroscope/tree/main/examples
Solutions
- https://github.com/pkg/profile - lib for go profiling
- https://github.com/grafana/pyroscope - profiling in grafana
- https://www.parca.dev/docs/concepts - continuous profiling
- https://github.com/felixge/fgprof - better sampling Go profiler for CPU, includes off-CPU Time.