PC Gaming Performance Benchmarking: How to Measure FPS and System Output

PC gaming performance benchmarking is the structured process of measuring, recording, and interpreting quantitative output from a gaming system — including frame rate, frame time, GPU utilization, CPU load, thermal readings, and memory bandwidth — to assess how well hardware and software configurations perform under defined test conditions. This reference covers the technical definitions, measurement mechanics, causal relationships between hardware and output metrics, classification distinctions between benchmark types, and the contested tradeoffs that practitioners and researchers regularly encounter. The subject is directly relevant to hardware analysts, system builders, competitive players, and anyone operating within the PC gaming ecosystem where performance parity and hardware investment decisions depend on reproducible data.

Definition and scope
Core mechanics or structure
Causal relationships or drivers
Classification boundaries
Tradeoffs and tensions
Common misconceptions
Benchmark execution sequence
Reference table: key metrics and tools

Definition and scope

Performance benchmarking in PC gaming is a measurement discipline, not an optimization activity. Its output is quantified data describing what a given hardware-software configuration produces under reproducible test conditions — not a prescription for settings changes. The primary metrics captured fall into two categories: throughput metrics (frames per second, render resolution, data throughput) and latency metrics (frame time in milliseconds, input-to-display latency, GPU render latency).

Frames per second (FPS) is the most recognized throughput metric: the number of fully rendered frames a GPU delivers to the display pipeline per second. At 60 FPS, one frame is produced every 16.67 milliseconds. At 144 FPS, that interval compresses to 6.94 milliseconds. These intervals matter because frame time variance — not average FPS alone — determines the smoothness a player perceives. A benchmark session averaging 144 FPS with individual frame spikes to 40 milliseconds will produce visible stutter regardless of the reported average.

Benchmark scope extends across the full system stack: GPU render performance, CPU simulation throughput, RAM latency and bandwidth, storage load times, and display output characteristics. The scope of a given benchmark session is defined by which variables are controlled and which are allowed to vary. Failure to define scope — for example, running a game benchmark at inconsistent background application loads — produces data that cannot be replicated or compared.

The scope of benchmarking as a reference activity is broader on the PC Gaming Authority index, which situates benchmarking within the hardware selection and upgrade decision lifecycle, alongside CPU performance profiling, GPU capability tiers, and display matching.

Core mechanics or structure

A benchmark session consists of three structural components: the test workload, the measurement instrumentation, and the recording interval.

Test workload refers to the rendered scene or game sequence used to stress the hardware. This is either a synthetic workload (a purpose-built stress test such as 3DMark's Time Spy or Port Royal, published by UL Benchmarks) or an in-game workload using a repeatable gameplay segment or built-in benchmark mode. Synthetic benchmarks produce standardized, cross-system comparable scores. In-game benchmarks reflect real-world rendering conditions but vary across game engines, driver versions, and scene compositions.

Measurement instrumentation captures output data during the workload. The primary instrumentation layer on Windows systems is GPU driver telemetry, accessible through vendor tools such as NVIDIA FrameView (published by NVIDIA Corporation) and AMD's Radeon Software Performance Overlay (published by AMD). Hardware-agnostic tools include CapFrameX and PresentMon, an open-source capture tool maintained by Intel Corporation that hooks into the Windows DXGI presentation pipeline to log per-frame timing at the OS level.

The recording interval defines the duration and conditions of capture. A standard benchmark interval for reproducibility is a minimum of 60 continuous seconds of stable workload. Shorter intervals are susceptible to load-ramp artifacts. Warm-up periods of 10–15 seconds before the recording window begins allow the GPU and CPU thermal states to stabilize, preventing throttle events from distorting early-interval data.

Frame time data is typically reported in three statistical forms: average FPS (arithmetic mean of frame delivery rate), 1% low FPS (the bottom first percentile of frame rate across the session), and 0.1% low FPS (the bottom tenth-of-a-percent). These low percentile metrics capture the worst-case frame delivery events — stutters, hitches, and throttle-induced frame drops — that average FPS conceals entirely.

Causal relationships or drivers

The relationship between hardware specifications and benchmark output is not linear. GPU compute throughput — measured in TFLOPS (teraflops of floating-point operations per second) — is the primary driver of rasterized rendering performance, but it interacts with VRAM capacity, memory bandwidth, and CPU frame preparation speed to determine actual delivered FPS.

At high resolutions (3840×2160, commonly called 4K), GPU VRAM capacity becomes a hard constraint. A GPU with 8 GB of VRAM running a scene with 10 GB of active texture data must stream assets from system RAM through the PCIe bus, introducing latency and frame time spikes. NVIDIA's RTX 4070, for example, carries 12 GB of GDDR6X, while the RTX 4090 carries 24 GB — a difference that becomes operationally significant at 4K with maximum texture quality settings enabled.

CPU performance is the dominant driver at low resolutions (1080p and below), where GPU render time is short enough that CPU frame preparation — physics simulation, AI computation, draw call submission — becomes the bottleneck. This is called a CPU bottleneck and is identifiable when GPU utilization falls below 90% while CPU core utilization on the game thread is at or near 100%.

RAM speed and latency affect CPU-side frame preparation time. DDR5-6000 operating at CL30 timings delivers measurably lower memory access latency than DDR5-4800 at CL40, with published benchmark data from Tom's Hardware and AnandTech documenting FPS deltas of 5–12% in CPU-bottlenecked scenarios at 1080p.

Storage subsystem speed determines load times and asset streaming performance in open-world titles. An NVMe SSD with sequential read speeds of 7,000 MB/s (a common specification for PCIe Gen 4 drives) reduces texture pop-in and load screen durations compared to a SATA SSD operating at 550 MB/s — a difference that benchmarks measuring frame rate and resolution in PC gaming contexts must account for when open-world streaming is part of the test workload.

Classification boundaries

Benchmarks fall across two primary axes: workload type (synthetic vs. in-game) and measurement focus (throughput vs. latency).

Synthetic benchmarks are reproducible by design. UL Benchmarks' 3DMark suite — which includes Time Spy (DirectX 12 rasterization), Speed Way (DirectX 12 Ultimate ray tracing), and Port Royal (dedicated ray tracing stress test) — produces integer scores normalized to a reference system. These scores enable cross-hardware comparison but do not directly predict in-game FPS in any specific title.

In-game benchmarks use real game engine workloads. Built-in benchmark modes exist in titles including Shadow of the Tomb Raider, Total War: Warhammer III, Horizon Zero Dawn, and Cyberpunk 2077. These are repeatable by replaying a fixed camera path, but results vary with driver versions, background processes, and OS state.

Latency-focused benchmarks measure the interval between a player input (mouse click, keypress) and the corresponding pixel change on the display. NVIDIA's LDAT (Latency Display Analysis Tool) and Blur Busters' methods capture end-to-end system latency, which combines GPU render time, display pipeline delay, and monitor response time. This classification is distinct from FPS benchmarking and requires specialized hardware capture equipment.

A separate classification boundary exists between controlled benchmarks (fixed settings, clean system state, single application running) and real-world performance captures (background applications present, variable system load). These two categories are not interchangeable in published results.

Tradeoffs and tensions

The central tension in PC gaming benchmarking is between reproducibility and representativeness. Synthetic benchmarks produce reproducible, comparable results but do not reflect the variable rendering demands of actual gameplay. In-game benchmarks are more representative but introduce variability from driver updates, OS patches, and background process states that make direct cross-session comparison unreliable.

A secondary tension exists between average FPS as a marketing metric and frame time percentiles as the operationally meaningful measure. GPU vendors historically publish average FPS figures in marketing materials. Hardware reviewers including Digital Foundry (Eurogamer Network) and Hardware Unboxed have documented cases where two GPUs producing identical average FPS deliver measurably different 1% low percentiles — creating a performance gap invisible in headline numbers but perceptible in gameplay.

Ray tracing introduces a specific tradeoff in benchmarking scope. Enabling ray tracing on a scene that uses it selectively (reflections only, for example) produces a different GPU load profile than a scene with full path tracing enabled. Ray tracing and DLSS benchmarks must specify exactly which ray tracing features are active and at what quality level to produce comparable data across different GPU architectures.

Upscaling technologies — NVIDIA DLSS, AMD FSR, and Intel XeSS — complicate FPS benchmarking by separating the render resolution from the output resolution. A benchmark reporting 120 FPS at 4K using DLSS Quality mode is rendering at approximately 1440p internally, then upscaling. Reporting that figure alongside a native 4K benchmark without disclosure conflates two structurally different workloads.

Common misconceptions

Misconception: Higher average FPS always means a smoother experience.
Correction: Average FPS masks frame time variance. A result of 90 FPS average with a 1% low of 22 FPS will produce visible stutter. Frame time consistency — measured by the standard deviation of frame intervals — is the accurate predictor of perceived smoothness.

Misconception: Benchmark scores are directly comparable across driver versions.
Correction: GPU driver updates frequently change rasterization performance by 2–8% through shader compiler changes and pipeline optimizations. A Time Spy score recorded on NVIDIA driver 546.01 is not directly comparable to one recorded on driver 551.23 without disclosure of driver version, because the underlying execution path may have changed.

Misconception: A GPU running at 99% utilization is bottlenecked.
Correction: 99% GPU utilization is the target state. It indicates the GPU is fully fed with work and operating at capacity — the desired condition in a GPU-limited workload. A bottleneck is identified when GPU utilization drops significantly below 90% despite the player requesting maximum render throughput, indicating the CPU cannot prepare frames fast enough.

Misconception: 4K benchmarks always reflect GPU capability more accurately than 1080p.
Correction: At 4K, VRAM capacity and memory bandwidth become dominant constraints that may not reflect raw shader throughput. A GPU with a bandwidth-limited memory bus will perform disproportionately poorly at 4K relative to its TFLOPS rating. Benchmark results at multiple resolutions are required to separate memory bandwidth constraints from compute constraints.

Benchmark execution sequence

The following sequence describes the procedural structure of a controlled in-game benchmark session:

Record baseline system state — note GPU driver version, Windows build number, and background application inventory before beginning.
Close non-essential background applications — terminate browser sessions, game launchers not under test, and system tray utilities that consume CPU or RAM resources.
Set display and in-game resolution — confirm the rendering resolution matches the intended test configuration and is not modified by upscaling features unless upscaling is the test subject.
Define and lock graphics preset — apply a fixed graphics preset (Ultra, High, Medium, or custom) and document each setting individually, particularly shadow quality, texture resolution, and ambient occlusion level.
Allow a thermal stabilization period — run the game workload for 10–15 minutes before the recording window to allow GPU and CPU temperatures to reach steady-state and eliminate cold-start thermal artifacts.
Execute the benchmark scene — use a built-in benchmark tool or a repeatable manual gameplay segment of no fewer than 60 seconds.
Capture frame time data — record using PresentMon, CapFrameX, or a vendor overlay tool throughout the entire test interval.
Extract percentile metrics — from the captured frame time log, derive average FPS, 1% low FPS, 0.1% low FPS, and standard deviation of frame intervals.
Run a minimum of 3 consecutive sessions — average results across sessions to reduce single-run variance.
Document and store configuration — save the full settings profile, driver version, and hardware configuration alongside the recorded data file for future reproducibility.

Reference table: key metrics and tools

Metric	Unit	Primary Tool	Significance
Average FPS	Frames per second	CapFrameX, PresentMon, NVIDIA FrameView	Overall throughput; insufficient alone
1% Low FPS	Frames per second	CapFrameX, PresentMon	Identifies worst-case stutter events
0.1% Low FPS	Frames per second	CapFrameX	Identifies extreme single-frame drops
Frame time std. dev.	Milliseconds	CapFrameX	Measures consistency of frame delivery
GPU utilization	Percentage	MSI Afterburner / RivaTuner, vendor tools	Identifies GPU vs. CPU bottleneck
GPU temperature	Degrees Celsius	HWiNFO64, MSI Afterburner	Detects thermal throttling events
VRAM usage	Gigabytes	GPU-Z, HWiNFO64	Identifies VRAM capacity constraints
CPU core utilization	Percentage (per-core)	HWiNFO64, Task Manager	Identifies game thread CPU bottleneck
RAM bandwidth	GB/s	AIDA64, HWiNFO64	Measures memory subsystem throughput
Render resolution	Pixels (W×H)	In-game HUD / RenderDoc	Confirms actual vs. output resolution
Input-to-display latency	Milliseconds	NVIDIA LDAT, Blur Busters FCAT	Measures end-to-end system responsiveness
Synthetic score	Integer (vendor scale)	3DMark Time Spy, Speed Way	Cross-system hardware comparison