Evaluate

Use eval to score tracking runs on MOT-style datasets with BoxMOT's in-repo MOT metrics.

Examples

Example

CLIPython

boxmot eval --benchmark mot17 --split ablation --tracker boosttrack --verbose

from boxmot import BoxMOT

boxmot = BoxMOT(detector="yolov8n", reid="lmbn_n_duke", tracker="boosttrack")
metrics = boxmot.val(benchmark="mot17", split="ablation")
print(metrics)

Typical workflow

Example

CLIPython

For repeated experiments:

boxmot generate --benchmark mot17 --split ablation
boxmot eval --benchmark mot17 --split ablation --tracker boosttrack

This lets eval reuse precomputed detections and embeddings.

from boxmot import BoxMOT

boxmot = BoxMOT(detector="yolov8n", reid="lmbn_n_duke", tracker="boosttrack")
metrics = boxmot.val(benchmark="mot17", split="ablation")
print(metrics)

Public detections

Use --detection-source to run with public MOTChallenge detections instead of the benchmark's configured detector:

boxmot eval --benchmark mot17 --split ablation --tracker boosttrack --detection-source frcnn
boxmot eval --benchmark mot17 --split ablation --tracker boosttrack --detection-source sdp
boxmot eval --benchmark mot17 --split ablation --tracker boosttrack --detection-source dpm

--detection-source public uses the default public detector defined in the benchmark YAML. When omitted (or --detection-source private), eval runs the configured detector model.

See Benchmark Workflows for details on how public detections are resolved.

Kalman filter noise tuning

Use --tune-kf to estimate per-sequence Kalman filter process and measurement noise (Q/R matrices) from the cached detections and ground truth before tracking:

boxmot eval --benchmark mot17 --split ablation --tracker boosttrack --tune-kf

This is most useful for trackers with Kalman-filter-based motion models. It requires cached detections and ground truth to be available.

For runtime adaptation without ground truth, use --adaptive-kf instead, which estimates noise online via the Mehra (1970) method.

Postprocessing

Example

CLIPython

eval can apply optional postprocessing before scoring. Multiple steps can be chained with commas and are applied sequentially to the same result files:

# Single step
boxmot eval --benchmark mot17 --split ablation --tracker boosttrack --postprocessing gsi

# Chained: GSI runs first, then GTA reads GSI's output
boxmot eval --benchmark mot17 --split ablation --tracker boosttrack --postprocessing gsi,gta

Available steps:

Step	Description
`gsi`	Gaussian-smoothed interpolation — fills gaps and smooths trajectories
`gbrc`	Gradient-boosting reconnection — ML-based interpolation and smoothing
`gta`	Global tracklet association — offline split-and-connect across the full sequence

BoxMOT.val(...) is the Python-facing validation entry point. Postprocessing details and metric interpretation are the same as in the CLI evaluation pipeline.

Chained steps overwrite in place

When chaining multiple postprocessing steps, each step reads the MOT result files, transforms them, and writes back to the same directory. The second step operates on the output of the first.

See Evaluation and Postprocessing.

See Benchmark Workflows for cache reuse, MMOT benchmark ids, and replay image-loading behavior.

Native C++ replay

Use --tracker-backend cpp to run the cached replay stage through a native tracker implementation:

boxmot eval --benchmark mot17 --split ablation --tracker bytetrack --tracker-backend cpp
boxmot eval --benchmark mot17 --split ablation --tracker ocsort:cpp

Native replay is currently available for botsort, bytetrack, ocsort, occluboost, and sfsort. --tracking-backend cpp is still accepted as a compatibility alias, but --tracker-backend cpp is the canonical selector.

Main outputs

combined benchmark metrics such as HOTA, MOTA, and IDF1
per-sequence summaries
optional runtime timing summary with --show-timing
MOT-style tracker outputs
reused cache paths and evaluation artifacts in the run directory

See Evaluation and Postprocessing.

CLI Arguments

boxmot eval

Evaluate tracking performance

Usage:

boxmot eval [OPTIONS]

Options:

Name	Type	Description	Default
`--benchmark`	text	benchmark config name or YAML file, e.g. mot17 or boxmot/configs/datasets/mot17.yaml	None
`--split`	text	Dataset split to use (e.g. train, val, test, ablation). Overrides auto-detection from source path.	None
`--detection-source`	choice (`public` \| `private`)	Detection source: "public" reads det/det.txt from sequences, "private" (default) runs the configured detector model.	None
`--tracking-backend`	choice (`process` \| `thread` \| `cpp`)	Cached replay executor for eval/tune/research. Use 'cpp' as a compatibility alias for '--tracker-backend cpp'.	`process`
`--tracker-backend`	choice (`python` \| `cpp`)	Tracker implementation backend. Native 'cpp' is available for botsort, bytetrack, ocsort, and sfsort.	`python`
`--imgsz`	text	Image size for model input as H,W (e.g. 800,1440) or single int for square. Default: read from the selected detector config, otherwise use detector-specific defaults.	None
`--fps`	integer	video frame-rate	None
`--conf`	float	Min confidence threshold. Default: read from the selected detector config, fallback 0.01.	None
`--iou`	float	IoU threshold for NMS	`0.7`
`--device`	text	cuda device(s), e.g. 0 or 0,1,2,3 or cpu	`cpu`
`--batch-size`	integer	micro-batch size for batched detection/embedding	`16`
`--auto-batch` / `--no-auto-batch`	boolean	probe GPU memory with a dummy pass to pick a safe batch size	`True`
`--resume` / `--no-resume`	boolean	resume detection/embedding generation from progress checkpoints	`True`
`--n-threads`	integer	CPU threads for image decoding; defaults to min(8, cpu_count)	`4`
`--project`	Path	save results to project/name	`runs`
`--name`	text	save results to project/name	`exp`
`--exist-ok`	boolean	existing project/name ok, do not increment	`False`
`--half`	boolean	use FP16 half-precision inference	`False`
`--vid-stride`	integer	video frame-rate stride	`1`
`--ci`	boolean	reuse existing runs in CI (no UI)	`False`
`--tracker`	text	one of: strongsort, ocsort, bytetrack, sfsort, botsort, deepocsort, hybridsort, boosttrack, occluboost, sam2mot	`bytetrack`
`--verbose`	boolean	print detailed logs	`False`
`--show-timing` / `--hide-timing`	boolean	print runtime timing summary after evaluation	`False`
`--agnostic-nms`	boolean	class-agnostic NMS	`False`
`--postprocessing`	text	Postprocess tracker output (comma-separated, applied in order): none	gsi
`--show`	boolean	display tracking in a window	`False`
`--show-labels` / `--hide-labels`	boolean	show or hide detection labels	`True`
`--show-conf` / `--hide-conf`	boolean	show or hide detection confidences	`True`
`--show-trajectories`	boolean	overlay past trajectories	`False`
`--show-kf-preds`	boolean	show Kalman-filter predictions	`False`
`--save-txt`	boolean	save results to a .txt file	`False`
`--save-crop`	boolean	save cropped detections	`False`
`--save`	boolean	save annotated video	`False`
`--line-width`	integer	bounding box line width	None
`--per-class`	boolean	track each class separately	`False`
`--target-id`	integer	ID to highlight in green	None
`--masks-dir`	text	Override directory for cached segmentation masks (.npz files)	None
`--masks-model`	choice (`maskrcnn`)	Mask model to use for generation (stored under cache tree automatically)	None
`--detector`	Path	one or more YOLO weights for detection	`[PosixPath('/home/runner/work/boxmot/boxmot/models/yolov8n.pt')]`
`--reid`	Path	one or more ReID model weights	`[PosixPath('/home/runner/work/boxmot/boxmot/models/osnet_x0_25_msmt17.pt')]`
`--classes`	text	filter by class indices, e.g. 0 or "0,1"	None
`--tune-kf` / `--no-tune-kf`	boolean	Run KF noise tuning (Q/R estimation) before tracking. Automatically selects parameterization based on the tracker. Requires cached dets and GT.	`False`
`--help`	boolean	Show this message and exit.	`False`