Skip to content

Evaluate

Use eval to score tracking runs on MOT-style datasets with TrackEval-backed metrics.

Examples

Example

boxmot eval --benchmark mot17 --split ablation --tracker boosttrack --verbose
from boxmot import Boxmot

boxmot = Boxmot(detector="yolov8n", reid="lmbn_n_duke", tracker="boosttrack")
metrics = boxmot.val(benchmark="mot17", split="ablation")
print(metrics)

Typical workflow

Example

For repeated experiments:

boxmot generate --benchmark mot17 --split ablation
boxmot eval --benchmark mot17 --split ablation --tracker boosttrack

This lets eval reuse precomputed detections and embeddings.

from boxmot import Boxmot

boxmot = Boxmot(detector="yolov8n", reid="lmbn_n_duke", tracker="boosttrack")
metrics = boxmot.val(benchmark="mot17", split="ablation")
print(metrics)

Public detections

Use --detection-source to run with public MOTChallenge detections instead of the benchmark's configured detector:

boxmot eval --benchmark mot17 --split ablation --tracker boosttrack --detection-source frcnn
boxmot eval --benchmark mot17 --split ablation --tracker boosttrack --detection-source sdp
boxmot eval --benchmark mot17 --split ablation --tracker boosttrack --detection-source dpm

--detection-source public uses the default public detector defined in the benchmark YAML. When omitted (or --detection-source private), eval runs the configured detector model.

See Benchmark Workflows for details on how public detections are resolved.

Kalman filter noise tuning

Use --tune-kf to estimate per-sequence Kalman filter process and measurement noise (Q/R matrices) from the cached detections and ground truth before tracking:

boxmot eval --benchmark mot17 --split ablation --tracker boosttrack --tune-kf

This is most useful for trackers with Kalman-filter-based motion models. It requires cached detections and ground truth to be available.

For runtime adaptation without ground truth, use --adaptive-kf instead, which estimates noise online via the Mehra (1970) method.

Postprocessing

Example

eval can apply optional postprocessing before scoring. Multiple steps can be chained with commas and are applied sequentially to the same result files:

# Single step
boxmot eval --benchmark mot17 --split ablation --tracker boosttrack --postprocessing gsi

# Chained: GSI runs first, then GTA reads GSI's output
boxmot eval --benchmark mot17 --split ablation --tracker boosttrack --postprocessing gsi,gta

Available steps:

Step Description
gsi Gaussian-smoothed interpolation — fills gaps and smooths trajectories
gbrc Gradient-boosting reconnection — ML-based interpolation and smoothing
gta Global tracklet association — offline split-and-connect across the full sequence

Boxmot.val(...) is the Python-facing validation entry point. Postprocessing details and metric interpretation are the same as in the CLI evaluation pipeline.

Chained steps overwrite in place

When chaining multiple postprocessing steps, each step reads the MOT result files, transforms them, and writes back to the same directory. The second step operates on the output of the first.

See Evaluation and Postprocessing.

See Benchmark Workflows for cache reuse, MMOT benchmark ids, and replay image-loading behavior.

Native C++ replay

Use --tracker-backend cpp to run the cached replay stage through a native tracker implementation:

boxmot eval --benchmark mot17 --split ablation --tracker bytetrack --tracker-backend cpp
boxmot eval --benchmark mot17 --split ablation --tracker ocsort:cpp

Native replay is currently available for botsort, bytetrack, ocsort, occluboost, and sfsort. --tracking-backend cpp is still accepted as a compatibility alias, but --tracker-backend cpp is the canonical selector.

Main outputs

  • combined benchmark metrics such as HOTA, MOTA, and IDF1
  • per-sequence summaries
  • optional runtime timing summary with --show-timing
  • MOT-style tracker outputs
  • reused cache paths and evaluation artifacts in the run directory

See Evaluation and Postprocessing.

CLI Arguments

boxmot eval

Evaluate tracking performance

Usage:

boxmot eval [OPTIONS]

Options:

Name Type Description Default
--benchmark text benchmark config name or YAML file, e.g. mot17 or boxmot/configs/datasets/mot17.yaml None
--split text Dataset split to use (e.g. train, val, test, ablation). Overrides auto-detection from source path. None
--detection-source choice (public | private) Detection source: "public" reads det/det.txt from sequences, "private" (default) runs the configured detector model. None
--tracking-backend choice (process | thread | cpp) Cached replay executor for eval/tune/research. Use 'cpp' as a compatibility alias for '--tracker-backend cpp'. process
--tracker-backend choice (python | cpp) Tracker implementation backend. Native 'cpp' is available for botsort, bytetrack, ocsort, and sfsort. python
--imgsz text Image size for model input as H,W (e.g. 800,1440) or single int for square. Default: read from the selected detector config, otherwise use detector-specific defaults. None
--fps integer video frame-rate None
--conf float Min confidence threshold. Default: read from the selected detector config, fallback 0.01. None
--iou float IoU threshold for NMS 0.7
--device text cuda device(s), e.g. 0 or 0,1,2,3 or cpu cpu
--batch-size integer micro-batch size for batched detection/embedding 16
--auto-batch / --no-auto-batch boolean probe GPU memory with a dummy pass to pick a safe batch size True
--resume / --no-resume boolean resume detection/embedding generation from progress checkpoints True
--n-threads integer CPU threads for image decoding; defaults to min(8, cpu_count) 4
--project Path save results to project/name runs
--name text save results to project/name exp
--exist-ok boolean existing project/name ok, do not increment False
--half boolean use FP16 half-precision inference False
--vid-stride integer video frame-rate stride 1
--ci boolean reuse existing runs in CI (no UI) False
--tracker text deepocsort, botsort, strongsort, ... bytetrack
--verbose boolean print detailed logs False
--show-timing / --hide-timing boolean print runtime timing summary after evaluation False
--agnostic-nms boolean class-agnostic NMS False
--postprocessing text Postprocess tracker output (comma-separated, applied in order): none gsi
--show boolean display tracking in a window False
--show-labels / --hide-labels boolean show or hide detection labels True
--show-conf / --hide-conf boolean show or hide detection confidences True
--show-trajectories boolean overlay past trajectories False
--show-kf-preds boolean show Kalman-filter predictions False
--save-txt boolean save results to a .txt file False
--save-crop boolean save cropped detections False
--save boolean save annotated video False
--line-width integer bounding box line width None
--per-class boolean track each class separately False
--target-id integer ID to highlight in green None
--masks-dir text Override directory for cached segmentation masks (.npz files) None
--masks-model choice (maskrcnn) Mask model to use for generation (stored under cache tree automatically) None
--detector Path one or more YOLO weights for detection [PosixPath('/home/runner/work/boxmot/boxmot/models/yolov8n.pt')]
--reid Path one or more ReID model weights [PosixPath('/home/runner/work/boxmot/boxmot/models/osnet_x0_25_msmt17.pt')]
--classes text filter by class indices, e.g. 0 or "0,1" None
--tune-kf / --no-tune-kf boolean Run KF noise tuning (Q/R estimation) before tracking. Automatically selects parameterization based on the tracker. Requires cached dets and GT. False
--help boolean Show this message and exit. False