Evaluate
Use eval to score tracking runs on MOT-style datasets with TrackEval-backed metrics.
Examples
Example
Typical workflow
Example
Public detections
Use --detection-source to run with public MOTChallenge detections instead of the benchmark's configured detector:
boxmot eval --benchmark mot17 --split ablation --tracker boosttrack --detection-source frcnn
boxmot eval --benchmark mot17 --split ablation --tracker boosttrack --detection-source sdp
boxmot eval --benchmark mot17 --split ablation --tracker boosttrack --detection-source dpm
--detection-source public uses the default public detector defined in the benchmark YAML.
When omitted (or --detection-source private), eval runs the configured detector model.
See Benchmark Workflows for details on how public detections are resolved.
Kalman filter noise tuning
Use --tune-kf to estimate per-sequence Kalman filter process and measurement noise (Q/R matrices) from the cached detections and ground truth before tracking:
This is most useful for trackers with Kalman-filter-based motion models. It requires cached detections and ground truth to be available.
For runtime adaptation without ground truth, use --adaptive-kf instead, which estimates noise online via the Mehra (1970) method.
Postprocessing
Example
eval can apply optional postprocessing before scoring.
Multiple steps can be chained with commas and are applied sequentially to the same result files:
# Single step
boxmot eval --benchmark mot17 --split ablation --tracker boosttrack --postprocessing gsi
# Chained: GSI runs first, then GTA reads GSI's output
boxmot eval --benchmark mot17 --split ablation --tracker boosttrack --postprocessing gsi,gta
Available steps:
| Step | Description |
|---|---|
gsi |
Gaussian-smoothed interpolation — fills gaps and smooths trajectories |
gbrc |
Gradient-boosting reconnection — ML-based interpolation and smoothing |
gta |
Global tracklet association — offline split-and-connect across the full sequence |
Boxmot.val(...) is the Python-facing validation entry point. Postprocessing details and metric interpretation are the same as in the CLI evaluation pipeline.
Chained steps overwrite in place
When chaining multiple postprocessing steps, each step reads the MOT result files, transforms them, and writes back to the same directory. The second step operates on the output of the first.
See Evaluation and Postprocessing.
See Benchmark Workflows for cache reuse, MMOT benchmark ids, and replay image-loading behavior.
Native C++ replay
Use --tracker-backend cpp to run the cached replay stage through a native tracker implementation:
boxmot eval --benchmark mot17 --split ablation --tracker bytetrack --tracker-backend cpp
boxmot eval --benchmark mot17 --split ablation --tracker ocsort:cpp
Native replay is currently available for botsort, bytetrack, ocsort, occluboost, and sfsort. --tracking-backend cpp is still accepted as a compatibility alias, but --tracker-backend cpp is the canonical selector.
Main outputs
- combined benchmark metrics such as
HOTA,MOTA, andIDF1 - per-sequence summaries
- optional runtime timing summary with
--show-timing - MOT-style tracker outputs
- reused cache paths and evaluation artifacts in the run directory
See Evaluation and Postprocessing.
CLI Arguments
boxmot eval
Evaluate tracking performance
Usage:
Options:
| Name | Type | Description | Default |
|---|---|---|---|
--benchmark |
text | benchmark config name or YAML file, e.g. mot17 or boxmot/configs/datasets/mot17.yaml | None |
--split |
text | Dataset split to use (e.g. train, val, test, ablation). Overrides auto-detection from source path. | None |
--detection-source |
choice (public | private) |
Detection source: "public" reads det/det.txt from sequences, "private" (default) runs the configured detector model. | None |
--tracking-backend |
choice (process | thread | cpp) |
Cached replay executor for eval/tune/research. Use 'cpp' as a compatibility alias for '--tracker-backend cpp'. | process |
--tracker-backend |
choice (python | cpp) |
Tracker implementation backend. Native 'cpp' is available for botsort, bytetrack, ocsort, and sfsort. | python |
--imgsz |
text | Image size for model input as H,W (e.g. 800,1440) or single int for square. Default: read from the selected detector config, otherwise use detector-specific defaults. | None |
--fps |
integer | video frame-rate | None |
--conf |
float | Min confidence threshold. Default: read from the selected detector config, fallback 0.01. | None |
--iou |
float | IoU threshold for NMS | 0.7 |
--device |
text | cuda device(s), e.g. 0 or 0,1,2,3 or cpu | cpu |
--batch-size |
integer | micro-batch size for batched detection/embedding | 16 |
--auto-batch / --no-auto-batch |
boolean | probe GPU memory with a dummy pass to pick a safe batch size | True |
--resume / --no-resume |
boolean | resume detection/embedding generation from progress checkpoints | True |
--n-threads |
integer | CPU threads for image decoding; defaults to min(8, cpu_count) | 4 |
--project |
Path | save results to project/name | runs |
--name |
text | save results to project/name | exp |
--exist-ok |
boolean | existing project/name ok, do not increment | False |
--half |
boolean | use FP16 half-precision inference | False |
--vid-stride |
integer | video frame-rate stride | 1 |
--ci |
boolean | reuse existing runs in CI (no UI) | False |
--tracker |
text | deepocsort, botsort, strongsort, ... | bytetrack |
--verbose |
boolean | print detailed logs | False |
--show-timing / --hide-timing |
boolean | print runtime timing summary after evaluation | False |
--agnostic-nms |
boolean | class-agnostic NMS | False |
--postprocessing |
text | Postprocess tracker output (comma-separated, applied in order): none | gsi |
--show |
boolean | display tracking in a window | False |
--show-labels / --hide-labels |
boolean | show or hide detection labels | True |
--show-conf / --hide-conf |
boolean | show or hide detection confidences | True |
--show-trajectories |
boolean | overlay past trajectories | False |
--show-kf-preds |
boolean | show Kalman-filter predictions | False |
--save-txt |
boolean | save results to a .txt file | False |
--save-crop |
boolean | save cropped detections | False |
--save |
boolean | save annotated video | False |
--line-width |
integer | bounding box line width | None |
--per-class |
boolean | track each class separately | False |
--target-id |
integer | ID to highlight in green | None |
--masks-dir |
text | Override directory for cached segmentation masks (.npz files) | None |
--masks-model |
choice (maskrcnn) |
Mask model to use for generation (stored under cache tree automatically) | None |
--detector |
Path | one or more YOLO weights for detection | [PosixPath('/home/runner/work/boxmot/boxmot/models/yolov8n.pt')] |
--reid |
Path | one or more ReID model weights | [PosixPath('/home/runner/work/boxmot/boxmot/models/osnet_x0_25_msmt17.pt')] |
--classes |
text | filter by class indices, e.g. 0 or "0,1" | None |
--tune-kf / --no-tune-kf |
boolean | Run KF noise tuning (Q/R estimation) before tracking. Automatically selects parameterization based on the tracker. Requires cached dets and GT. | False |
--help |
boolean | Show this message and exit. | False |