Research

Use research when you want GEPA to propose code changes to tracker source files and score them on a benchmark.

Reference material:

Examples

Example

CLIPython

boxmot research \
  --benchmark mot17 --split ablation \
  --tracker bytetrack \
  --proposal-model openai/gpt-5.4 \
  --max-metric-calls 24

from boxmot import BoxMOT

result = BoxMOT(tracker="bytetrack").research(
    benchmark="mot17",
    split="ablation",
    proposal_model="openai/gpt-5.4",
    max_metric_calls=24,
)
print(result.delta_summary)

Prerequisites

See Mode-specific extras.

research needs the research extra for GEPA, plus whatever detector backend the selected benchmark uses.

Proposal models

BoxMOT expects provider-prefixed model identifiers such as:

openai/gpt-5.4
anthropic/claude-sonnet-4-20250514
openrouter/openai/gpt-5.4

Bare OpenAI model names such as gpt-5.4 are normalized to openai/gpt-5.4, but explicit prefixes are still preferred.

Credential setup

Set the provider API key in the matching environment variable, for example:

export OPENAI_API_KEY=...
export ANTHROPIC_API_KEY=...

Evaluation budget and timeout

max_metric_calls limits how many benchmark evaluations GEPA can spend.
eval_timeout is per evaluation subprocess, not the total wall-clock runtime of the full research job.

Native C++ scoring

Use --tracker-backend cpp to score candidate changes through a native C++ replay backend:

boxmot research --benchmark mot17 --split ablation --tracker bytetrack --tracker-backend cpp

This is only useful for trackers with registered native replay support: botsort, bytetrack, ocsort, occluboost, and sfsort.

Outputs

research writes:

GEPA state and logs
accepted and rejected candidate artifacts
best-candidate code snapshots
benchmark summaries before and after optimization

CLI Arguments

boxmot research

Research tracker code changes with GEPA

Usage:

boxmot research [OPTIONS]

Options:

Name	Type	Description	Default
`--benchmark`	text	benchmark config name or YAML file, e.g. mot17 or boxmot/configs/datasets/mot17.yaml	None
`--split`	text	Dataset split to use (e.g. train, val, test, ablation). Overrides auto-detection from source path.	None
`--detection-source`	choice (`public` \| `private`)	Detection source: "public" reads det/det.txt from sequences, "private" (default) runs the configured detector model.	None
`--tracking-backend`	choice (`process` \| `thread` \| `cpp`)	Cached replay executor for eval/tune/research. Use 'cpp' as a compatibility alias for '--tracker-backend cpp'.	`process`
`--tracker-backend`	choice (`python` \| `cpp`)	Tracker implementation backend. Native 'cpp' is available for botsort, bytetrack, ocsort, and sfsort.	`python`
`--imgsz`	text	Image size for model input as H,W (e.g. 800,1440) or single int for square. Default: read from the selected detector config, otherwise use detector-specific defaults.	None
`--fps`	integer	video frame-rate	None
`--conf`	float	Min confidence threshold. Default: read from the selected detector config, fallback 0.01.	None
`--iou`	float	IoU threshold for NMS	`0.7`
`--device`	text	cuda device(s), e.g. 0 or 0,1,2,3 or cpu	`cpu`
`--batch-size`	integer	micro-batch size for batched detection/embedding	`16`
`--auto-batch` / `--no-auto-batch`	boolean	probe GPU memory with a dummy pass to pick a safe batch size	`True`
`--resume` / `--no-resume`	boolean	resume detection/embedding generation from progress checkpoints	`True`
`--n-threads`	integer	CPU threads for image decoding; defaults to min(8, cpu_count)	`4`
`--project`	Path	save results to project/name	`runs`
`--name`	text	save results to project/name	`exp`
`--exist-ok`	boolean	existing project/name ok, do not increment	`False`
`--half`	boolean	use FP16 half-precision inference	`False`
`--vid-stride`	integer	video frame-rate stride	`1`
`--ci`	boolean	reuse existing runs in CI (no UI)	`False`
`--tracker`	text	one of: strongsort, ocsort, bytetrack, sfsort, botsort, deepocsort, hybridsort, boosttrack, occluboost, sam2mot	`bytetrack`
`--verbose`	boolean	print detailed logs	`False`
`--show-timing` / `--hide-timing`	boolean	print runtime timing summary after evaluation	`False`
`--agnostic-nms`	boolean	class-agnostic NMS	`False`
`--postprocessing`	text	Postprocess tracker output (comma-separated, applied in order): none	gsi
`--show`	boolean	display tracking in a window	`False`
`--show-labels` / `--hide-labels`	boolean	show or hide detection labels	`True`
`--show-conf` / `--hide-conf`	boolean	show or hide detection confidences	`True`
`--show-trajectories`	boolean	overlay past trajectories	`False`
`--show-kf-preds`	boolean	show Kalman-filter predictions	`False`
`--save-txt`	boolean	save results to a .txt file	`False`
`--save-crop`	boolean	save cropped detections	`False`
`--save`	boolean	save annotated video	`False`
`--line-width`	integer	bounding box line width	None
`--per-class`	boolean	track each class separately	`False`
`--target-id`	integer	ID to highlight in green	None
`--masks-dir`	text	Override directory for cached segmentation masks (.npz files)	None
`--masks-model`	choice (`maskrcnn`)	Mask model to use for generation (stored under cache tree automatically)	None
`--proposal-model`	text	proposal model identifier used by GEPA reflections, e.g. openai/gpt-5.4, anthropic/claude-sonnet-4-20250514, openrouter/openai/gpt-5.4	`openai/gpt-5.4`
`--proposal-api-key`	text	proposal model API key; prefer shell env vars in CI but this can inject the key at runtime	None
`--proposal-api-key-env`	text	environment variable name for --proposal-api-key when the provider is not inferred, e.g. OPENAI_API_KEY or ANTHROPIC_API_KEY	None
`--max-metric-calls`	integer	maximum number of benchmark evaluations during research	`24`
`--eval-timeout`	float	hard timeout in seconds for each benchmark evaluation	`900.0`
`--keep-workspace` / `--no-keep-workspace`	boolean	preserve the temporary research workspace after the run	`False`
`--hota-penalty`	float	penalty multiplier for combined HOTA regression versus baseline	`0.0`
`--idf1-penalty`	float	penalty multiplier for combined IDF1 regression versus baseline	`1.0`
`--mota-penalty`	float	penalty multiplier for combined MOTA regression versus baseline	`1.0`
`--hota-tolerance`	float	allowed combined HOTA drop before penalties apply	`0.0`
`--idf1-tolerance`	float	allowed combined IDF1 drop before penalties apply	`0.0`
`--mota-tolerance`	float	allowed combined MOTA drop before penalties apply	`0.0`
`--detector`	Path	one or more YOLO weights for detection	`[PosixPath('/home/runner/work/boxmot/boxmot/models/yolov8n.pt')]`
`--reid`	Path	one or more ReID model weights	`[PosixPath('/home/runner/work/boxmot/boxmot/models/osnet_x0_25_msmt17.pt')]`
`--classes`	text	filter by class indices, e.g. 0 or "0,1"	None
`--help`	boolean	Show this message and exit.	`False`