Skip to content

Research

Use research when you want GEPA to propose code changes to tracker source files and score them on a benchmark.

Reference material:

Examples

Example

boxmot research \
  --benchmark mot17-ablation \
  --tracker bytetrack \
  --proposal-model openai/gpt-5.4 \
  --max-metric-calls 24
from boxmot import Boxmot

result = Boxmot(tracker="bytetrack").research(
    benchmark="mot17-ablation",
    proposal_model="openai/gpt-5.4",
    max_metric_calls=24,
)
print(result.delta_summary)

Prerequisites

See Mode-specific extras.

research needs the research extra for GEPA, plus whatever detector backend the selected benchmark uses.

Proposal models

BoxMOT expects provider-prefixed model identifiers such as:

  • openai/gpt-5.4
  • anthropic/claude-sonnet-4-20250514
  • openrouter/openai/gpt-5.4

Bare OpenAI model names such as gpt-5.4 are normalized to openai/gpt-5.4, but explicit prefixes are still preferred.

Credential setup

Set the provider API key in the matching environment variable, for example:

export OPENAI_API_KEY=...
export ANTHROPIC_API_KEY=...

Evaluation budget and timeout

  • max_metric_calls limits how many benchmark evaluations GEPA can spend.
  • eval_timeout is per evaluation subprocess, not the total wall-clock runtime of the full research job.

Outputs

research writes:

  • GEPA state and logs
  • accepted and rejected candidate artifacts
  • best-candidate code snapshots
  • benchmark summaries before and after optimization

CLI Arguments

boxmot research

Research tracker code changes with GEPA

Usage:

boxmot research [OPTIONS]

Options:

Name Type Description Default
--benchmark text benchmark config name or YAML file, e.g. mot17-ablation or boxmot/configs/benchmarks/mot17-ablation.yaml None
--imgsz text Image size for model input as H,W (e.g. 800,1440) or single int for square. Default: read from the selected detector config, otherwise use detector-specific defaults. None
--fps integer video frame-rate None
--conf float Min confidence threshold. Default: read from the selected detector config, fallback 0.01. None
--iou float IoU threshold for NMS 0.7
--device text cuda device(s), e.g. 0 or 0,1,2,3 or cpu cpu
--batch-size integer micro-batch size for batched detection/embedding 16
--auto-batch / --no-auto-batch boolean probe GPU memory with a dummy pass to pick a safe batch size True
--resume / --no-resume boolean resume detection/embedding generation from progress checkpoints True
--n-threads integer CPU threads for image decoding; defaults to min(8, cpu_count) 4
--project Path save results to project/name runs
--name text save results to project/name exp
--exist-ok boolean existing project/name ok, do not increment False
--half boolean use FP16 half-precision inference False
--vid-stride integer video frame-rate stride 1
--ci boolean reuse existing runs in CI (no UI) False
--tracker text deepocsort, botsort, strongsort, ... bytetrack
--verbose boolean print detailed logs False
--show-timing / --hide-timing boolean print runtime timing summary after evaluation False
--agnostic-nms boolean class-agnostic NMS False
--postprocessing choice (none | gsi | gbrc) Postprocess tracker output: none gsi (Gaussian smoothed interpolation)
--show boolean display tracking in a window False
--show-labels / --hide-labels boolean show or hide detection labels True
--show-conf / --hide-conf boolean show or hide detection confidences True
--show-trajectories boolean overlay past trajectories False
--show-kf-preds boolean show Kalman-filter predictions False
--save-txt boolean save results to a .txt file False
--save-crop boolean save cropped detections False
--save boolean save annotated video False
--line-width integer bounding box line width None
--per-class boolean track each class separately False
--target-id integer ID to highlight in green None
--proposal-model text proposal model identifier used by GEPA reflections, e.g. openai/gpt-5.4, anthropic/claude-sonnet-4-20250514, openrouter/openai/gpt-5.4 openai/gpt-5.4
--proposal-api-key text proposal model API key; prefer shell env vars in CI but this can inject the key at runtime None
--proposal-api-key-env text environment variable name for --proposal-api-key when the provider is not inferred, e.g. OPENAI_API_KEY or ANTHROPIC_API_KEY None
--max-metric-calls integer maximum number of benchmark evaluations during research 24
--eval-timeout float hard timeout in seconds for each benchmark evaluation 900.0
--keep-workspace / --no-keep-workspace boolean preserve the temporary research workspace after the run False
--idf1-penalty float penalty multiplier for combined IDF1 regression versus baseline 1.0
--mota-penalty float penalty multiplier for combined MOTA regression versus baseline 1.0
--idf1-tolerance float allowed combined IDF1 drop before penalties apply 0.0
--mota-tolerance float allowed combined MOTA drop before penalties apply 0.0
--detector Path one or more YOLO weights for detection [PosixPath('/home/runner/work/boxmot/boxmot/models/yolov8n.pt')]
--reid Path one or more ReID model weights [PosixPath('/home/runner/work/boxmot/boxmot/models/osnet_x0_25_msmt17.pt')]
--classes text filter by class indices, e.g. 0 or "0,1" None
--help boolean Show this message and exit. False