Research
Use research when you want GEPA to propose code changes to tracker source files and score them on a benchmark.
Reference material:
Examples
Example
Prerequisites
See Mode-specific extras.
research needs the research extra for GEPA, plus whatever detector backend the selected benchmark uses.
Proposal models
BoxMOT expects provider-prefixed model identifiers such as:
openai/gpt-5.4anthropic/claude-sonnet-4-20250514openrouter/openai/gpt-5.4
Bare OpenAI model names such as gpt-5.4 are normalized to openai/gpt-5.4, but explicit prefixes are still preferred.
Credential setup
Set the provider API key in the matching environment variable, for example:
Evaluation budget and timeout
max_metric_callslimits how many benchmark evaluations GEPA can spend.eval_timeoutis per evaluation subprocess, not the total wall-clock runtime of the full research job.
Outputs
research writes:
- GEPA state and logs
- accepted and rejected candidate artifacts
- best-candidate code snapshots
- benchmark summaries before and after optimization
CLI Arguments
boxmot research
Research tracker code changes with GEPA
Usage:
Options:
| Name | Type | Description | Default |
|---|---|---|---|
--benchmark |
text | benchmark config name or YAML file, e.g. mot17-ablation or boxmot/configs/benchmarks/mot17-ablation.yaml | None |
--imgsz |
text | Image size for model input as H,W (e.g. 800,1440) or single int for square. Default: read from the selected detector config, otherwise use detector-specific defaults. | None |
--fps |
integer | video frame-rate | None |
--conf |
float | Min confidence threshold. Default: read from the selected detector config, fallback 0.01. | None |
--iou |
float | IoU threshold for NMS | 0.7 |
--device |
text | cuda device(s), e.g. 0 or 0,1,2,3 or cpu | cpu |
--batch-size |
integer | micro-batch size for batched detection/embedding | 16 |
--auto-batch / --no-auto-batch |
boolean | probe GPU memory with a dummy pass to pick a safe batch size | True |
--resume / --no-resume |
boolean | resume detection/embedding generation from progress checkpoints | True |
--n-threads |
integer | CPU threads for image decoding; defaults to min(8, cpu_count) | 4 |
--project |
Path | save results to project/name | runs |
--name |
text | save results to project/name | exp |
--exist-ok |
boolean | existing project/name ok, do not increment | False |
--half |
boolean | use FP16 half-precision inference | False |
--vid-stride |
integer | video frame-rate stride | 1 |
--ci |
boolean | reuse existing runs in CI (no UI) | False |
--tracker |
text | deepocsort, botsort, strongsort, ... | bytetrack |
--verbose |
boolean | print detailed logs | False |
--show-timing / --hide-timing |
boolean | print runtime timing summary after evaluation | False |
--agnostic-nms |
boolean | class-agnostic NMS | False |
--postprocessing |
choice (none | gsi | gbrc) |
Postprocess tracker output: none | gsi (Gaussian smoothed interpolation) |
--show |
boolean | display tracking in a window | False |
--show-labels / --hide-labels |
boolean | show or hide detection labels | True |
--show-conf / --hide-conf |
boolean | show or hide detection confidences | True |
--show-trajectories |
boolean | overlay past trajectories | False |
--show-kf-preds |
boolean | show Kalman-filter predictions | False |
--save-txt |
boolean | save results to a .txt file | False |
--save-crop |
boolean | save cropped detections | False |
--save |
boolean | save annotated video | False |
--line-width |
integer | bounding box line width | None |
--per-class |
boolean | track each class separately | False |
--target-id |
integer | ID to highlight in green | None |
--proposal-model |
text | proposal model identifier used by GEPA reflections, e.g. openai/gpt-5.4, anthropic/claude-sonnet-4-20250514, openrouter/openai/gpt-5.4 | openai/gpt-5.4 |
--proposal-api-key |
text | proposal model API key; prefer shell env vars in CI but this can inject the key at runtime | None |
--proposal-api-key-env |
text | environment variable name for --proposal-api-key when the provider is not inferred, e.g. OPENAI_API_KEY or ANTHROPIC_API_KEY | None |
--max-metric-calls |
integer | maximum number of benchmark evaluations during research | 24 |
--eval-timeout |
float | hard timeout in seconds for each benchmark evaluation | 900.0 |
--keep-workspace / --no-keep-workspace |
boolean | preserve the temporary research workspace after the run | False |
--idf1-penalty |
float | penalty multiplier for combined IDF1 regression versus baseline | 1.0 |
--mota-penalty |
float | penalty multiplier for combined MOTA regression versus baseline | 1.0 |
--idf1-tolerance |
float | allowed combined IDF1 drop before penalties apply | 0.0 |
--mota-tolerance |
float | allowed combined MOTA drop before penalties apply | 0.0 |
--detector |
Path | one or more YOLO weights for detection | [PosixPath('/home/runner/work/boxmot/boxmot/models/yolov8n.pt')] |
--reid |
Path | one or more ReID model weights | [PosixPath('/home/runner/work/boxmot/boxmot/models/osnet_x0_25_msmt17.pt')] |
--classes |
text | filter by class indices, e.g. 0 or "0,1" | None |
--help |
boolean | Show this message and exit. | False |