Skip to content

StrongSort

Paper: StrongSORT: Make DeepSORT Great Again

StrongSORT revisits DeepSORT and shows that a stronger baseline matters. The paper improves the detector and appearance encoder, adds better motion handling and camera compensation, and then layers on lightweight postprocessing ideas to recover missed links and detections. The core message is that a carefully engineered DeepSORT-style tracker can remain competitive without changing the online MOT formulation.

What BoxMOT Needs For StrongSort

  • A detector plus a ReID model. Appearance cues are central to this tracker.
  • AABB detections only in BoxMOT.
  • Good when appearance matching matters more than raw speed, especially for pedestrian-style MOT benchmarks.

Bases: BaseTracker

Initialize the StrongSort tracker.

Parameters:

Name Type Description Default
reid_model Any | None

Pre-built ReID backend model (e.g. ReID(...).model).

None
min_conf float

Minimum confidence threshold for detections.

0.1
max_cos_dist float

Maximum cosine distance accepted by the nearest-neighbor metric.

0.2
max_iou_dist float

Maximum IoU distance used during association.

0.7
n_init int

Number of consecutive hits required to confirm a track.

3
nn_budget int

Maximum number of appearance features stored per track.

100
mc_lambda float

Motion-consistency weight used by StrongSORT.

0.98
ema_alpha float

Exponential moving average coefficient for appearance features.

0.9
**kwargs Any

Base tracker settings forwarded to :class:BaseTracker.

{}

Attributes:

Name Type Description
model

ReID model used for appearance extraction.

tracker Tracker

Internal StrongSORT tracker instance.

cmc

Camera-motion compensation method.

Source code in boxmot/trackers/bbox/strongsort/strongsort.py
class StrongSort(BaseTracker):
    """Initialize the StrongSort tracker.

    Args:
        reid_model: Pre-built ReID backend model (e.g. ``ReID(...).model``).
        min_conf (float): Minimum confidence threshold for detections.
        max_cos_dist (float): Maximum cosine distance accepted by the
            nearest-neighbor metric.
        max_iou_dist (float): Maximum IoU distance used during association.
        n_init (int): Number of consecutive hits required to confirm a track.
        nn_budget (int): Maximum number of appearance features stored per
            track.
        mc_lambda (float): Motion-consistency weight used by StrongSORT.
        ema_alpha (float): Exponential moving average coefficient for
            appearance features.
        **kwargs: Base tracker settings forwarded to :class:`BaseTracker`.

    Attributes:
        model: ReID model used for appearance extraction.
        tracker (Tracker): Internal StrongSORT tracker instance.
        cmc: Camera-motion compensation method.
    """

    def __init__(
        self,
        reid_model: Any | None = None,
        min_conf: float = 0.1,
        max_cos_dist: float = 0.2,
        max_iou_dist: float = 0.7,
        n_init: int = 3,
        nn_budget: int = 100,
        mc_lambda: float = 0.98,
        ema_alpha: float = 0.9,
        **kwargs: Any,
    ):
        init_args = {k: v for k, v in locals().items() if k not in ('self', 'kwargs')}
        super().__init__(**init_args, _tracker_name='StrongSort', **kwargs)

        self.min_conf = min_conf
        self.model = reid_model

        self.tracker = Tracker(
            metric=NearestNeighborDistanceMetric("cosine", max_cos_dist, nn_budget),
            max_iou_dist=max_iou_dist,
            max_age=self.max_age,
            n_init=n_init,
            mc_lambda=mc_lambda,
            ema_alpha=ema_alpha,
        )

        self.cmc = get_cmc_method("ecc")()

    def _update_impl(
        self, dets: np.ndarray, img: np.ndarray, embs: np.ndarray = None,
        masks: np.ndarray = None,
    ) -> np.ndarray:
        self.check_inputs(dets, img, embs)
        dets = self.detection_layout.with_detection_indices(dets)
        remain_inds = self.detection_layout.confidences(dets) >= self.min_conf
        dets = dets[remain_inds]

        xyxy = self.detection_layout.boxes(dets)
        confs = self.detection_layout.confidences(dets)
        clss = self.detection_layout.classes(dets)
        det_ind = dets[:, self.detection_layout.det_cols]

        if len(self.tracker.tracks) >= 1:
            warp_matrix = self.cmc.apply(img, xyxy)
            for track in self.tracker.tracks:
                track.camera_update(warp_matrix)

        if embs is not None:
            features = embs[remain_inds]
        else:
            features = self.model.get_features(xyxy, img)

        tlwh = xyxy2tlwh(xyxy)
        detections = [
            Detection(box, conf, cls, det_ind, feat)
            for box, conf, cls, det_ind, feat in zip(
                tlwh, confs, clss, det_ind, features
            )
        ]

        self.tracker.predict()
        self.tracker.update(detections)

        outputs = []
        for track in self.tracker.tracks:
            if not track.is_confirmed() or track.time_since_update >= 1:
                continue

            x1, y1, x2, y2 = track.to_tlbr()

            id = track.id
            conf = track.conf
            cls = track.cls
            det_ind = track.det_ind

            outputs.append(
                np.concatenate(
                    ([x1, y1, x2, y2], [id], [conf], [cls], [det_ind])
                ).reshape(1, -1)
            )
        if len(outputs) > 0:
            return np.concatenate(outputs)
        return self.empty_output()

    def reset(self):
        pass