Skip to content

OcSort

Paper: Observation-Centric SORT: Rethinking SORT for Robust Multi-Object Tracking

OC-SORT focuses on a specific failure mode in Kalman-filter trackers: error accumulation during occlusion and non-linear motion. The paper replaces a purely prediction-centric view with an observation-centric one, using detector observations to reconstruct a more reliable virtual trajectory across missed frames. That makes the tracker much more robust than vanilla SORT in crowded scenes while keeping the same simple online structure.

What BoxMOT Needs For OcSort

  • Detector only. ReID is not required.
  • Supports both AABB and OBB detections in BoxMOT.
  • A strong choice when you want a fast motion-only tracker but expect more non-linear motion or occlusion than ByteTrack handles comfortably.

Bases: BaseTracker

Initialize the OcSort tracker.

Parameters:

Name Type Description Default
min_conf float

Minimum confidence threshold used in the second-stage association pass.

0.1
delta_t int

Time window used for motion estimation.

3
inertia float

Weight applied to the velocity-direction term during matching.

0.2
use_byte bool

Whether to enable ByteTrack-style second association.

False
Q_xy_scaling float

Process-noise scaling for position coordinates.

0.01
Q_s_scaling float

Process-noise scaling for scale coordinates.

0.0001
**kwargs Any

Base tracker settings forwarded to :class:BaseTracker, including det_thresh, max_age, max_obs, min_hits, iou_threshold, per_class, nr_classes, asso_func, and is_obb.

{}

Attributes:

Name Type Description
frame_count int

Number of processed frames.

active_tracks list

Currently active tracks.

Source code in boxmot/trackers/ocsort/ocsort.py
class OcSort(BaseTracker):
    """Initialize the OcSort tracker.

    Args:
        min_conf (float): Minimum confidence threshold used in the second-stage
            association pass.
        delta_t (int): Time window used for motion estimation.
        inertia (float): Weight applied to the velocity-direction term during
            matching.
        use_byte (bool): Whether to enable ByteTrack-style second association.
        Q_xy_scaling (float): Process-noise scaling for position coordinates.
        Q_s_scaling (float): Process-noise scaling for scale coordinates.
        **kwargs: Base tracker settings forwarded to :class:`BaseTracker`,
            including ``det_thresh``, ``max_age``, ``max_obs``, ``min_hits``,
            ``iou_threshold``, ``per_class``, ``nr_classes``, ``asso_func``,
            and ``is_obb``.

    Attributes:
        frame_count (int): Number of processed frames.
        active_tracks (list): Currently active tracks.
    """

    supports_obb = True

    def __init__(
        self,
        # OcSort-specific parameters
        min_conf: float = 0.1,
        delta_t: int = 3,
        inertia: float = 0.2,
        use_byte: bool = False,
        Q_xy_scaling: float = 0.01,
        Q_s_scaling: float = 0.0001,
        **kwargs: Any,  # BaseTracker parameters
    ):
        # Capture all init params for logging
        init_args = {k: v for k, v in locals().items() if k not in ('self', 'kwargs')}
        super().__init__(**init_args, _tracker_name='OcSort', **kwargs)

        # Store OcSort-specific parameters
        self.min_conf: float = min_conf
        self.asso_threshold: float = self.iou_threshold  # Use from BaseTracker
        self.delta_t: int = delta_t
        self.inertia: float = inertia
        self.use_byte: bool = use_byte
        self.Q_xy_scaling: float = Q_xy_scaling
        self.Q_s_scaling: float = Q_s_scaling
        self.frame_count: int = 0
        KalmanBoxTracker.count = 0

        # Initialize tracker collections
        self.active_tracks: list = []

    @BaseTracker.setup_decorator
    @BaseTracker.per_class_decorator
    def update(
        self, dets: np.ndarray, img: np.ndarray, embs: np.ndarray = None
    ) -> np.ndarray:
        """Update tracks for one frame.

        Args:
            dets: Detection array for the current frame in the active BoxMOT
                layout.
            img: Current image frame.
            embs: Optional appearance embeddings aligned with ``dets``.

        Returns:
            Array of active tracks with the object ID in the last column.

        Notes:
            Call this once per frame, including frames with no detections.
            Pass an empty detection array with the matching layout when a frame
            has no detections. The number of returned tracks may differ from the
            number of detections provided.
        """

        self.check_inputs(dets, img)

        self.frame_count += 1
        h, w = img.shape[0:2]

        dets = self.detection_layout.with_detection_indices(dets)
        confs = self.detection_layout.confidences(dets)

        inds_low = confs > self.min_conf
        inds_high = confs < self.det_thresh
        inds_second = np.logical_and(
            inds_low, inds_high
        )  # self.det_thresh > score > 0.1, for second matching
        dets_second = dets[inds_second]  # detections for second matching
        remain_inds = confs > self.det_thresh
        dets = dets[remain_inds]

        # get predicted locations from existing trackers.
        trks = np.zeros((len(self.active_tracks), self.detection_layout.box_with_conf_cols))
        to_del = []
        ret = []
        for t, trk in enumerate(trks):
            pos = self.active_tracks[t].predict()[0]
            trk[:] = [pos[i] for i in range(self.detection_layout.box_cols)] + [0]
            if np.any(np.isnan(pos)):
                to_del.append(t)
        trks = np.ma.compress_rows(np.ma.masked_invalid(trks))
        for t in reversed(to_del):
            self.active_tracks.pop(t)

        velocities = np.array(
            [
                trk.velocity if trk.velocity is not None else np.array((0, 0))
                for trk in self.active_tracks
            ]
        )
        last_boxes = np.array([trk.last_observation for trk in self.active_tracks])

        k_observations = np.array(
            [
                k_previous_obs(
                    trk.observations, trk.age, self.delta_t, is_obb=self.is_obb
                )
                for trk in self.active_tracks
            ]
        )

        """
            First round of association
        """
        matched, unmatched_dets, unmatched_trks = associate(
            dets[:, 0 : self.detection_layout.box_with_conf_cols],
            trks,
            self.asso_func,
            self.asso_threshold,
            velocities,
            k_observations,
            self.inertia,
            w,
            h,
        )
        for m in matched:
            self.active_tracks[m[1]].update(
                dets[m[0], :-2], dets[m[0], -2], dets[m[0], -1]
            )

        """
            Second round of associaton by OCR
        """
        # BYTE association
        if self.use_byte and len(dets_second) > 0 and unmatched_trks.shape[0] > 0:
            u_trks = trks[unmatched_trks]
            iou_left = self.asso_func(
                dets_second, u_trks
            )  # iou between low score detections and unmatched tracks
            iou_left = np.array(iou_left)
            if iou_left.max() > self.asso_threshold:
                """
                NOTE: by using a lower threshold, e.g., self.asso_threshold - 0.1, you may
                get a higher performance especially on MOT17/MOT20 datasets. But we keep it
                uniform here for simplicity
                """
                matched_indices = linear_assignment(-iou_left)
                to_remove_trk_indices = []
                for m in matched_indices:
                    det_ind, trk_ind = m[0], unmatched_trks[m[1]]
                    if iou_left[m[0], m[1]] < self.asso_threshold:
                        continue
                    self.active_tracks[trk_ind].update(
                        dets_second[det_ind, :-2],
                        dets_second[det_ind, -2],
                        dets_second[det_ind, -1],
                    )
                    to_remove_trk_indices.append(trk_ind)
                unmatched_trks = np.setdiff1d(
                    unmatched_trks, np.array(to_remove_trk_indices)
                )

        if unmatched_dets.shape[0] > 0 and unmatched_trks.shape[0] > 0:
            left_dets = dets[unmatched_dets]
            left_trks = last_boxes[unmatched_trks]
            iou_left = self.asso_func(left_dets, left_trks)
            iou_left = np.array(iou_left)
            if iou_left.max() > self.asso_threshold:
                """
                NOTE: by using a lower threshold, e.g., self.asso_threshold - 0.1, you may
                get a higher performance especially on MOT17/MOT20 datasets. But we keep it
                uniform here for simplicity
                """
                rematched_indices = linear_assignment(-iou_left)
                to_remove_det_indices = []
                to_remove_trk_indices = []
                for m in rematched_indices:
                    det_ind, trk_ind = unmatched_dets[m[0]], unmatched_trks[m[1]]
                    if iou_left[m[0], m[1]] < self.asso_threshold:
                        continue
                    self.active_tracks[trk_ind].update(
                        dets[det_ind, :-2], dets[det_ind, -2], dets[det_ind, -1]
                    )
                    to_remove_det_indices.append(det_ind)
                    to_remove_trk_indices.append(trk_ind)
                unmatched_dets = np.setdiff1d(
                    unmatched_dets, np.array(to_remove_det_indices)
                )
                unmatched_trks = np.setdiff1d(
                    unmatched_trks, np.array(to_remove_trk_indices)
                )

        for m in unmatched_trks:
            self.active_tracks[m].update(None, None, None)

        # create and initialise new trackers for unmatched detections
        for i in unmatched_dets:
            trk = KalmanBoxTracker(
                dets[i, : self.detection_layout.box_with_conf_cols],
                dets[i, self.detection_layout.cls_idx],
                dets[i, self.detection_layout.det_cols],
                delta_t=self.delta_t,
                Q_xy_scaling=self.Q_xy_scaling,
                Q_s_scaling=self.Q_s_scaling,
                Q_a_scaling=self.Q_s_scaling,
                max_obs=self.max_obs,
                is_obb=self.is_obb,
            )
            self.active_tracks.append(trk)
        i = len(self.active_tracks)
        for trk in reversed(self.active_tracks):
            if trk.last_observation.sum() < 0:
                d = trk.get_state()[0]
            else:
                """
                this is optional to use the recent observation or the kalman filter prediction,
                we didn't notice significant difference here
                """
                d = trk.last_observation[: self.detection_layout.box_cols]
            if (trk.time_since_update < 1) and (
                trk.hit_streak >= self.min_hits or self.frame_count <= self.min_hits
            ):
                # +1 as MOT benchmark requires positive
                ret.append(
                    np.concatenate(
                        (d, [trk.id + 1], [trk.conf], [trk.cls], [trk.det_ind])
                    ).reshape(1, -1)
                )
            i -= 1
            # remove dead tracklet
            if trk.time_since_update > self.max_age:
                self.active_tracks.pop(i)
        if len(ret) > 0:
            return np.concatenate(ret)
        return np.array([])