YOLO Stereo-Depth promises metric depth from a calibrated camera pair, a camera-native alternative to lidar for robotics. You can read metric depth per detection on edge hardware today with stereo cameras, RF-DETR, and Roboflow.
YOLO-StereoDepth is a roadmap item for September 2026. Its headline capability, real-time detection plus metric depth from cameras, is something you can deploy now with a stereo depth camera and Roboflow.
Cameras like the Luxonis OAK-D, Stereolabs ZED, and Intel RealSense compute metric depth onboard and output an aligned depth frame alongside the RGB image. No calibration workaround, and no separate depth model competing for compute.
Run a detector on the RGB stream with Inference on your edge device. We recommend RF-DETR, trained on your own classes, and Roboflow supports a range of edge hardware for deployment.
For each detection, sample the camera's aligned depth frame at the bounding box center, or the median over the box for robustness, to get a real-world distance per object. The camera supplies the depth; the detector supplies the what.
In Workflows, attach depth to every detection and trigger actions: a stop signal when an obstacle is inside stopping distance, a grasp target for an arm, or a dimension estimate for a passing package.
A single RGB camera reads relative depth, which pixels are closer and which are farther, and a calibration step converts it to real units. Best for retrofits on installed cameras, proximity ranking, and monitoring, with no new hardware. This runs today with YOLO-Depth's available alternative, Depth Anything 3.
A calibrated pair computes absolute, metric depth from binocular disparity (1.42 meters, not closer than the shelf behind it). Best for robotics, grasping, dimensioning, and anything that acts on real units. A camera-native alternative to lidar at a fraction of the cost.
| Monocular (one camera) | Stereo (two cameras) | |
|---|---|---|
| Depth type | Relative by default; metric requires calibration | Metric out of the box, from the known baseline |
| Hardware | Any existing RGB camera, no new capex | Stereo camera or calibrated pair, typically a few hundred dollars per unit |
| Accuracy | Consistent ordering of near and far; absolute error grows without calibration | Strong at short-to-mid range; error grows with distance as disparity shrinks |
| Best for | Retrofits on installed cameras, proximity ranking, monitoring and alerts | Robotics, grasping, dimensioning, anything that acts on real units |
The short version: if the cameras are already on the wall, monocular depth gets you there without new hardware. If a machine has to move, grasp, or measure based on the number, stereo earns its hardware cost. Many operations run both, monocular on the installed fleet and stereo on the robots.
Metric depth, commodity hardware, and a commercial-safe stack you can deploy today.
Stereo produces absolute distances from a known baseline, so a robot knows it has 2.8 meters to stop, in meters. That is the difference that lets a machine move, grasp, and measure on the number rather than ordering near from far.
A calibrated pair of commodity cameras costs a fraction of a lidar unit and captures color and texture lidar cannot. For cost-sensitive robots working at room-to-warehouse distances, stereo closes the gap between a webcam and a lidar.
Pair a Luxonis OAK-D, Stereolabs ZED, or Intel RealSense with RF-DETR and Roboflow Inference to get metric depth per detection now, on hardware you can buy today, instead of waiting for the September 2026 release.
RF-DETR, the recommended detector, ships under the permissive Apache 2.0 license. YOLO-StereoDepth licensing is unannounced and previous YOLO releases shipped under AGPL-3.0. Since robotics deployments are almost always commercial, build on a license you can trust.
Half the Fortune 100 build computer vision with Roboflow, with detection deployed on AMRs, robot arms, and automation at human scale.
Trusted by teams at BNSF, Rivian, GE Vernova, Cummins, USG, Pella, and Peer Robotics.
YOLO-StereoDepth is an announced stereo depth estimation model in the YOLO family, part of the YOLO27 generation and planned for September 2026. It computes metric depth from two cameras using binocular disparity, the same principle as human vision: two cameras a known distance apart capture the same scene, and depth is computed from the difference between the views. Because the baseline is known, stereo depth produces absolute, metric distances (1.42 meters, not closer than the shelf behind it), which is why it is positioned as a camera-native alternative to lidar for robotics.
If the cameras are already on the wall and you need to know what is closer to what, monocular depth (like Depth Anything 3) gets you there without new hardware, giving relative depth that requires calibration for real units. If a machine has to move, grasp, or measure based on the number, stereo earns its hardware cost, producing metric distances out of the box from a calibrated pair that typically costs a few hundred dollars per unit. Monocular is best for retrofits on installed cameras, proximity ranking, and monitoring; stereo is best for robotics, grasping, dimensioning, and anything that acts on real units. Many operations run both.
Use a stereo depth camera such as the Luxonis OAK-D, Stereolabs ZED, or Intel RealSense, which compute metric depth onboard and output an aligned depth frame alongside the RGB image. Run a detector on the RGB stream with Roboflow Inference on your edge device (RF-DETR is recommended), then for each detection sample the aligned depth frame at the bounding box center, or the median over the box, to get metric distance per object. Build the logic in Roboflow Workflows to attach depth to every detection and trigger actions like a stop signal, a grasp target, or a dimension estimate.
RF-DETR, the recommended detector, is released under the Apache 2.0 license, free to use commercially with no copyleft obligations. YOLO-StereoDepth licensing has not been announced, and previous similar YOLO releases shipped under AGPL-3.0, which requires open-sourcing derivative works unless you buy a commercial license. Robotics deployments are almost always commercial, so this is worth confirming before you build on it.
Pair a stereo depth camera with RF-DETR and Roboflow to read metric distance per detection on the edge, no waiting required.
Ask the Roboflow agent about reading metric depth per detection and deploying on edge robotics hardware.
Explore depth-driven navigation, grasping, and dimensioning on edge robots.
See the monocular and stereo depth landscape and how the options stack up.
The monocular sibling: per-pixel depth from a single camera, also for September 2026.