Unveiling Sapiens: The Future of Human-Centric Vision Tasks

Published On Fri Aug 23 2024

Sapiens | Meta

Meta Reality Labs is proud to introduce Sapiens, a cutting-edge family of models designed for four essential human-centric vision tasks: 2D pose estimation, body-part segmentation, depth estimation, and surface normal prediction.

Our Sapiens models are built to seamlessly handle 1K high-resolution inference and can be effortlessly customized for specific tasks through the straightforward fine-tuning of preexisting models trained on a vast database of over 300 million in-the-wild human images.

Multi-stage cascaded deconvolution for depth map and surface normal prediction

The adaptability of our models is truly exceptional, showcasing exceptional generalization capabilities when faced with in-the-wild data, even in scenarios where labeled data is limited or entirely synthetic. Moreover, the streamlined design of our models contributes to their scalability - as we increase the parameters from 0.3 to 2 billion, the performance of the models improves across all tasks. Sapiens consistently outperforms existing benchmarks in various human-centric evaluations.