TapTechNews August 24th news, MetaReality Lab has newly launched an AI vision model named Sapiens, which is suitable for 4 human-centered basic visual tasks such as 2D pose estimation, body part segmentation, depth estimation and surface normal prediction.
The number of parameters of these models varies, ranging from 300 million to 2 billion. They adopt a vision transformer architecture, and the tasks share the same encoder, while each task has a different decoder head.
2D pose estimation: This task includes detecting and locating the key points of the human body in a 2D image. These key points usually correspond to joints such as elbows, knees and shoulders, which helps to understand the pose and movement of a person.
Body part segmentation: This task segments the image into different body parts such as the head, torso, arms and legs. Each pixel in the image is classified as belonging to a specific body part, which is very useful for applications such as virtual try-on and medical imaging.
Depth estimation: This task is to estimate the distance of each pixel from the camera in the image, thereby effectively generating a 3D image from a 2D image. This is crucial for applications such as augmented reality and autonomous driving, where understanding the spatial layout is very important.
Surface normal prediction: This task is to predict the direction of the surface in the image. Each pixel will be assigned a normal vector indicating the direction the surface is facing. This information is very valuable for 3D reconstruction and understanding the geometry of objects in the scene.
Meta said that the model can natively support 1K high-resolution inference and is very easy to adjust for individual tasks, just by pre-training the model on more than 300 million wild human images.
Even in the case of scarce labeled data or completely synthetic data, the generated model can show excellent generalization ability on wild data.
TapTechNews attached the reference address