Abstract – We propose a new method to quickly and accurately predict
3D positions of body joints from a single depth image,
using no temporal information. We take an object recognition
approach, designing an intermediate body parts representation
that maps the difficult pose estimation problem
into a simpler per-pixel classification problem. Our large
and highly varied training dataset allows the classifier to
estimate body parts invariant to pose, body shape, clothing,
etc. Finally we generate confidence-scored 3D proposals of
several body joints by reprojecting the classification result
and finding local modes.
The system runs at 200 frames per second on consumer
hardware. Our evaluation shows high accuracy on both
synthetic and real test sets, and investigates the effect of several
training parameters. We achieve state of the art accuracy
in our comparison with related work and demonstrate
improved generalization over exact whole-skeleton nearest
neighbor matching.
https://www.microsoft.com/en-us/research/wp-content/uploads/2016/02/BodyPartRecognition.pdf
Excerpt – Randomized decision trees and forests [35, 30, 2, 8] have
proven fast and effective multi-class classifiers for many
tasks [20, 23, 36], and can be implemented efficiently on the
GPU [34]. As illustrated in Fig. 4, a forest is an ensemble
of T decision trees, each consisting of split and leaf nodes.
Each split node consists of a feature f and a threshold .
To classify pixel x in image I, one starts at the root and repeatedly
evaluates Eq. 1, branching left or right according
to the comparison to threshold . At the leaf node reached
in tree t, a learned distribution Pt(cjI; x) over body part labels
c is stored. The distributions are averaged together for
all trees in the forest to give the final classification