Human pose estimation for care robots using deep learning
Left: Experiment scene (this image is not used for estimation) Center: Depth data corresponding to the extracted person region, Right: Estimation result (the colors correspond to each part of the body. Credit: (c) Toyohashi University of Technology
Source: Toyohashi University of Technology
Expectations for care robots are growing against the backdrop of declining birthrates, an aging population, and a lack of care staff. As an example, for care at nursing homes and other such facilities, it is anticipated that robots will check the condition of the residents while patrolling the facility. When evaluating a person's condition, while an initial estimation of the pose (standing, sitting, fallen, etc.) is useful, most methods to date have utilized images. These methods face challenges such as privacy issues, and difficulties concerning application within darkly lit spaces. As such, the research group (Kaichiro Nishi, a 2016 master's program graduate, and Professor Miura) has developed a method of pose recognition using depth data alone (Fig. 1).
For poses such as upright positions and sitting positions, where body parts are able to be recognized relatively easily, methods and instruments which can estimate poses with high precision are available. In the case of care, however, it is necessary to recognize various poses, such as a recumbent position (the state of lying down) and a crouching position, which has posed a challenge up until now. Along with the recent progress of deep learning (a technique using a multistage neural network), the development of a method to estimate complex poses using images is advancing. Although deep learning requires preparation of a large amount of training data, in the case of image data, it is relatively easy for a person to see each part in an image and identify it, with some datasets also having been made open to the public. In the case of depth data, however, it is difficult to see the boundaries of parts, making it difficult to generate training data.
As such, this research has established a method to generate a large amount of training data by combining computer graphics (CG) technology and motion capture technology (Fig. 2). This method first creates CG data of various body shapes. Next, it adds to the data information of each part (11 parts including a head part, a torso part, and a right upper arm part), and skeleton information including each joint position. This makes it possible to make CG models take arbitrary poses simply by giving the joint angles using a motion capture system. Fig. 3 shows an example of generating data for various sitting poses.
Procedure of generating learning data. Credit: (c) Toyohashi University of Technology
By using this developed method, training data can be generated corresponding to a combination of persons with arbitrary body shapes, and arbitrary poses. So far, we have created and released a total of about 100,000 pieces of data, both for sitting positions (with/without occlusions), and for several poses in a recumbent positions. This data is freely available for research purposes (http://www.aisl.cs.tut.ac.jp/database_HDIBPL.html). In the future, we will release human models and detailed procedures for data generation so that everyone can make data easily by using them. We hope that this will contribute to the progress of the related fields.
The result of this research was published in Pattern Recognition on Saturday, June 3, 2017.
First row: These are the body part label images, Second row: This is the depth data. Credit: (c) Toyohashi University of Technology