Smart Mobile Phone Based Gait Assessment of Patients with Low Back Pain Herman Chan1, Huiru Zheng1, Haiying Wang1, Roy Sterritt1, Dave Newell2 1
Faculty of Computing and Engineering, University of Ulster, Belfast, UK 2 Anglo-European College of Chiropractic, Bournemouth, UK
Abstract— Smart mobile phones have become a ubiquitous device that many people have access to and are commonly used in daily living. This paper explores the iPhone as a device, which can potentially be a feasible alternative to the conventional methods of data collection for gait analysis and assessment. Within the iPhone, the internal accelerometer is a micro electro-mechanical system (MEMS) based sensor. This study compared the phone sensor against a commercial standalone accelerometer, Minimod developed by McRoberts. One of the most common and costly conditions that people may encounter at some point in their lives is non-specific chronic low back pain (nscLBP). Due to the nature of the condition, it is difficult to determine the aetiology and assess the status of recovery. Machine learning (ML) algorithms were implemented to determine whether patients with LBP and healthy controls could be classified based on gait characteristics recorded by the smart phone. The results show that 85.7% accuracy can be achieved from the features extracted from the iPhone accelerometer, with improvements of 88.8% when feature selection methods are applied. To investigate the feasibility of using iPhone embedded MEMS sensors, intraclass correlation coefficient (ICC) were calculated to determine agreement between the features extracted from the portable devices. A Mann-Whitney U-Test was employed to determine whether features were significantly different between the LBP and healthy subjects. It can be concluded from this study that, using the iPhone accelerometer, features can be successfully extracted with high agreement. Classification is achieved with significant accuracy to differentiate between patient and control groups. The experiments demonstrate that the iPhone and smart phone equivalents can provide an accurate and precise measurement that can be used for gait assessment and analysis.
Keywords- Smart mobile phone, portable devices, machine learning, reliability, feasibility, gait analysis.
There are many instruments that can be used to measure the body’s movements, including gait parameters, and a number of devices available for gait analysis are used in the clinical environment. These instruments can be expensive or restricted to use in a clinical or laboratory environment such as footswitches, treadmills and camera-based systems. This study aimed to look at smart mobile phone devices as an alternative technology for gait measurement and analysis. Quantifying gait characteristics has been demonstrated by LeMoyne et al.  looking into the use of iPod accelerometer placed on the lateral malleolus of the leg, near the ankle. The study investigates a single device to determine consistency and accuracy, whereas
Nishiguchi et al  uses the android device to address the issue of reliability and validity of such portable devices. The method used consisted of placing two kinds of acceleration measurement terminals, one with a smart mobile phone and the other equipped with a tri-axial accelerometer, taped together and placed over the L3 region which is close to the body’s center of mass during standing. In literature relating to feature extraction, gait authentication has provided detailed research in how to use features of gait for security and identification of an individual. For example, research on accelerometer-based biometric gait recognition was introduced by Ailisto et al. who used methods to find individual steps and measure similarity. Nickel et al. suggests that within their study the methods are efficient enough to be applied in practice; i.e., extracting statistical features and applying k-NN algorithm could be useful for classification. Gafurov et al  examined the effects of using different shoes, sideway motions and other methods while utilizing accelerometers. It has been suggested that with the accelerometer within smart mobile phones, gait authentication can be performed. Classification can be performed on the feature set extracted from gait data to differentiate gait patterns; Aminian et al  successfully predicted the incline and the speed of over ground walking from data collected during treadmill walking using neural network approach. Furthermore with activity recognition, Sekine et al  managed to classify walking stairs-up, stairsdown and on a flat surface by measuring the fractal dimensions using wavelet-based fractal analysis. In recent studies, accelerometer data was able to classify whether an individual was walking up or down stairs by 95.7% . Chronic low back pain (cLBP) is common and costly. LBP in general affects 8 out of 10 people at some point in their lives with 80-90% of pain episodes recovering within six weeks . However, a small proportion of these go on to develop more chronic conditions and these individuals consume significant amounts of health care costs. Despite there being a variety of different causes of lower back pain (mechanical, neuropathic and non-mechanical), the vast majority are of a non-specific nature. Of the 80-90% considered to be mechanical, most (6570%) are unknown causes that are often characterized by muscle strain or ligamentous injury . Preliminarily feasibility study carried out by Yang et al  showed that smart phones could be used in gait analysis. This paper extends the previous feasibility study and develops a smart mobile phone based gait assessment approach for analysis of gait patterns of LBP.
Forty subjects participated in this study, including 20 patients and 20 control subjects. The age and gender distribution of the participants is described in Table 1. In the study, 20 subjects with non-specific LBP attending a chiropractic teaching clinic between October and December 2011 were asked to wear accelerometer devices on the lower back to measure the gait pattern of subjects while they walked a short distance. Twenty control subjects matched for age and gender with no existing LBP or walking difficulties were also recruited to perform the same tasks. A smart mobile phone (iPhone) with custom software for collecting accelerometer data from the device was attached to a belt and subjects were asked to walk a short distance. Figure 1 illustrates the device placement. Twenty eight features of gait were extracted using a gait analysis toolkit developed by University of Ulster . Table 2 summarises the features extracted, which includes temporal-spatial features, root mean square (RMS) of the three axis, frequency domain features such as power spectral density of the signal (PSD), regularity and symmetry based features. These features are used for classification and reliability assessment in section III.
Frequency at 50% energy (Hz) ML
Frequency at 75% energy (Hz) ML
Frequency at 90% energy (Hz) ML
Frequency at 100% energy (Hz) ML
Intergral PSD AP
Frequency at 50% energy (Hz) VER
Frequency at 75% energy (Hz) VER
Frequency at 90% energy (Hz) VER
Frequency at 100% energy (Hz) VER
Symmetry in VER
Symmetry in AP
Stride Regularity in VER
Stride Regularity in ML
Stride Regularity in AP
Step Regularity in VER
Step Regularity in AP
Generally this study consisted of a feasibility and reliability study of using smart mobile phone for gait data collection. In addition, there were several experiments performed, to determine whether machine learning methods can be used to differentiate between categories of subjects,with statistical analysis being performed to determine significant differences in features between the two groups. Figure 1. Device placement TABLE 1. DESCRIPTION OF SUBJECTS Age Patient (Male/Female) Controlled (Male/Female)
20-29 6 (2M/4F) 8 (4M/4F)
30-39 4 (2M/2F) 4 (2M/2F)
40-49 3 (2M/1F) 3 (2M/1F)
50-59 4 (3M/1F) 4 (3M/1F)
60-65 3 (3M) 1 (1M)
A. Machine Learning Methods To examine whether features extracted from the iPhone can be used to differentiate between patient with LBP and healthy controls, the following ML methods were investigated:
KStar is an instance-based learner that classifies an instance by comparing it to a dataset of pre-classified examples. KStar is implemented using the knearest neighbours (KNN) framework but uses entropy-based distance to measure the dissimilarity between input samples. The instance-based learner assumes that similar instances will have similar classification by using distances.
Support vector machine (SVM) is based on statistical learning theory introduced by Vapnik in the early 90’s. The main idea behind the SVM is to construct a maximal separating hyper plane as the decisionmaking surface. Utilizing this decision boundary, the algorithm decides whether a new instance falls into one classification or the other.
Multilayer perception (MLP)  represents the most prominent and well-studied class of artificial neural networks in classification. It uses a back propagation learning algorithm to build a model from given data. A multi-layer neural network consists of large number of
TABLE 2. DESCRIPTION OF FEATURES Feature no.
Mean step length (m)
Intergral PSD VER
Frequency at 50% energy (Hz) VER
Frequency at 75% energy (Hz) VER
Frequency at 90% energy (Hz) VER
Frequency at 100% energy (Hz) VER
Intergral PSD ML
neurons joined together in a pattern of connections. MLP is commonly used in speech recognition, image recognition and machine translation software.
Decision Tree C4.5 (J48) - algorithm used to generate a decision tree developed by Ross Quinlan. C4.5 is an extension of Quinlan's earlier ID3 algorithm. The decision trees generated by C4.5 can be used for classification, using the concept of information entropy. All the samples in the list belong to the same class. When this happens, it simply creates a leaf node for the decision tree saying to choose that class
B. Feature Selection Methods A simple sequential selection algorithm is performed. Each feature is individually tested against their classification accuracy. The feature set is sorted by the rank of accuracy, the lowest ranked feature is removed each time and the subset is tested against the classification methods. This method is repeated until one feature remains which are the highest ranked feature from single feature selection.
The proposed correlation based feature ranking method is described in algorithm 1. Algorithm 1. Single feature prediction power ranking combine with correlation based feature selection method 1 1: Rank all the features in terms of their predictive performance. 2: Repeat 3: Calculate pair wise correlation among all feature pairs. 4: Find a pair with the highest correlation values. 5: Remove one feature with lower predictive power
and last model makes no assumptions about the methods. Also a consideration of single or average mean of the values, if each value is obtained from an individual, then the single or individual form is used. If each value is mean of multiple measurements, i.e. mean of repeated measurements of a sample is the mean of the obtained. D. Mann-whitney U-test Mann-whitney U-test  is a non-parametric statistical hypothesis test to assess whether one of two samples of observations tend to be a larger value than the other to determine non parametric significance difference. Assumptions of this method is that all observations from both groups are independent of each other, the responses are ordinal such that any two observations to determine which is the greater. There is symmetry between populations with respect to probability of random drawing of a larger observation. The U-test is used in determining whether patient group had significance difference in features. An ICC value of less than 0.40 indicates a weak reliability, and an ICC value of greater than 0.75 indicates an excellent reliability, which the values in between shows a good agreement. IV.
A. Classification Machine learning methods were investigated to determine how well each algorithm performs using WEKA . Table 3, highlights that KStar performs significantly better than SVM, MLP and Decision Tree with accuracy of 87% and ROC Area of 0.92, which is considered as excellent. Based on the observations from the ML tests, KStar was further investigated to determine whether the set of features can be reduced using methods of feature selection such as single selection method and correlation based feature selection. TABLE 3. CLASSIFICATION USING MACHINE LEAR NING METHODS ON THE IPHONE DATASET.
from the pair. 6: Estimate prediction performance of the classifier using the remaining feature subsets. 7: Until there is only one feature left or significant deterioration of prediction performance. C. Intraclass Correlation Coefficient(ICC) ICC is a measurement of agreement; the coefficient represents agreement between two or more evaluation methods on the same set of subjects . In the case of this study, the evaluation methods or rater are the different devices. The ICC provides a scalar measure of agreement between the device measurements. There are considerations in the use of ICC, 3 models exists, Model 1 assumes that each methods of measurement is different, being subsets of a larger set of methods, chosen randomly. Model 2 assumes the same methods to perform the evaluation in all cases, although these methods may be subsets of a larger set of methods. The third
Accuracy (%) ROC Area
Decision Tree 70.83 0.74
Table 4 shows that, using single selection method, higher performance of accuracy can be achieved. Using a subset of 27 features an 88.9% accuracy of classification performance was achieved. With 9 features we could achieve the level of accuracy of 87.5%, which is comparable of using the entire feature set thus reducing the amount of redundant feature that are available within the feature set. Correlation based feature selection however was unable to reduce to amount of feature as initially expected. To determine whether there were inconsistencies of features that may exist from the data collection, a feature reproducibility study is performed using ICC to determine whether the features extracted from the iPhone agree with the features extracted from the benchmark device, Minimod.
The ICC results show that 7 features were deemed unreliable (ICC