View Full Text | Abstract |
Article as PDF | Print this Article |
Pubmed | PMC |
PubReader | Export to Citation |
Email Alerts | Open Access |
Exp Neurobiol 2024; 33(3): 119-128
Published online June 30, 2024
https://doi.org/10.5607/en24008
© The Korean Society for Brain and Neural Sciences
Jea Kwon1†, Moonsun Sa1†, Hyewon Kim1,2†, Yejin Seong1 and C. Justin Lee1*
1Center for Cognition and Sociality, Institute for Basic Science (IBS), Daejeon 34126,
2Department of Pre-Medicine, Eulji University School of Medicine, Daejeon 34824, Korea
Correspondence to: *To whom correspondence should be addressed.
TEL: 82-42-878-9150, FAX: 82-42-878-9151
e-mail: cjl@ibs.re.kr
†These authors contributed equally to this article.
This is an Open Access article distributed under the terms of the Creative Commons Attribution Non-Commercial License (http://creativecommons.org/licenses/by-nc/4.0) which permits unrestricted non-commercial use, distribution, and reproduction in any medium, provided the original work is properly cited.
Obesity is a growing health concern, mainly caused by poor dietary habits. Yet, accurately tracking the diet and food intake of individuals with obesity is challenging. Although 3D motion capture technology is becoming increasingly important in healthcare, its potential for detecting early signs of obesity has not been fully explored. In this research, we used a deep LSTM network trained with individual identity (identity-trained deep LSTM network) to analyze 3D time-series skeleton data from mouse models with diet-induced obesity. First, we analyzed the data from two different viewpoints: allocentric and egocentric. Second, we trained various deep recurrent networks (e.g., RNN, GRU, LSTM) to predict the identity. Lastly, we tested whether these models effectively encode obese-like motion representations by training a support vector classifier with the latent features from the last layer. Our experimental results indicate that the optimal performance is achieved when utilizing an identity-trained deep LSTM network in conjunction with an egocentric viewpoint. This approach suggests a new way to use deep learning to spot health risks in mouse models of obesity and should be useful for detecting early signs of obesity in humans.
Keywords: Obesity, Egocentric, 3D skeleton, Identity, LSTM, Behavior
Recent advancements in deep learning for motion capture systems show a promise in medical healthcare by accurately capturing 3D skeleton data over time [1-4]. These data are valuable for early diagnosis and remote monitoring, particularly for conditions related to motor movements, such as Parkinson’s disease [5-7]. However, their potential for understanding health conditions not directly related to motor movements such as obesity is yet to be evaluated.
Obesity, a prevalent lifestyle disease, is primarily linked to poor dietary habits [8, 9]. While the transition from an unhealthy diet to an obesity diagnosis can take considerable time, the traditional Body Mass Index (BMI) system, which bases obesity diagnosis on weight and height, falls short in accuracy by failing to differentiate between muscle mass and body fat [10]. Predicting the risk of diet-induced obesity (DIO) through early behavioral pattern observation presents a promising avenue for healthcare systems. While 3D skeleton data provide detailed spatiotemporal information on dynamic movements, the complexity of data structure makes it challenging to extract meaningful insights, such as disease traits [11, 12].
This study aims to detect early signs of obesity-like motion representations using time-series 3D skeleton data from DIO mouse models. Since diet tracking on a daily basis in clinical settings is infeasible, we explore the potential of deep recurrent networks (DRNs) to capture obese-like motion representations without dietary information. Inspired by previous works on deep convolutional networks (DCNs) [13, 14], we leverage DRNs to predict the identity of DIO models based on their motion data. Moreover, we applied the concept of object-centered view (i.e., allocentric) or self-centered view (i.e., egocentric) [15-17] on this dataset. We show that combining the egocentric viewpoint with deep LSTM networks trained to identify individual identities, identity-trained networks (note identity-training differs from the concept of identity-aware learning [18, 19].) effectively differentiates movement patterns between mice on a standard chow diet and those on a high-fat diet (HFD). This approach offers a promising way to identify obesity-related motion characteristics without invasive or continuous dietary monitoring.
All experiments involving animals in this study were conducted following the guidelines approved by the Institutional Animal Care and Use Committee of both the Institute for Basic Science (IBS) in Daejeon, South Korea. The mice used in this research were housed in a facility ensuring a specific pathogen-free environment. The housing conditions for these mice included a consistent 12-hour cycle of light and dark, with lights turning on at 8:00 AM. The environment was maintained at a temperature of 21°C and a humidity level between 40% and 60%. Mice had unrestricted access to both water and food throughout the study.
To establish the DIO mouse model, male C57BL/6J mice, starting at six weeks of age, were fed either a high-fat diet (HFD; consisting of 60% of calories from fat, sourced from D12492, Research Diets Inc.) or a standard chow diet (Teklad, 2018S, Envigo) for a period of 6 to 18 weeks, following protocols previously established (Fig. 1, 2a) [20-23]. All experimental procedures were carried out using age-matched control groups to ensure accuracy and reliability. These mice were subjected to weekly measurements at the same time every week, including body weight measurements, and AVATAR experiments were conducted.
For the acquisition of 3D obesity skeleton dataset, we used AVATAR system [24], a YOLO-based 3D pose estimation with multi-view images that extracts
For generation of allocentric dataset (object view), the skeletons were adjusted on spatiotemporal centroid offset. For generation of egocentric dataset (subject view), the skeletons were adjusted on anus node offset (Fig. 2b). Each trace was randomly split with a chunk size of T. Then the dataset was randomly grouped into
To address time series 3D skeleton data, we have explored the potential of DRNs (Fig. 2c): we compared feed-forward network (FFN), recurrent neural network (RNN) [25], gated recurrent unit (GRU) [26] and long-short term memory (LSTM) [27]. With batch size of N the inputs for FFN are given by 2D tensor (
After training models, the last layer hidden activations (or latent vectors) were extracted to detect obese-like motion representations, similar to previous approach using support vector classifier (SVC) (Fig. 2d) [14]. The objective of SVC is to find a linear hyperplane that discriminates the diet group (chow vs HFD). During this phase,
One interesting property of deep neural networks is their ability to capture features that are seemingly unrelated to the original task at hand during the process of learning from large datasets. For example, when trained to recognize the identity of individuals from their photos, the DCN could also develop representations related to facial expressions [14]. Similarly, when trained with natural sounds, the DCN could also detect rudimentary music [13]. In clinical settings, tracking the precise composition of subjects' diets is considerably more challenging than tracking their individual identity. Consequently, utilizing individual identity offers practical advantage to deep neural networks. Therefore, we examined whether identity-trained DRNs trained with 3D skeleton datasets can effectively capture seemingly unrelated dietary information using SVC.
We found that deep neural networks, when models were trained to the task of identity classification, significantly enhance the ability to differentiate between two dietary groups, chow vs HFD (Fig. 3). The positive correlations (Pearson's correlation analysis, Top-1 identity classification accuracy vs SVC diet classification accuracy; allocentric,
Next, we sought to explore the impact of viewpoint difference on the effectiveness of DRNs in time series data analysis. Through linear regression analysis, we demonstrated that allocentric data consistently exhibited strong linear relationships regardless of the model architecture used, with high goodness of fit values across different architectures (FFN,
While we found that an egocentric viewpoint plays a significant role in capturing shared features of both identity and dietary information with a combination of DRNs, it remains unclear how architectural difference affects the dietary classification. To investigate this, we have compared the dietary classification accuracy across models with different numbers of hidden layers and sequence lengths (Fig. 4).
The most significant impact on identity accuracy was observed to be due to differences in the viewpoint (Fig. 4). Regarding sequence length, we noted a general trend where increasing lengths tended to decrease accuracy for both identity and dietary classifications. However, deep LSTM networks demonstrated a remarkable resilience to this performance degradation compared to other networks (Fig. 4). Furthermore, this resilience was most pronounced with three hidden layers (Fig. 4a). Surprisingly, this identity-trained LSTM was more effective compared to end-to-end diet-trained LSTM (Fig. 5). These results collectively suggest that the memory cells in deep LSTM networks trained with identity may play a key role in capturing the underlying data structure which is beneficial for accurately predicting both identity and dietary habits.
Another intriguing observation is that the performance of the identity-trained LSTM, utilizing animals' 3D behavioral data, was more accurate than when using the animals' weight and period information (Fig. 1, 6). This implies that in the progression of diet-induced obesity, behavioral changes may precede weight changes.
In our proposed two-step method, an identity-trained LSTM followed by a diet-trained linear SVC demonstrates superior performance when the dataset is represented in an egocentric skeleton format. To elucidate how linear SVC effectively captures the diet feature representation extracted from the latent vectors of deep models, we have visualized the linear separability in the latent feature embedding space (Fig. 7).
For this visualization, we first applied linear discriminant analysis (LDA) to identify an axis (LD1) that maximizes the separation between two classes. LDA achieves this by maximizing the ratio of the between-class variance to the within-class variance. After determining the LD1 axis, we then utilized a modified principal component analysis (PCA) technique to identify principal components within the subspace orthogonal to LD1. This modification involved projecting the data onto a plane orthogonal to LD1 before applying PCA, thereby ensuring that the variance maximized by the first principal component is orthogonal to LD1.
While there are notable differences in how LDA and SVC approach the task of finding a linear hyperplane between classes (LDA assumes a normal distribution of data within each class, whereas SVC does not), combining these linear transformations significantly enhances our understanding of the feature distribution and relationships within the latent space. This method not only facilitates clearer visualizations but also provides deeper insights into how the model distinguishes between different classes, thus improving our understanding of the model’s effectiveness and behavior.
As demonstrated in Fig. 7, the egocentric representation notably improves linear separability between classes across various model architectures. This enhancement is primarily due to the shift from allocentric to egocentric skeletons, characterized by the absence of global geometric information. We believe this key factor significantly boosts linear separability in the latent feature embedding space by allowing the model to focus more on motion differences.
Furthermore, we evaluated the identity-trained LSTM to determine its effectiveness in classifying motion changes induced by diet. Although the results are intriguing, the mechanism by which identity-specific motion training enhances the predictive capabilities of a linear SVC remains unclear. Linear SVC aims to find a linear hyperplane that maximizes the separation between two different diet groups.
During the progression of diet-induced obesity, both weight and size can vary over time between the chow and high-fat diet (HFD) groups. To ensure that our identity-trained deep LSTM accurately learns motion differences rather than size or weight differences, we divided the original datasets into two sub-datasets: Similar Weight (SW) and Different Weight (DW), as shown in Fig. 9a. In the SW dataset, both forebody and hindbody skeleton lengths were comparable, while in the DW dataset, these measurements were significantly different, as depicted in Fig. 9b. We then cross-validated our method under four different conditions of ‘LSTM-trained’ & ‘SVC-predicted’ pairs: SW & SW, DW & DW, DW & SW, and SW & DW (Fig. 9c). In allocentric conditions, the SVC prediction performance significantly declined in cross-condition pairs (i.e., SW & DW, DW & SW), whereas in egocentric conditions, the performance was much more stable, indicating robustness to variations in body size and weight. This result suggests that the latent features encoded by the identity-trained deep LSTM are derived from motion differences, not from differences in size or weight.
In this study, our focus centered on the detection and prediction of obesity, a growing concern in modern society. Employing novel methods, we aimed to uncover motion features indicative of dietary information through the process of identity classification. By developing a 3D skeleton identity classification network for both chow and HFD models, we extracted latent vectors and utilized an SVC to evaluate the representation of dietary features within these vectors. Our findings underscore the potential of identity-trained deep LSTM networks in identifying obese-like features from time-series 3D skeleton data, shedding light on the previously elusive association between dietary habits and skeletal movement. Our research also marks the first demonstration that DRNs can capture seemingly unrelated features from original tasks, extending previous DCN studies. This indicates a broader capability of deep learning models. We foresee these models transforming clinical practice, fostering deep-learning solutions for obesity prevention and enhancing healthcare outcomes, thus promoting societal well-being.
Despite the remarkable performances demonstrated by deep LSTM networks in both identity and dietary classification tasks, our study acknowledges some limitations. Firstly, the absence of cross-subject validation. While our results showcase the LSTM's proficiency, questions persist regarding its reliance on memorization versus genuine generalization capabilities. Additionally, the identity-trained diet prediction model unexpectedly outperforms the end-to-end model, a surprising deviation from typical expectations that highlights the need for further research. Lastly, our dataset and task formulation have not yet been compared across various structures such as graph-based, convolutional, or transformer-based networks research endeavors should prioritize investigating the LSTM's ability to generalize across subjects, or the potential of different model architectures, thus providing further validation of its effectiveness in discerning dietary-related motion patterns beyond the confines of our study's experimental scope.
In this study, we explore the combination of an identity-trained LSTM with a Linear SVC to effectively distinguish differences in diet-induced egocentric motion. One key observation is that the linear separability of the latent feature embeddings, extracted from the last layer of the DRNs, is significantly influenced by the model architecture and the type of skeleton representation used (Fig. 7). The LSTM's distinctive recurrent connectivity and memory cells contribute to a consistent performance improvement across varying sequence lengths (Fig. 8). These results underscore the critical role of data representation and model architecture selection in achieving robust performance.
Controlling genetic diversity, living environments, and dietary habits in clinical human studies is challenging, and within this context, simultaneously tracking 3D motion and dietary data to study diet-induced obesity's effects adds a significant layer of complexity. In contrast, animal models enable the acquisition of tightly controlled datasets for studying long-term obesity progression. In the context of human clinical settings, obtaining personal identity information is far simpler than accurately documenting dietary composition. An identity-trained deep LSTM model can be a potential method for tracing back to the dietary patterns of patients who suffer from nutritional issues but are not necessarily obese, such as those with diabetes or fatty liver disease, etc. This enables the redesign of future dietary and behavioral strategies. However, this approach requires a significant amount of human data to set criteria in humans based on results obtained from mice. Furthermore, if advancements were made in related wearable technology capable of monitoring human joint movements, it could apply to everyone to monitor daily dietary patterns and health. These aspects underscore the capacity of the identity-trained deep LSTM model to accurately identify dietary patterns without the need to explicitly categorize diet group classifications, laying a crucial groundwork for translating these findings into human clinical practices.
This work was supported by the Institute for Basic Science (IBS), Center for Cognition and Sociality (IBS-R001-D2). Graphical abstract and
1
2
3 T: Number of frames (consecutive number of skeletons). Here,