DeepMood: Forecasting Depressed Mood Based on Self-Reported Histories via Recurrent Neural Networks

Depression is a prevailing issue and is an increasing problem in many people's lives. Without observable diagnostic criteria, the signs of depression may go unnoticed, resulting in high demand for detecting depression in advance automatically. This paper tackles the challenging problem of forecasting severely depressed moods based on self-reported histories. Despite the large amount of research on understanding individual moods including depression, anxiety, and stress based on behavioral logs collected by pervasive computing devices such as smartphones, forecasting depressed moods is still an open question. This paper develops a recurrent neural network algorithm that incorporates categorical embedding layers for forecasting depression. We collected large-scale records from 2,382 self-declared depressed people to conduct the experiment. Experimental results show that our method forecast the severely depressed mood of a user based on self-reported histories, with higher accuracy than SVM. The results also showed that the long-term historical information of a user improves the accuracy of forecasting depressed mood.


INTRODUCTION
Depression is a prevailing mental health care problem and is a popular keyword due to the increase in mentally disordered patients including potential numbers. The WHO estimates that 676 million people in the world (nearly one in ten people) suffer from depression 1 . Current predictions by the WHO indicate that by 2030 depression will be the leading * These authors contributed equally to this work.
† The author is also affiliated with MIT Media Lab. cause of disease burden globally. In the United States, mental disorder problems are among the top five conditions for direct medical expenditure, with associated annual health care costs exceeding $30 billion [29]. Due to the lack of physical symptoms, diagnosing depression is a challenge. People cough or may have a fever when they are physically ill; these symptoms can lead them to go to hospitals for appropriate treatment. Without these physical symptoms, the signs of depression may go unnoticed. Existing studies [24] [35] show effective vital signs for depression detection. A standard method is to measure biomarkers such as serotonin to provide obvious evidence for depression. However, obtaining such biomarkers requires special apparatus and often invasive sensing. Not many people can adopt the approach to assess their mental health in daily life. Because we lack a system to reveal mental illness through physical signs, we need an external system that helps us detect depression in a noninvasive manner. This need has led to a large body of work on depressed mood detection by pervasive computing devices [3][5] [10] [30] [36][48] [50].
Early detection of depressed mood is essential to provide appropriate interventions for preventing critical situations. Despite a large amount of research on existing depressive mood prediction [3][5] [10] [30][36] [48] [50], forecasting depressed moods has not been well studied. Therefore, we define a novel task of forecasting depressed mood and develop a predictive model for this task. In this paper, we distinguish forecasting from prediction, emphasizing the meaning of predicting future mood instead of existing mood. Particularly, we focus on forecasting severe depression among several types of depressive moods in this paper. Severely depressed moods might cause irreversible damage or suicide attempts among individuals; thus, we consider the early detection of severe depression as essential to health care. Figure 1 shows the methodological relationship between this paper and previous studies. Previous studies have solved the problem of predicting mood based on individual behavioral information collected by pervasive computing devices such as smartphones. We aim to forecast severely depressive mood based on individual histories including existing mood, behavioral information, and sleeping hours. To advance previous studies, this paper assumes that the information used to forecast depressive mood can be estimated accurately. Thus, we use self-reported histories instead of estimated information. Our focus is to understand the relationship between individual histories, including mood information in and up to the present, and a person's mood in the future.
Our first research question is "Can we forecast severe depression based on individual histories" (RQ1). The question includes what kind of histories contribute to improving forecasting performance. The second research question is "How many days do we need to look back to forecast severe depression?" (RQ2). The question is derived from two motivations. The first motivation is to confirm whether incorporating distant histories improves forecasting accuracy. If individual histories in a shorter period perform well, we do not have to use long-past histories in the forecast. The second motivation is to reveal the relationship between individual mood at some point and person's moods in the past; in other words, individual mental status is not always expressed by simple Markov state models. Instead, mental status should depend on complicated patterns of an individual's history. People usually have an experience in which a "subtle" event in the past strongly affected their mood at some time. To the best of our knowledge, no empirical study has answered RQ2.
In this paper, we developed a smartphone application for self-declared depressed people to collect moods, behavioral types, and medication records in a self-reported manner. We published the smartphone application in Google Play 2 and the App Store 3 in November 2012. By September 2014, the application had been used by 24,211 users. For our depression forecasting study, we filtered out users who had not used the application more than 28 successive days to ensure adequate history duration for each user. After this preprocessing step, the total number of users in the dataset 4 is 2,382 and the total number of recorded days is 345,158. To the best of our knowledge, this is the largest dataset that researchers have used for a mental health study. This paper describes a deep learning algorithm we developed based on long short-term memory recurrent neural networks (LSTM-RNNs) [14] for using individual histories as time series features. LSTM-RNN is known for its capability for long-distance dependencies and is considered as a standard algorithm for a prediction task based on time series features. Our method separately prepares embedding layers for each categorical variables including a day-of-theweek variable in order to introduce day-of-the-week effect into the method.
Our contributions in the paper are as follows: • We tackle the severe depression forecasting problem with a large-scale longitudinal dataset collected from 2,382 users over 22 months to take an essential step toward automatic and early depression detection. • We develop an RNN-based algorithm for the depressed mood forecasting task. The algorithm introduces a day-of-the-week variable directly in order to take the day-of-the-week effect into account. • The experimental results show that individual histories from the previous two weeks are indicative to forecast severe depression.

RELATED WORK
Automatic mood detection has been well studied. A major stream of automatic mood detection is a pervasive computing approach. Previous studies have used individual physical activity information including behavioral, mobility, sleeping [38], voice acoustic [25] [34], and social patterns [28] [33]. These studies use mental health assessment questionnaires such as the PHQ-9 [10] or EMA [18] to prepare ground truth information for building predictive models based on machine learning techniques.
Mehrota et al. [30] used phone usage patterns such as calls, SMSs, and overall application usage logs to calculate the correlation between the social interaction factors and depressive states. Farthen et al. [9] used the Stu-dentLife dataset to build a depression classification that predicts users' PHQ-9 score based on features extracted from behavioral logs collected through smartphones. They categorized the users in the dataset into three groups and formalized the prediction task as a multiclass classification problem. Canzian and Musolesi [6] studies depression prediction only based on individual mobility traces. They analyzed the correlation between mobility metrics in the past time and PHQ-8 [41] scores of participants. They used SVM [8] to build a predictive model for predicting depression. Mood-Scope [22] collected a user's moods, communication, and phone usage logs to predict his or her mood based on a regression model. It used a dataset collected from 32 participants over two months in order to build a mood prediction model based on the collected information. Purple Robot [36] was developed to collect a user's location, movement, and phone usage in addition to responses to a self-reported depression survey. The authors conducted an experiment to collect a dataset from 40 participants for four months. They applied a linear regression model to show correlated features to understand the informative indicators of participants' depressive moods.
It is commonly known that individual depressive moods vary according to the day of the week. However, not many studies have empirically shown evidence of this. According to visitors to DepressedTest.com 5 , which provides an online depression test, Monday is the most depressing day and Saturday is the least depressing day of the week. MacKerron and Mourato [27] conducted an experiment using a smartphone application called Mappiness 6 to collect 100,000 data 5 http://www.depressedtest.com/ 6 http://www.mappiness.org.uk/ points of self-reported happiness. The results show that Tuesday is the least happy day of the week, followed by Monday. However, these studies do not take into account different times of the day. Golder and Macy [13] used millions of Tweets to estimate the trend of individual hourly moods over a week. They confirmed that Tuesday is the least happy day and Saturday morning is the least negative time period. However, because no study has empirically shown the trend of individual moods based on a large-scale dataset, we analyzed the trend of individual moods at three times of the day (morning, afternoon, and evening) to confirm the trend empirically. We also directly incorporated day-of-the-week information as a feature of our predictive model.
Several longitudinal experiments have been conducted for projects that aim to understand the relationship between human behaviors and mental health. The Friends and Family Study [1] is the first work in understanding how social dynamics affect many aspects of lives including health and moods. The study used an Android application called Funf 7 to collect call logs and SMS for communication, GPS trajectories for mobility, Bluetooth proximity for face-to-face communication, and smartphone usage. Weekly self-reported surveys were conducted to collect individual health statuses and moods. The 130 participants in the study were either couples or families who were members of a young-family residential living community adjacent to a major research university in North America. A part of the study, Moturu et. al. [32] showed that social interaction revealed their moods. The StudentLife Study [49] conducted a study that uses passive and automatic data collected from a class of 48 Dartmouth students via their smartphones over 10 weeks to assess their mental health. The study collected stress levels and positive affect information from participants to understand fine-grained mood transitions throughout the semester. The study revealed how academic activity affected students' affective moods [49]. The SNAPSHOT Study 8 collected sleep, network, affect, performance, stress, and health information from 200 socially connected undergraduate students over 30 days. It showed that predicting multiple factors such as depressed mood, happiness, health, and anxiety simultaneously via multitask learning improved prediction accuracy [17], indicating these factors share unified indicators. It also showed that sleep irregularity is strongly correlated to individual moods [37].
These existing studies tried to detect existing moods, including depression, based on social interaction, behavioral, and sleeping patterns. To the best of our knowledge, no study has tried to apply machine learning to build a predictive model to forecast depressive moods based on individual histories.
Our paper differs from existing studies in four ways: (1) This is the first work to build a predictive model to forecast individuals' depressed moods based on individual histories.
(2) We use a large-scale dataset collected from more than 2,000 depressed users without limiting the participants' diversity. The dataset should cover a wide variety of depressed people in order to conduct a general depression study. (3) This is the first mental health study to apply a state-ofthe-art deep learning technique to a prediction problem. It enables us to take a step forward in understanding how the features used for predictive models contribute to forecasting depressive moods. (4) We directly incorporate the day-ofthe-week effect to improve the performance of the predictive model.

DATA COLLECTION
We developed a smartphone application called Utsureko for collecting data from users. Figure 2 presents screenshots of the application. The application provides an intuitive interface for users to input their moods in different time slots (morning, afternoon, and evening) each day. It also allows users to record their action type (i.e., go to office, go to work, work at home, do nothing at home, and sick in bed) and sleeping time, including bedtime and wake-up time. Users can input and update their records voluntarily at any time. The recorded information can be visualized in an aggregated manner so that users can look back at their records to capture general trends.
The application design followed existing psychiatric studies. Many mental assessment tools are described in the clinical psychology literature [20] [43][41] [40]. A common method is to ask a person to recognize his or her feelings via predefined questionnaires such as the PHQ-9 [40] and GAD-7 [40]. One example question from the PHQ-9 is as follows: "Over the last two weeks, how often have you been bothered by any of the following problems?-Little interest or pleasure in doing things (0, 1, 2, 3)." People answer the questions with multigraded scores. This approach ensures that people answer questions in a consistent manner. However, the major disadvantage of this approach is that recalled information is influenced by reconstructive processes that reduce the information's accuracy [31]. A recent trend, therefore, is to adopt an ecological momentary assessment (EMA) approach [43] to conduct mental assessments. The concept of EMA is to use open-ended questions to capture an individual's mental status close to the time that symptoms are detected. The advantage of EMA is that it tracks users' mental statuses with more fine-grained resolution than conventional mental health assessment tools such as PHQ-9. However, it has dis- We investigated the trade-off and concluded that a hybrid of conventional assessment and EMA was a good solution to develop our smartphone application to collect individual self-reported histories. The application has three aims: (1) to ask users predefined closed questions with single/multigraded answering options; (2) to collect individual moods at three periods of a day; (3) and to let users record their histories whenever they want. We decided to adopt aim (3) to collect longitudinal logs from a large number of depressed people without burdening them. In this paper, we also analyze the distribution of submission time to confirm that users periodically record their moods at different periods of a day. Table 1 describes the collected self-reported information. The first category reflects the user's moods. We designed four categories of moods (fine, fair, depressed, and irritated). In addition to the depressed option, we prepared an irritated option, because uncontrollable irritability and anger are caused by certain types of depression [2]. Moods can be separately reported for three different parts of a daymorning, afternoon, and evening. Collecting moods at different periods of the day aims to capture the fine-grained transition of moods of users.

Preliminaries
Definition 1. A user experiences a severe depression day if the user has negative feelings all day and exhibits inactive behavior in which he or she avoids leaving home. Definition 2. (n, k)-day severe depression forecast is a forecasting task that classifies whether a user will experience at least one severe depression day in the coming n days based on the user's history during the last k days. Problem Formulation. Given individual histories, the task is to forecast the existence of severe depression day in n days. For instance, (1, 14)-day severe depression forecast is used to forecast a user's severe depression on a coming day based on the histories of the last two weeks. User histories are considered to be the time series of user logs. A feature vector extracted from the input of a user u in day t is denoted as x u t ∈ X u , where X u denotes a sequence of feature vectors of user u. The severe depression label y u t is derived from user logs of day t if n = 1, or otherwise a period from t to t + n − 1.
Severe depression forecasting aims to forecast the severe depression of an individual user y u t based on the user's histories in the previous k days (x u t−k−1 , x u t−k , . . . , x u t−1 ). For instance, k = 14 is set to use information from the last two weeks for depression forecasting. Commonly used mental health questionnaires such as the PHQ-9 and GAD-7 asks patients how they have felt over the last two weeks to answer the items. Thus, in this paper, we set k = 14 and evaluate the forecasting performance by changing k values to answer RQ2.

Our method
With our large dataset, we used a supervised machine learning technique to build a predictive model for forecasting severe depression. To use individual histories as time series data, we needed a technique that was capable of incorporating dependencies from previous states. Among the techniques that are capable of handling sequential features, RNNs with hidden LSTM units [14] are known to be powerful models for learning from sequential data. They effectively model varying length sequences and capture longrange dependencies. Following the application of RNNs to clinical diagnoses classification [23], we present the first study to evaluate the ability of LSTM to forecast severe depression. The method also recognizes patterns in time series of self-reported user histories by visualizing representative patterns of nodes in a hidden layer.
The network architecture of our method is described in Figure 3. The architecture passes over all inputs in chronological order from t − k − 1 to t − 1 and generates an output y t at the final sequence step of t. The LSTM layer is used to propagate historical information along with the features of the next step until it reaches the fully connected layer. The architecture can have more than one LSTM layer to empower the capability of long distance dependencies. The fully connected layer receives the propagated information. We apply the dropout technique [42] to avoid an overfitting problem. Because we have multiple categorical variables with distinct semantics, we introduce embedding layers for each categorical value to the model in order to convert categorical variables into a dense vectorial representation. This not only improves the model's performance but also provides a semantic interpretation of categorical variables   Table 1. We extract bedtime and wake-up time into sleeping hour and sleeping irregularity features. Sleeping hour is divided into 24 bins. We create a sleeping irregularity feature to capture the irregularity in sleeping hours, following previous studies that showed the relationship between sleeping irregularity and depressive moods. The sleeping irregularity feature takes 1 if the sleeping hour during day t is more different than 3 hours compared to the sleeping hour during day t − 1, and 0 otherwise.
In addition to the categorical variables appearing in individual histories, we introduce a day-of-the-week variable and add a corresponding embedding layer to the model. Conventional studies have shown that people's mental health depends on a day of the week [13] [44]. Thus, incorporating day-of-the-week information improves the model performance. Furthermore, it enables us to distinguish the same mood when it is experienced during the same period of time on different days of the week. For instance, being depressed on Friday evening should have a different meaning compared to being depressed on Sunday evening.

EVALUATION
In this section, we analyze the collected dataset to confirm that the dataset is reliable to conduct our study. Then we describe the experimental results and discuss our two research questions. Figure 4 presents the distribution of user input hours in the 7 day x 24 hours matrix. The matrix shows that users input data into their logs in the morning (6 a.m.-12 p.m.) and at night (9 p.m.-12 a.m.), indicating that users record their logs accordingly. This ensures that our application design is successful in keeping users' motivation to log their histories voluntarily. Figure 5 shows the distribution of moods at each part of the day. Higher values denote more positive feelings during the periods. We confirm that the general trend of "afternoon > evening > morning" and "weekend > weekday," except for Sunday evening. The drop in the positive feeling ratio on Sunday evening shows so-called Sunday night blues. This coincides with the results estimated from millions of Tweets in [13]. The figure also supports the necessity of collecting user records for each part of the day separately because individual mood strongly depends on a part of the day.

Labeled Dataset Creation
We defined severe depression day and assigned labels based on individual histories. A severe depression day is a day when the user experienced negative feelings (i.e., depressed or irritated) all day AND user was physically inactive (i.e., do nothing at home or sick in bed.) The distribution of the severe depression flags of the users is shown in Figure 6.
We used the dataset collected by the smartphone application described in Section 3. Successive histories of 28 days for each user were extracted as instances. User data do not overlap in the histories to avoid the unexpected leakage of the test dataset into a training dataset. We set the size as 28 to conduct different experiments (i.e., n = 1, 3, 7 and k = 14, . . . , 21.) in a consistent manner.
For label extraction, we assigned severe depression labels for each block. Please note that the label extraction method differs for each severe depression task. For the (1, 14)-severe depression forecasting task, the 8th slot to the 21st slot were used to extract features and the 22nd slot was used for label extraction. For (n, k) = (3,14), we used the 22nd, 23rd, and 24th slots to calculate a severe depression label for the instances. This manner ensured that we used the same histories for different settings of n and k for comparative study. The dataset description is shown in Table 2.

RQ 1: Can we forecast severe depression?
We conducted a comparative experiment to evaluate the forecasting performance of LSTM-RNN to answer our RQ1.  We compared two different feature sets. The first feature set severe only employs the user's severe depression history to forecast severe depression. The feature set can be regarded as containing only minimal information of individual histories, assuming that a user only inputs his or her mood once a day by an aggregated manner. The other feature set all uses all the information of individual histories shown in Table 1 in addition to the day-of-the-week variable.
Our LSTM-RNN used three LSTM layers with 128 memory cells with dropout of 0.1, and a single fully-connected layer with 64 nodes. It was trained for 5 epochs using RM-SProp [45].
We used SVM as a baseline learning algorithm because it has been used in previous studies for mood prediction [6][16] [38]. Because SVM cannot directly handle time series data, we concatenated time series feature values to create a single feature vector, allowing SVM to learn a predictive model based on the same information. We used the (all) feature set to learn SVM. C parameter was selected from {10 −3 , 10 −2 , . . . , 10 1 } by grid search.
For the experiment, we evaluated the methods for different n values (n = 1, 3, 7) and a fixed k value (k = 14) for the (n, k)-day severe depression forecast task. We used AUC-ROC as an evaluation metric because it is insensitive to imbalanced class distributions [11]. A random guessing classifier achieves 0.5 AUC-ROC. We conducted five-fold cross validation to calculate the evaluation measure. In this experiment, the cross validation split the dataset into training and test datasets on a user basis; The strategy ensures to evaluate the forecasting performance for unseen users whose histories are not used to train a predictive model in each fold.
The results are shown in Table 3. Remarkably, every classifier achieves significantly higher than random guessing (AUC-ROC=0.5). The results confirm the feasibility of forecasting severe depression based on individual histories. LSTM-RNN (all) outperforms SVM (all) regardless of n values. From the results, we confirm that LSTM-RNN appropriately processes time series information to forecast severe depression. Compared by feature sets, all shows higher AUC-ROC than severe only. Therefore, we confirm that fine-grained information such as moods at different parts of the day and action type inform the forecasting of severe depression.
As n increases from 1 to 7, AUC-ROC values of all the methods become lower. This indicates that it is difficult to forecast n-day severe depression with a large n value compared with a small n value. This follows our intuition that forecasting a longer future is more difficult than forecasting a shorter future. The general trend of the results is consistent over different n in Table 3.
We calculated the feature importance of LSTM-RNN (all) based on Mean Decrease AUC-ROC, which is the decrease value of AUC-ROC on the test dataset when the values of a selected feature are randomly permuted among all instances. This is commonly used for calculating feature importance, especially for ensemble classifiers [46] because it is not straightforward to evaluate feature importance for nonlinear classifiers including LSTM-RNN. Figure 8 contains three figures representing feature importance distributions for different n values (n = 1, 3, 7). Figure 8(a) and Figure 8(b) present evening mood, which contributes to forecasting performance. For one-day forecasting, the most recent mood is the most informative signal to forecast severe depression for the coming day. On the other hand, the feature importance of seven-day forecasting shows that morning mood contributes most greatly to the predictive model. This indicates that morning moods in individual histories are more indicative of severe depression in the distant future. The results indicate that the day-of-the-week variable contributes to the forecasting performance well for one-day forecasting but not for the others. We believe the reason for this rests on the current definition of n-day severe depression, which does not point to a single day in the future if n > 1. Thus, the predictive model cannot utilize the day-of-the-week information.
Remarkably, sleeping features and medical features generally do not show a positive contribution to the forecasting performance. 1-day forecasting is the only case in which both sleeping hours and sleeping irregularity show a positive contribution. Medical features and nocturnal awakening do not show a positive contribution to the forecasting performance in any case. These results are inconsistent with previous studies that have shown a correlation between sleeping irregularity and negative moods [37]. We consider the absence of contribution by these features is caused by building a single predictive model among all users in the dataset. Some users might suffer from severe depression without attending hospitals or taking medicines. This result thus implies that stratifying users to create multiple predictive models for each group might improve forecasting performance. Figure 8(d) presents the learning curves of LSTM-RNN (all) for different n settings. Three curves consistently improve their performance as the ratio increases. The result verifies the appropriate learning of severe depression forecasting models by LSTM-RNN. The improvements saturate around 30% of training data in three models. This indicates that we need a large-scale dataset of about 700 users to utilize LSTM-RNN for depression forecast.

RQ 2: How many days should we look
back?
To answer our RQ2, we evaluated our model with different k (k = 1, 2, . . . , 21) to compare the forecasting performance .886 * * * (.020) .860 * * * (.031) .842 * * * (.044)  of predictive models. Here, k = 1 means that a predictive model uses only the day before a target day for feature extraction, and the k = 21 setting uses data from the previous three weeks for feature extraction. If information in the last several weeks contributes to forecasting performance, the performance should keep improving as k increases. We used LSTM-RNN (all) and fixed n as 1 for the experiment. Figure 9 summarizes the results. The performance curve shows a rapid increase in the first week and a slower increase in the second and third weeks. The improvement apparently saturates as k increases. After k is higher than 14, the performance curve drops until k reaches 21. The results indicate that the history of the previous two weeks is sufficient to forecast future severe depression. This finding coincides with the design of commonly used mental health assessment such as the PHQ-9 and GAD-7, whose first questions are, "Over the last two weeks, how often have you been bothered by any of the following problems?" This consistency empirically ensures that the assessments' questions are reasonable for capturing individual mental health status. To precisely analyze the contribution of every single day in the histories of the previous two weeks, we also calculated the importance of each day of the last k days based on Mean Decrease AUC-ROC in the same manner as the feature importance analysis in 5.3. Instead of selecting a single feature, this analysis selected a single day from last k days to randomly permute the feature values correspond to the selected day. This analysis shows the contribution of a day in last k days for depression forecasting performance.
The results are shown in Figure 10. This figure indicates that user logs from the most recent day are most informative features for depression forecasting. This follows the intuition that a present mental status strongly depends on the previous state. However, user logs from other days also have reasonably high values. The finding is consistent with the results in Figure 9. The implications here are twofold: (1) there exist long-range dependencies in depression forecasting with user's historical logs, and (2) the long-range dependen-  cies can be expressed by a finite number of patterns among all the users so that LSTM-RNN performs well in forecasting severe depression in the test dataset. Assumption (1) is also supported by the results in Figure 9. Interestingly, a present mental status depends on not only the most recent day but also other days. Long-range dependency is especially apparent if the day is the same day of the week as the target day.
To confirm the learned representation of LSTM-RNN, we looked up the representative input representation for each node in the fully-connected layer. We restored input representations that maximally activate a node in order to interpret the obtained high-level representations. The extracted patterns are shown in Figure 11. The representative patterns of the severe depression class include negative feelings (depressed or irritated) and inactive behaviors (do nothing at home, sick in bed), whereas the nonsevere depression class is associated with more positive feelings and more active behaviors in general. However, by focusing on moods in the afternoon (EDay in Figure 11), we confirm that representative patterns of the severe depression class still include positive feelings (i.e., fine or fair.) Additionally, the representative patterns of the nonsevere depression class include some negative feelings. These remarkable results do not match our intuition and indicate that negative feelings in the afternoon do not always indicate future severe depression.
As another example of interpretability of our method, we show the trained embedding vector of the day-of-the-week variable. Figure 12 presents the embedding vectors in twodimensional space after converting original embedding vectors into two-dimensional vectors by singular value decomposition (SVD) [19]. Two groups are shown in the figure. Friday, Saturday, and Sunday are happier days than the other days, namely weekdays except Friday. This result follows the findings by this paper and previous studies. The model only uses the severe depression labels as target values to infer the day-of-the-week embedding representation.

CONCLUSION
This study tackled a novel type of depression prediction task-namely, depression forecast based on individual histories. Our study used 345,158 total days of logs collected from more than 2,382 users over 22 months. With the largescale data, we utilized the power of deep learning to build a depression forecasting model. Our model separately embedded categorical variables, including our proposed day-ofthe-week variable.
Experimental results confirmed that our framework was able to forecast severe depression based on individual histories with high accuracy. The results showed that fine-grained information such as reporting moods in different parts of the day improved forecasting performance. The capability of LSTM-RNN to incorporate long-range dependencies of time series helped us determine the contribution of distant past histories up to the previous two weeks. The representative patterns of the model further showed that having a depressed mood only in the afternoon is not always a sign of future severe depression.
This study relied on self-reported histories. However, previous studies have used many methods to estimate individual behavioral patterns, including sleeping and moods, based on behavioral and mobility information collected by smartphones and/or wearable devices. As we have shown in Figure 1, our aim is to bridge the gap between previous studies and depression forecasting. We believe that this paper is a good step toward automatic depression detection, which