PYCON MALAYSIA, August 25, 2019
Mark Koester | www.markwk.com
PYCON MALAYSIA, August 24, 2019
Slides and Code: github.com/markwk/ts4health
A time series is a sequence of observations taken sequentially in time.
George Box and Gwilym Jenkins, Time Series Analysis (2015, orig 1970)
The objective of time series analysis is to decompose a time series into its constituent characteristics and develop a mathematical model for each.
Pal, D. A., & Prakash, D. P. K. S. (2017). Practical Time Series Analysis
Source: Box, 2015.
measuring or documenting something about your self to gain meaning or make improvements
Related: Self-tracking, Biohacking, Data-driven life…
How to understand human health across time
or an individual self over a lifetime?
to transform science and data into better self-understanding and empowered self-improvement
Intersection of data technologies AND human health and optimization
Time Series Data Analysis
REFERENCES and APPENDIX
Focus will be on univariate, linear, discrete time series
(instead of multivariate, nonlinear, or continuous)
and assume our data/process follows a stochastic model.
What, then, is time? If no one asks me, I know what it is. If I wish to explain it to him who asks, I do not know.
Saint Augustine (AD 354-430, The Confessions)
Fortunately, we don’t need to deal with these general time problems as such, because we only need to deal with the time challenges in our data!
Part of what happens in your data is because of the effects of time, time’s order, cycles, patterns, etc.
Examples of Stationary vs. Non-Stationary
(aka effects of the time index)
(or stationary time series or stationary process)
the transformation process of decomposing and detrending ts data so non-stationary becomes stationary.
to Time Series Data
Source: Pal, 2017.
Our Focus: Within-Individual Variablity
FUTURE: Does sleep correlate or affect activity level? Or vis-versa?
SEE: Previous Speech Python For Self-Trackers.
Source: Fitbit User 1
Source: Fitbit User 2
Exploratory Data Analysis
TS Data Processing
Why do we use data visualization for initial time series analysis?
for Detecting Temporal Effects
NOTE: The prefix “auto” refers to “self” (rather than automatic)
For background see, Time_Series_Data_Visualization_with_Python.ipynb
from statsmodels.tsa.stattools import adfuller #Perform Dickey-Fuller test: print('Results of Dickey-Fuller Test:') dftest = adfuller(timeseries, autolag='AIC') dfoutput = pd.Series(dftest[0:4], index=['Test Statistic','p-value','#Lags Used','Number of Observations Used']) for key,value in dftest.items(): dfoutput['Critical Value (%s)'%key] = value print(dfoutput)
Test Statistic 0.815369 p-value 0.991880 #Lags Used 13.000000 Number of Observations Used 130.000000 Critical Value (1%) -3.481682 Critical Value (5%) -2.884042 Critical Value (10%) -2.578770
What Does this Mean? This data is not stationary!
Apple Watch 01 Sleep
Test Statistic -4.625906 p-value 0.000116 #Lags Used 6.000000 Number of Observations Used 357.000000 Critical Value (1%) -3.448801 Critical Value (5%) -2.869670 Critical Value (10%) -2.571101
fitbit_02 steps without outlier tweaking
Test Statistic -7.57 p-value 2.70 #Lags Used 2.00 Number of Observations Used 3.61 Critical Value (1%) -3.44 Critical Value (5%) -2.86 Critical Value (10%) -2.57
fitbit_02 steps with outlier tweaking
apple_watch_01: Without Averaging Out Outliers: -1.23 With Averaging Out Outliers: -7.22 fitbit_01: Without Averaging Out Outliers: -3.329097 With Averaging Out Outliers: -5.436158 fitbit_02: Without Averaging Out Outliers: -4.62 With Averaging Out Outliers: -7.57
apple_watch_01: Without Averaging Out Outliers: -1.557651e+01 With Averaging Out Outliers: -19.654423 fitbit_01: Without Averaging Out Outliers: -5.407617 With Averaging Out Outliers: -4.959970 fitbit_02: Without Averaging Out Outliers: -1.097 With Averaging Out Outliers: -4.925933
Our health data is generally stationary.
for Dealing with Time Series Data
for Time Series Data
ARIMA = Auto-Regressive Integrated Moving Average
Standard notation: ARIMA(p, d, q)
model = pm.auto_arima(train, start_p=1, start_q=1, test='adf', # use adftest to find optimal 'd' max_p=3, max_q=3, # maximum p and q m=1, # frequency of series d=None, # let model determine 'd' seasonal=False, # No Seasonality start_P=0, D=0, trace=True, error_action='ignore', suppress_warnings=True, stepwise=True) print(model.summary())
Prophet is a procedure for forecasting time series data based on an additive model where non-linear trends are fit with yearly, weekly, and daily seasonality, plus holiday effects. It works best with time series that have strong seasonal effects and several seasons of historical data. Prophet is robust to missing data and shifts in the trend, and typically handles outliers well.
Applied to Health and Self
Can we model the data with ARIMA?
Can we model health data better with Prophet?
Why TS Matters, Next Steps and Future Research
Slides and Code: github.com/markwk/ts4health
“In God we trust, all others bring data.” (W. Edwards Deming)
Find me online at www.markwk.com!