Mark Koester
PYCON MALAYSIA, August 25, 2019
Mark Koester | www.markwk.com
PYCON MALAYSIA, August 24, 2019
Slides and Code: github.com/markwk/ts4health
github.com/markwk/ts4health
A time series is a sequence of observations taken sequentially in time.
George Box and Gwilym Jenkins, Time Series Analysis (2015, orig 1970)
The objective of time series analysis is to decompose a time series into its constituent characteristics and develop a mathematical model for each.
Pal, D. A., & Prakash, D. P. K. S. (2017). Practical Time Series Analysis
Source: Box, 2015.
measuring or documenting something about your self to gain meaning or make improvements
Related: Self-tracking, Biohacking, Data-driven life…
Source: https://github.com/markwk/qs_mind_map
How to understand human health across time
or an individual self over a lifetime?
to transform science and data into better self-understanding and empowered self-improvement
Intersection of data technologies AND human health and optimization
Time Series Data Analysis
with Python
REFERENCES and APPENDIX
Focus will be on univariate, linear, discrete time series
(instead of multivariate, nonlinear, or continuous)
and assume our data/process follows a stochastic model.
What, then, is time? If no one asks me, I know what it is. If I wish to explain it to him who asks, I do not know.
Saint Augustine (AD 354-430, The Confessions)
Fortunately, we don’t need to deal with these general time problems as such, because we only need to deal with the time challenges in our data!
Part of what happens in your data is because of the effects of time, time’s order, cycles, patterns, etc.
Examples of Stationary vs. Non-Stationary
(aka effects of the time index)
(or stationary time series or stationary process)
the transformation process of decomposing and detrending ts data so non-stationary becomes stationary.
to Time Series Data
Source: Pal, 2017.
Our Focus: Within-Individual Variablity
FUTURE: Does sleep correlate or affect activity level? Or vis-versa?
SEE: Previous Speech Python For Self-Trackers.
Source: Fitbit User 1
Source: Fitbit User 2
Exploratory Data Analysis
TS Data Processing
Why do we use data visualization for initial time series analysis?
:
for Detecting Temporal Effects
NOTE: The prefix “auto” refers to “self” (rather than automatic)
Data_Visualization_Health_and_Self_Time_Series.ipynb
For background see, Time_Series_Data_Visualization_with_Python.ipynb
from statsmodels.tsa.stattools import adfuller
#Perform Dickey-Fuller test:
print('Results of Dickey-Fuller Test:')
dftest = adfuller(timeseries, autolag='AIC')
dfoutput = pd.Series(dftest[0:4], index=['Test Statistic','p-value','#Lags Used','Number of Observations Used'])
for key,value in dftest[4].items():
dfoutput['Critical Value (%s)'%key] = value
print(dfoutput)
Test Statistic 0.815369
p-value 0.991880
#Lags Used 13.000000
Number of Observations Used 130.000000
Critical Value (1%) -3.481682
Critical Value (5%) -2.884042
Critical Value (10%) -2.578770
What Does this Mean? This data is not stationary!
Apple Watch 01 Sleep
fitbit_02 Sleep
fitbit_02 Sleep
Test Statistic -4.625906
p-value 0.000116
#Lags Used 6.000000
Number of Observations Used 357.000000
Critical Value (1%) -3.448801
Critical Value (5%) -2.869670
Critical Value (10%) -2.571101
fitbit_02 steps without outlier tweaking
Test Statistic -7.57
p-value 2.70
#Lags Used 2.00
Number of Observations Used 3.61
Critical Value (1%) -3.44
Critical Value (5%) -2.86
Critical Value (10%) -2.57
fitbit_02 steps with outlier tweaking
apple_watch_01:
Without Averaging Out Outliers: -1.23
With Averaging Out Outliers: -7.22
fitbit_01:
Without Averaging Out Outliers: -3.329097
With Averaging Out Outliers: -5.436158
fitbit_02:
Without Averaging Out Outliers: -4.62
With Averaging Out Outliers: -7.57
apple_watch_01:
Without Averaging Out Outliers: -1.557651e+01
With Averaging Out Outliers: -19.654423
fitbit_01:
Without Averaging Out Outliers: -5.407617
With Averaging Out Outliers: -4.959970
fitbit_02:
Without Averaging Out Outliers: -1.097
With Averaging Out Outliers: -4.925933
Our health data is generally stationary.
for Dealing with Time Series Data
Tests_and_Techniques_Health_and_Self_Time_Series.ipynb
for Time Series Data
ARIMA = Auto-Regressive Integrated Moving Average
Standard notation: ARIMA(p, d, q)
from statsmodels.tsa.arima_model import ARIMA
model = ARIMA(ts_log, order=(2, 1, 0)) # set parameters here
results_AR = model.fit(disp=-1)
plt.plot(ts_log_diff)
plt.plot(results_AR.fittedvalues, color='red')
TS_Statistical_Modeling_Health_and_Self_Time_Series.ipynb
Ref: https://pypi.org/project/pmdarima/
model = pm.auto_arima(train, start_p=1, start_q=1,
test='adf', # use adftest to find optimal 'd'
max_p=3, max_q=3, # maximum p and q
m=1, # frequency of series
d=None, # let model determine 'd'
seasonal=False, # No Seasonality
start_P=0,
D=0,
trace=True,
error_action='ignore',
suppress_warnings=True,
stepwise=True)
print(model.summary())
Prophet is a procedure for forecasting time series data based on an additive model where non-linear trends are fit with yearly, weekly, and daily seasonality, plus holiday effects. It works best with time series that have strong seasonal effects and several seasons of historical data. Prophet is robust to missing data and shifts in the trend, and typically handles outliers well.
Plotting Predictions
Components Breakdown
Applied to Health and Self
CODE: Tests_and_Techniques_Health_and_Self_Time_Series.ipynb
Can we model the data with ARIMA?
CODE: TS_Statistical_Modeling_Health_and_Self_Time_Series.ipynb
:
Can we model health data better with Prophet?
Code: Health_TS_with_Prophet.ipynb
Why TS Matters, Next Steps and Future Research
www.markwk.com
datadrivenyou.com
“In God we trust, all others bring data.” (W. Edwards Deming)
Find me online at www.markwk.com!