Time Series Forecasting with Python and statsmodels

Learn how to build accurate time series forecasting models using Python's statsmodels library. Master ARIMA, SARIMA, and seasonal decomposition techniques.

Time series forecasting helps businesses predict future trends based on historical data. Sales projections, demand planning, and resource allocation all depend on accurate forecasts. Python’s statsmodels library provides tools for building these models, from basic ARIMA to seasonal SARIMA models.

This tutorial walks through the complete process of time series forecasting. You’ll learn how to test for stationarity, choose model parameters, and validate your predictions. Each step includes working code examples you can adapt for your own datasets.

Prerequisites

You need Python 3.8 or higher with these libraries installed:

pip install statsmodels pandas matplotlib numpy

Basic familiarity with pandas DataFrames helps. You should know how to load CSV files and access columns. Understanding standard deviation and mean makes interpreting results easier.

Download a sample dataset or prepare your own. The examples use monthly sales data, but any time-indexed dataset works. Your data needs a date column and at least one numeric variable to forecast.

Step 1: Loading and Exploring Time Series Data

Start by loading your data into a pandas DataFrame. Set the date column as the index to enable time series operations:

import pandas as pd
import matplotlib.pyplot as plt

# Load data
df = pd.read_csv('sales_data.csv', parse_dates=['date'], index_col='date')

# Display basic information
print(df.head())
print(f"Dataset shape: {df.shape}")
print(f"Date range: {df.index.min()} to {df.index.max()}")

Visualize the time series to spot patterns:

plt.figure(figsize=(12, 6))
plt.plot(df.index, df['sales'], linewidth=1.5)
plt.title('Sales Over Time')
plt.xlabel('Date')
plt.ylabel('Sales')
plt.grid(True, alpha=0.3)
plt.show()

Look for three key patterns in your plot. Trend shows overall direction (up, down, or flat). Seasonality reveals repeating patterns at regular intervals. Random fluctuations appear as noise around the main pattern.

Check for missing values and irregular spacing:

# Check for missing values
print(f"Missing values: {df['sales'].isna().sum()}")

# Verify regular frequency
print(f"Frequency: {df.index.freq}")

# Resample if needed (example for monthly data)
df_monthly = df.resample('M').sum()

Handle missing data before modeling. Forward fill works for short gaps. Linear interpolation suits longer gaps with smooth trends. Remove rows only if data is missing at random.

Step 2: Stationarity Testing and Transformation

ARIMA models require stationary data. A stationary series has constant mean and variance over time. Most real-world data starts non-stationary.

Test for stationarity using the Augmented Dickey-Fuller test:

from statsmodels.tsa.stattools import adfuller

def check_stationarity(series, name='Series'):
    result = adfuller(series.dropna())
    
    print(f'\n{name} Stationarity Test:')
    print(f'ADF Statistic: {result[0]:.6f}')
    print(f'p-value: {result[1]:.6f}')
    
    if result[1] <= 0.05:
        print("Result: Stationary (reject null hypothesis)")
    else:
        print("Result: Non-stationary (fail to reject null hypothesis)")
    
    return result[1] <= 0.05

is_stationary = check_stationarity(df['sales'], 'Original Sales')

A p-value below 0.05 indicates stationarity. Values above 0.05 mean you need transformation. The null hypothesis states the series has a unit root (non-stationary).

Apply differencing to remove trends:

# First difference
df['sales_diff1'] = df['sales'].diff()

# Check again
check_stationarity(df['sales_diff1'], 'First Difference')

# Second difference if needed
df['sales_diff2'] = df['sales_diff1'].diff()
check_stationarity(df['sales_diff2'], 'Second Difference')

Plot the differenced series to confirm stationarity:

fig, axes = plt.subplots(3, 1, figsize=(12, 10))

df['sales'].plot(ax=axes[0], title='Original Series')
df['sales_diff1'].plot(ax=axes[1], title='First Difference')
df['sales_diff2'].plot(ax=axes[2], title='Second Difference')

plt.tight_layout()
plt.show()

Most series become stationary after one or two differences. Excessive differencing (more than 2) introduces noise and hurts forecast accuracy. Stop when the ADF test shows stationarity.

Log transformation helps stabilize variance before differencing:

import numpy as np

# Log transform for series with growing variance
df['sales_log'] = np.log(df['sales'])
df['sales_log_diff'] = df['sales_log'].diff()

check_stationarity(df['sales_log_diff'], 'Log + First Difference')

Step 3: Building an ARIMA Model

ARIMA stands for AutoRegressive Integrated Moving Average. Three parameters define the model: p (autoregressive order), d (differencing order), and q (moving average order).

Start by plotting ACF and PACF to identify parameters:

from statsmodels.graphics.tsaplots import plot_acf, plot_pacf

fig, axes = plt.subplots(1, 2, figsize=(14, 5))

# ACF plot helps determine q
plot_acf(df['sales_diff1'].dropna(), lags=20, ax=axes[0])
axes[0].set_title('Autocorrelation Function (ACF)')

# PACF plot helps determine p
plot_pacf(df['sales_diff1'].dropna(), lags=20, ax=axes[1])
axes[1].set_title('Partial Autocorrelation Function (PACF)')

plt.tight_layout()
plt.show()

For ACF, count significant lags that exceed confidence bands. This suggests the MA order (q). For PACF, count significant lags to determine AR order (p). The d parameter equals the number of differences needed for stationarity.

Build the ARIMA model:

from statsmodels.tsa.arima.model import ARIMA

# Define model (example: ARIMA(1,1,1))
model = ARIMA(df['sales'], order=(1, 1, 1))

# Fit model
fitted_model = model.fit()

# Display results
print(fitted_model.summary())

The summary shows coefficient estimates, standard errors, and p-values. Coefficients with p-values below 0.05 are statistically significant. Check the AIC (Akaike Information Criterion) to compare models.

Generate forecasts:

# In-sample predictions
predictions = fitted_model.predict(start=0, end=len(df)-1)

# Out-of-sample forecast
forecast_steps = 12  # Forecast 12 periods ahead
forecast = fitted_model.forecast(steps=forecast_steps)

print(f"\nForecast for next {forecast_steps} periods:")
print(forecast)

Visualize predictions against actual values:

plt.figure(figsize=(12, 6))
plt.plot(df.index, df['sales'], label='Actual', linewidth=2)
plt.plot(df.index, predictions, label='Fitted', linewidth=1.5, alpha=0.7)
plt.legend()
plt.title('ARIMA Model Fit')
plt.xlabel('Date')
plt.ylabel('Sales')
plt.grid(True, alpha=0.3)
plt.show()

Try multiple parameter combinations to find the best fit:

# Grid search for best parameters
best_aic = float('inf')
best_order = None

for p in range(0, 3):
    for d in range(0, 2):
        for q in range(0, 3):
            try:
                model = ARIMA(df['sales'], order=(p, d, q))
                fitted = model.fit()
                
                if fitted.aic < best_aic:
                    best_aic = fitted.aic
                    best_order = (p, d, q)
                    
            except:
                continue

print(f"Best ARIMA order: {best_order}")
print(f"Best AIC: {best_aic:.2f}")

Lower AIC indicates better model fit. Balance model complexity with performance. More parameters improve fit but increase overfitting risk.

Step 4: Seasonal Decomposition and SARIMA

Many time series have seasonal patterns that repeat at fixed intervals. Monthly data might show yearly cycles. Daily data could have weekly patterns. SARIMA extends ARIMA to handle seasonality.

Decompose the series to visualize components:

from statsmodels.tsa.seasonal import seasonal_decompose

# Decompose series
decomposition = seasonal_decompose(df['sales'], model='additive', period=12)

# Plot components
fig, axes = plt.subplots(4, 1, figsize=(12, 10))

decomposition.observed.plot(ax=axes[0], title='Observed')
decomposition.trend.plot(ax=axes[1], title='Trend')
decomposition.seasonal.plot(ax=axes[2], title='Seasonal')
decomposition.resid.plot(ax=axes[3], title='Residual')

plt.tight_layout()
plt.show()

Choose ‘additive’ when seasonal variation stays constant. Use ‘multiplicative’ when seasonal variation grows with trend level. The period parameter matches your seasonality cycle (12 for monthly data with yearly patterns).

SARIMA adds four seasonal parameters: P, D, Q, and s. The s parameter defines seasonal period length:

from statsmodels.tsa.statespace.sarimax import SARIMAX

# SARIMA(p,d,q)(P,D,Q,s)
# Example: SARIMA(1,1,1)(1,1,1,12) for monthly data with yearly seasonality
model = SARIMAX(df['sales'], 
                order=(1, 1, 1),
                seasonal_order=(1, 1, 1, 12))

fitted_model = model.fit(disp=False)
print(fitted_model.summary())

Seasonal differencing removes periodic patterns:

# Seasonal difference (lag = seasonal period)
df['sales_seasonal_diff'] = df['sales'].diff(12)

# Check stationarity
check_stationarity(df['sales_seasonal_diff'], 'Seasonal Difference')

Generate forecasts with confidence intervals:

# Forecast with prediction intervals
forecast_object = fitted_model.get_forecast(steps=12)
forecast_mean = forecast_object.predicted_mean
forecast_ci = forecast_object.conf_int()

# Plot forecast
plt.figure(figsize=(12, 6))
plt.plot(df.index, df['sales'], label='Historical', linewidth=2)

# Create forecast index
last_date = df.index[-1]
forecast_index = pd.date_range(start=last_date + pd.DateOffset(months=1), 
                               periods=12, freq='M')

plt.plot(forecast_index, forecast_mean, label='Forecast', 
         linewidth=2, color='red')
plt.fill_between(forecast_index, 
                 forecast_ci.iloc[:, 0], 
                 forecast_ci.iloc[:, 1], 
                 alpha=0.2, color='red', label='95% Confidence Interval')

plt.legend()
plt.title('SARIMA Forecast')
plt.xlabel('Date')
plt.ylabel('Sales')
plt.grid(True, alpha=0.3)
plt.show()

Auto ARIMA finds optimal parameters automatically:

from pmdarima import auto_arima

# Note: requires pmdarima installation (pip install pmdarima)
auto_model = auto_arima(df['sales'], 
                        seasonal=True, 
                        m=12,
                        stepwise=True,
                        suppress_warnings=True,
                        trace=True)

print(auto_model.summary())

The algorithm tests multiple parameter combinations and selects the model with lowest AIC. This saves time but sometimes misses domain-specific insights.

Step 5: Evaluating and Improving Your Forecasts

Split your data into training and testing sets:

# Use 80% for training, 20% for testing
train_size = int(len(df) * 0.8)
train_data = df['sales'][:train_size]
test_data = df['sales'][train_size:]

print(f"Training samples: {len(train_data)}")
print(f"Testing samples: {len(test_data)}")

Train on historical data and forecast the test period:

# Fit model on training data
model = SARIMAX(train_data, 
                order=(1, 1, 1),
                seasonal_order=(1, 1, 1, 12))
fitted_model = model.fit(disp=False)

# Forecast test period
forecast = fitted_model.forecast(steps=len(test_data))

Calculate error metrics:

from sklearn.metrics import mean_absolute_error, mean_squared_error

mae = mean_absolute_error(test_data, forecast)
rmse = mean_squared_error(test_data, forecast, squared=False)
mape = (abs((test_data - forecast) / test_data).mean()) * 100

print(f"Mean Absolute Error: {mae:.2f}")
print(f"Root Mean Squared Error: {rmse:.2f}")
print(f"Mean Absolute Percentage Error: {mape:.2f}%")

MAE shows average forecast error in original units. RMSE penalizes large errors more heavily. MAPE expresses error as a percentage, making it easier to interpret across different scales.

Check residual diagnostics:

residuals = fitted_model.resid

fig, axes = plt.subplots(2, 2, figsize=(12, 10))

# Residuals plot
residuals.plot(ax=axes[0, 0], title='Residuals Over Time')
axes[0, 0].axhline(0, color='red', linestyle='--', linewidth=1)

# Histogram
residuals.hist(ax=axes[0, 1], bins=30)
axes[0, 1].set_title('Residuals Distribution')

# ACF of residuals
plot_acf(residuals, lags=20, ax=axes[1, 0])
axes[1, 0].set_title('Residuals ACF')

# Q-Q plot
from scipy import stats
stats.probplot(residuals, dist="norm", plot=axes[1, 1])
axes[1, 1].set_title('Q-Q Plot')

plt.tight_layout()
plt.show()

Good residuals show no pattern (white noise). They should cluster around zero with constant variance. The Q-Q plot should follow a straight line, indicating normal distribution.

Perform Ljung-Box test for residual autocorrelation:

from statsmodels.stats.diagnostic import acorr_ljungbox

lb_test = acorr_ljungbox(residuals, lags=10, return_df=True)
print(lb_test)

High p-values (above 0.05) indicate no significant autocorrelation. Low p-values suggest the model missed some patterns in the data.

Use walk-forward validation for reliable evaluation:

predictions = []
actuals = []

# Start with minimum training window
min_train_size = 50

for i in range(min_train_size, len(df)):
    train = df['sales'][:i]
    test = df['sales'][i]
    
    model = ARIMA(train, order=(1, 1, 1))
    fitted = model.fit()
    forecast = fitted.forecast(steps=1)[0]
    
    predictions.append(forecast)
    actuals.append(test)

# Calculate metrics on walk-forward predictions
wf_mae = mean_absolute_error(actuals, predictions)
print(f"Walk-forward MAE: {wf_mae:.2f}")

This approach simulates real forecasting conditions where you update the model with each new observation.

Common Pitfalls

Overfitting with too many parameters hurts forecast performance. High-order models fit training data well but fail on new data. Start simple and add complexity only when needed. Compare training and testing errors to detect overfitting.

Ignoring domain knowledge leads to unrealistic forecasts. A retail model predicting negative sales makes no mathematical sense. Apply logical constraints based on your business context. Transform predictions back to original scale and validate ranges.

Insufficient data prevents accurate seasonal modeling. You need at least two full cycles to model seasonality. Monthly data with yearly patterns requires 24+ months. Less data means simpler models or external data sources.

Changing patterns over time break model assumptions. Economic shifts, market disruptions, or policy changes alter historical patterns. Monitor forecast errors and retrain regularly. Consider models that adapt to regime changes.

Forgetting to difference back after modeling differenced data. ARIMA with d=1 predicts differences, not levels. Integrate (cumulative sum) predictions to get actual values:

# If you modeled differenced data manually
differenced_forecast = model.forecast(steps=5)
level_forecast = df['sales'].iloc[-1] + differenced_forecast.cumsum()

Missing confidence intervals understates forecast uncertainty. Point predictions alone mislead decision-makers. Always report prediction intervals. Wider intervals indicate higher uncertainty.

Not validating assumptions produces unreliable results. Check that residuals behave like white noise. Test for heteroscedasticity (changing variance) and autocorrelation. Failed diagnostics suggest model improvements needed.

Summary

Time series forecasting with statsmodels follows a structured workflow. Load and visualize your data to understand patterns. Test for stationarity and apply transformations as needed. Choose model parameters using ACF/PACF plots or automated search.

ARIMA handles non-seasonal data through autoregressive and moving average components. SARIMA extends this to capture seasonal patterns. Both models require stationary input and provide forecast confidence intervals.

Validate your model using train-test splits and residual diagnostics. Calculate error metrics to quantify performance. Good residuals show no patterns and follow normal distribution. Walk-forward validation tests real-world forecasting conditions.

Watch for common mistakes like overfitting and insufficient data. Apply domain knowledge to sanity-check predictions. Retrain models periodically as new data arrives. Time series forecasting improves with iteration and careful evaluation.

The statsmodels library provides production-ready tools for business forecasting. Master these techniques to generate reliable predictions that support better decisions. Your next step: apply these methods to your own data and compare results across different model specifications.

Spread The Article

Share this guide

Send this article to your network or keep a copy of the direct link.

X Facebook LinkedIn Reddit Telegram

Discussion

Leave a comment

No comments yet

Be the first to start the conversation.