stats_model module

Time Series Statistical Modeling Module.

This module implements various time series models for analyzing and forecasting financial and economic data, with a focus on ARIMA for conditional mean modeling and GARCH for volatility modeling. It supports both univariate and multivariate approaches.

Key Components: - ModelARIMA: ARIMA model for conditional mean forecasting - ModelGARCH: GARCH model for volatility forecasting - ModelMultivariateGARCH: Multivariate GARCH for correlation/covariance modeling - ModelFactory: Factory pattern for creating appropriate model instances

Key Functions: - run_arima: Convenience function for ARIMA modeling - run_garch: Convenience function for GARCH modeling - run_multivariate_garch: Function for multivariate GARCH analysis - calculate_correlation_matrix: Compute correlation matrices - calculate_portfolio_risk: Assess risk based on volatility and correlations

Supported Models: - ARIMA(p,d,q): For modeling conditional means - GARCH(p,q): For modeling conditional volatility - CCC-GARCH: Constant Conditional Correlation - DCC-GARCH: Dynamic Conditional Correlation with EWMA

Typical Usage Flow: 1. Start with prepared data from data_processor.py 2. Fit ARIMA models to capture conditional mean 3. Extract residuals and fit GARCH models for volatility 4. For multiple series, analyze correlations with multivariate GARCH 5. Generate forecasts and risk metrics

The models in this module follow standard econometric practices and use statsmodels and arch packages for the underlying implementations.

class timeseries_compute.stats_model.ModelARIMA(data: DataFrame, order: Tuple[int, int, int] = (1, 1, 1), steps: int = 5)[source]

Bases: object

Applies the ARIMA (AutoRegressive Integrated Moving Average) model on all columns of a DataFrame.

data

The input data on which ARIMA models will be applied.

Type:

pd.DataFrame

order

The (p, d, q) order of the ARIMA model.

Type:

Tuple[int, int, int]

steps

The number of steps to forecast.

Type:

int

models

A dictionary to store ARIMA models for each column.

Type:

Dict[str, ARIMA]

fits

A dictionary to store fitted ARIMA models for each column.

Type:

Dict[str, ARIMA]

fit() Dict[str, Any][source]

Fits an ARIMA model to each column in the dataset.

Returns:

A dictionary where the keys are column names and the values are the

fitted ARIMA models for each column.

Return type:

Dict[str, Any]

forecast() Dict[str, float | list][source]

Generates forecasts for each fitted model.

Returns:

A dictionary where the keys are the column names and the values

are the forecasted values. If steps=1, returns a float. If steps>1, returns a list.

Return type:

Dict[str, Union[float, list]]

summary() Dict[str, str][source]

Returns the model summaries for all columns.

Returns:

A dictionary containing the model summaries for each column.

Return type:

Dict[str, str]

class timeseries_compute.stats_model.ModelFactory[source]

Bases: object

Factory class for creating instances of different statistical models.

create_model(model_type

str, **kwargs) -> Any: Static method that creates and returns an instance of a model based on the provided model_type.

static create_model(model_type: str, data: DataFrame, order: Tuple[int, int, int] = (1, 1, 1), steps: int = 5, p: int = 1, q: int = 1, dist: str = 'normal', mv_model_type: str = 'cc') ModelARIMA | ModelGARCH | ModelMultivariateGARCH[source]

Creates and returns an instance of a statistical model based on the specified type.

Parameters:
  • model_type (str) – Type of model to create (“ARIMA”, “GARCH”, or “MVGARCH”)

  • data (pd.DataFrame) – Input data for the model

  • order (Tuple[int, int, int]) – (p,d,q) order for ARIMA models

  • steps (int) – Forecast horizon for ARIMA models

  • p (int) – GARCH order parameter

  • q (int) – ARCH order parameter

  • dist (str) – Error distribution for GARCH models

  • mv_model_type (str) – Type of multivariate GARCH model (“cc” or “dcc”)

Returns:

The created model instance

Return type:

Union[ModelARIMA, ModelGARCH, ModelMultivariateGARCH]

Raises:

ValueError – If an unsupported model type is provided

class timeseries_compute.stats_model.ModelGARCH(data: DataFrame, p: int = 1, q: int = 1, dist: str = 'normal')[source]

Bases: object

Represents a GARCH model for time series data.

data

The input time series data.

Type:

pd.DataFrame

p

The order of the GARCH model for the lag of the squared residuals.

Type:

int

q

The order of the GARCH model for the lag of the conditional variance.

Type:

int

dist

The distribution to use for the GARCH model (e.g., ‘normal’, ‘t’).

Type:

str

models

A dictionary to store models for each column of the data.

Type:

Dict[str, arch_model]

fits

A dictionary to store fitted models for each column of the data.

Type:

Dict[str, arch_model]

fit() Dict[str, Any][source]

Fits a GARCH model to each column of the data.

Returns:

A dictionary where the keys are column names and the values

are the fitted GARCH models.

Return type:

Dict[str, Any]

forecast(steps: int) Dict[str, float][source]

Generates forecasted variance for each fitted model.

Parameters:

steps (int) – The number of steps ahead to forecast.

Returns:

A dictionary where keys are column names and values are the forecasted variances for the specified horizon.

Return type:

Dict[str, float]

summary() Dict[str, str][source]

Returns the model summaries for all columns.

Returns:

A dictionary containing the model summaries for each column.

Return type:

Dict[str, str]

class timeseries_compute.stats_model.ModelMultivariateGARCH(data: DataFrame, p: int = 1, q: int = 1, model_type: str = 'cc')[source]

Bases: object

Implements multivariate GARCH models including CC-GARCH and DCC-GARCH.

fit_cc_garch() Dict[str, Any][source]

Fit Constant Conditional Correlation GARCH model.

fit_dcc_garch(lambda_val: float = 0.95)[source]

Fit Dynamic Conditional Correlation GARCH model using EWMA for correlation.

Parameters:

lambda_val – EWMA decay factor

Returns:

Dictionary with DCC-GARCH results

timeseries_compute.stats_model.calculate_correlation_matrix(standardized_residuals: DataFrame) DataFrame[source]

Calculate constant conditional correlation matrix from standardized residuals.

Parameters:

standardized_residuals (pd.DataFrame) – DataFrame of standardized residuals from GARCH models

Returns:

Correlation matrix

Return type:

pd.DataFrame

timeseries_compute.stats_model.calculate_dynamic_correlation(ewma_cov: Series, ewma_vol1: Series, ewma_vol2: Series) Series[source]

Calculate dynamic conditional correlation from EWMA covariance and volatilities.

Parameters:
  • ewma_cov (pd.Series) – EWMA covariance between two series

  • ewma_vol1 (pd.Series) – EWMA volatility of first series

  • ewma_vol2 (pd.Series) – EWMA volatility of second series

Returns:

Dynamic conditional correlation

Return type:

pd.Series

timeseries_compute.stats_model.calculate_portfolio_risk(weights: ndarray, cov_matrix: ndarray) tuple[source]

Calculate portfolio variance and volatility for given weights and covariance matrix.

Parameters:
  • weights (np.ndarray) – Array of portfolio weights

  • cov_matrix (np.ndarray) – Covariance matrix

Returns:

(portfolio_variance, portfolio_volatility)

Return type:

tuple

timeseries_compute.stats_model.calculate_stats(series: Series, annualization_factor: int = 250) Dict[str, float][source]

Calculate comprehensive descriptive statistics for a time series.

This function computes a comprehensive set of statistical measures commonly used in financial time series analysis, including central tendency, dispersion, distribution shape, and annualized volatility metrics.

Parameters:
  • series (pd.Series) – Time series data to analyze. Should contain numeric values.

  • annualization_factor (int, optional) – Factor used to annualize volatility. Common values: - 250: For daily financial data (trading days per year) - 252: Alternative daily factor accounting for holidays - 52: For weekly data - 12: For monthly data - 4: For quarterly data - 1: For annual data or no annualization Defaults to 250.

Returns:

Dictionary containing comprehensive statistics:
  • ’n’: Number of observations in the series

  • ’mean’: Arithmetic mean of the series

  • ’median’: Median value (50th percentile)

  • ’min’: Minimum value in the series

  • ’max’: Maximum value in the series

  • ’std’: Standard deviation (sample standard deviation)

  • ’var’: Variance (sample variance)

  • ’skew’: Skewness - measure of asymmetry (0 = symmetric)

  • ’kurt’: Excess kurtosis - measure of tail heaviness (0 = normal)

  • ’annualized_vol’: Annualized volatility (std * sqrt(annualization_factor))

  • ’annualized_return’: Annualized return (mean * annualization_factor)

  • ’sharpe_approx’: Approximate Sharpe ratio (annualized_return / annualized_vol)

Return type:

Dict[str, float]

Raises:
  • ValueError – If the series is empty or contains no numeric data

  • TypeError – If the series contains non-numeric data that cannot be converted

Example

>>> import pandas as pd
>>> import numpy as np
>>>
>>> # Daily stock returns
>>> returns = pd.Series(np.random.normal(0.001, 0.02, 252))
>>> stats = calculate_stats(returns, annualization_factor=252)
>>> print(f"Annualized Return: {stats['annualized_return']:.2%}")
>>> print(f"Annualized Volatility: {stats['annualized_vol']:.2%}")
>>> print(f"Sharpe Ratio: {stats['sharpe_approx']:.2f}")
>>>
>>> # Monthly data
>>> monthly_data = pd.Series([0.02, -0.01, 0.03, 0.01, -0.02])
>>> monthly_stats = calculate_stats(monthly_data, annualization_factor=12)

Note

  • Skewness interpretation: >0 (right tail), <0 (left tail), =0 (symmetric)

  • Kurtosis interpretation: >0 (heavy tails), <0 (light tails), =0 (normal)

  • Sharpe ratio calculation assumes zero risk-free rate for simplicity

  • For non-return data, annualized metrics may not be meaningful

timeseries_compute.stats_model.construct_covariance_matrix(volatilities: list, correlation: float) ndarray[source]

Construct a 2x2 covariance matrix using volatilities and correlation.

Parameters:
  • volatilities (list) – List of volatilities [vol1, vol2]

  • correlation (float) – Correlation coefficient

Returns:

2x2 covariance matrix

Return type:

np.ndarray

timeseries_compute.stats_model.fit_arima_model(returns_series: Series, order: Tuple[int, int, int] = (1, 0, 0)) Any[source]

Fit ARIMA model to a single returns series.

Parameters:
  • returns_series – Time series of returns for a single asset

  • order – ARIMA order (p, d, q)

Returns:

Fitted ARIMA model

timeseries_compute.stats_model.fit_dcc_garch_model(returns_df: DataFrame, garch_order: Tuple[int, int] = (1, 1)) Any[source]

Fit a DCC-GARCH model to multivariate returns data.

Parameters:
  • returns_df – DataFrame of returns for multiple assets

  • garch_order – GARCH order (p, q)

Returns:

Dictionary containing standardized residuals and correlation matrix

timeseries_compute.stats_model.fit_garch_model(returns_series: Series, p: int = 1, q: int = 1) Any[source]

Fit GARCH model to a single returns series.

Parameters:
  • returns_series – Time series of returns for a single asset

  • p – Order of the GARCH terms

  • q – Order of the ARCH terms

Returns:

Fitted GARCH model

timeseries_compute.stats_model.run_arima(df_stationary: DataFrame, p: int = 1, d: int = 1, q: int = 1, forecast_steps: int = 5) Tuple[Dict[str, object], Dict[str, float | List[float]]][source]

Runs an ARIMA model on stationary time series data.

This function fits ARIMA(p,d,q) models to each column in the provided DataFrame and generates forecasts for the specified number of steps ahead. It performs minimal logging to display only core information about the model and forecasts.

Parameters:
  • df_stationary (pd.DataFrame) – The DataFrame with stationary time series data

  • p (int) – Autoregressive lag order, default=1

  • d (int) – Degree of differencing, default=1

  • q (int) – Moving average lag order, default=1

  • forecast_steps (int) – Number of steps to forecast, default=5

Returns:

  • First element: Dictionary of fitted ARIMA models for each column

  • Second element: Dictionary of forecasted values for each column

Return type:

Tuple[Dict[str, object], Dict[str, Union[float, List[float]]]]

timeseries_compute.stats_model.run_garch(df_stationary: DataFrame, p: int = 1, q: int = 1, dist: str = 'normal', forecast_steps: int = 5) Tuple[Dict[str, Any], Dict[str, float]][source]

Runs the GARCH model on the provided stationary DataFrame.

This function fits GARCH(p,q) models to each column in the provided DataFrame and generates volatility forecasts. It performs minimal logging to display only core information about the model and forecasts.

Parameters:
  • df_stationary (pd.DataFrame) – The stationary time series data for GARCH modeling

  • p (int) – The GARCH lag order, default=1

  • q (int) – The ARCH lag order, default=1

  • dist (str) – The error distribution - ‘normal’, ‘t’, etc., default=”normal”

  • forecast_steps (int) – The number of steps to forecast, default=5

Returns:

  • First element: Dictionary of fitted GARCH models for each column

  • Second element: Dictionary of forecasted volatility values for each column

Return type:

Tuple[Dict[str, Any], Dict[str, float]]

timeseries_compute.stats_model.run_multivariate_garch(df_stationary: DataFrame, arima_fits: Dict[str, Any] | None = None, garch_fits: Dict[str, Any] | None = None, lambda_val: float = 0.95) Dict[str, Any][source]

Runs multivariate GARCH analysis on the provided stationary DataFrame.

This function implements both Constant Conditional Correlation (CCC) and Dynamic Conditional Correlation (DCC) GARCH models. It either uses provided ARIMA and GARCH models or fits new ones if not provided.

Parameters:
  • df_stationary (pd.DataFrame) – The stationary time series data for GARCH modeling

  • arima_fits (dict, optional) – Dictionary of fitted ARIMA models for each column

  • garch_fits (dict, optional) – Dictionary of fitted GARCH models for each column

  • lambda_val (float, optional) – EWMA decay factor for DCC model. Defaults to 0.95.

Returns:

Dictionary containing multivariate GARCH results
  • ’arima_residuals’: DataFrame of ARIMA residuals

  • ’conditional_volatilities’: DataFrame of conditional volatilities

  • ’standardized_residuals’: DataFrame of standardized residuals

  • ’cc_correlation’: Constant conditional correlation matrix

  • ’cc_covariance_matrix’: Covariance matrix using CCC

  • ’dcc_correlation’: Series of dynamic conditional correlations

  • ’dcc_covariance’: Series of dynamic conditional covariances

Return type:

dict

Example

>>> # Create stationary returns for two assets
>>> returns = pd.DataFrame({
...     'Asset1': [0.01, -0.02, 0.015, -0.01, 0.02],
...     'Asset2': [0.015, -0.01, 0.02, -0.015, 0.01]
... })
>>> # Run multivariate GARCH analysis
>>> results = run_multivariate_garch(returns)
>>> # Access the correlation matrix
>>> print(results['cc_correlation'])
>>> # Plot dynamic correlation over time
>>> plt.plot(results['dcc_correlation'])