data_generator module

Time Series Data Generation Module.

This module provides functionality for generating synthetic price series data with controlled statistical properties. It’s designed as the first step in a typical time series analysis pipeline, creating test data with known characteristics.

Key Components: - PriceSeriesGenerator: Class for generating correlated price series - generate_price_series: Convenience function with simplified interface - set_random_seed: Function to ensure reproducible results

Typical Usage Flow: 1. Create a PriceSeriesGenerator instance with desired date range 2. Generate price series with specific initial values and correlations 3. Proceed with the generated data to data_processor.py for preparation

The generated price series follow a random walk with drift, with options to control cross-series correlations.

class timeseries_compute.data_generator.PriceSeriesGenerator(start_date: str, end_date: str)[source]

Bases: object

Class generates a series of prices for given tickers over a specified date range.

start_date

The start date of the price series in YYYY-MM-DD format.

Type:

str

end_date

The end date of the price series in YYYY-MM-DD format.

Type:

str

dates

A range of dates from start_date to end_date, including only weekdays.

Type:

pd.DatetimeIndex

__init__(start_date

str, end_date: str): Initializes the PriceSeriesGenerator with the given date range.

generate_correlated_prices(anchor_prices

dict, correlation_matrix: Optional[Dict[Tuple[str, str], float]] = None) -> Dict[str, list]: Generates a series of correlated prices for the given tickers with initial prices.

generate_correlated_prices(anchor_prices: Dict[str, float], correlation_matrix: Dict[Tuple[str, str], float] | None = None) Dict[str, list][source]

Create price series for given tickers with initial prices and correlations.

Parameters:
  • anchor_prices (Dict[str, float]) – Dictionary where keys are ticker symbols (e.g., ‘AAPL’, ‘MSFT’) and values are their respective initial prices.

  • correlation_matrix (Dict[Tuple[str, str], float], optional) – Dictionary specifying correlations between ticker pairs. Each key should be a tuple of two ticker symbols (e.g., (‘AAPL’, ‘MSFT’)), and each value should be the desired correlation coefficient between -1.0 and 1.0. For example: {(‘AAPL’, ‘MSFT’): 0.7, (‘AAPL’, ‘GOOG’): 0.5, (‘MSFT’, ‘GOOG’): 0.6} If None, a default correlation of 0.6 will be used for all pairs.

Returns:

Dictionary where keys are ticker symbols and values are lists

containing the generated price series for each ticker.

Return type:

Dict[str, list]

Example

>>> generator = PriceSeriesGenerator(start_date="2023-01-01", end_date="2023-01-31")
>>> anchor_prices = {"AAA": 150.0, "BBB": 250.0}
>>> correlations = {("AAA", "BBB"): 0.7}
>>> prices = generator.generate_correlated_prices(anchor_prices, correlations)
timeseries_compute.data_generator.generate_price_series(start_date: str = '2023-01-01', end_date: str = '2023-12-31', anchor_prices: Dict[str, float] | None = None, random_seed: int | None = None, correlations: Dict[Tuple[str, str], float] | None = None) Tuple[Dict[str, list], DataFrame][source]

Generates a series of price data based on the provided parameters.

I return both a dict and a df. Supporting both means i can stop second guessing which to return.

Parameters:
  • start_date (str, optional) – The start date for the price series. Defaults to “2023-01-01”.

  • end_date (str, optional) – The end date for the price series. Defaults to “2023-12-31”.

  • anchor_prices (Dict[str, float], optional) – A dictionary of tickers and their initial prices. Defaults to {“GME”: 100.0, “BYND”: 200.0} if None.

  • random_seed (int, optional) – Seed for random number generation. If provided, overrides the module-level seed.

  • correlations (Dict[Tuple[str, str], float], optional) – Dictionary specifying correlations between ticker pairs.

Returns:

A dictionary of generated prices and a DataFrame.

Return type:

Tuple[Dict[str, list], pd.DataFrame]

timeseries_compute.data_generator.set_random_seed(seed: int = 2025) None[source]

Sets the random seed for the random module.

Parameters:

seed (int) – Seed value for random number generator.