Skip to content

Replay

Module for replaying historical data.

DataSink

Bases: Protocol[T]

Interface for saving the results of a replay to a file or database.

append(timestamp, data)

Append data for the current cycle.

Parameters

timestamp: End of the time interval for which data was replayed (inclusive) data: The generated data

close()

Flush the data and clean up resources.

DataSinkProvider

Bases: Protocol[T]

Interface for the provision of DataSink.

__call__(replay_context) abstractmethod

Create a DataSink for the given replay_context.

Parameters

replay_context: Information about the replay that's about to run

Returns

DataSink[T]: Sink for the replay

DataSource

Bases: Protocol[T]

Interface for replaying historical data from a file or database.

get_next()

Return the next timestamp for which there is data.

If no data is available this should return UTC_MAX

Returns

timestamp: pd.Timestamp Timestamp of the next available data point (or UTC_MAX if no more data is available)

read_to(timestamp)

Read from the data source, all the way to the provided timestamp (inclusive).

This function is stateful and must remember the previous timestamp for which data was read.

Parameters

timestamp End of the time interval for which data is required (inclusive)

Returns

data The data for the interval (or empty if no data is found)

DataSourceProvider

Bases: Protocol[T]

Interface for the provision of DataSource.

__call__(replay_context)

Create a DataSource for the given replay_context.

Parameters

replay_context: Information about the replay that's about to run

Returns

DataSource[T]: Source for the replay

IteratorDataSourceAdapter

Bases: DataSource[T]

Adapter between an iterator of DataSource and a DataSource.

This can be used to stitch together various DataSource for incremental date range

ReplayContext dataclass

Stores the information about a replay.

Attributes

start: pd.Timestamp Start of the replay end: pd.Timestamp End of the replay. This is exclusive, the replay will stop 1ns before frequency: How often should the replay run

ReplayCycleMetrics dataclass

Metrics for each replay cycle.

ReplayDriver dataclass

Orchestrate the replay of data for dag.

This will:

  • create the relevant DataSources
  • create the relevant DataSinks
  • stream the data from the sources
  • inject the input data in the dag source nodes
  • execute the dag
  • collect the output data and pass it to the sink
  • close the sink at the end of the run

Notes

Do not call the constructor directly, use create instead