Replay
Module for replaying historical data.
DataSink
Bases: Protocol[T]
Interface for saving the results of a replay to a file or database.
append(timestamp, data)
Append data for the current cycle.
Parameters
timestamp: End of the time interval for which data was replayed (inclusive) data: The generated data
close()
Flush the data and clean up resources.
DataSinkProvider
Bases: Protocol[T]
Interface for the provision of DataSink
.
__call__(replay_context)
abstractmethod
Create a DataSink
for the given replay_context.
Parameters
replay_context: Information about the replay that's about to run
Returns
DataSink[T]: Sink for the replay
DataSource
Bases: Protocol[T]
Interface for replaying historical data from a file or database.
get_next()
Return the next timestamp for which there is data.
If no data is available this should return UTC_MAX
Returns
timestamp: pd.Timestamp
Timestamp of the next available data point (or UTC_MAX
if no more data
is available)
read_to(timestamp)
Read from the data source, all the way to the provided timestamp (inclusive).
This function is stateful and must remember the previous timestamp for which data was read.
Parameters
timestamp End of the time interval for which data is required (inclusive)
Returns
data The data for the interval (or empty if no data is found)
DataSourceProvider
Bases: Protocol[T]
Interface for the provision of DataSource
.
__call__(replay_context)
Create a DataSource
for the given replay_context.
Parameters
replay_context: Information about the replay that's about to run
Returns
DataSource[T]: Source for the replay
IteratorDataSourceAdapter
Bases: DataSource[T]
Adapter between an iterator of DataSource
and a DataSource.
This can be used to stitch together various DataSource
for incremental date range
ReplayContext
dataclass
Stores the information about a replay.
Attributes
start: pd.Timestamp Start of the replay end: pd.Timestamp End of the replay. This is exclusive, the replay will stop 1ns before frequency: How often should the replay run
ReplayCycleMetrics
dataclass
Metrics for each replay cycle.
ReplayDriver
dataclass
Orchestrate the replay of data for dag.
This will:
- create the relevant
DataSource
s - create the relevant
DataSink
s - stream the data from the sources
- inject the input data in the dag source nodes
- execute the dag
- collect the output data and pass it to the sink
- close the sink at the end of the run
Notes
Do not call the constructor directly, use create
instead