Data streams¶

class fuel.streams.AbstractDataStream(iteration_scheme=None, axis_labels=None)[source]¶

Bases: object

A stream of data separated into epochs.

A data stream is an iterable stream of examples/minibatches. It shares similarities with Python file handles return by the open method. Data streams can be closed using the close() method and reset using reset() (similar to f.seek(0)).

Parameters:

iteration_scheme (IterationScheme, optional) – The iteration scheme to use when retrieving data. Note that not all datasets support the same iteration schemes, some datasets require one, and others don’t support any. In case when the data stream wraps another data stream, the choice of supported iteration schemes is typically even more limited. Be sure to read the documentation of the dataset or data stream in question.
axis_labels (dict, optional) – Maps source names to tuples of strings describing axis semantics, one per axis. Defaults to None, i.e. no information is available.

iteration_scheme¶

The iteration scheme used to retrieve data. Can be None when not used.

Type:	`IterationScheme`

sources¶

The names of the data sources returned by this data stream, as given by the dataset.

Type:	tuple of strings

produces_examples¶

Whether this data stream produces examples (as opposed to batches of examples).

Type:	bool

close()[source]¶: Gracefully close the data stream, e.g. releasing file handles.

get_data(request=None)[source]¶

Request data from the dataset or the wrapped stream.

Parameters:	request (object) – A request fetched from the request_iterator.

Notes

It is possible to build a usable stream in terms of underlying streams for the purposes of training by only implementing get_epoch_iterator, thus this method is optional.

get_epoch_iterator(as_dict=False)[source]¶

iterate_epochs(as_dict=False)[source]¶

Allow iteration through all epochs.

Notes

This method uses the get_epoch_iterator() method to retrieve the DataIterator for each epoch. The default implementation of this method resets the state of the data stream so that the new epoch can read the data from the beginning. However, this behavior only works as long as the epochs property is iterated over using e.g. for epoch in stream.epochs. If you create the data iterators in advance (e.g. using for i, epoch in zip(range(10), stream.epochs in legacy Python) you must call the reset() method yourself.

next_epoch()[source]¶: Switch the data stream to the next epoch.

produces_examples

reset()[source]¶: Reset the data stream.

class fuel.streams.DataStream(dataset, **kwargs)[source]¶

Bases: fuel.streams.AbstractDataStream

A stream of data from a dataset.

Parameters:	dataset (instance of `Dataset`) – The dataset from which the data is fetched.

close()[source]¶: Gracefully close the data stream, e.g. releasing file handles.

classmethod default_stream(dataset, **kwargs)[source]¶

get_data(request=None)[source]¶: Get data from the dataset.

get_epoch_iterator(**kwargs)[source]¶: Get an epoch iterator for the data stream.

next_epoch()[source]¶: Switch the data stream to the next epoch.

reset()[source]¶: Reset the data stream.

sources¶

class fuel.streams.ServerDataStream(sources, produces_examples, host='localhost', port=5557, hwm=10, axis_labels=None)[source]¶

Bases: fuel.streams.AbstractDataStream

A data stream that receives batches from a Fuel server.

Parameters:

sources (tuple of strings) – The names of the data sources returned by this data stream.
produces_examples (bool) – Whether this data stream produces examples (as opposed to batches of examples).
host (str, optional) – The host to connect to. Defaults to localhost.
port (int, optional) – The port to connect on. Defaults to 5557.
hwm (int, optional) – The ZeroMQ high-water mark (HWM) on the receiving socket. Increasing this increases the buffer, which can be useful if your data preprocessing times are very random. However, it will increase memory usage. There is no easy way to tell how many batches will actually be queued with a particular HWM. Defaults to 10. Be sure to set the corresponding HWM on the server’s end as well.
axis_labels (dict, optional) – Maps source names to tuples of strings describing axis semantics, one per axis. Defaults to None, i.e. no information is available.

close()[source]¶: Gracefully close the data stream, e.g. releasing file handles.

connect()[source]¶

get_data(request=None)[source]¶

Request data from the dataset or the wrapped stream.

Parameters:	request (object) – A request fetched from the request_iterator.

Notes

It is possible to build a usable stream in terms of underlying streams for the purposes of training by only implementing get_epoch_iterator, thus this method is optional.

get_epoch_iterator(**kwargs)[source]¶

next_epoch()[source]¶: Switch the data stream to the next epoch.

reset()[source]¶: Reset the data stream.