Data streams¶
-
class
fuel.streams.
AbstractDataStream
(iteration_scheme=None, axis_labels=None)[source]¶ Bases:
object
A stream of data separated into epochs.
A data stream is an iterable stream of examples/minibatches. It shares similarities with Python file handles return by the
open
method. Data streams can be closed using theclose()
method and reset usingreset()
(similar tof.seek(0)
).Parameters: - iteration_scheme (
IterationScheme
, optional) – The iteration scheme to use when retrieving data. Note that not all datasets support the same iteration schemes, some datasets require one, and others don’t support any. In case when the data stream wraps another data stream, the choice of supported iteration schemes is typically even more limited. Be sure to read the documentation of the dataset or data stream in question. - axis_labels (dict, optional) – Maps source names to tuples of strings describing axis semantics, one per axis. Defaults to None, i.e. no information is available.
-
iteration_scheme
¶ The iteration scheme used to retrieve data. Can be
None
when not used.Type: IterationScheme
-
sources
¶ The names of the data sources returned by this data stream, as given by the dataset.
Type: tuple of strings
-
produces_examples
¶ Whether this data stream produces examples (as opposed to batches of examples).
Type: bool
-
get_data
(request=None)[source]¶ Request data from the dataset or the wrapped stream.
Parameters: request (object) – A request fetched from the request_iterator. Notes
It is possible to build a usable stream in terms of underlying streams for the purposes of training by only implementing get_epoch_iterator, thus this method is optional.
-
iterate_epochs
(as_dict=False)[source]¶ Allow iteration through all epochs.
Notes
This method uses the
get_epoch_iterator()
method to retrieve theDataIterator
for each epoch. The default implementation of this method resets the state of the data stream so that the new epoch can read the data from the beginning. However, this behavior only works as long as theepochs
property is iterated over using e.g.for epoch in stream.epochs
. If you create the data iterators in advance (e.g. usingfor i, epoch in zip(range(10), stream.epochs
in legacy Python) you must call thereset()
method yourself.
-
produces_examples
- iteration_scheme (
-
class
fuel.streams.
DataStream
(dataset, **kwargs)[source]¶ Bases:
fuel.streams.AbstractDataStream
A stream of data from a dataset.
Parameters: dataset (instance of Dataset
) – The dataset from which the data is fetched.-
sources
¶
-
-
class
fuel.streams.
ServerDataStream
(sources, produces_examples, host='localhost', port=5557, hwm=10, axis_labels=None)[source]¶ Bases:
fuel.streams.AbstractDataStream
A data stream that receives batches from a Fuel server.
Parameters: - sources (tuple of strings) – The names of the data sources returned by this data stream.
- produces_examples (bool) – Whether this data stream produces examples (as opposed to batches of examples).
- host (str, optional) – The host to connect to. Defaults to
localhost
. - port (int, optional) – The port to connect on. Defaults to 5557.
- hwm (int, optional) – The ZeroMQ high-water mark (HWM) on the receiving socket. Increasing this increases the buffer, which can be useful if your data preprocessing times are very random. However, it will increase memory usage. There is no easy way to tell how many batches will actually be queued with a particular HWM. Defaults to 10. Be sure to set the corresponding HWM on the server’s end as well.
- axis_labels (dict, optional) – Maps source names to tuples of strings describing axis semantics, one per axis. Defaults to None, i.e. no information is available.
-
get_data
(request=None)[source]¶ Request data from the dataset or the wrapped stream.
Parameters: request (object) – A request fetched from the request_iterator. Notes
It is possible to build a usable stream in terms of underlying streams for the purposes of training by only implementing get_epoch_iterator, thus this method is optional.