Caching datasets locally

In some use cases, it may be desirable to set Fuel’s data_path to point to a shared network drive. For example, when configuring multiple machines in a cluster to work on the same data in parallel. However, this can easily cause network bandwidth to become saturated.

To avoid this problem, Fuel provides a second configuration variable named local_data_path, which can be set in ~/.fuelrc. This variable points to a filesystem directory to be used to act as a local cache for datasets.

This variable can also be set through an environment variable as follows:

$ export FUEL_LOCAL_DATA_PATH="/LOCAL_PATH/my_local_cache"

Please note that currently, caching is only implemented in the H5PyDataset. In order to add caching to other types of datasets, one should use the fuel.utils.cache.cache_file() function.