Converters

Base classes

fuel.converters.base.check_exists(required_files)[source]

Decorator that checks if required files exist before running.

Parameters:required_files (list of str) – A list of strings indicating the filenames of regular files (not directories) that should be found in the input directory (which is the first argument to the wrapped function).
Returns:wrapper – A function that takes a function and returns a wrapped function. The function returned by wrapper will include input file existence verification.
Return type:function

Notes

Assumes that the directory in which to find the input files is provided as the first argument, with the argument name directory.

fuel.converters.base.fill_hdf5_file(h5file, data)[source]

Fills an HDF5 file in a H5PYDataset-compatible manner.

Parameters:
  • h5file (h5py.File) – File handle for an HDF5 file.
  • data (tuple of tuple) –

    One element per split/source pair. Each element consists of a tuple of (split_name, source_name, data_array, comment), where

    • ’split_name’ is a string identifier for the split name
    • ’source_name’ is a string identifier for the source name
    • ’data_array’ is a numpy.ndarray containing the data for this split/source pair
    • ’comment’ is a comment string for the split/source pair

    The ‘comment’ element can optionally be omitted.

fuel.converters.base.progress_bar(*args, **kwds)[source]

Manages a progress bar for a conversion.

Parameters:
  • name (str) – Name of the file being converted.
  • maxval (int) – Total number of steps for the conversion.

Adult

fuel.converters.adult.convert_adult(directory, output_directory, output_filename='adult.hdf5')[source]

Convert the Adult dataset to HDF5.

Converts the Adult dataset to an HDF5 dataset compatible with fuel.datasets.Adult. The converted dataset is saved as ‘adult.hdf5’. This method assumes the existence of the file adult.data and adult.test.

Parameters:
  • directory (str) – Directory in which input files reside.
  • output_directory (str) – Directory in which to save the converted dataset.
  • output_filename (str, optional) – Name of the saved dataset. Defaults to adult.hdf5.
Returns:

output_paths – Single-element tuple containing the path to the converted dataset.

Return type:

tuple of str

fuel.converters.adult.convert_to_one_hot(y)[source]

converts y into one hot reprsentation.

Parameters:y (list) – A list containing continous integer values.
Returns:one_hot – A numpy.ndarray object, which is one-hot representation of y.
Return type:numpy.ndarray
fuel.converters.adult.fill_subparser(subparser)[source]

CalTech 101 Silhouettes

fuel.converters.caltech101_silhouettes.convert_silhouettes(size, directory, output_directory, output_filename=None)[source]

Convert the CalTech 101 Silhouettes Datasets.

Parameters:
  • size ({16, 28}) – Convert either the 16x16 or 28x28 sized version of the dataset.
  • directory (str) – Directory in which the required input files reside.
  • output_filename (str) – Where to save the converted dataset.
fuel.converters.caltech101_silhouettes.fill_subparser(subparser)[source]

Sets up a subparser to convert CalTech101 Silhouettes Database files.

Parameters:subparser (argparse.ArgumentParser) – Subparser handling the caltech101_silhouettes command.

Binarized MNIST

fuel.converters.binarized_mnist.convert_binarized_mnist(directory, *args, **kwargs)[source]

Converts the binarized MNIST dataset to HDF5.

Converts the binarized MNIST dataset used in R. Salakhutdinov’s DBN paper [DBN] to an HDF5 dataset compatible with fuel.datasets.BinarizedMNIST. The converted dataset is saved as ‘binarized_mnist.hdf5’.

This method assumes the existence of the files binarized_mnist_{train,valid,test}.amat, which are accessible through Hugo Larochelle’s website [HUGO].

[DBN]Ruslan Salakhutdinov and Iain Murray, On the Quantitative Analysis of Deep Belief Networks, Proceedings of the 25th international conference on Machine learning, 2008, pp. 872-879.
Parameters:
  • directory (str) – Directory in which input files reside.
  • output_directory (str) – Directory in which to save the converted dataset.
  • output_filename (str, optional) – Name of the saved dataset. Defaults to ‘binarized_mnist.hdf5’.
Returns:

output_paths – Single-element tuple containing the path to the converted dataset.

Return type:

tuple of str

fuel.converters.binarized_mnist.fill_subparser(subparser)[source]

Sets up a subparser to convert the binarized MNIST dataset files.

Parameters:subparser (argparse.ArgumentParser) – Subparser handling the binarized_mnist command.

CIFAR100

fuel.converters.cifar100.convert_cifar100(directory, *args, **kwargs)[source]

Converts the CIFAR-100 dataset to HDF5.

Converts the CIFAR-100 dataset to an HDF5 dataset compatible with fuel.datasets.CIFAR100. The converted dataset is saved as ‘cifar100.hdf5’.

This method assumes the existence of the following file: cifar-100-python.tar.gz

Parameters:
  • directory (str) – Directory in which the required input files reside.
  • output_directory (str) – Directory in which to save the converted dataset.
  • output_filename (str, optional) – Name of the saved dataset. Defaults to ‘cifar100.hdf5’.
Returns:

output_paths – Single-element tuple containing the path to the converted dataset.

Return type:

tuple of str

fuel.converters.cifar100.fill_subparser(subparser)[source]

Sets up a subparser to convert the CIFAR100 dataset files.

Parameters:subparser (argparse.ArgumentParser) – Subparser handling the cifar100 command.

CIFAR10

fuel.converters.cifar10.convert_cifar10(directory, *args, **kwargs)[source]

Converts the CIFAR-10 dataset to HDF5.

Converts the CIFAR-10 dataset to an HDF5 dataset compatible with fuel.datasets.CIFAR10. The converted dataset is saved as ‘cifar10.hdf5’.

It assumes the existence of the following file:

  • cifar-10-python.tar.gz
Parameters:
  • directory (str) – Directory in which input files reside.
  • output_directory (str) – Directory in which to save the converted dataset.
  • output_filename (str, optional) – Name of the saved dataset. Defaults to ‘cifar10.hdf5’.
Returns:

output_paths – Single-element tuple containing the path to the converted dataset.

Return type:

tuple of str

fuel.converters.cifar10.fill_subparser(subparser)[source]

Sets up a subparser to convert the CIFAR10 dataset files.

Parameters:subparser (argparse.ArgumentParser) – Subparser handling the cifar10 command.

IRIS

fuel.converters.iris.convert_iris(directory, output_directory, output_filename='iris.hdf5')[source]

Convert the Iris dataset to HDF5.

Converts the Iris dataset to an HDF5 dataset compatible with fuel.datasets.Iris. The converted dataset is saved as ‘iris.hdf5’. This method assumes the existence of the file iris.data.

Parameters:
  • directory (str) – Directory in which input files reside.
  • output_directory (str) – Directory in which to save the converted dataset.
  • output_filename (str, optional) – Name of the saved dataset. Defaults to None, in which case a name based on dtype will be used.
Returns:

output_paths – Single-element tuple containing the path to the converted dataset.

Return type:

tuple of str

fuel.converters.iris.fill_subparser(subparser)[source]

Sets up a subparser to convert the Iris dataset file.

Parameters:subparser (argparse.ArgumentParser) – Subparser handling the iris command.

MNIST

fuel.converters.mnist.convert_mnist(directory, *args, **kwargs)[source]

Converts the MNIST dataset to HDF5.

Converts the MNIST dataset to an HDF5 dataset compatible with fuel.datasets.MNIST. The converted dataset is saved as ‘mnist.hdf5’.

This method assumes the existence of the following files: train-images-idx3-ubyte.gz, train-labels-idx1-ubyte.gz t10k-images-idx3-ubyte.gz, t10k-labels-idx1-ubyte.gz

It assumes the existence of the following files:

  • train-images-idx3-ubyte.gz
  • train-labels-idx1-ubyte.gz
  • t10k-images-idx3-ubyte.gz
  • t10k-labels-idx1-ubyte.gz
Parameters:
  • directory (str) – Directory in which input files reside.
  • output_directory (str) – Directory in which to save the converted dataset.
  • output_filename (str, optional) – Name of the saved dataset. Defaults to None, in which case a name based on dtype will be used.
  • dtype (str, optional) – Either ‘float32’, ‘float64’, or ‘bool’. Defaults to None, in which case images will be returned in their original unsigned byte format.
Returns:

output_paths – Single-element tuple containing the path to the converted dataset.

Return type:

tuple of str

fuel.converters.mnist.fill_subparser(subparser)[source]

Sets up a subparser to convert the MNIST dataset files.

Parameters:subparser (argparse.ArgumentParser) – Subparser handling the mnist command.
fuel.converters.mnist.read_mnist_images(filename, dtype=None)[source]

Read MNIST images from the original ubyte file format.

Parameters:
  • filename (str) – Filename/path from which to read images.
  • dtype ('float32', 'float64', or 'bool') – If unspecified, images will be returned in their original unsigned byte format.
Returns:

images – An image array, with individual examples indexed along the first axis and the image dimensions along the second and third axis.

Return type:

ndarray, shape (n_images, 1, n_rows, n_cols)

Notes

If the dtype provided was Boolean, the resulting array will be Boolean with True if the corresponding pixel had a value greater than or equal to 128, False otherwise.

If the dtype provided was a float dtype, the values will be mapped to the unit interval [0, 1], with pixel values that were 255 in the original unsigned byte representation equal to 1.0.

fuel.converters.mnist.read_mnist_labels(filename)[source]

Read MNIST labels from the original ubyte file format.

Parameters:filename (str) – Filename/path from which to read labels.
Returns:labels – A one-dimensional unsigned byte array containing the labels as integers.
Return type:ndarray, shape (nlabels, 1)

SVHN

fuel.converters.svhn.convert_svhn(which_format, directory, output_directory, output_filename=None)[source]

Converts the SVHN dataset to HDF5.

Converts the SVHN dataset [SVHN] to an HDF5 dataset compatible with fuel.datasets.SVHN. The converted dataset is saved as ‘svhn_format_1.hdf5’ or ‘svhn_format_2.hdf5’, depending on the which_format argument.

[SVHN]Yuval Netzer, Tao Wang, Adam Coates, Alessandro Bissacco, Bo Wu, Andrew Y. Ng. Reading Digits in Natural Images with Unsupervised Feature Learning, NIPS Workshop on Deep Learning and Unsupervised Feature Learning, 2011.
Parameters:
  • which_format (int) – Either 1 or 2. Determines which format (format 1: full numbers or format 2: cropped digits) to convert.
  • directory (str) – Directory in which input files reside.
  • output_directory (str) – Directory in which to save the converted dataset.
  • output_filename (str, optional) – Name of the saved dataset. Defaults to ‘svhn_format_1.hdf5’ or ‘svhn_format_2.hdf5’, depending on which_format.
Returns:

output_paths – Single-element tuple containing the path to the converted dataset.

Return type:

tuple of str

fuel.converters.svhn.convert_svhn_format_1(directory, *args, **kwargs)[source]

Converts the SVHN dataset (format 1) to HDF5.

This method assumes the existence of the files {train,test,extra}.tar.gz, which are accessible through the official website [SVHNSITE].

[SVHNSITE]http://ufldl.stanford.edu/housenumbers/
Parameters:
  • directory (str) – Directory in which input files reside.
  • output_directory (str) – Directory in which to save the converted dataset.
  • output_filename (str, optional) – Name of the saved dataset. Defaults to ‘svhn_format_1.hdf5’.
Returns:

output_paths – Single-element tuple containing the path to the converted dataset.

Return type:

tuple of str

fuel.converters.svhn.convert_svhn_format_2(directory, *args, **kwargs)[source]

Converts the SVHN dataset (format 2) to HDF5.

This method assumes the existence of the files {train,test,extra}_32x32.mat, which are accessible through the official website [SVHNSITE].

Parameters:
  • directory (str) – Directory in which input files reside.
  • output_directory (str) – Directory in which to save the converted dataset.
  • output_filename (str, optional) – Name of the saved dataset. Defaults to ‘svhn_format_2.hdf5’.
Returns:

output_paths – Single-element tuple containing the path to the converted dataset.

Return type:

tuple of str

fuel.converters.svhn.fill_subparser(subparser)[source]

Sets up a subparser to convert the SVHN dataset files.

Parameters:subparser (argparse.ArgumentParser) – Subparser handling the svhn command.