drytorch.lib.loading

Module containing classes nad utilities for batching a dateset.

Functions

get_n_batches(dataset_len, batch_size, ...)

Calculate the number of batches in a dataset.

take_from_dataset(dataset[, n_samples, ...])

Sample a batch of elements from a dataset and transfers them to a device.

validate_dataset_length(dataset)

Checks if a dataset has a valid length.

Classes

DataLoader(dataset, batch_size[, ...])

A data-loader class with runtime settings.

Permutation(size, seed)

Sliceable pseudo-random permutation.

Sliced(seq, slice_)

Slice a sequence keeping the reference to it.

class DataLoader(dataset: Dataset[Data], batch_size: int, pin_memory: bool | None = None, sampler: Sampler | Iterable[Any] | None = None, n_workers: int = 0)[source]

Bases: LoaderProtocol[Data]

A data-loader class with runtime settings.

This class wraps PyTorch’s DataLoader with additional functionalities.

batch_size

number of samples per batch.

Type:

int | None

dataset

the dataset to load data from.

Type:

torch.utils.data.dataset.Dataset[drytorch.lib.loading.Data]

dataset_len

length of the dataset.

Type:

int

sampler

the sampling strategy for the dataset.

Type:

torch.utils.data.sampler.Sampler | collections.abc.Iterable

Initialize.

Parameters:
  • dataset (Dataset[Data]) – the dataset to load data from.

  • batch_size (int | None) – number of samples per batch.

  • pin_memory (bool | None) – pin memory for faster GPU training. Defaults to true when hardware acceleration is available.

  • sampler (Sampler | Iterable) – defines the strategy to draw samples from the dataset.

  • n_workers (int) – number of subprocesses for data loading.

get_loader(inference: bool) DataLoader[Data][source]

Create a DataLoader instance with runtime settings.

Parameters:

inference (bool) – whether to use inference settings. Default checks torch global state.

Returns:

A configured PyTorch DataLoader instance.

Return type:

DataLoader[Data]

split(split: float = 0.2, shuffle: bool = True, seed: int = 42) tuple[DataLoader[Data], DataLoader[Data]][source]

Split the loader into two.

Parameters:
  • split (float) – fraction of the dataset to the second output loader.

  • shuffle (bool) – whether to shuffle the data before splitting.

  • seed (int) – seed for shuffling.

Returns:

A tuple of (DataLoader, DataLoader).

Raises:

ValueError – if split is not between 0 and 1.

Return type:

tuple[DataLoader[Data], DataLoader[Data]]

take_from_dataset(dataset: Dataset[Data], n_samples: int = 1, preserve_order: bool = True, device: device = device(type='cpu')) Data[source]

Sample a batch of elements from a dataset and transfers them to a device.

Parameters:
  • dataset (Dataset[Data]) – the dataset where to sample from.

  • n_samples (int) – the number of samples to take.

  • preserve_order (bool) – take samples in order or randomly otherwise.

  • device (device) – device where to store the sample.

Returns:

The desired number of samples in a batch.

Return type:

Data