drytorch.lib.loading

Module containing classes nad utilities for batching a dateset.

Functions

`get_n_batches`(dataset_len, batch_size, ...)	Calculate the number of batches in a dataset.
`take_from_dataset`(dataset[, n_samples, ...])	Sample a batch of elements from a dataset and transfers them to a device.
`validate_dataset_length`(dataset)	Checks if a dataset has a valid length.

Classes

`DataLoader`(dataset, batch_size[, ...])	A data-loader class with runtime settings.
`Permutation`(size, seed)	Sliceable pseudo-random permutation.
`Sliced`(seq, slice_)	Slice a sequence keeping the reference to it.

class DataLoader(dataset: Dataset[Data], batch_size: int, pin_memory: bool | None = None, sampler: Sampler | Iterable[Any] | None = None, n_workers: int = 0)[source]

Bases: LoaderProtocol[Data]

A data-loader class with runtime settings.

This class wraps PyTorch’s DataLoader with additional functionalities.

batch_size

number of samples per batch.

Type:: int | None

dataset

the dataset to load data from.

Type:: torch.utils.data.dataset.Dataset[drytorch.lib.loading.Data]

dataset_len

length of the dataset.

Type:: int

sampler

the sampling strategy for the dataset.

Type:: torch.utils.data.sampler.Sampler | collections.abc.Iterable

Initialize.

Parameters:

dataset (Dataset[Data]) – the dataset to load data from.
batch_size (int | None) – number of samples per batch.
pin_memory (bool | None) – pin memory for faster GPU training. Defaults to true when hardware acceleration is available.
sampler (Sampler | Iterable) – defines the strategy to draw samples from the dataset.
n_workers (int) – number of subprocesses for data loading.

get_loader(inference: bool) → DataLoader[Data][source]

Create a DataLoader instance with runtime settings.

Parameters:: inference (bool) – whether to use inference settings. Default checks torch global state.
Returns:: A configured PyTorch DataLoader instance.
Return type:: DataLoader[Data]

split(split: float = 0.2, shuffle: bool = True, seed: int = 42) → tuple[DataLoader[Data], DataLoader[Data]][source]

Split the loader into two.

Parameters:

split (float) – fraction of the dataset to the second output loader.
shuffle (bool) – whether to shuffle the data before splitting.
seed (int) – seed for shuffling.

Returns:

A tuple of (DataLoader, DataLoader).

Raises:

ValueError – if split is not between 0 and 1.

Return type:

tuple[DataLoader[Data], DataLoader[Data]]

take_from_dataset(dataset: Dataset[Data], n_samples: int = 1, preserve_order: bool = True, device: device = device(type='cpu')) → Data[source]

Sample a batch of elements from a dataset and transfers them to a device.

Parameters:

dataset (Dataset[Data]) – the dataset where to sample from.
n_samples (int) – the number of samples to take.
preserve_order (bool) – take samples in order or randomly otherwise.
device (device) – device where to store the sample.

Returns:

The desired number of samples in a batch.

Return type:

Data