drytorch.lib.models

Module containing classes for wrapping a torch module and its optimizer.

Functions

count_params(params)

Count the number of parameters.

Classes

`Model`(module[, name, device, checkpoint, ...])	Wrapper for a torch.nn.Module class with extra information.
`ModelAverage`(torch_module, ...)	Bundle a torch.nn.Module and a torch.optim.swa_utils.AveragedModel.
`ModelOptimizer`(model, learning_schema)	Bundle the module and its optimizer.

class Model(module: ModuleProtocol[Input, Output], name: str = '', device: device | None = None, checkpoint: CheckpointProtocol | None = None, mixed_precision: bool = False, should_compile: bool = True, should_distribute: bool = True)[source]

Bases: CreatedAtMixin, ModelProtocol[Input, Output]

Wrapper for a torch.nn.Module class with extra information.

module

Pytorch module to optimize.

Type:: torch.nn.modules.module.Module

epoch

the number of epochs the model has been trained so far.

Type:: int

mixed_precision

whether to use mixed precision computing.

Type:: bool

checkpoint

checkpoint manager.

Type:: drytorch.core.protocols.CheckpointProtocol

Initialize.

Option should_distribute assumes that there is a single accelerator for each process and that the device for the process is already set.

Parameters:

module (Module) – Pytorch module with type annotations.
name (str) – the name of the model. Default uses the class name.
device (torch.device | None) – the device where to store the weights of the module. Default uses the accelerator if available, cpu otherwise.
checkpoint (CheckpointProtocol) – class that saves the state and optionally the optimizer.
mixed_precision (bool) – whether to use mixed precision computing.
should_compile (bool) – compile the module at instantiation (Python < 3.14).
should_distribute (bool) – wrap the module for data-distributed settings.

__call__(inputs: Input) → Output[source]

Execute forward pass.

Parameters:: inputs (Input)
Return type:: Output

__del__()[source]: Unregister from the registry when deleted/garbage-collected.

property device: device: The device where the weights are stored.

property name: str: The name of the model.

prepare_module(module: Module) → Module[source]

Compile and distribute the module.

Parameters:: module (Module)
Return type:: Module

increment_epoch() → None[source]

Increment the epoch by 1.

Return type:: None

load_state(epoch=-1) → None[source]

Load the weights and epoch of the model.

Return type:: None

register() → None[source]

Register to the registry.

Return type:: None

save_state() → None[source]

Save the weights and epoch of the model.

Return type:: None

unregister() → None[source]

Unregister from the registry.

Return type:: None

update_parameters() → None[source]

Update the parameters of the model.

Return type:: None

class ModelAverage(torch_module: ~drytorch.core.protocols.ModuleProtocol[~drytorch.lib.models.Input, ~drytorch.lib.models.Output], /, name: str = '', device: ~torch.device | None = None, checkpoint: ~drytorch.core.protocols.CheckpointProtocol = <drytorch.lib.checkpoints.LocalCheckpoint object>, mixed_precision: bool = False, avg_fn: ~collections.abc.Callable[[~torch.Tensor, ~torch.Tensor, ~torch.Tensor | int], ~torch.Tensor] | None = None, multi_avg_fn: ~collections.abc.Callable[[tuple[~torch.Tensor, ...] | list[~torch.Tensor], tuple[~torch.Tensor, ...] | list[~torch.Tensor], ~torch.Tensor | int], None] | None = None, use_buffers: bool = False)[source]

Bases: Model[Input, Output]

Bundle a torch.nn.Module and a torch.optim.swa_utils.AveragedModel.

Use the averaged model when in inference mode.

averaged_module

the averaged module.

Type:: torch.optim.swa_utils.AveragedModel

Initialize.

Parameters:

torch_module (p.ModuleProtocol[Input, Output]) – Pytorch module with type annotations.
name (str) – the name of the model. Default uses the class name.
device (torch.device | None) – the device where to store the weights of the module. Default uses cuda when available, cpu otherwise.
checkpoint (CheckpointProtocol) – class that saves the state and optionally the optimizer.
mixed_precision (bool) – whether to use mixed precision computing. Defaults to False.
avg_fn (Callable[[Tensor, Tensor, Tensor | int], Tensor] | None) – see docs at torch.optim.swa_utils.AveragedModel.
multi_avg_fn (Callable[[ParamList, ParamList, Tensor | int], None] | None) – see docs at torch.optim.swa_utils.AveragedModel.
use_buffers (bool) – see docs at torch.optim.swa_utils.AveragedModel.

__call__(inputs: Input) → Output[source]

Execute the forward pass.

Parameters:: inputs (Input)
Return type:: Output

update_parameters() → None[source]

Update the parameters of the model.

Return type:: None

class ModelOptimizer(model: ModelProtocol[Input, Output], learning_schema: LearningProtocol)[source]

Bases: object

Bundle the module and its optimizer.

It supports different learning rates to separate parameters’ groups.

Initialize.

Parameters:

model (p.ModelProtocol[Input, Output]) – the model to be optimized.
learning_schema (p.LearningProtocol) – the learning scheme for the optimizer.

property base_lr: float | dict[str, float]

Learning rate(s) for the module parameters.

Raises:: MissingParamError – if parameters are missing from the dictionary.

get_opt_params() → list[_OptParams][source]

Actual learning rates for each parameter updated according.

Return type:: list[_OptParams]

get_scheduled_lr(lr: float) → float[source]

Update the base learning rate according to the scheduler.

Parameters:: lr (float) – base learning rate.
Return type:: float

load(epoch: int = -1) → None[source]

Load model and optimizer state from a checkpoint.

Parameters:: epoch (int)
Return type:: None

update_learning_rate(base_lr: float | dict[str, float] | None = None, scheduler: SchedulerProtocol | None = None) → None[source]

Recalculate the learning rates for the current epoch.

It updates the learning rates for each parameter’s group in the optimizer based on input learning rate(s) and scheduler.

Parameters:

base_lr (float | dict[str, float] | None) – initial learning rates for named parameters or global value. Default keeps the original learning rates.
scheduler (SchedulerProtocol | None) – scheduler for the learning rates. Default keeps the original scheduler.

Return type:

None

optimize(loss_value: Tensor)[source]

Optimize the model backpropagating the loss value.

Parameters:: loss_value (Tensor) – the output tensor for the loss.

save() → None[source]

Save model and optimizer state in a checkpoint.

Return type:: None