---
jupytext:
  formats: ipynb,md:myst
  text_representation:
    extension: .md
    format_name: myst
    format_version: 0.13
    jupytext_version: 1.19.1
kernelspec:
  display_name: .venv
  language: python
  name: python3
---

# Experiments and Runs

[![Open In Colab](https://colab.research.google.com/assets/colab-badge.svg)](https://colab.research.google.com/github/nverchev/drytorch/blob/main/docs/tutorials/experiments_and_runs.ipynb)

## Defining an Experiment

In the DRYTorch framework, an experiment is a fully reproducible execution of code defined entirely by a configuration file. For example, this design implies that:

- a result obtained by modifying the configuration file (e.g., changing the optimizer) constitutes a new experiment instance.

- a parameter sweep (or grid search), when fully described within the configuration file, is considered a single experiment.

To define an experiment, subclass the DRYTorch's `Experiment` class specifying the required specification.
The `Experiment` class needs a name unique for each instance and accepts tags and a directory for logging as optional arguments, which is communicated to trackers.

```{code-cell} ipython3
:tags: [skip-execution]

! uv pip install drytorch
```

```{code-cell} ipython3
import dataclasses

from drytorch import Experiment as GenericExperiment


@dataclasses.dataclass(frozen=True)
class SimpleConfig:
    """A simple configuration."""

    batch_size: int


class MyExperiment(GenericExperiment[SimpleConfig]):
    """Class for Simple Experiments."""


my_config = SimpleConfig(32)
first_experiment = MyExperiment(
    my_config,
    name='FirstExp',
    par_dir='experiments/',
    tags=[],
)
```

## Starting a Run
In the DRYTorch framework, a run is a single execution instance of an experiment's code.
Multiple runs of the same experiment are used to replicate and validate results, often using different seeds for the pseudo number generator.
There can only be an active run at once.

You initiate a run instance using the `Experiment.create_run` method. This instance serves as a context manager for the experiment's execution code.

The run's ID is a timestamp by default, but you can specify a unique, descriptive name.
You can resume a run by specifying its name in `create_run`. If a name is not provided, DRYTorch attempts to resume the last recorded run.

Note: DRYTorch maintains a run registry on the local disk to track and manage all run IDs and states. It also attempts to record the last commit hash when git is available.

```{code-cell} ipython3
def implement_experiment() -> None:
    """Here should be the code for the experiment."""


with first_experiment.create_run() as run:
    first_id = run.id
    implement_experiment()


with first_experiment.create_run(resume=True) as run:
    second_id = run.id
    implement_experiment()

if first_id != second_id:
    raise AssertionError('The resumed run should keep the id.')
```

### Notebooks

For convenience, especially in interactive environments like notebooks, you
can directly start and stop a run, avoiding the context manager.

Alternatively, you can use the `Run.start` and `Run.stop` methods directly.
To do this, use the `Run.start` and method and ensure you explicitly call `Run.stop` when finished.

It is recommended to stop the run explicitly, otherwise DRYTorch will attempt to
clean up the metadata and exit gracefully when the Python session terminates.

```{code-cell} ipython3
run = first_experiment.create_run()
run.start()
run.stop()
```

## Global configuration

It is possible to access the configuration file directly from the `Experiment` class when a run is on. If no experiment is running, the operation will throw an exception.

```{code-cell} ipython3
from drytorch.core import exceptions


def get_batch() -> int:
    """Retrieve the batch size setting."""
    return MyExperiment.get_config().batch_size


with first_experiment.create_run():
    get_batch()

try:
    get_batch()
except (exceptions.AccessOutsideScopeError, exceptions.NoActiveExperimentError):
    pass
else:
    raise AssertionError('Configuration accessed when no run is on.')
```

## Registration

### Register model

DRYTorch discourages information leakage between runs to ensure reproducibility.

The framework explicitly prevents the construction of a `Model` instance based on a module registered in a previous run.
This isolation ensures that each run starts from a clean state defined solely by its configuration.

The registration to the current run happens during the `Model` instantiation. If no experiment is running, the `Model` class will not be instantiated.

To use the same module, you must first `unregister` it.

```{code-cell} ipython3
from torch import nn

from drytorch import Model
from drytorch.core import exceptions, registering


second_experiment = MyExperiment(
    my_config,
    name='SecondExp',
    par_dir='experiments/',
    tags=[],
)
module = nn.Linear(1, 1)

with first_experiment.create_run():
    first_model = Model(module)

try:
    second_model = Model(module)
except exceptions.NoActiveExperimentError:
    pass
else:
    raise AssertionError('Model instantiated when no experiment is running.')


with second_experiment.create_run():
    try:
        second_model = Model(module)
    except exceptions.ModuleAlreadyRegisteredError:
        pass
    else:
        raise AssertionError('Module registered through two Model instances.')
```

```{code-cell} ipython3
with second_experiment.create_run():
    registering.unregister_model(first_model)
    second_model = Model(first_model.module)
```

## Register actor

An **actor** is an object, like a trainer or a test class, that acts upon a model or produces logging from it.

Registration checks that the model and the actor belong to the same experiment. Actors from the library implementation register themselves when called.

```{code-cell} ipython3
import torch

from torch.utils.data import Dataset
from typing_extensions import override

from drytorch.lib.loading import DataLoader
from drytorch.lib.runners import ModelRunner


class MyDataset(Dataset[tuple[torch.Tensor, torch.Tensor]]):
    """Example dataset containing tensor with value one."""

    def __init__(self) -> None:
        """Initialize some dummy attributes."""
        super().__init__()
        self.empty_container = []
        self.none = None

    def __len__(self) -> int:
        """Size of the dataset."""
        return 1

    @override
    def __getitem__(self, index) -> tuple[torch.Tensor, torch.Tensor]:
        return torch.ones(1), torch.ones(1)


one_dataset: Dataset[tuple[torch.Tensor, torch.Tensor]] = MyDataset()

with second_experiment.create_run(resume=True):  # correctly resuming run
    loader = DataLoader(one_dataset, batch_size=1)
    model_caller = ModelRunner(second_model, loader=loader)
    model_caller()
```

```{code-cell} ipython3
with second_experiment.create_run():  # new run
    loader = DataLoader(one_dataset, batch_size=1)
    model_caller = ModelRunner(second_model, loader=loader)
    try:
        model_caller()
    except exceptions.ModuleNotRegisteredError:
        pass
    else:
        raise AssertionError('Model not registered in the current run')
```

## Metadata Extraction

DRYTorch automatically documents the models and actors during registration by extracting a readable representation at runtime.
The metadata is then handled by the tracker. By default, when PyAML 6.0 or later is installed, metadata is dumped in YAML format.

To better visualize it, we create an adhoc tracker for this tutorial.

```{code-cell} ipython3
import functools
import pprint

from drytorch.core import log_events
from drytorch.core.tracking import Tracker


class MetadataVisualizer(Tracker):
    """Tracker that prints the metadata on the console."""

    @functools.singledispatchmethod
    @override
    def notify(self, event: log_events.Event) -> None:
        return super().notify(event)

    @notify.register
    def _(self, event: log_events.ModelRegistrationEvent) -> None:
        pprint.pp(event.architecture_repr)
        return super().notify(event)

    @notify.register
    def _(self, event: log_events.ActorRegistrationEvent) -> None:
        pprint.pp(event.metadata)
        return super().notify(event)


third_experiment = MyExperiment(
    my_config,
    name='ThirdExp',
    par_dir='experiments/',
    tags=[],
)

third_experiment.trackers.subscribe(MetadataVisualizer())
```

### Model metadata
The readable representation of a Model is simply the native
representation of the wrapped `nn.Module`.

```{code-cell} ipython3
with third_experiment.create_run():  # correctly resuming run
    third_model = Model(nn.Linear(1, 1))
```

### Actor Metadata
The readable representation of an actor not only documents the actor object but all the public attributes recursively.

Private attributes, that is, attributes that start with an underscore, are ignored, and so are attributes that have not been initialized (evaluating to None, or an empty container).

```{code-cell} ipython3
with third_experiment.create_run(resume=True):  # correctly resuming run
    loader = DataLoader(one_dataset, batch_size=1)
    model_caller = ModelRunner(third_model, loader=loader)
    model_caller()
```