drytorch.lib.gradient_ops
Module containing gradient operations.
Functions
|
Standard clipping to the threshold value. |
|
Clip to the mean value (effectively setting gradient to running mean). |
|
Reciprocal clipping as recommended in https://arxiv.org/pdf/2504.02507. |
Classes
|
Abstract base class for gradient operations. |
Criteria that detects when to clip snd determines the clipping value. |
|
|
Clipping criterion based on Exponential Moving Average. |
|
Gradient norm clipping strategy. |
|
Strategy that normalizes each parameter's gradient to unit norm. |
|
Gradient value clipping strategy. |
|
Gradient normalizing strategy using Z-score normalization. |
|
Global gradient clipping strategy that uses previous gradient statistics. |
|
Placeholder performing no gradient action. |
|
Gradient clipping strategy that keeps per-parameter statistics. |
|
Initialize. |
|
Clipping criterion based on the Z-statistic. |
- class ClippingCriterion[source]
Bases:
ABCCriteria that detects when to clip snd determines the clipping value.
- abstractmethod should_clip(value: float) bool[source]
Determine whether to clip gradients based on the current value.
- abstractmethod get_clip_value(value: float) float[source]
Calculate the clipping threshold based on current statistics.
- update(value: float) None[source]
Update internal statistics with a new observed value.
- Parameters:
value (float) – new gradient norm or value to incorporate.
- Return type:
None
- class EMACriterion(alpha: float = 0.98, r_thresh: float = 1.05, clipping_function: ~collections.abc.Callable[[float, float], float] = <function max_clipping>)[source]
Bases:
ClippingCriterionClipping criterion based on Exponential Moving Average.
It uses only the running mean of gradient norms to detect outliers. It clips when the current norm exceeds the mean by a factor of r_thresh.
- clipping_function
function to determine clipping behavior.
- Type:
Initialize.
- Parameters:
- should_clip(value: float) bool[source]
Determine whether to clip gradients based on the current value.
- get_clip_value(value: float) float[source]
Calculate the clipping threshold based on current statistics.
- update(value: float) None[source]
Update internal statistics with a new observed value.
- Parameters:
value (float) – new gradient norm or value to incorporate.
- Return type:
None
- class GradNormClipper(threshold: float = 1)[source]
Bases:
ClipOperationGradient norm clipping strategy.
Initialize.
- Parameters:
threshold (float) – Maximum norm value of the clipped gradients.
- class GradParamNormalizer(*args, **kwargs)[source]
Bases:
GradientOpProtocolStrategy that normalizes each parameter’s gradient to unit norm.
- class GradValueClipper(threshold: float = 1)[source]
Bases:
ClipOperationGradient value clipping strategy.
Initialize.
- Parameters:
threshold (float) – Maximum absolute value of the clipped gradients.
- class GradZScoreNormalizer(*args, **kwargs)[source]
Bases:
GradientOpProtocolGradient normalizing strategy using Z-score normalization.
- class HistClipper(criterion: ClippingCriterion = <drytorch.lib.gradient_ops.ZStatCriterion object>, warmup_clip_strategy: GradientOpProtocol = <drytorch.lib.gradient_ops.GradNormClipper object>, n_warmup_steps: int = 20)[source]
Bases:
ClipOperationGlobal gradient clipping strategy that uses previous gradient statistics.
The gradients’ norm is renormalized according to a clipping criterion.
- criterion
the clipping criterion to determine when and how to clip.
- warmup_clip_strategy
the clipping strategy used during warmup.
Initialize.
- Parameters:
criterion (ClippingCriterion) – the clipping criterion to determine when and how to clip.
warmup_clip_strategy (GradientOpProtocol) – the clipping strategy used during warmup.
n_warmup_steps (int) – the number of warmup steps to collect initial stats.
- class NoOp[source]
Bases:
GradientOpProtocolPlaceholder performing no gradient action.
- class ParamHistClipper(criterion: ClippingCriterion = <drytorch.lib.gradient_ops.ZStatCriterion object>, warmup_clip_strategy: GradientOpProtocol = <drytorch.lib.gradient_ops.GradNormClipper object>, n_warmup_steps: int = 20)[source]
Bases:
ClipOperationGradient clipping strategy that keeps per-parameter statistics.
The gradients’ norm is renormalized according to a clipping criterion.
- criterion
the clipping criterion to determine when and how to clip.
- warmup_clip_strategy
the clipping strategy used during warmup.
Initialize.
- Parameters:
criterion (ClippingCriterion) – the clipping criterion to determine when and how to clip.
warmup_clip_strategy (GradientOpProtocol) – the clipping strategy used during warmup.
n_warmup_steps (int) – the number of warmup steps to collect initial stats.
- class StatsCollector(max_samples: int)[source]
Bases:
objectInitialize.
Initialize warmup handler.
- Parameters:
max_samples (int) – the number of collected samples for completion.
- class ZStatCriterion(alpha: float = 0.97, z_thresh: float = 2.5, clipping_function: ~collections.abc.Callable[[float, float], float] = <function reciprocal_clipping>)[source]
Bases:
ClippingCriterionClipping criterion based on the Z-statistic.
Tracks both mean and variance using exponential moving averages. The clipping threshold is on the Z-score (standardized deviation). See also https://arxiv.org/pdf/2504.02507.
- clipping_function
function to determine clipping behavior.
- Type:
Initialize.
- Parameters:
- get_clip_value(value: float) float[source]
Calculate the clipping threshold based on current statistics.
- update(value: float) None[source]
Update internal statistics with a new observed value.
- Parameters:
value (float) – new gradient norm or value to incorporate.
- Return type:
None
- mean_clipping(zt: float, z_thresh: float) float[source]
Clip to the mean value (effectively setting gradient to running mean).