zamba.models.slowfast_models¶
Classes¶
SlowFast (ZambaVideoClassificationLightningModule)
¶
Pretrained SlowFast model for fine-tuning with the following architecture:
Input -> SlowFast Base (including trainable Backbone) -> Res Basic Head -> Output
Attributes:
Name | Type | Description |
---|---|---|
backbone |
torch.nn.Module |
When scheduling the backbone to train with the
|
base |
torch.nn.Module |
The entire model prior to the head. |
head |
torch.nn.Module |
The trainable head. |
_backbone_output_dim |
int |
Dimensionality of the backbone output (and head input). |
Attributes¶
CHECKPOINT_HYPER_PARAMS_KEY
inherited
¶
CHECKPOINT_HYPER_PARAMS_NAME
inherited
¶
CHECKPOINT_HYPER_PARAMS_TYPE
inherited
¶
T_destination
inherited
¶
automatic_optimization: bool
inherited
property
writable
¶
If set to False
you are responsible for calling .backward()
, .step()
, .zero_grad()
.
current_epoch: int
inherited
property
readonly
¶
The current epoch in the Trainer. If no Trainer is attached, this propery is 0.
datamodule: Any
inherited
property
writable
¶
device: Union[str, torch.device]
inherited
property
readonly
¶
dtype: Union[str, torch.dtype]
inherited
property
writable
¶
dump_patches: bool
inherited
¶
This allows better BC support for :meth:load_state_dict
. In
:meth:state_dict
, the version number will be saved as in the attribute
_metadata
of the returned state dict, and thus pickled. _metadata
is a
dictionary with keys that follow the naming convention of state dict. See
_load_from_state_dict
on how to use this information in loading.
If new parameters/buffers are added/removed from a module, this number shall
be bumped, and the module's _load_from_state_dict
method can compare the
version number and do appropriate changes if the state dict is from before
the change.
example_input_array: Any
inherited
property
writable
¶
The example input array is a specification of what the module can consume in the :meth:forward
method.
The return type is interpreted as follows:
- Single tensor: It is assumed the model takes a single argument, i.e.,
model.forward(model.example_input_array)
- Tuple: The input array should be interpreted as a sequence of positional arguments, i.e.,
model.forward(*model.example_input_array)
- Dict: The input array represents named keyword arguments, i.e.,
model.forward(**model.example_input_array)
global_rank: int
inherited
property
readonly
¶
The index of the current process across all nodes and devices.
global_step: int
inherited
property
readonly
¶
Total training batches seen across all epochs. If no Trainer is attached, this propery is 0.
hparams: Union[pytorch_lightning.utilities.parsing.AttributeDict, dict, argparse.Namespace]
inherited
property
readonly
¶
hparams_initial: AttributeDict
inherited
property
readonly
¶
loaded_optimizer_states_dict: dict
inherited
property
writable
¶
local_rank: int
inherited
property
readonly
¶
The index of the current process within a single node.
logger
inherited
property
readonly
¶
Reference to the logger object in the Trainer.
model_size: float
inherited
property
readonly
¶
The model's size in megabytes. The computation includes everything in the
:meth:~torch.nn.Module.state_dict
, i.e., by default the parameteters and buffers.
on_gpu
inherited
property
readonly
¶
Returns True
if this model is currently located on a GPU.
Useful to set flags around the LightningModule for different CPU vs GPU behavior.
truncated_bptt_steps: int
inherited
property
writable
¶
Enables Truncated Backpropagation Through Time
in the Trainer when set to a positive integer. It represents
the number of times :meth:training_step
gets called before backpropagation. If this is > 0, the
:meth:training_step
receives an additional argument hiddens
and is expected to return a hidden state.
Methods¶
__init__(self, backbone_mode: str = 'train', post_backbone_dropout: Optional[float] = None, output_with_global_average: bool = True, head_dropout_rate: Optional[float] = None, head_hidden_layer_sizes: Optional[Tuple[int]] = None, finetune_from: Union[str, os.PathLike] = None, **kwargs)
special
¶
Initializes the SlowFast model.
Parameters:
Name | Type | Description | Default |
---|---|---|---|
backbone_mode |
str |
If "eval", treat the backbone as a feature extractor and set to evaluation mode in all forward passes. |
'train' |
post_backbone_dropout |
float |
Dropout that operates on the output of the backbone + pool (before the fully-connected layer in the head). |
None |
output_with_global_average |
bool |
If True, apply an adaptive average pooling operation after the fully-connected layer in the head. |
True |
head_dropout_rate |
float |
Optional dropout rate applied after backbone and between projection layers in the head. |
None |
head_hidden_layer_sizes |
tuple of int |
If not None, the size of hidden layers in the head multilayer perceptron. |
None |
finetune_from |
pathlike or str |
If not None, load an existing model from the path and resume training from an existing model. |
None |
Source code in zamba/models/slowfast_models.py
def __init__(
self,
backbone_mode: str = "train",
post_backbone_dropout: Optional[float] = None,
output_with_global_average: bool = True,
head_dropout_rate: Optional[float] = None,
head_hidden_layer_sizes: Optional[Tuple[int]] = None,
finetune_from: Optional[Union[os.PathLike, str]] = None,
**kwargs,
):
"""Initializes the SlowFast model.
Args:
backbone_mode (str): If "eval", treat the backbone as a feature extractor
and set to evaluation mode in all forward passes.
post_backbone_dropout (float, optional): Dropout that operates on the output of the
backbone + pool (before the fully-connected layer in the head).
output_with_global_average (bool): If True, apply an adaptive average pooling
operation after the fully-connected layer in the head.
head_dropout_rate (float, optional): Optional dropout rate applied after backbone and
between projection layers in the head.
head_hidden_layer_sizes (tuple of int): If not None, the size of hidden layers in the
head multilayer perceptron.
finetune_from (pathlike or str, optional): If not None, load an existing model from
the path and resume training from an existing model.
"""
super().__init__(**kwargs)
if finetune_from is None:
self.initialize_from_torchub()
else:
model = self.load_from_checkpoint(finetune_from)
self._backbone_output_dim = model.head.proj.in_features
self.backbone = model.backbone
self.base = model.base
for param in self.base.parameters():
param.requires_grad = False
head = ResNetBasicHead(
proj=build_multilayer_perceptron(
self._backbone_output_dim,
head_hidden_layer_sizes,
self.num_classes,
activation=torch.nn.ReLU,
dropout=head_dropout_rate,
output_activation=None,
),
activation=None,
pool=None,
dropout=None
if post_backbone_dropout is None
else torch.nn.Dropout(post_backbone_dropout),
output_pool=torch.nn.AdaptiveAvgPool3d(1),
)
self.backbone_mode = backbone_mode
self.head = head
self.save_hyperparameters(
"backbone_mode",
"head_dropout_rate",
"head_hidden_layer_sizes",
"output_with_global_average",
"post_backbone_dropout",
)
add_module(self, name: str, module: Optional[Module]) -> None
inherited
¶
Adds a child module to the current module.
The module can be accessed as an attribute using the given name.
Parameters:
Name | Type | Description | Default |
---|---|---|---|
name |
string |
name of the child module. The child module can be accessed from this module using the given name |
required |
module |
Module |
child module to be added to the module. |
required |
Source code in zamba/models/slowfast_models.py
def add_module(self, name: str, module: Optional['Module']) -> None:
r"""Adds a child module to the current module.
The module can be accessed as an attribute using the given name.
Args:
name (string): name of the child module. The child module can be
accessed from this module using the given name
module (Module): child module to be added to the module.
"""
if not isinstance(module, Module) and module is not None:
raise TypeError("{} is not a Module subclass".format(
torch.typename(module)))
elif not isinstance(name, torch._six.string_classes):
raise TypeError("module name should be a string. Got {}".format(
torch.typename(name)))
elif hasattr(self, name) and name not in self._modules:
raise KeyError("attribute '{}' already exists".format(name))
elif '.' in name:
raise KeyError("module name can't contain \".\", got: {}".format(name))
elif name == '':
raise KeyError("module name can't be empty string \"\"")
self._modules[name] = module
add_to_queue(self, queue: <bound method BaseContext.SimpleQueue of <multiprocessing.context.DefaultContext object at 0x7f5c8ddc7640>>) -> None
inherited
¶
Appends the :attr:trainer.callback_metrics
dictionary to the given queue.
To avoid issues with memory sharing, we cast the data to numpy.
Parameters:
Name | Type | Description | Default |
---|---|---|---|
queue |
<bound method BaseContext.SimpleQueue of <multiprocessing.context.DefaultContext object at 0x7f5c8ddc7640>> |
the instance of the queue to append the data. |
required |
Source code in zamba/models/slowfast_models.py
def add_to_queue(self, queue: torch.multiprocessing.SimpleQueue) -> None:
"""
Appends the :attr:`trainer.callback_metrics` dictionary to the given queue.
To avoid issues with memory sharing, we cast the data to numpy.
Args:
queue: the instance of the queue to append the data.
"""
callback_metrics: dict = apply_to_collection(
self.trainer.callback_metrics, torch.Tensor, lambda x: x.cpu().numpy()
) # send as numpy to avoid issues with memory sharing
queue.put(callback_metrics)
aggregate_step_outputs(outputs: Dict[str, numpy.ndarray]) -> Tuple[numpy.ndarray, numpy.ndarray, numpy.ndarray]
inherited
¶
Source code in zamba/models/slowfast_models.py
@staticmethod
def aggregate_step_outputs(
outputs: Dict[str, np.ndarray]
) -> Tuple[np.ndarray, np.ndarray, np.ndarray]:
y_true = np.vstack([output["y_true"] for output in outputs])
y_pred = np.vstack([output["y_pred"] for output in outputs])
y_proba = np.vstack([output["y_proba"] for output in outputs])
return y_true, y_pred, y_proba
all_gather(self, data: Union[torch.Tensor, Dict, List, Tuple], group: Optional[Any] = None, sync_grads: bool = False)
inherited
¶
Allows users to call self.all_gather()
from the LightningModule, thus making the all_gather
operation
accelerator agnostic. all_gather
is a function provided by accelerators to gather a tensor from several
distributed processes.
Parameters:
Name | Type | Description | Default |
---|---|---|---|
data |
Union[torch.Tensor, Dict, List, Tuple] |
int, float, tensor of shape (batch, ...), or a (possibly nested) collection thereof. |
required |
group |
Optional[Any] |
the process group to gather results from. Defaults to all processes (world) |
None |
sync_grads |
bool |
flag that allows users to synchronize gradients for the all_gather operation |
False |
Returns:
Type | Description |
---|---|
A tensor of shape (world_size, batch, ...), or if the input was a collection the output will also be a collection with tensors of this shape. |
Source code in zamba/models/slowfast_models.py
def all_gather(
self, data: Union[torch.Tensor, Dict, List, Tuple], group: Optional[Any] = None, sync_grads: bool = False
):
r"""
Allows users to call ``self.all_gather()`` from the LightningModule, thus making the ``all_gather`` operation
accelerator agnostic. ``all_gather`` is a function provided by accelerators to gather a tensor from several
distributed processes.
Args:
data: int, float, tensor of shape (batch, ...), or a (possibly nested) collection thereof.
group: the process group to gather results from. Defaults to all processes (world)
sync_grads: flag that allows users to synchronize gradients for the all_gather operation
Return:
A tensor of shape (world_size, batch, ...), or if the input was a collection
the output will also be a collection with tensors of this shape.
"""
group = group if group is not None else torch.distributed.group.WORLD
all_gather = self.trainer.accelerator.all_gather
data = convert_to_tensors(data, device=self.device)
return apply_to_collection(data, torch.Tensor, all_gather, group=group, sync_grads=sync_grads)
apply(self: ~T, fn: Callable[[Module], NoneType]) -> ~T
inherited
¶
Applies fn
recursively to every submodule (as returned by .children()
)
as well as self. Typical use includes initializing the parameters of a model
(see also :ref:nn-init-doc
).
Parameters:
Name | Type | Description | Default |
---|---|---|---|
fn |
class: |
required |
Returns:
Type | Description |
---|---|
Module |
self |
Example::
>>> @torch.no_grad()
>>> def init_weights(m):
>>> print(m)
>>> if type(m) == nn.Linear:
>>> m.weight.fill_(1.0)
>>> print(m.weight)
>>> net = nn.Sequential(nn.Linear(2, 2), nn.Linear(2, 2))
>>> net.apply(init_weights)
Linear(in_features=2, out_features=2, bias=True)
Parameter containing:
tensor([[ 1., 1.],
[ 1., 1.]])
Linear(in_features=2, out_features=2, bias=True)
Parameter containing:
tensor([[ 1., 1.],
[ 1., 1.]])
Sequential(
(0): Linear(in_features=2, out_features=2, bias=True)
(1): Linear(in_features=2, out_features=2, bias=True)
)
Sequential(
(0): Linear(in_features=2, out_features=2, bias=True)
(1): Linear(in_features=2, out_features=2, bias=True)
)
Source code in zamba/models/slowfast_models.py
def apply(self: T, fn: Callable[['Module'], None]) -> T:
r"""Applies ``fn`` recursively to every submodule (as returned by ``.children()``)
as well as self. Typical use includes initializing the parameters of a model
(see also :ref:`nn-init-doc`).
Args:
fn (:class:`Module` -> None): function to be applied to each submodule
Returns:
Module: self
Example::
>>> @torch.no_grad()
>>> def init_weights(m):
>>> print(m)
>>> if type(m) == nn.Linear:
>>> m.weight.fill_(1.0)
>>> print(m.weight)
>>> net = nn.Sequential(nn.Linear(2, 2), nn.Linear(2, 2))
>>> net.apply(init_weights)
Linear(in_features=2, out_features=2, bias=True)
Parameter containing:
tensor([[ 1., 1.],
[ 1., 1.]])
Linear(in_features=2, out_features=2, bias=True)
Parameter containing:
tensor([[ 1., 1.],
[ 1., 1.]])
Sequential(
(0): Linear(in_features=2, out_features=2, bias=True)
(1): Linear(in_features=2, out_features=2, bias=True)
)
Sequential(
(0): Linear(in_features=2, out_features=2, bias=True)
(1): Linear(in_features=2, out_features=2, bias=True)
)
"""
for module in self.children():
module.apply(fn)
fn(self)
return self
backward(self, loss: Tensor, optimizer: Optional[torch.optim.optimizer.Optimizer], optimizer_idx: Optional[int], *args, **kwargs) -> None
inherited
¶
Called to perform backward on the loss returned in :meth:training_step
.
Override this hook with your own implementation if you need to.
Parameters:
Name | Type | Description | Default |
---|---|---|---|
loss |
Tensor |
The loss tensor returned by :meth: |
required |
optimizer |
Optional[torch.optim.optimizer.Optimizer] |
Current optimizer being used. |
required |
optimizer_idx |
Optional[int] |
Index of the current optimizer being used. |
required |
Example::
def backward(self, loss, optimizer, optimizer_idx):
loss.backward()
Source code in zamba/models/slowfast_models.py
def backward(
self, loss: Tensor, optimizer: Optional[Optimizer], optimizer_idx: Optional[int], *args, **kwargs
) -> None:
"""
Called to perform backward on the loss returned in :meth:`training_step`.
Override this hook with your own implementation if you need to.
Args:
loss: The loss tensor returned by :meth:`training_step`. If gradient accumulation is used, the loss here
holds the normalized value (scaled by 1 / accumulation steps).
optimizer: Current optimizer being used. ``None`` if using manual optimization.
optimizer_idx: Index of the current optimizer being used. ``None`` if using manual optimization.
Example::
def backward(self, loss, optimizer, optimizer_idx):
loss.backward()
"""
loss.backward(*args, **kwargs)
bfloat16(self: ~T) -> ~T
inherited
¶
Casts all floating point parameters and buffers to bfloat16
datatype.
.. note:: This method modifies the module in-place.
Returns:
Type | Description |
---|---|
Module |
self |
Source code in zamba/models/slowfast_models.py
def bfloat16(self: T) -> T:
r"""Casts all floating point parameters and buffers to ``bfloat16`` datatype.
.. note::
This method modifies the module in-place.
Returns:
Module: self
"""
return self._apply(lambda t: t.bfloat16() if t.is_floating_point() else t)
buffers(self, recurse: bool = True) -> Iterator[torch.Tensor]
inherited
¶
Returns an iterator over module buffers.
Parameters:
Name | Type | Description | Default |
---|---|---|---|
recurse |
bool |
if True, then yields buffers of this module and all submodules. Otherwise, yields only buffers that are direct members of this module. |
True |
!!! yields torch.Tensor: module buffer
Example::
>>> for buf in model.buffers():
>>> print(type(buf), buf.size())
<class 'torch.Tensor'> (20L,)
<class 'torch.Tensor'> (20L, 1L, 5L, 5L)
Source code in zamba/models/slowfast_models.py
def buffers(self, recurse: bool = True) -> Iterator[Tensor]:
r"""Returns an iterator over module buffers.
Args:
recurse (bool): if True, then yields buffers of this module
and all submodules. Otherwise, yields only buffers that
are direct members of this module.
Yields:
torch.Tensor: module buffer
Example::
>>> for buf in model.buffers():
>>> print(type(buf), buf.size())
<class 'torch.Tensor'> (20L,)
<class 'torch.Tensor'> (20L, 1L, 5L, 5L)
"""
for _, buf in self.named_buffers(recurse=recurse):
yield buf
children(self) -> Iterator[Module]
inherited
¶
Returns an iterator over immediate children modules.
!!! yields Module: a child module
Source code in zamba/models/slowfast_models.py
def children(self) -> Iterator['Module']:
r"""Returns an iterator over immediate children modules.
Yields:
Module: a child module
"""
for name, module in self.named_children():
yield module
compute_and_log_metrics(self, y_true: ndarray, y_pred: ndarray, y_proba: ndarray, subset: str)
inherited
¶
Source code in zamba/models/slowfast_models.py
def compute_and_log_metrics(
self, y_true: np.ndarray, y_pred: np.ndarray, y_proba: np.ndarray, subset: str
):
self.log(f"{subset}_macro_f1", f1_score(y_true, y_pred, average="macro", zero_division=0))
# if only two classes, skip top_k accuracy since not enough classes
if self.num_classes > 2:
for k in DEFAULT_TOP_K:
if k < self.num_classes:
self.log(
f"{subset}_top_{k}_accuracy",
top_k_accuracy_score(
y_true.argmax(
axis=1
), # top k accuracy only supports single label case
y_proba,
labels=np.arange(y_proba.shape[1]),
k=k,
),
)
else:
self.log(f"{subset}_accuracy", accuracy_score(y_true, y_pred))
for metric_name, label, metric in compute_species_specific_metrics(
y_true, y_pred, self.species
):
self.log(f"species/{subset}_{metric_name}/{label}", metric)
configure_callbacks(self)
inherited
¶
Configure model-specific callbacks.
When the model gets attached, e.g., when .fit()
or .test()
gets called,
the list returned here will be merged with the list of callbacks passed to the Trainer's callbacks
argument.
If a callback returned here has the same type as one or several callbacks already present in
the Trainer's callbacks list, it will take priority and replace them.
In addition, Lightning will make sure :class:~pytorch_lightning.callbacks.model_checkpoint.ModelCheckpoint
callbacks run last.
Returns:
Type | Description |
---|---|
A list of callbacks which will extend the list of callbacks in the Trainer. |
Example::
def configure_callbacks(self):
early_stop = EarlyStopping(monitor"val_acc", mode="max")
checkpoint = ModelCheckpoint(monitor="val_loss")
return [early_stop, checkpoint]
!!! note
Certain callback methods like :meth:~pytorch_lightning.callbacks.base.Callback.on_init_start
will never be invoked on the new callbacks returned here.
Source code in zamba/models/slowfast_models.py
def configure_callbacks(self):
"""
Configure model-specific callbacks.
When the model gets attached, e.g., when ``.fit()`` or ``.test()`` gets called,
the list returned here will be merged with the list of callbacks passed to the Trainer's ``callbacks`` argument.
If a callback returned here has the same type as one or several callbacks already present in
the Trainer's callbacks list, it will take priority and replace them.
In addition, Lightning will make sure :class:`~pytorch_lightning.callbacks.model_checkpoint.ModelCheckpoint`
callbacks run last.
Return:
A list of callbacks which will extend the list of callbacks in the Trainer.
Example::
def configure_callbacks(self):
early_stop = EarlyStopping(monitor"val_acc", mode="max")
checkpoint = ModelCheckpoint(monitor="val_loss")
return [early_stop, checkpoint]
Note:
Certain callback methods like :meth:`~pytorch_lightning.callbacks.base.Callback.on_init_start`
will never be invoked on the new callbacks returned here.
"""
return []
configure_optimizers(self)
inherited
¶
Setup the Adam optimizer. Note, that this function also can return a lr scheduler, which is usually useful for training video models.
Source code in zamba/models/slowfast_models.py
def configure_optimizers(self):
"""
Setup the Adam optimizer. Note, that this function also can return a lr scheduler, which is
usually useful for training video models.
"""
optim = self._get_optimizer()
if self.scheduler is None:
return optim
else:
return {
"optimizer": optim,
"lr_scheduler": self.scheduler(
optim, **({} if self.scheduler_params is None else self.scheduler_params)
),
}
configure_sharded_model(self) -> None
inherited
¶
Hook to create modules in a distributed aware context. This is useful for when using sharded plugins, where we'd like to shard the model instantly, which is useful for extremely large models which can save memory and initialization time.
The accelerator manages whether to call this hook at every given stage. For sharded plugins where model parallelism is required, the hook is usually on called once to initialize the sharded parameters, and not called again in the same process.
By default for accelerators/plugins that do not use model sharding techniques, this hook is called during each fit/val/test/predict stages.
Source code in zamba/models/slowfast_models.py
def configure_sharded_model(self) -> None:
"""
Hook to create modules in a distributed aware context. This is useful for when using sharded plugins,
where we'd like to shard the model instantly, which is useful for extremely large models
which can save memory and initialization time.
The accelerator manages whether to call this hook at every given stage.
For sharded plugins where model parallelism is required, the hook is usually on called once
to initialize the sharded parameters, and not called again in the same process.
By default for accelerators/plugins that do not use model sharding techniques,
this hook is called during each fit/val/test/predict stages.
"""
cpu(self) -> DeviceDtypeModuleMixin
inherited
¶
Moves all model parameters and buffers to the CPU.
Returns:
Type | Description |
---|---|
Module |
self |
Source code in zamba/models/slowfast_models.py
def cpu(self) -> "DeviceDtypeModuleMixin":
"""Moves all model parameters and buffers to the CPU.
Returns:
Module: self
"""
self.__update_properties(device=torch.device("cpu"))
return super().cpu()
cuda(self, device: Union[torch.device, int] = None) -> DeviceDtypeModuleMixin
inherited
¶
Moves all model parameters and buffers to the GPU. This also makes associated parameters and buffers different objects. So it should be called before constructing optimizer if the module will live on GPU while being optimized.
Parameters:
Name | Type | Description | Default |
---|---|---|---|
device |
Union[torch.device, int] |
if specified, all parameters will be copied to that device |
None |
Returns:
Type | Description |
---|---|
Module |
self |
Source code in zamba/models/slowfast_models.py
def cuda(self, device: Optional[Union[torch.device, int]] = None) -> "DeviceDtypeModuleMixin":
"""Moves all model parameters and buffers to the GPU.
This also makes associated parameters and buffers different objects. So
it should be called before constructing optimizer if the module will
live on GPU while being optimized.
Arguments:
device: if specified, all parameters will be
copied to that device
Returns:
Module: self
"""
if device is None or isinstance(device, int):
device = torch.device("cuda", index=device)
self.__update_properties(device=device)
return super().cuda(device=device)
double(self) -> DeviceDtypeModuleMixin
inherited
¶
Casts all floating point parameters and buffers to double
datatype.
Returns:
Type | Description |
---|---|
Module |
self |
Source code in zamba/models/slowfast_models.py
def double(self) -> "DeviceDtypeModuleMixin":
"""Casts all floating point parameters and buffers to ``double`` datatype.
Returns:
Module: self
"""
self.__update_properties(dtype=torch.double)
return super().double()
eval(self: ~T) -> ~T
inherited
¶
Sets the module in evaluation mode.
This has any effect only on certain modules. See documentations of
particular modules for details of their behaviors in training/evaluation
mode, if they are affected, e.g. :class:Dropout
, :class:BatchNorm
,
etc.
This is equivalent with :meth:self.train(False) <torch.nn.Module.train>
.
See :ref:locally-disable-grad-doc
for a comparison between
.eval()
and several similar mechanisms that may be confused with it.
Returns:
Type | Description |
---|---|
Module |
self |
Source code in zamba/models/slowfast_models.py
def eval(self: T) -> T:
r"""Sets the module in evaluation mode.
This has any effect only on certain modules. See documentations of
particular modules for details of their behaviors in training/evaluation
mode, if they are affected, e.g. :class:`Dropout`, :class:`BatchNorm`,
etc.
This is equivalent with :meth:`self.train(False) <torch.nn.Module.train>`.
See :ref:`locally-disable-grad-doc` for a comparison between
`.eval()` and several similar mechanisms that may be confused with it.
Returns:
Module: self
"""
return self.train(False)
extra_repr(self) -> str
inherited
¶
Set the extra representation of the module
To print customized extra information, you should re-implement this method in your own modules. Both single-line and multi-line strings are acceptable.
Source code in zamba/models/slowfast_models.py
def extra_repr(self) -> str:
r"""Set the extra representation of the module
To print customized extra information, you should re-implement
this method in your own modules. Both single-line and multi-line
strings are acceptable.
"""
return ''
float(self) -> DeviceDtypeModuleMixin
inherited
¶
Casts all floating point parameters and buffers to float
datatype.
Returns:
Type | Description |
---|---|
Module |
self |
Source code in zamba/models/slowfast_models.py
def float(self) -> "DeviceDtypeModuleMixin":
"""Casts all floating point parameters and buffers to ``float`` datatype.
Returns:
Module: self
"""
self.__update_properties(dtype=torch.float)
return super().float()
forward(self, x, *args, **kwargs)
¶
Same as :meth:torch.nn.Module.forward()
.
Parameters:
Name | Type | Description | Default |
---|---|---|---|
*args |
Whatever you decide to pass into the forward method. |
() |
|
**kwargs |
Keyword arguments are also possible. |
{} |
Returns:
Type | Description |
---|---|
Your model's output |
Source code in zamba/models/slowfast_models.py
def forward(self, x, *args, **kwargs):
if self.backbone_mode == "eval":
self.base.eval()
x = self.base(x)
return self.head(x)
freeze(self) -> None
inherited
¶
Freeze all params for inference.
Example::
model = MyLightningModule(...)
model.freeze()
Source code in zamba/models/slowfast_models.py
def freeze(self) -> None:
r"""
Freeze all params for inference.
Example::
model = MyLightningModule(...)
model.freeze()
"""
for param in self.parameters():
param.requires_grad = False
self.eval()
get_buffer(self, target: str) -> Tensor
inherited
¶
Returns the buffer given by target
if it exists,
otherwise throws an error.
See the docstring for get_submodule
for a more detailed
explanation of this method's functionality as well as how to
correctly specify target
.
Parameters:
Name | Type | Description | Default |
---|---|---|---|
target |
str |
The fully-qualified string name of the buffer
to look for. (See |
required |
Returns:
Type | Description |
---|---|
torch.Tensor |
The buffer referenced by |
Exceptions:
Type | Description |
---|---|
AttributeError |
If the target string references an invalid path or resolves to something that is not a buffer |
Source code in zamba/models/slowfast_models.py
def get_buffer(self, target: str) -> "Tensor":
"""
Returns the buffer given by ``target`` if it exists,
otherwise throws an error.
See the docstring for ``get_submodule`` for a more detailed
explanation of this method's functionality as well as how to
correctly specify ``target``.
Args:
target: The fully-qualified string name of the buffer
to look for. (See ``get_submodule`` for how to specify a
fully-qualified string.)
Returns:
torch.Tensor: The buffer referenced by ``target``
Raises:
AttributeError: If the target string references an invalid
path or resolves to something that is not a
buffer
"""
module_path, _, buffer_name = target.rpartition(".")
mod: torch.nn.Module = self.get_submodule(module_path)
if not hasattr(mod, buffer_name):
raise AttributeError(mod._get_name() + " has no attribute `"
+ buffer_name + "`")
buffer: torch.Tensor = getattr(mod, buffer_name)
if buffer_name not in mod._buffers:
raise AttributeError("`" + buffer_name + "` is not a buffer")
return buffer
get_extra_state(self) -> Any
inherited
¶
Returns any extra state to include in the module's state_dict.
Implement this and a corresponding :func:set_extra_state
for your module
if you need to store extra state. This function is called when building the
module's state_dict()
.
Note that extra state should be pickleable to ensure working serialization of the state_dict. We only provide provide backwards compatibility guarantees for serializing Tensors; other objects may break backwards compatibility if their serialized pickled form changes.
Returns:
Type | Description |
---|---|
object |
Any extra state to store in the module's state_dict |
Source code in zamba/models/slowfast_models.py
def get_extra_state(self) -> Any:
"""
Returns any extra state to include in the module's state_dict.
Implement this and a corresponding :func:`set_extra_state` for your module
if you need to store extra state. This function is called when building the
module's `state_dict()`.
Note that extra state should be pickleable to ensure working serialization
of the state_dict. We only provide provide backwards compatibility guarantees
for serializing Tensors; other objects may break backwards compatibility if
their serialized pickled form changes.
Returns:
object: Any extra state to store in the module's state_dict
"""
raise RuntimeError(
"Reached a code path in Module.get_extra_state() that should never be called. "
"Please file an issue at https://github.com/pytorch/pytorch/issues/new?template=bug-report.md "
"to report this bug.")
get_from_queue(self, queue: <bound method BaseContext.SimpleQueue of <multiprocessing.context.DefaultContext object at 0x7f5c8ddc7640>>) -> None
inherited
¶
Retrieve the :attr:trainer.callback_metrics
dictionary from the given queue.
To preserve consistency, we cast back the data to torch.Tensor
.
Parameters:
Name | Type | Description | Default |
---|---|---|---|
queue |
<bound method BaseContext.SimpleQueue of <multiprocessing.context.DefaultContext object at 0x7f5c8ddc7640>> |
the instance of the queue from where to get the data. |
required |
Source code in zamba/models/slowfast_models.py
def get_from_queue(self, queue: torch.multiprocessing.SimpleQueue) -> None:
"""
Retrieve the :attr:`trainer.callback_metrics` dictionary from the given queue.
To preserve consistency, we cast back the data to ``torch.Tensor``.
Args:
queue: the instance of the queue from where to get the data.
"""
# NOTE: `add_to_queue` needs to be called before
callback_metrics: dict = queue.get()
self.trainer.callback_metrics.update(
apply_to_collection(callback_metrics, np.ndarray, lambda x: torch.tensor(x))
)
get_parameter(self, target: str) -> Parameter
inherited
¶
Returns the parameter given by target
if it exists,
otherwise throws an error.
See the docstring for get_submodule
for a more detailed
explanation of this method's functionality as well as how to
correctly specify target
.
Parameters:
Name | Type | Description | Default |
---|---|---|---|
target |
str |
The fully-qualified string name of the Parameter
to look for. (See |
required |
Returns:
Type | Description |
---|---|
torch.nn.Parameter |
The Parameter referenced by |
Exceptions:
Type | Description |
---|---|
AttributeError |
If the target string references an invalid
path or resolves to something that is not an
|
Source code in zamba/models/slowfast_models.py
def get_parameter(self, target: str) -> "Parameter":
"""
Returns the parameter given by ``target`` if it exists,
otherwise throws an error.
See the docstring for ``get_submodule`` for a more detailed
explanation of this method's functionality as well as how to
correctly specify ``target``.
Args:
target: The fully-qualified string name of the Parameter
to look for. (See ``get_submodule`` for how to specify a
fully-qualified string.)
Returns:
torch.nn.Parameter: The Parameter referenced by ``target``
Raises:
AttributeError: If the target string references an invalid
path or resolves to something that is not an
``nn.Parameter``
"""
module_path, _, param_name = target.rpartition(".")
mod: torch.nn.Module = self.get_submodule(module_path)
if not hasattr(mod, param_name):
raise AttributeError(mod._get_name() + " has no attribute `"
+ param_name + "`")
param: torch.nn.Parameter = getattr(mod, param_name)
if not isinstance(param, torch.nn.Parameter):
raise AttributeError("`" + param_name + "` is not an "
"nn.Parameter")
return param
get_progress_bar_dict(self) -> Dict[str, Union[int, str]]
inherited
¶
Implement this to override the default items displayed in the progress bar. By default it includes the average loss value, split index of BPTT (if used) and the version of the experiment when using a logger.
.. code-block::
Epoch 1: 4%|â–Ž | 40/1095 [00:03<01:37, 10.84it/s, loss=4.501, v_num=10]
Here is an example how to override the defaults:
.. code-block:: python
def get_progress_bar_dict(self):
# don't show the version number
items = super().get_progress_bar_dict()
items.pop("v_num", None)
return items
Returns:
Type | Description |
---|---|
Dict[str, Union[int, str]] |
Dictionary with the items to be displayed in the progress bar. |
Source code in zamba/models/slowfast_models.py
def get_progress_bar_dict(self) -> Dict[str, Union[int, str]]:
r"""
Implement this to override the default items displayed in the progress bar.
By default it includes the average loss value, split index of BPTT (if used)
and the version of the experiment when using a logger.
.. code-block::
Epoch 1: 4%|â–Ž | 40/1095 [00:03<01:37, 10.84it/s, loss=4.501, v_num=10]
Here is an example how to override the defaults:
.. code-block:: python
def get_progress_bar_dict(self):
# don't show the version number
items = super().get_progress_bar_dict()
items.pop("v_num", None)
return items
Return:
Dictionary with the items to be displayed in the progress bar.
"""
# call .item() only once but store elements without graphs
running_train_loss = self.trainer.fit_loop.running_loss.mean()
avg_training_loss = None
if running_train_loss is not None:
avg_training_loss = running_train_loss.cpu().item()
elif self.automatic_optimization:
avg_training_loss = float("NaN")
tqdm_dict = {}
if avg_training_loss is not None:
tqdm_dict["loss"] = f"{avg_training_loss:.3g}"
module_tbptt_enabled = self.truncated_bptt_steps > 0
trainer_tbptt_enabled = self.trainer.truncated_bptt_steps is not None and self.trainer.truncated_bptt_steps > 0
if module_tbptt_enabled or trainer_tbptt_enabled:
tqdm_dict["split_idx"] = self.trainer.fit_loop.split_idx
if self.trainer.logger is not None and self.trainer.logger.version is not None:
version = self.trainer.logger.version
# show last 4 places of long version strings
version = version[-4:] if isinstance(version, str) else version
tqdm_dict["v_num"] = version
return tqdm_dict
get_submodule(self, target: str) -> Module
inherited
¶
Returns the submodule given by target
if it exists,
otherwise throws an error.
For example, let's say you have an nn.Module
A
that
looks like this:
.. code-block::text
A(
(net_b): Module(
(net_c): Module(
(conv): Conv2d(16, 33, kernel_size=(3, 3), stride=(2, 2))
)
(linear): Linear(in_features=100, out_features=200, bias=True)
)
)
(The diagram shows an nn.Module
A
. A
has a nested
submodule net_b
, which itself has two submodules net_c
and linear
. net_c
then has a submodule conv
.)
To check whether or not we have the linear
submodule, we
would call get_submodule("net_b.linear")
. To check whether
we have the conv
submodule, we would call
get_submodule("net_b.net_c.conv")
.
The runtime of get_submodule
is bounded by the degree
of module nesting in target
. A query against
named_modules
achieves the same result, but it is O(N) in
the number of transitive modules. So, for a simple check to see
if some submodule exists, get_submodule
should always be
used.
Parameters:
Name | Type | Description | Default |
---|---|---|---|
target |
str |
The fully-qualified string name of the submodule to look for. (See above example for how to specify a fully-qualified string.) |
required |
Returns:
Type | Description |
---|---|
torch.nn.Module |
The submodule referenced by |
Exceptions:
Type | Description |
---|---|
AttributeError |
If the target string references an invalid
path or resolves to something that is not an
|
Source code in zamba/models/slowfast_models.py
def get_submodule(self, target: str) -> "Module":
"""
Returns the submodule given by ``target`` if it exists,
otherwise throws an error.
For example, let's say you have an ``nn.Module`` ``A`` that
looks like this:
.. code-block::text
A(
(net_b): Module(
(net_c): Module(
(conv): Conv2d(16, 33, kernel_size=(3, 3), stride=(2, 2))
)
(linear): Linear(in_features=100, out_features=200, bias=True)
)
)
(The diagram shows an ``nn.Module`` ``A``. ``A`` has a nested
submodule ``net_b``, which itself has two submodules ``net_c``
and ``linear``. ``net_c`` then has a submodule ``conv``.)
To check whether or not we have the ``linear`` submodule, we
would call ``get_submodule("net_b.linear")``. To check whether
we have the ``conv`` submodule, we would call
``get_submodule("net_b.net_c.conv")``.
The runtime of ``get_submodule`` is bounded by the degree
of module nesting in ``target``. A query against
``named_modules`` achieves the same result, but it is O(N) in
the number of transitive modules. So, for a simple check to see
if some submodule exists, ``get_submodule`` should always be
used.
Args:
target: The fully-qualified string name of the submodule
to look for. (See above example for how to specify a
fully-qualified string.)
Returns:
torch.nn.Module: The submodule referenced by ``target``
Raises:
AttributeError: If the target string references an invalid
path or resolves to something that is not an
``nn.Module``
"""
if target == "":
return self
atoms: List[str] = target.split(".")
mod: torch.nn.Module = self
for item in atoms:
if not hasattr(mod, item):
raise AttributeError(mod._get_name() + " has no "
"attribute `" + item + "`")
mod = getattr(mod, item)
if not isinstance(mod, torch.nn.Module):
raise AttributeError("`" + item + "` is not "
"an nn.Module")
return mod
grad_norm(self, norm_type: Union[float, int, str]) -> Dict[str, float]
inherited
¶
Compute each parameter's gradient's norm and their overall norm.
.. deprecated:: v1.3
Will be removed in v1.5.0. Use :func:pytorch_lightning.utilities.grads.grad_norm
instead.
Source code in zamba/models/slowfast_models.py
def grad_norm(self, norm_type: Union[float, int, str]) -> Dict[str, float]:
"""Compute each parameter's gradient's norm and their overall norm.
.. deprecated:: v1.3
Will be removed in v1.5.0. Use :func:`pytorch_lightning.utilities.grads.grad_norm` instead.
"""
rank_zero_deprecation(
"LightningModule.grad_norm is deprecated in v1.3 and will be removed in v1.5."
" Use grad_norm from pytorch_lightning.utilities.grads instead."
)
return new_grad_norm(self, norm_type)
half(self) -> DeviceDtypeModuleMixin
inherited
¶
Casts all floating point parameters and buffers to half
datatype.
Returns:
Type | Description |
---|---|
Module |
self |
Source code in zamba/models/slowfast_models.py
def half(self) -> "DeviceDtypeModuleMixin":
"""Casts all floating point parameters and buffers to ``half`` datatype.
Returns:
Module: self
"""
self.__update_properties(dtype=torch.half)
return super().half()
initialize_from_torchub(self)
¶
Loads SlowFast model from torchhub and prepares ZambaVideoClassificationLightningModule by removing the head and setting the backbone and base.
Source code in zamba/models/slowfast_models.py
def initialize_from_torchub(self):
"""Loads SlowFast model from torchhub and prepares ZambaVideoClassificationLightningModule
by removing the head and setting the backbone and base."""
# workaround for pytorch bug
torch.hub._validate_not_a_forked_repo = lambda a, b, c: True
base = torch.hub.load(
"facebookresearch/pytorchvideo:0.1.3", model="slowfast_r50", pretrained=True
)
self._backbone_output_dim = base.blocks[-1].proj.in_features
base.blocks = base.blocks[:-1] # Remove the pre-trained head
# self.backbone attribute lets `BackboneFinetune` freeze and unfreeze that module
self.backbone = base.blocks[-2:]
self.base = base
load_state_dict(self, state_dict: OrderedDict[str, Tensor], strict: bool = True)
inherited
¶
Copies parameters and buffers from :attr:state_dict
into
this module and its descendants. If :attr:strict
is True
, then
the keys of :attr:state_dict
must exactly match the keys returned
by this module's :meth:~torch.nn.Module.state_dict
function.
Parameters:
Name | Type | Description | Default |
---|---|---|---|
state_dict |
dict |
a dict containing parameters and persistent buffers. |
required |
strict |
bool |
whether to strictly enforce that the keys
in :attr: |
True |
Returns:
Type | Description |
---|---|
``NamedTuple`` with ``missing_keys`` and ``unexpected_keys`` fields |
|
!!! note
If a parameter or buffer is registered as None
and its corresponding key
exists in :attr:state_dict
, :meth:load_state_dict
will raise a
RuntimeError
.
Source code in zamba/models/slowfast_models.py
def load_state_dict(self, state_dict: 'OrderedDict[str, Tensor]',
strict: bool = True):
r"""Copies parameters and buffers from :attr:`state_dict` into
this module and its descendants. If :attr:`strict` is ``True``, then
the keys of :attr:`state_dict` must exactly match the keys returned
by this module's :meth:`~torch.nn.Module.state_dict` function.
Args:
state_dict (dict): a dict containing parameters and
persistent buffers.
strict (bool, optional): whether to strictly enforce that the keys
in :attr:`state_dict` match the keys returned by this module's
:meth:`~torch.nn.Module.state_dict` function. Default: ``True``
Returns:
``NamedTuple`` with ``missing_keys`` and ``unexpected_keys`` fields:
* **missing_keys** is a list of str containing the missing keys
* **unexpected_keys** is a list of str containing the unexpected keys
Note:
If a parameter or buffer is registered as ``None`` and its corresponding key
exists in :attr:`state_dict`, :meth:`load_state_dict` will raise a
``RuntimeError``.
"""
missing_keys: List[str] = []
unexpected_keys: List[str] = []
error_msgs: List[str] = []
# copy state_dict so _load_from_state_dict can modify it
metadata = getattr(state_dict, '_metadata', None)
state_dict = state_dict.copy()
if metadata is not None:
# mypy isn't aware that "_metadata" exists in state_dict
state_dict._metadata = metadata # type: ignore[attr-defined]
def load(module, prefix=''):
local_metadata = {} if metadata is None else metadata.get(prefix[:-1], {})
module._load_from_state_dict(
state_dict, prefix, local_metadata, True, missing_keys, unexpected_keys, error_msgs)
for name, child in module._modules.items():
if child is not None:
load(child, prefix + name + '.')
load(self)
del load
if strict:
if len(unexpected_keys) > 0:
error_msgs.insert(
0, 'Unexpected key(s) in state_dict: {}. '.format(
', '.join('"{}"'.format(k) for k in unexpected_keys)))
if len(missing_keys) > 0:
error_msgs.insert(
0, 'Missing key(s) in state_dict: {}. '.format(
', '.join('"{}"'.format(k) for k in missing_keys)))
if len(error_msgs) > 0:
raise RuntimeError('Error(s) in loading state_dict for {}:\n\t{}'.format(
self.__class__.__name__, "\n\t".join(error_msgs)))
return _IncompatibleKeys(missing_keys, unexpected_keys)
log(self, name: str, value: Union[torchmetrics.metric.Metric, torch.Tensor, numbers.Number, Mapping[str, Union[torchmetrics.metric.Metric, torch.Tensor, numbers.Number]]], prog_bar: bool = False, logger: bool = True, on_step: Optional[bool] = None, on_epoch: Optional[bool] = None, reduce_fx: Union[str, Callable] = 'default', tbptt_reduce_fx: Optional = None, tbptt_pad_token: Optional = None, enable_graph: bool = False, sync_dist: bool = False, sync_dist_op: Optional = None, sync_dist_group: Optional[Any] = None, add_dataloader_idx: bool = True, batch_size: Optional[int] = None, metric_attribute: Optional[str] = None, rank_zero_only: Optional[bool] = None) -> None
inherited
¶
Log a key, value pair.
Example::
self.log('train_loss', loss)
The default behavior per hook is as follows:
.. csv-table:: *
also applies to the test loop
:header: "LightningModule Hook", "on_step", "on_epoch", "prog_bar", "logger"
:widths: 20, 10, 10, 10, 10
"training_step", "T", "F", "F", "T" "training_step_end", "T", "F", "F", "T" "training_epoch_end", "F", "T", "F", "T" "validation_step", "F", "T", "F", "T" "validation_step_end", "F", "T", "F", "T" "validation_epoch_end*", "F", "T", "F", "T"
Parameters:
Name | Type | Description | Default |
---|---|---|---|
name |
str |
key to log |
required |
value |
Union[torchmetrics.metric.Metric, torch.Tensor, numbers.Number, Mapping[str, Union[torchmetrics.metric.Metric, torch.Tensor, numbers.Number]]] |
value to log. Can be a |
required |
prog_bar |
bool |
if True logs to the progress bar |
False |
logger |
bool |
if True logs to the logger |
True |
on_step |
Optional[bool] |
if True logs at this step. None auto-logs at the training_step but not validation/test_step |
None |
on_epoch |
Optional[bool] |
if True logs epoch accumulated metrics. None auto-logs at the val/test step but not training_step |
None |
reduce_fx |
Union[str, Callable] |
reduction function over step values for end of epoch. :meth: |
'default' |
enable_graph |
bool |
if True, will not auto detach the graph |
False |
sync_dist |
bool |
if True, reduces the metric across GPUs/TPUs |
False |
sync_dist_group |
Optional[Any] |
the ddp group to sync across |
None |
add_dataloader_idx |
bool |
if True, appends the index of the current dataloader to the name (when using multiple). If False, user needs to give unique names for each dataloader to not mix values |
True |
batch_size |
Optional[int] |
Current batch_size. This will be directly inferred from the loaded batch, but some data structures might need to explicitly provide it. |
None |
metric_attribute |
Optional[str] |
To restore the metric state, Lightning requires the reference of the
:class: |
None |
rank_zero_only |
Optional[bool] |
Whether the value will be logged only on rank 0. This will prevent synchronization which would produce a deadlock as not all processes would perform this log call. |
None |
Source code in zamba/models/slowfast_models.py
def log(
self,
name: str,
value: _METRIC_COLLECTION,
prog_bar: bool = False,
logger: bool = True,
on_step: Optional[bool] = None,
on_epoch: Optional[bool] = None,
reduce_fx: Union[str, Callable] = "default", # TODO: change to 'mean' when `sync_dist_op` is removed in 1.6
tbptt_reduce_fx: Optional = None, # noqa: Remove in 1.6
tbptt_pad_token: Optional = None, # noqa: Remove in 1.6
enable_graph: bool = False,
sync_dist: bool = False,
sync_dist_op: Optional = None, # noqa: Remove in 1.6
sync_dist_group: Optional[Any] = None,
add_dataloader_idx: bool = True,
batch_size: Optional[int] = None,
metric_attribute: Optional[str] = None,
rank_zero_only: Optional[bool] = None,
) -> None:
"""
Log a key, value pair.
Example::
self.log('train_loss', loss)
The default behavior per hook is as follows:
.. csv-table:: ``*`` also applies to the test loop
:header: "LightningModule Hook", "on_step", "on_epoch", "prog_bar", "logger"
:widths: 20, 10, 10, 10, 10
"training_step", "T", "F", "F", "T"
"training_step_end", "T", "F", "F", "T"
"training_epoch_end", "F", "T", "F", "T"
"validation_step*", "F", "T", "F", "T"
"validation_step_end*", "F", "T", "F", "T"
"validation_epoch_end*", "F", "T", "F", "T"
Args:
name: key to log
value: value to log. Can be a ``float``, ``Tensor``, ``Metric``, or a dictionary of the former.
prog_bar: if True logs to the progress bar
logger: if True logs to the logger
on_step: if True logs at this step. None auto-logs at the training_step but not validation/test_step
on_epoch: if True logs epoch accumulated metrics. None auto-logs at the val/test step but not training_step
reduce_fx: reduction function over step values for end of epoch. :meth:`torch.mean` by default.
enable_graph: if True, will not auto detach the graph
sync_dist: if True, reduces the metric across GPUs/TPUs
sync_dist_group: the ddp group to sync across
add_dataloader_idx: if True, appends the index of the current dataloader to
the name (when using multiple). If False, user needs to give unique names for
each dataloader to not mix values
batch_size: Current batch_size. This will be directly inferred from the loaded batch,
but some data structures might need to explicitly provide it.
metric_attribute: To restore the metric state, Lightning requires the reference of the
:class:`torchmetrics.Metric` in your model. This is found automatically if it is a model attribute.
rank_zero_only: Whether the value will be logged only on rank 0. This will prevent synchronization which
would produce a deadlock as not all processes would perform this log call.
"""
if tbptt_reduce_fx is not None:
rank_zero_deprecation(
"`self.log(tbptt_reduce_fx=...)` is no longer supported. The flag will be removed in v1.6."
" Please, open a discussion explaining your use-case in"
" `https://github.com/PyTorchLightning/pytorch-lightning/discussions`"
)
if tbptt_pad_token is not None:
rank_zero_deprecation(
"`self.log(tbptt_pad_token=...)` is no longer supported. The flag will be removed in v1.6."
" Please, open a discussion explaining your use-case in"
" `https://github.com/PyTorchLightning/pytorch-lightning/discussions`"
)
if sync_dist_op is not None:
rank_zero_deprecation(
f"`self.log(sync_dist_op='{sync_dist_op}')` is deprecated and will be removed in v.1.6."
f" Use `self.log(reduce_fx={sync_dist_op})` instead."
)
if reduce_fx == "default":
reduce_fx = sync_dist_op
elif reduce_fx == "default":
reduce_fx = "mean"
# check for invalid values
apply_to_collection(value, dict, self.__check_not_nested, name)
apply_to_collection(
value, object, self.__check_allowed, name, value, wrong_dtype=(numbers.Number, Metric, Tensor, dict)
)
# set the default depending on the fx_name
on_step = self.__auto_choose_log_on_step(on_step)
on_epoch = self.__auto_choose_log_on_epoch(on_epoch)
results = self.trainer._results
assert results is not None
assert self._current_fx_name is not None
FxValidator.check_logging(self._current_fx_name, on_step=on_step, on_epoch=on_epoch)
# make sure user doesn't introduce logic for multi-dataloaders
if "/dataloader_idx_" in name:
raise MisconfigurationException(
f"You called `self.log` with the key `{name}`"
" but it should not contain information about `dataloader_idx`"
)
value = apply_to_collection(value, numbers.Number, self.__to_tensor)
if self.trainer.logger_connector.should_reset_tensors(self._current_fx_name):
# if we started a new epoch (running it's first batch) the hook name has changed
# reset any tensors for the new hook name
results.reset(metrics=False, fx=self._current_fx_name)
if metric_attribute is None and isinstance(value, Metric):
if self._metric_attributes is None:
# compute once
self._metric_attributes = {
id(module): name for name, module in self.named_modules() if isinstance(module, Metric)
}
if not self._metric_attributes:
raise MisconfigurationException(
"Could not find the `LightningModule` attribute for the `torchmetrics.Metric` logged."
" You can fix this by setting an attribute for the metric in your `LightningModule`."
)
# try to find the passed metric in the LightningModule
metric_attribute = self._metric_attributes.get(id(value), None)
if metric_attribute is None:
raise MisconfigurationException(
"Could not find the `LightningModule` attribute for the `torchmetrics.Metric` logged."
f" You can fix this by calling `self.log({name}, ..., metric_attribute=name)` where `name` is one"
f" of {list(self._metric_attributes.values())}"
)
results.log(
self._current_fx_name,
name,
value,
prog_bar=prog_bar,
logger=logger,
on_step=on_step,
on_epoch=on_epoch,
reduce_fx=reduce_fx,
enable_graph=enable_graph,
dataloader_idx=(self._current_dataloader_idx if add_dataloader_idx else None),
batch_size=batch_size,
sync_dist=sync_dist and distributed_available(),
sync_dist_fn=self.trainer.training_type_plugin.reduce or sync_ddp,
sync_dist_group=sync_dist_group,
metric_attribute=metric_attribute,
rank_zero_only=rank_zero_only,
)
self.trainer.logger_connector._current_fx = self._current_fx_name
log_dict(self, dictionary: Mapping[str, Union[torchmetrics.metric.Metric, torch.Tensor, numbers.Number, Mapping[str, Union[torchmetrics.metric.Metric, torch.Tensor, numbers.Number]]]], prog_bar: bool = False, logger: bool = True, on_step: Optional[bool] = None, on_epoch: Optional[bool] = None, reduce_fx: Union[str, Callable] = 'default', tbptt_reduce_fx: Optional[Any] = None, tbptt_pad_token: Optional[Any] = None, enable_graph: bool = False, sync_dist: bool = False, sync_dist_op: Optional[Any] = None, sync_dist_group: Optional[Any] = None, add_dataloader_idx: bool = True) -> None
inherited
¶
Log a dictionary of values at once.
Example::
values = {'loss': loss, 'acc': acc, ..., 'metric_n': metric_n}
self.log_dict(values)
Parameters:
Name | Type | Description | Default |
---|---|---|---|
dictionary |
Mapping[str, Union[torchmetrics.metric.Metric, torch.Tensor, numbers.Number, Mapping[str, Union[torchmetrics.metric.Metric, torch.Tensor, numbers.Number]]]] |
key value pairs.
The values can be a |
required |
prog_bar |
bool |
if True logs to the progress base |
False |
logger |
bool |
if True logs to the logger |
True |
on_step |
Optional[bool] |
if True logs at this step. None auto-logs for training_step but not validation/test_step |
None |
on_epoch |
Optional[bool] |
if True logs epoch accumulated metrics. None auto-logs for val/test step but not training_step |
None |
reduce_fx |
Union[str, Callable] |
reduction function over step values for end of epoch. :meth: |
'default' |
enable_graph |
bool |
if True, will not auto detach the graph |
False |
sync_dist |
bool |
if True, reduces the metric across GPUs/TPUs |
False |
sync_dist_group |
Optional[Any] |
the ddp group sync across |
None |
add_dataloader_idx |
bool |
if True, appends the index of the current dataloader to the name (when using multiple). If False, user needs to give unique names for each dataloader to not mix values |
True |
Source code in zamba/models/slowfast_models.py
def log_dict(
self,
dictionary: Mapping[str, _METRIC_COLLECTION],
prog_bar: bool = False,
logger: bool = True,
on_step: Optional[bool] = None,
on_epoch: Optional[bool] = None,
reduce_fx: Union[str, Callable] = "default", # TODO: change to 'mean' when `sync_dist_op` is removed in 1.6
tbptt_reduce_fx: Optional[Any] = None, # noqa: Remove in 1.6
tbptt_pad_token: Optional[Any] = None, # noqa: Remove in 1.6
enable_graph: bool = False,
sync_dist: bool = False,
sync_dist_op: Optional[Any] = None, # noqa: Remove in 1.6
sync_dist_group: Optional[Any] = None,
add_dataloader_idx: bool = True,
) -> None:
"""
Log a dictionary of values at once.
Example::
values = {'loss': loss, 'acc': acc, ..., 'metric_n': metric_n}
self.log_dict(values)
Args:
dictionary: key value pairs.
The values can be a ``float``, ``Tensor``, ``Metric``, or a dictionary of the former.
prog_bar: if True logs to the progress base
logger: if True logs to the logger
on_step: if True logs at this step. None auto-logs for training_step but not validation/test_step
on_epoch: if True logs epoch accumulated metrics. None auto-logs for val/test step but not training_step
reduce_fx: reduction function over step values for end of epoch. :meth:`torch.mean` by default.
enable_graph: if True, will not auto detach the graph
sync_dist: if True, reduces the metric across GPUs/TPUs
sync_dist_group: the ddp group sync across
add_dataloader_idx: if True, appends the index of the current dataloader to
the name (when using multiple). If False, user needs to give unique names for
each dataloader to not mix values
"""
for k, v in dictionary.items():
self.log(
name=k,
value=v,
prog_bar=prog_bar,
logger=logger,
on_step=on_step,
on_epoch=on_epoch,
reduce_fx=reduce_fx,
enable_graph=enable_graph,
sync_dist=sync_dist,
sync_dist_group=sync_dist_group,
sync_dist_op=sync_dist_op,
tbptt_pad_token=tbptt_pad_token,
tbptt_reduce_fx=tbptt_reduce_fx,
add_dataloader_idx=add_dataloader_idx,
)
log_grad_norm(self, grad_norm_dict: Dict[str, torch.Tensor]) -> None
inherited
¶
Override this method to change the default behaviour of log_grad_norm
.
Parameters:
Name | Type | Description | Default |
---|---|---|---|
grad_norm_dict |
Dict[str, torch.Tensor] |
Dictionary containing current grad norm metrics |
required |
Example::
# DEFAULT
def log_grad_norm(self, grad_norm_dict):
self.log_dict(grad_norm_dict, on_step=False, on_epoch=True, prog_bar=False, logger=True)
Source code in zamba/models/slowfast_models.py
def log_grad_norm(self, grad_norm_dict: Dict[str, torch.Tensor]) -> None:
"""Override this method to change the default behaviour of ``log_grad_norm``.
Args:
grad_norm_dict: Dictionary containing current grad norm metrics
Example::
# DEFAULT
def log_grad_norm(self, grad_norm_dict):
self.log_dict(grad_norm_dict, on_step=False, on_epoch=True, prog_bar=False, logger=True)
"""
self.log_dict(grad_norm_dict, on_step=True, on_epoch=True, prog_bar=True, logger=True)
lr_schedulers(self) -> Union[Any, List[Any]]
inherited
¶
Returns the learning rate scheduler(s) that are being used during training. Useful for manual optimization.
Returns:
Type | Description |
---|---|
A single scheduler, or a list of schedulers in case multiple ones are present, or ``None`` if no
schedulers were returned in |
meth: |
Source code in zamba/models/slowfast_models.py
def lr_schedulers(self) -> Optional[Union[Any, List[Any]]]:
"""
Returns the learning rate scheduler(s) that are being used during training. Useful for manual optimization.
Returns:
A single scheduler, or a list of schedulers in case multiple ones are present, or ``None`` if no
schedulers were returned in :meth:`configure_optimizers`.
"""
if not self.trainer.lr_schedulers:
return None
# ignore other keys "interval", "frequency", etc.
lr_schedulers = [s["scheduler"] for s in self.trainer.lr_schedulers]
# single scheduler
if len(lr_schedulers) == 1:
return lr_schedulers[0]
# multiple schedulers
return lr_schedulers
manual_backward(self, loss: Tensor, *args, **kwargs) -> None
inherited
¶
Call this directly from your :meth:training_step
when doing optimizations manually.
By using this, Lightning can ensure that all the proper scaling gets applied when using mixed precision.
See :ref:manual optimization<common/optimizers:Manual optimization>
for more examples.
Example::
def training_step(...):
opt = self.optimizers()
loss = ...
opt.zero_grad()
# automatically applies scaling, etc...
self.manual_backward(loss)
opt.step()
Parameters:
Name | Type | Description | Default |
---|---|---|---|
loss |
Tensor |
The tensor on which to compute gradients. Must have a graph attached. |
required |
*args |
Additional positional arguments to be forwarded to :meth: |
() |
|
**kwargs |
Additional keyword arguments to be forwarded to :meth: |
{} |
Source code in zamba/models/slowfast_models.py
def manual_backward(self, loss: Tensor, *args, **kwargs) -> None:
"""
Call this directly from your :meth:`training_step` when doing optimizations manually.
By using this, Lightning can ensure that all the proper scaling gets applied when using mixed precision.
See :ref:`manual optimization<common/optimizers:Manual optimization>` for more examples.
Example::
def training_step(...):
opt = self.optimizers()
loss = ...
opt.zero_grad()
# automatically applies scaling, etc...
self.manual_backward(loss)
opt.step()
Args:
loss: The tensor on which to compute gradients. Must have a graph attached.
*args: Additional positional arguments to be forwarded to :meth:`~torch.Tensor.backward`
**kwargs: Additional keyword arguments to be forwarded to :meth:`~torch.Tensor.backward`
"""
# make sure we're using manual opt
self._verify_is_manual_optimization("manual_backward")
# backward
self.trainer.fit_loop.epoch_loop.batch_loop.backward(loss, None, None, *args, **kwargs)
modules(self) -> Iterator[Module]
inherited
¶
Returns an iterator over all modules in the network.
!!! yields Module: a module in the network
!!! note
Duplicate modules are returned only once. In the following
example, l
will be returned only once.
Example::
>>> l = nn.Linear(2, 2)
>>> net = nn.Sequential(l, l)
>>> for idx, m in enumerate(net.modules()):
print(idx, '->', m)
0 -> Sequential(
(0): Linear(in_features=2, out_features=2, bias=True)
(1): Linear(in_features=2, out_features=2, bias=True)
)
1 -> Linear(in_features=2, out_features=2, bias=True)
Source code in zamba/models/slowfast_models.py
def modules(self) -> Iterator['Module']:
r"""Returns an iterator over all modules in the network.
Yields:
Module: a module in the network
Note:
Duplicate modules are returned only once. In the following
example, ``l`` will be returned only once.
Example::
>>> l = nn.Linear(2, 2)
>>> net = nn.Sequential(l, l)
>>> for idx, m in enumerate(net.modules()):
print(idx, '->', m)
0 -> Sequential(
(0): Linear(in_features=2, out_features=2, bias=True)
(1): Linear(in_features=2, out_features=2, bias=True)
)
1 -> Linear(in_features=2, out_features=2, bias=True)
"""
for _, module in self.named_modules():
yield module
named_buffers(self, prefix: str = '', recurse: bool = True) -> Iterator[Tuple[str, torch.Tensor]]
inherited
¶
Returns an iterator over module buffers, yielding both the name of the buffer as well as the buffer itself.
Parameters:
Name | Type | Description | Default |
---|---|---|---|
prefix |
str |
prefix to prepend to all buffer names. |
'' |
recurse |
bool |
if True, then yields buffers of this module and all submodules. Otherwise, yields only buffers that are direct members of this module. |
True |
!!! yields (string, torch.Tensor): Tuple containing the name and buffer
Example::
>>> for name, buf in self.named_buffers():
>>> if name in ['running_var']:
>>> print(buf.size())
Source code in zamba/models/slowfast_models.py
def named_buffers(self, prefix: str = '', recurse: bool = True) -> Iterator[Tuple[str, Tensor]]:
r"""Returns an iterator over module buffers, yielding both the
name of the buffer as well as the buffer itself.
Args:
prefix (str): prefix to prepend to all buffer names.
recurse (bool): if True, then yields buffers of this module
and all submodules. Otherwise, yields only buffers that
are direct members of this module.
Yields:
(string, torch.Tensor): Tuple containing the name and buffer
Example::
>>> for name, buf in self.named_buffers():
>>> if name in ['running_var']:
>>> print(buf.size())
"""
gen = self._named_members(
lambda module: module._buffers.items(),
prefix=prefix, recurse=recurse)
for elem in gen:
yield elem
named_children(self) -> Iterator[Tuple[str, Module]]
inherited
¶
Returns an iterator over immediate children modules, yielding both the name of the module as well as the module itself.
!!! yields (string, Module): Tuple containing a name and child module
Example::
>>> for name, module in model.named_children():
>>> if name in ['conv4', 'conv5']:
>>> print(module)
Source code in zamba/models/slowfast_models.py
def named_children(self) -> Iterator[Tuple[str, 'Module']]:
r"""Returns an iterator over immediate children modules, yielding both
the name of the module as well as the module itself.
Yields:
(string, Module): Tuple containing a name and child module
Example::
>>> for name, module in model.named_children():
>>> if name in ['conv4', 'conv5']:
>>> print(module)
"""
memo = set()
for name, module in self._modules.items():
if module is not None and module not in memo:
memo.add(module)
yield name, module
named_modules(self, memo: Optional[Set[Module]] = None, prefix: str = '', remove_duplicate: bool = True)
inherited
¶
Returns an iterator over all modules in the network, yielding both the name of the module as well as the module itself.
Parameters:
Name | Type | Description | Default |
---|---|---|---|
memo |
Optional[Set[Module]] |
a memo to store the set of modules already added to the result |
None |
prefix |
str |
a prefix that will be added to the name of the module |
'' |
remove_duplicate |
bool |
whether to remove the duplicated module instances in the result |
True |
!!! yields (string, Module): Tuple of name and module
!!! note
Duplicate modules are returned only once. In the following
example, l
will be returned only once.
Example::
>>> l = nn.Linear(2, 2)
>>> net = nn.Sequential(l, l)
>>> for idx, m in enumerate(net.named_modules()):
print(idx, '->', m)
0 -> ('', Sequential(
(0): Linear(in_features=2, out_features=2, bias=True)
(1): Linear(in_features=2, out_features=2, bias=True)
))
1 -> ('0', Linear(in_features=2, out_features=2, bias=True))
Source code in zamba/models/slowfast_models.py
def named_modules(self, memo: Optional[Set['Module']] = None, prefix: str = '', remove_duplicate: bool = True):
r"""Returns an iterator over all modules in the network, yielding
both the name of the module as well as the module itself.
Args:
memo: a memo to store the set of modules already added to the result
prefix: a prefix that will be added to the name of the module
remove_duplicate: whether to remove the duplicated module instances in the result
or not
Yields:
(string, Module): Tuple of name and module
Note:
Duplicate modules are returned only once. In the following
example, ``l`` will be returned only once.
Example::
>>> l = nn.Linear(2, 2)
>>> net = nn.Sequential(l, l)
>>> for idx, m in enumerate(net.named_modules()):
print(idx, '->', m)
0 -> ('', Sequential(
(0): Linear(in_features=2, out_features=2, bias=True)
(1): Linear(in_features=2, out_features=2, bias=True)
))
1 -> ('0', Linear(in_features=2, out_features=2, bias=True))
"""
if memo is None:
memo = set()
if self not in memo:
if remove_duplicate:
memo.add(self)
yield prefix, self
for name, module in self._modules.items():
if module is None:
continue
submodule_prefix = prefix + ('.' if prefix else '') + name
for m in module.named_modules(memo, submodule_prefix, remove_duplicate):
yield m
named_parameters(self, prefix: str = '', recurse: bool = True) -> Iterator[Tuple[str, torch.nn.parameter.Parameter]]
inherited
¶
Returns an iterator over module parameters, yielding both the name of the parameter as well as the parameter itself.
Parameters:
Name | Type | Description | Default |
---|---|---|---|
prefix |
str |
prefix to prepend to all parameter names. |
'' |
recurse |
bool |
if True, then yields parameters of this module and all submodules. Otherwise, yields only parameters that are direct members of this module. |
True |
!!! yields (string, Parameter): Tuple containing the name and parameter
Example::
>>> for name, param in self.named_parameters():
>>> if name in ['bias']:
>>> print(param.size())
Source code in zamba/models/slowfast_models.py
def named_parameters(self, prefix: str = '', recurse: bool = True) -> Iterator[Tuple[str, Parameter]]:
r"""Returns an iterator over module parameters, yielding both the
name of the parameter as well as the parameter itself.
Args:
prefix (str): prefix to prepend to all parameter names.
recurse (bool): if True, then yields parameters of this module
and all submodules. Otherwise, yields only parameters that
are direct members of this module.
Yields:
(string, Parameter): Tuple containing the name and parameter
Example::
>>> for name, param in self.named_parameters():
>>> if name in ['bias']:
>>> print(param.size())
"""
gen = self._named_members(
lambda module: module._parameters.items(),
prefix=prefix, recurse=recurse)
for elem in gen:
yield elem
on_after_backward(self) -> None
inherited
¶
Called after loss.backward()
and before optimizers are stepped.
!!! note
If using native AMP, the gradients will not be unscaled at this point.
Use the on_before_optimizer_step
if you need the unscaled gradients.
Source code in zamba/models/slowfast_models.py
def on_after_backward(self) -> None:
"""
Called after ``loss.backward()`` and before optimizers are stepped.
Note:
If using native AMP, the gradients will not be unscaled at this point.
Use the ``on_before_optimizer_step`` if you need the unscaled gradients.
"""
on_after_batch_transfer(self, batch: Any, dataloader_idx: int) -> Any
inherited
¶
Override to alter or apply batch augmentations to your batch after it is transferred to the device.
!!! note
To check the current state of execution of this hook you can use
self.trainer.training/testing/validating/predicting
so that you can
add different logic as per your requirement.
!!! note This hook only runs on single GPU training and DDP (no data-parallel). Data-Parallel support will come in near future.
Parameters:
Name | Type | Description | Default |
---|---|---|---|
batch |
Any |
A batch of data that needs to be altered or augmented. |
required |
dataloader_idx |
int |
The index of the dataloader to which the batch belongs. |
required |
Returns:
Type | Description |
---|---|
Any |
A batch of data |
Example::
def on_after_batch_transfer(self, batch, dataloader_idx):
batch['x'] = gpu_transforms(batch['x'])
return batch
See Also:
- :meth:on_before_batch_transfer
- :meth:transfer_batch_to_device
Source code in zamba/models/slowfast_models.py
def on_after_batch_transfer(self, batch: Any, dataloader_idx: int) -> Any:
"""
Override to alter or apply batch augmentations to your batch after it is transferred to the device.
Note:
To check the current state of execution of this hook you can use
``self.trainer.training/testing/validating/predicting`` so that you can
add different logic as per your requirement.
Note:
This hook only runs on single GPU training and DDP (no data-parallel).
Data-Parallel support will come in near future.
Args:
batch: A batch of data that needs to be altered or augmented.
dataloader_idx: The index of the dataloader to which the batch belongs.
Returns:
A batch of data
Example::
def on_after_batch_transfer(self, batch, dataloader_idx):
batch['x'] = gpu_transforms(batch['x'])
return batch
Raises:
MisconfigurationException:
If using data-parallel, ``Trainer(accelerator='dp')``.
See Also:
- :meth:`on_before_batch_transfer`
- :meth:`transfer_batch_to_device`
"""
return batch
on_before_backward(self, loss: Tensor) -> None
inherited
¶
Called before loss.backward()
.
Parameters:
Name | Type | Description | Default |
---|---|---|---|
loss |
Tensor |
Loss divided by number of batches for gradient accumulation and scaled if using native AMP. |
required |
Source code in zamba/models/slowfast_models.py
def on_before_backward(self, loss: torch.Tensor) -> None:
"""
Called before ``loss.backward()``.
Args:
loss: Loss divided by number of batches for gradient accumulation and scaled if using native AMP.
"""
pass
on_before_batch_transfer(self, batch: Any, dataloader_idx: int) -> Any
inherited
¶
Override to alter or apply batch augmentations to your batch before it is transferred to the device.
!!! note
To check the current state of execution of this hook you can use
self.trainer.training/testing/validating/predicting
so that you can
add different logic as per your requirement.
!!! note This hook only runs on single GPU training and DDP (no data-parallel). Data-Parallel support will come in near future.
Parameters:
Name | Type | Description | Default |
---|---|---|---|
batch |
Any |
A batch of data that needs to be altered or augmented. |
required |
dataloader_idx |
int |
The index of the dataloader to which the batch belongs. |
required |
Returns:
Type | Description |
---|---|
Any |
A batch of data |
Example::
def on_before_batch_transfer(self, batch, dataloader_idx):
batch['x'] = transforms(batch['x'])
return batch
See Also:
- :meth:on_after_batch_transfer
- :meth:transfer_batch_to_device
Source code in zamba/models/slowfast_models.py
def on_before_batch_transfer(self, batch: Any, dataloader_idx: int) -> Any:
"""
Override to alter or apply batch augmentations to your batch before it is transferred to the device.
Note:
To check the current state of execution of this hook you can use
``self.trainer.training/testing/validating/predicting`` so that you can
add different logic as per your requirement.
Note:
This hook only runs on single GPU training and DDP (no data-parallel).
Data-Parallel support will come in near future.
Args:
batch: A batch of data that needs to be altered or augmented.
dataloader_idx: The index of the dataloader to which the batch belongs.
Returns:
A batch of data
Example::
def on_before_batch_transfer(self, batch, dataloader_idx):
batch['x'] = transforms(batch['x'])
return batch
Raises:
MisconfigurationException:
If using data-parallel, ``Trainer(accelerator='dp')``.
See Also:
- :meth:`on_after_batch_transfer`
- :meth:`transfer_batch_to_device`
"""
return batch
on_before_optimizer_step(self, optimizer: Optimizer, optimizer_idx: int) -> None
inherited
¶
Called before optimizer.step()
.
The hook is only called if gradients do not need to be accumulated.
See: :paramref:~pytorch_lightning.trainer.Trainer.accumulate_grad_batches
.
If using native AMP, the loss will be unscaled before calling this hook.
See these docs <https://pytorch.org/docs/stable/notes/amp_examples.html#working-with-unscaled-gradients>
__
for more information on the scaling of gradients.
Parameters:
Name | Type | Description | Default |
---|---|---|---|
optimizer |
Optimizer |
Current optimizer being used. |
required |
optimizer_idx |
int |
Index of the current optimizer being used. |
required |
Example::
def on_before_optimizer_step(self, optimizer, optimizer_idx):
# example to inspect gradient information in tensorboard
if self.trainer.global_step % 25 == 0: # don't make the tf file huge
for k, v in self.named_parameters():
self.logger.experiment.add_histogram(
tag=k, values=v.grad, global_step=self.trainer.global_step
)
Source code in zamba/models/slowfast_models.py
def on_before_optimizer_step(self, optimizer: Optimizer, optimizer_idx: int) -> None:
"""
Called before ``optimizer.step()``.
The hook is only called if gradients do not need to be accumulated.
See: :paramref:`~pytorch_lightning.trainer.Trainer.accumulate_grad_batches`.
If using native AMP, the loss will be unscaled before calling this hook.
See these `docs <https://pytorch.org/docs/stable/notes/amp_examples.html#working-with-unscaled-gradients>`__
for more information on the scaling of gradients.
Args:
optimizer: Current optimizer being used.
optimizer_idx: Index of the current optimizer being used.
Example::
def on_before_optimizer_step(self, optimizer, optimizer_idx):
# example to inspect gradient information in tensorboard
if self.trainer.global_step % 25 == 0: # don't make the tf file huge
for k, v in self.named_parameters():
self.logger.experiment.add_histogram(
tag=k, values=v.grad, global_step=self.trainer.global_step
)
"""
on_before_zero_grad(self, optimizer: Optimizer) -> None
inherited
¶
Called after training_step()
and before optimizer.zero_grad()
.
Called in the training loop after taking an optimizer step and before zeroing grads. Good place to inspect weight information with weights updated.
This is where it is called::
for optimizer in optimizers:
out = training_step(...)
model.on_before_zero_grad(optimizer) # < ---- called here
optimizer.zero_grad()
backward()
Parameters:
Name | Type | Description | Default |
---|---|---|---|
optimizer |
Optimizer |
The optimizer for which grads should be zeroed. |
required |
Source code in zamba/models/slowfast_models.py
def on_before_zero_grad(self, optimizer: Optimizer) -> None:
"""
Called after ``training_step()`` and before ``optimizer.zero_grad()``.
Called in the training loop after taking an optimizer step and before zeroing grads.
Good place to inspect weight information with weights updated.
This is where it is called::
for optimizer in optimizers:
out = training_step(...)
model.on_before_zero_grad(optimizer) # < ---- called here
optimizer.zero_grad()
backward()
Args:
optimizer: The optimizer for which grads should be zeroed.
"""
on_epoch_end(self) -> None
inherited
¶
Called when either of train/val/test epoch ends.
Source code in zamba/models/slowfast_models.py
def on_epoch_end(self) -> None:
"""
Called when either of train/val/test epoch ends.
"""
on_epoch_start(self) -> None
inherited
¶
Called when either of train/val/test epoch begins.
Source code in zamba/models/slowfast_models.py
def on_epoch_start(self) -> None:
"""
Called when either of train/val/test epoch begins.
"""
on_fit_end(self) -> None
inherited
¶
Called at the very end of fit. If on DDP it is called on every process
Source code in zamba/models/slowfast_models.py
def on_fit_end(self) -> None:
"""
Called at the very end of fit.
If on DDP it is called on every process
"""
on_fit_start(self) -> None
inherited
¶
Called at the very beginning of fit. If on DDP it is called on every process
Source code in zamba/models/slowfast_models.py
def on_fit_start(self) -> None:
"""
Called at the very beginning of fit.
If on DDP it is called on every process
"""
on_hpc_load(self, checkpoint: Dict[str, Any]) -> None
inherited
¶
Hook to do whatever you need right before Slurm manager loads the model.
Parameters:
Name | Type | Description | Default |
---|---|---|---|
checkpoint |
Dict[str, Any] |
A dictionary with variables from the checkpoint. |
required |
Source code in zamba/models/slowfast_models.py
def on_hpc_load(self, checkpoint: Dict[str, Any]) -> None:
"""
Hook to do whatever you need right before Slurm manager loads the model.
Args:
checkpoint: A dictionary with variables from the checkpoint.
"""
on_hpc_save(self, checkpoint: Dict[str, Any]) -> None
inherited
¶
Hook to do whatever you need right before Slurm manager saves the model.
Parameters:
Name | Type | Description | Default |
---|---|---|---|
checkpoint |
Dict[str, Any] |
A dictionary in which you can save variables to save in a checkpoint. Contents need to be pickleable. |
required |
Source code in zamba/models/slowfast_models.py
def on_hpc_save(self, checkpoint: Dict[str, Any]) -> None:
"""
Hook to do whatever you need right before Slurm manager saves the model.
Args:
checkpoint: A dictionary in which you can save variables to save in a checkpoint.
Contents need to be pickleable.
"""
on_load_checkpoint(self, checkpoint: Dict[str, Any]) -> None
inherited
¶
Do something with the checkpoint.
Gives model a chance to load something before state_dict
is restored.
Parameters:
Name | Type | Description | Default |
---|---|---|---|
checkpoint |
Dict[str, Any] |
A dictionary with variables from the checkpoint. |
required |
Source code in zamba/models/slowfast_models.py
def on_load_checkpoint(self, checkpoint: Dict[str, Any]) -> None:
"""
Do something with the checkpoint.
Gives model a chance to load something before ``state_dict`` is restored.
Args:
checkpoint: A dictionary with variables from the checkpoint.
"""
on_post_move_to_device(self) -> None
inherited
¶
Called in the parameter_validation
decorator after :meth:~pytorch_lightning.core.LightningModule.to
is called. This is a good place to tie weights between modules after moving them to a device. Can be
used when training models with weight sharing properties on TPU.
Addresses the handling of shared weights on TPU: https://github.com/pytorch/xla/blob/master/TROUBLESHOOTING.md#xla-tensor-quirks
Example::
def on_post_move_to_device(self):
self.decoder.weight = self.encoder.weight
Source code in zamba/models/slowfast_models.py
def on_post_move_to_device(self) -> None:
"""
Called in the ``parameter_validation`` decorator after :meth:`~pytorch_lightning.core.LightningModule.to`
is called. This is a good place to tie weights between modules after moving them to a device. Can be
used when training models with weight sharing properties on TPU.
Addresses the handling of shared weights on TPU:
https://github.com/pytorch/xla/blob/master/TROUBLESHOOTING.md#xla-tensor-quirks
Example::
def on_post_move_to_device(self):
self.decoder.weight = self.encoder.weight
"""
on_predict_batch_end(self, outputs: Optional[Any], batch: Any, batch_idx: int, dataloader_idx: int) -> None
inherited
¶
Called in the predict loop after the batch.
Parameters:
Name | Type | Description | Default |
---|---|---|---|
outputs |
Optional[Any] |
The outputs of predict_step_end(test_step(x)) |
required |
batch |
Any |
The batched data as it is returned by the test DataLoader. |
required |
batch_idx |
int |
the index of the batch |
required |
dataloader_idx |
int |
the index of the dataloader |
required |
Source code in zamba/models/slowfast_models.py
def on_predict_batch_end(self, outputs: Optional[Any], batch: Any, batch_idx: int, dataloader_idx: int) -> None:
"""
Called in the predict loop after the batch.
Args:
outputs: The outputs of predict_step_end(test_step(x))
batch: The batched data as it is returned by the test DataLoader.
batch_idx: the index of the batch
dataloader_idx: the index of the dataloader
"""
on_predict_batch_start(self, batch: Any, batch_idx: int, dataloader_idx: int) -> None
inherited
¶
Called in the predict loop before anything happens for that batch.
Parameters:
Name | Type | Description | Default |
---|---|---|---|
batch |
Any |
The batched data as it is returned by the test DataLoader. |
required |
batch_idx |
int |
the index of the batch |
required |
dataloader_idx |
int |
the index of the dataloader |
required |
Source code in zamba/models/slowfast_models.py
def on_predict_batch_start(self, batch: Any, batch_idx: int, dataloader_idx: int) -> None:
"""
Called in the predict loop before anything happens for that batch.
Args:
batch: The batched data as it is returned by the test DataLoader.
batch_idx: the index of the batch
dataloader_idx: the index of the dataloader
"""
on_predict_dataloader(self) -> None
inherited
¶
Called before requesting the predict dataloader.
.. deprecated:: v1.5
:meth:on_predict_dataloader
is deprecated and will be removed in v1.7.0.
Please use :meth:predict_dataloader()
directly.
Source code in zamba/models/slowfast_models.py
def on_predict_dataloader(self) -> None:
"""Called before requesting the predict dataloader.
.. deprecated:: v1.5
:meth:`on_predict_dataloader` is deprecated and will be removed in v1.7.0.
Please use :meth:`predict_dataloader()` directly.
"""
on_predict_end(self) -> None
inherited
¶
Called at the end of predicting.
Source code in zamba/models/slowfast_models.py
def on_predict_end(self) -> None:
"""
Called at the end of predicting.
"""
on_predict_epoch_end(self, results: List[Any]) -> None
inherited
¶
Called at the end of predicting.
Source code in zamba/models/slowfast_models.py
def on_predict_epoch_end(self, results: List[Any]) -> None:
"""
Called at the end of predicting.
"""
on_predict_epoch_start(self) -> None
inherited
¶
Called at the beginning of predicting.
Source code in zamba/models/slowfast_models.py
def on_predict_epoch_start(self) -> None:
"""
Called at the beginning of predicting.
"""
on_predict_model_eval(self) -> None
inherited
¶
Sets the model to eval during the predict loop
Source code in zamba/models/slowfast_models.py
def on_predict_model_eval(self) -> None:
"""
Sets the model to eval during the predict loop
"""
self.trainer.model.eval()
on_predict_start(self) -> None
inherited
¶
Called at the beginning of predicting.
Source code in zamba/models/slowfast_models.py
def on_predict_start(self) -> None:
"""
Called at the beginning of predicting.
"""
on_pretrain_routine_end(self) -> None
inherited
¶
Called at the end of the pretrain routine (between fit and train start).
- fit
- pretrain_routine start
- pretrain_routine end
- training_start
Source code in zamba/models/slowfast_models.py
def on_pretrain_routine_end(self) -> None:
"""
Called at the end of the pretrain routine (between fit and train start).
- fit
- pretrain_routine start
- pretrain_routine end
- training_start
"""
on_pretrain_routine_start(self) -> None
inherited
¶
Called at the beginning of the pretrain routine (between fit and train start).
- fit
- pretrain_routine start
- pretrain_routine end
- training_start
Source code in zamba/models/slowfast_models.py
def on_pretrain_routine_start(self) -> None:
"""
Called at the beginning of the pretrain routine (between fit and train start).
- fit
- pretrain_routine start
- pretrain_routine end
- training_start
"""
on_save_checkpoint(self, checkpoint: Dict[str, Any]) -> None
inherited
¶
Give the model a chance to add something to the checkpoint.
state_dict
is already there.
Parameters:
Name | Type | Description | Default |
---|---|---|---|
checkpoint |
Dict[str, Any] |
A dictionary in which you can save variables to save in a checkpoint. Contents need to be pickleable. |
required |
Source code in zamba/models/slowfast_models.py
def on_save_checkpoint(self, checkpoint: Dict[str, Any]) -> None:
"""
Give the model a chance to add something to the checkpoint.
``state_dict`` is already there.
Args:
checkpoint: A dictionary in which you can save variables to save in a checkpoint.
Contents need to be pickleable.
"""
on_test_batch_end(self, outputs: Union[torch.Tensor, Dict[str, Any]], batch: Any, batch_idx: int, dataloader_idx: int) -> None
inherited
¶
Called in the test loop after the batch.
Parameters:
Name | Type | Description | Default |
---|---|---|---|
outputs |
Union[torch.Tensor, Dict[str, Any]] |
The outputs of test_step_end(test_step(x)) |
required |
batch |
Any |
The batched data as it is returned by the test DataLoader. |
required |
batch_idx |
int |
the index of the batch |
required |
dataloader_idx |
int |
the index of the dataloader |
required |
Source code in zamba/models/slowfast_models.py
def on_test_batch_end(
self, outputs: Optional[STEP_OUTPUT], batch: Any, batch_idx: int, dataloader_idx: int
) -> None:
"""
Called in the test loop after the batch.
Args:
outputs: The outputs of test_step_end(test_step(x))
batch: The batched data as it is returned by the test DataLoader.
batch_idx: the index of the batch
dataloader_idx: the index of the dataloader
"""
on_test_batch_start(self, batch: Any, batch_idx: int, dataloader_idx: int) -> None
inherited
¶
Called in the test loop before anything happens for that batch.
Parameters:
Name | Type | Description | Default |
---|---|---|---|
batch |
Any |
The batched data as it is returned by the test DataLoader. |
required |
batch_idx |
int |
the index of the batch |
required |
dataloader_idx |
int |
the index of the dataloader |
required |
Source code in zamba/models/slowfast_models.py
def on_test_batch_start(self, batch: Any, batch_idx: int, dataloader_idx: int) -> None:
"""
Called in the test loop before anything happens for that batch.
Args:
batch: The batched data as it is returned by the test DataLoader.
batch_idx: the index of the batch
dataloader_idx: the index of the dataloader
"""
on_test_dataloader(self) -> None
inherited
¶
Called before requesting the test dataloader.
.. deprecated:: v1.5
:meth:on_test_dataloader
is deprecated and will be removed in v1.7.0.
Please use :meth:test_dataloader()
directly.
Source code in zamba/models/slowfast_models.py
def on_test_dataloader(self) -> None:
"""Called before requesting the test dataloader.
.. deprecated:: v1.5
:meth:`on_test_dataloader` is deprecated and will be removed in v1.7.0.
Please use :meth:`test_dataloader()` directly.
"""
on_test_end(self) -> None
inherited
¶
Called at the end of testing.
Source code in zamba/models/slowfast_models.py
def on_test_end(self) -> None:
"""
Called at the end of testing.
"""
on_test_epoch_end(self) -> None
inherited
¶
Called in the test loop at the very end of the epoch.
Source code in zamba/models/slowfast_models.py
def on_test_epoch_end(self) -> None:
"""
Called in the test loop at the very end of the epoch.
"""
on_test_epoch_start(self) -> None
inherited
¶
Called in the test loop at the very beginning of the epoch.
Source code in zamba/models/slowfast_models.py
def on_test_epoch_start(self) -> None:
"""
Called in the test loop at the very beginning of the epoch.
"""
on_test_model_eval(self) -> None
inherited
¶
Sets the model to eval during the test loop
Source code in zamba/models/slowfast_models.py
def on_test_model_eval(self) -> None:
"""
Sets the model to eval during the test loop
"""
self.trainer.model.eval()
on_test_model_train(self) -> None
inherited
¶
Sets the model to train during the test loop
Source code in zamba/models/slowfast_models.py
def on_test_model_train(self) -> None:
"""
Sets the model to train during the test loop
"""
self.trainer.model.train()
on_test_start(self) -> None
inherited
¶
Called at the beginning of testing.
Source code in zamba/models/slowfast_models.py
def on_test_start(self) -> None:
"""
Called at the beginning of testing.
"""
on_train_batch_end(self, outputs: Union[torch.Tensor, Dict[str, Any]], batch: Any, batch_idx: int, dataloader_idx: int) -> None
inherited
¶
Called in the training loop after the batch.
Parameters:
Name | Type | Description | Default |
---|---|---|---|
outputs |
Union[torch.Tensor, Dict[str, Any]] |
The outputs of training_step_end(training_step(x)) |
required |
batch |
Any |
The batched data as it is returned by the training DataLoader. |
required |
batch_idx |
int |
the index of the batch |
required |
dataloader_idx |
int |
the index of the dataloader |
required |
Source code in zamba/models/slowfast_models.py
def on_train_batch_end(self, outputs: STEP_OUTPUT, batch: Any, batch_idx: int, dataloader_idx: int) -> None:
"""
Called in the training loop after the batch.
Args:
outputs: The outputs of training_step_end(training_step(x))
batch: The batched data as it is returned by the training DataLoader.
batch_idx: the index of the batch
dataloader_idx: the index of the dataloader
"""
on_train_batch_start(self, batch: Any, batch_idx: int, dataloader_idx: int) -> None
inherited
¶
Called in the training loop before anything happens for that batch.
If you return -1 here, you will skip training for the rest of the current epoch.
Parameters:
Name | Type | Description | Default |
---|---|---|---|
batch |
Any |
The batched data as it is returned by the training DataLoader. |
required |
batch_idx |
int |
the index of the batch |
required |
dataloader_idx |
int |
the index of the dataloader |
required |
Source code in zamba/models/slowfast_models.py
def on_train_batch_start(self, batch: Any, batch_idx: int, dataloader_idx: int) -> None:
"""
Called in the training loop before anything happens for that batch.
If you return -1 here, you will skip training for the rest of the current epoch.
Args:
batch: The batched data as it is returned by the training DataLoader.
batch_idx: the index of the batch
dataloader_idx: the index of the dataloader
"""
on_train_dataloader(self) -> None
inherited
¶
Called before requesting the train dataloader.
.. deprecated:: v1.5
:meth:on_train_dataloader
is deprecated and will be removed in v1.7.0.
Please use :meth:train_dataloader()
directly.
Source code in zamba/models/slowfast_models.py
def on_train_dataloader(self) -> None:
"""Called before requesting the train dataloader.
.. deprecated:: v1.5
:meth:`on_train_dataloader` is deprecated and will be removed in v1.7.0.
Please use :meth:`train_dataloader()` directly.
"""
on_train_end(self) -> None
inherited
¶
Called at the end of training before logger experiment is closed.
Source code in zamba/models/slowfast_models.py
def on_train_end(self) -> None:
"""
Called at the end of training before logger experiment is closed.
"""
on_train_epoch_end(self, unused: Optional = None) -> None
inherited
¶
Called in the training loop at the very end of the epoch.
To access all batch outputs at the end of the epoch, either:
- Implement
training_epoch_end
in the LightningModule OR - Cache data across steps on the attribute(s) of the
LightningModule
and access them in this hook
Source code in zamba/models/slowfast_models.py
def on_train_epoch_end(self, unused: Optional = None) -> None:
"""
Called in the training loop at the very end of the epoch.
To access all batch outputs at the end of the epoch, either:
1. Implement `training_epoch_end` in the LightningModule OR
2. Cache data across steps on the attribute(s) of the `LightningModule` and access them in this hook
"""
on_train_epoch_start(self) -> None
inherited
¶
Called in the training loop at the very beginning of the epoch.
Source code in zamba/models/slowfast_models.py
def on_train_epoch_start(self) -> None:
"""
Called in the training loop at the very beginning of the epoch.
"""
on_train_start(self)
inherited
¶
Called at the beginning of training after sanity check.
Source code in zamba/models/slowfast_models.py
def on_train_start(self):
metrics = {"val_macro_f1": {}}
if self.num_classes > 2:
metrics.update(
{f"val_top_{k}_accuracy": {} for k in DEFAULT_TOP_K if k < self.num_classes}
)
else:
metrics.update({"val_accuracy": {}})
# write hparams to hparams.yaml file, log metrics to tb hparams tab
self.logger.log_hyperparams(self.hparams, metrics)
on_val_dataloader(self) -> None
inherited
¶
Called before requesting the val dataloader.
.. deprecated:: v1.5
:meth:on_val_dataloader
is deprecated and will be removed in v1.7.0.
Please use :meth:val_dataloader()
directly.
Source code in zamba/models/slowfast_models.py
def on_val_dataloader(self) -> None:
"""Called before requesting the val dataloader.
.. deprecated:: v1.5
:meth:`on_val_dataloader` is deprecated and will be removed in v1.7.0.
Please use :meth:`val_dataloader()` directly.
"""
on_validation_batch_end(self, outputs: Union[torch.Tensor, Dict[str, Any]], batch: Any, batch_idx: int, dataloader_idx: int) -> None
inherited
¶
Called in the validation loop after the batch.
Parameters:
Name | Type | Description | Default |
---|---|---|---|
outputs |
Union[torch.Tensor, Dict[str, Any]] |
The outputs of validation_step_end(validation_step(x)) |
required |
batch |
Any |
The batched data as it is returned by the validation DataLoader. |
required |
batch_idx |
int |
the index of the batch |
required |
dataloader_idx |
int |
the index of the dataloader |
required |
Source code in zamba/models/slowfast_models.py
def on_validation_batch_end(
self, outputs: Optional[STEP_OUTPUT], batch: Any, batch_idx: int, dataloader_idx: int
) -> None:
"""
Called in the validation loop after the batch.
Args:
outputs: The outputs of validation_step_end(validation_step(x))
batch: The batched data as it is returned by the validation DataLoader.
batch_idx: the index of the batch
dataloader_idx: the index of the dataloader
"""
on_validation_batch_start(self, batch: Any, batch_idx: int, dataloader_idx: int) -> None
inherited
¶
Called in the validation loop before anything happens for that batch.
Parameters:
Name | Type | Description | Default |
---|---|---|---|
batch |
Any |
The batched data as it is returned by the validation DataLoader. |
required |
batch_idx |
int |
the index of the batch |
required |
dataloader_idx |
int |
the index of the dataloader |
required |
Source code in zamba/models/slowfast_models.py
def on_validation_batch_start(self, batch: Any, batch_idx: int, dataloader_idx: int) -> None:
"""
Called in the validation loop before anything happens for that batch.
Args:
batch: The batched data as it is returned by the validation DataLoader.
batch_idx: the index of the batch
dataloader_idx: the index of the dataloader
"""
on_validation_end(self) -> None
inherited
¶
Called at the end of validation.
Source code in zamba/models/slowfast_models.py
def on_validation_end(self) -> None:
"""
Called at the end of validation.
"""
on_validation_epoch_end(self) -> None
inherited
¶
Called in the validation loop at the very end of the epoch.
Source code in zamba/models/slowfast_models.py
def on_validation_epoch_end(self) -> None:
"""
Called in the validation loop at the very end of the epoch.
"""
on_validation_epoch_start(self) -> None
inherited
¶
Called in the validation loop at the very beginning of the epoch.
Source code in zamba/models/slowfast_models.py
def on_validation_epoch_start(self) -> None:
"""
Called in the validation loop at the very beginning of the epoch.
"""
on_validation_model_eval(self) -> None
inherited
¶
Sets the model to eval during the val loop
Source code in zamba/models/slowfast_models.py
def on_validation_model_eval(self) -> None:
"""
Sets the model to eval during the val loop
"""
self.trainer.model.eval()
on_validation_model_train(self) -> None
inherited
¶
Sets the model to train during the val loop
Source code in zamba/models/slowfast_models.py
def on_validation_model_train(self) -> None:
"""
Sets the model to train during the val loop
"""
self.trainer.model.train()
on_validation_start(self) -> None
inherited
¶
Called at the beginning of validation.
Source code in zamba/models/slowfast_models.py
def on_validation_start(self) -> None:
"""
Called at the beginning of validation.
"""
optimizer_step(self, epoch: int = None, batch_idx: int = None, optimizer: Optimizer = None, optimizer_idx: int = None, optimizer_closure: Optional[Callable] = None, on_tpu: bool = None, using_native_amp: bool = None, using_lbfgs: bool = None) -> None
inherited
¶
Override this method to adjust the default way the
:class:~pytorch_lightning.trainer.trainer.Trainer
calls each optimizer.
By default, Lightning calls step()
and zero_grad()
as shown in the example
once per optimizer. This method (and zero_grad()
) won't be called during the
accumulation phase when Trainer(accumulate_grad_batches != 1)
.
!!! warning
If you are overriding this method, make sure that you pass the optimizer_closure
parameter
to optimizer.step()
function as shown in the examples. This ensures that
training_step()
, optimizer.zero_grad()
, backward()
are called within the training loop.
Parameters:
Name | Type | Description | Default |
---|---|---|---|
epoch |
int |
Current epoch |
None |
batch_idx |
int |
Index of current batch |
None |
optimizer |
Optimizer |
A PyTorch optimizer |
None |
optimizer_idx |
int |
If you used multiple optimizers, this indexes into that list. |
None |
optimizer_closure |
Optional[Callable] |
Closure for all optimizers |
None |
on_tpu |
bool |
|
None |
using_native_amp |
bool |
|
None |
using_lbfgs |
bool |
True if the matching optimizer is :class: |
None |
Examples::
# DEFAULT
def optimizer_step(self, epoch, batch_idx, optimizer, optimizer_idx,
optimizer_closure, on_tpu, using_native_amp, using_lbfgs):
optimizer.step(closure=optimizer_closure)
# Alternating schedule for optimizer steps (i.e.: GANs)
def optimizer_step(self, epoch, batch_idx, optimizer, optimizer_idx,
optimizer_closure, on_tpu, using_native_amp, using_lbfgs):
# update generator opt every step
if optimizer_idx == 0:
optimizer.step(closure=optimizer_closure)
# update discriminator opt every 2 steps
if optimizer_idx == 1:
if (batch_idx + 1) % 2 == 0 :
optimizer.step(closure=optimizer_closure)
# ...
# add as many optimizers as you want
Here's another example showing how to use this for more advanced things such as learning rate warm-up:
.. code-block:: python
# learning rate warm-up
def optimizer_step(
self,
epoch,
batch_idx,
optimizer,
optimizer_idx,
optimizer_closure,
on_tpu,
using_native_amp,
using_lbfgs,
):
# warm up lr
if self.trainer.global_step < 500:
lr_scale = min(1.0, float(self.trainer.global_step + 1) / 500.0)
for pg in optimizer.param_groups:
pg["lr"] = lr_scale * self.learning_rate
# update params
optimizer.step(closure=optimizer_closure)
Source code in zamba/models/slowfast_models.py
def optimizer_step(
self,
epoch: int = None,
batch_idx: int = None,
optimizer: Optimizer = None,
optimizer_idx: int = None,
optimizer_closure: Optional[Callable] = None,
on_tpu: bool = None,
using_native_amp: bool = None,
using_lbfgs: bool = None,
) -> None:
r"""
Override this method to adjust the default way the
:class:`~pytorch_lightning.trainer.trainer.Trainer` calls each optimizer.
By default, Lightning calls ``step()`` and ``zero_grad()`` as shown in the example
once per optimizer. This method (and ``zero_grad()``) won't be called during the
accumulation phase when ``Trainer(accumulate_grad_batches != 1)``.
Warning:
If you are overriding this method, make sure that you pass the ``optimizer_closure`` parameter
to ``optimizer.step()`` function as shown in the examples. This ensures that
``training_step()``, ``optimizer.zero_grad()``, ``backward()`` are called within the training loop.
Args:
epoch: Current epoch
batch_idx: Index of current batch
optimizer: A PyTorch optimizer
optimizer_idx: If you used multiple optimizers, this indexes into that list.
optimizer_closure: Closure for all optimizers
on_tpu: ``True`` if TPU backward is required
using_native_amp: ``True`` if using native amp
using_lbfgs: True if the matching optimizer is :class:`torch.optim.LBFGS`
Examples::
# DEFAULT
def optimizer_step(self, epoch, batch_idx, optimizer, optimizer_idx,
optimizer_closure, on_tpu, using_native_amp, using_lbfgs):
optimizer.step(closure=optimizer_closure)
# Alternating schedule for optimizer steps (i.e.: GANs)
def optimizer_step(self, epoch, batch_idx, optimizer, optimizer_idx,
optimizer_closure, on_tpu, using_native_amp, using_lbfgs):
# update generator opt every step
if optimizer_idx == 0:
optimizer.step(closure=optimizer_closure)
# update discriminator opt every 2 steps
if optimizer_idx == 1:
if (batch_idx + 1) % 2 == 0 :
optimizer.step(closure=optimizer_closure)
# ...
# add as many optimizers as you want
Here's another example showing how to use this for more advanced things such as
learning rate warm-up:
.. code-block:: python
# learning rate warm-up
def optimizer_step(
self,
epoch,
batch_idx,
optimizer,
optimizer_idx,
optimizer_closure,
on_tpu,
using_native_amp,
using_lbfgs,
):
# warm up lr
if self.trainer.global_step < 500:
lr_scale = min(1.0, float(self.trainer.global_step + 1) / 500.0)
for pg in optimizer.param_groups:
pg["lr"] = lr_scale * self.learning_rate
# update params
optimizer.step(closure=optimizer_closure)
"""
optimizer.step(closure=optimizer_closure)
optimizer_zero_grad(self, epoch: int, batch_idx: int, optimizer: Optimizer, optimizer_idx: int)
inherited
¶
Override this method to change the default behaviour of optimizer.zero_grad()
.
Parameters:
Name | Type | Description | Default |
---|---|---|---|
epoch |
int |
Current epoch |
required |
batch_idx |
int |
Index of current batch |
required |
optimizer |
Optimizer |
A PyTorch optimizer |
required |
optimizer_idx |
int |
If you used multiple optimizers this indexes into that list. |
required |
Examples::
# DEFAULT
def optimizer_zero_grad(self, epoch, batch_idx, optimizer, optimizer_idx):
optimizer.zero_grad()
# Set gradients to `None` instead of zero to improve performance.
def optimizer_zero_grad(self, epoch, batch_idx, optimizer, optimizer_idx):
optimizer.zero_grad(set_to_none=True)
See :meth:torch.optim.Optimizer.zero_grad
for the explanation of the above example.
Source code in zamba/models/slowfast_models.py
def optimizer_zero_grad(self, epoch: int, batch_idx: int, optimizer: Optimizer, optimizer_idx: int):
"""Override this method to change the default behaviour of ``optimizer.zero_grad()``.
Args:
epoch: Current epoch
batch_idx: Index of current batch
optimizer: A PyTorch optimizer
optimizer_idx: If you used multiple optimizers this indexes into that list.
Examples::
# DEFAULT
def optimizer_zero_grad(self, epoch, batch_idx, optimizer, optimizer_idx):
optimizer.zero_grad()
# Set gradients to `None` instead of zero to improve performance.
def optimizer_zero_grad(self, epoch, batch_idx, optimizer, optimizer_idx):
optimizer.zero_grad(set_to_none=True)
See :meth:`torch.optim.Optimizer.zero_grad` for the explanation of the above example.
"""
optimizer.zero_grad()
optimizers(self, use_pl_optimizer: bool = True) -> Union[torch.optim.optimizer.Optimizer, pytorch_lightning.core.optimizer.LightningOptimizer, List[torch.optim.optimizer.Optimizer], List[pytorch_lightning.core.optimizer.LightningOptimizer]]
inherited
¶
Returns the optimizer(s) that are being used during training. Useful for manual optimization.
Parameters:
Name | Type | Description | Default |
---|---|---|---|
use_pl_optimizer |
bool |
If |
True |
Returns:
Type | Description |
---|---|
Union[torch.optim.optimizer.Optimizer, pytorch_lightning.core.optimizer.LightningOptimizer, List[torch.optim.optimizer.Optimizer], List[pytorch_lightning.core.optimizer.LightningOptimizer]] |
A single optimizer, or a list of optimizers in case multiple ones are present. |
Source code in zamba/models/slowfast_models.py
def optimizers(
self, use_pl_optimizer: bool = True
) -> Union[Optimizer, LightningOptimizer, List[Optimizer], List[LightningOptimizer]]:
"""
Returns the optimizer(s) that are being used during training. Useful for manual optimization.
Args:
use_pl_optimizer: If ``True``, will wrap the optimizer(s) in a
:class:`~pytorch_lightning.core.optimizer.LightningOptimizer` for automatic handling of precision and
profiling.
Returns:
A single optimizer, or a list of optimizers in case multiple ones are present.
"""
if use_pl_optimizer:
opts = list(self.trainer.lightning_optimizers.values())
else:
opts = self.trainer.optimizers
# single optimizer
if isinstance(opts, list) and len(opts) == 1 and isinstance(opts[0], (Optimizer, LightningOptimizer)):
return opts[0]
# multiple opts
return opts
parameters(self, recurse: bool = True) -> Iterator[torch.nn.parameter.Parameter]
inherited
¶
Returns an iterator over module parameters.
This is typically passed to an optimizer.
Parameters:
Name | Type | Description | Default |
---|---|---|---|
recurse |
bool |
if True, then yields parameters of this module and all submodules. Otherwise, yields only parameters that are direct members of this module. |
True |
!!! yields Parameter: module parameter
Example::
>>> for param in model.parameters():
>>> print(type(param), param.size())
<class 'torch.Tensor'> (20L,)
<class 'torch.Tensor'> (20L, 1L, 5L, 5L)
Source code in zamba/models/slowfast_models.py
def parameters(self, recurse: bool = True) -> Iterator[Parameter]:
r"""Returns an iterator over module parameters.
This is typically passed to an optimizer.
Args:
recurse (bool): if True, then yields parameters of this module
and all submodules. Otherwise, yields only parameters that
are direct members of this module.
Yields:
Parameter: module parameter
Example::
>>> for param in model.parameters():
>>> print(type(param), param.size())
<class 'torch.Tensor'> (20L,)
<class 'torch.Tensor'> (20L, 1L, 5L, 5L)
"""
for name, param in self.named_parameters(recurse=recurse):
yield param
predict_dataloader(self) -> Union[torch.utils.data.dataloader.DataLoader, Sequence[torch.utils.data.dataloader.DataLoader]]
inherited
¶
Implement one or multiple PyTorch DataLoaders for prediction.
It's recommended that all data downloads and preparation happen in :meth:prepare_data
.
- :meth:
~pytorch_lightning.trainer.Trainer.fit
- ...
- :meth:
prepare_data
- :meth:
train_dataloader
- :meth:
val_dataloader
- :meth:
test_dataloader
!!! note Lightning adds the correct sampler for distributed and arbitrary hardware There is no need to set it yourself.
Returns:
Type | Description |
---|---|
A |
class: |
!!! note
In the case where you return multiple prediction dataloaders, the :meth:predict
will have an argument dataloader_idx
which matches the order here.
Source code in zamba/models/slowfast_models.py
def predict_dataloader(self) -> EVAL_DATALOADERS:
r"""
Implement one or multiple PyTorch DataLoaders for prediction.
It's recommended that all data downloads and preparation happen in :meth:`prepare_data`.
- :meth:`~pytorch_lightning.trainer.Trainer.fit`
- ...
- :meth:`prepare_data`
- :meth:`train_dataloader`
- :meth:`val_dataloader`
- :meth:`test_dataloader`
Note:
Lightning adds the correct sampler for distributed and arbitrary hardware
There is no need to set it yourself.
Return:
A :class:`torch.utils.data.DataLoader` or a sequence of them specifying prediction samples.
Note:
In the case where you return multiple prediction dataloaders, the :meth:`predict`
will have an argument ``dataloader_idx`` which matches the order here.
"""
predict_step(self, batch, batch_idx, dataloader_idx: Optional[int] = None)
inherited
¶
Step function called during :meth:~pytorch_lightning.trainer.trainer.Trainer.predict
.
By default, it calls :meth:~pytorch_lightning.core.lightning.LightningModule.forward
.
Override to add any processing logic.
The :meth:~pytorch_lightning.core.lightning.LightningModule.predict_step
is used
to scale inference on multi-devices.
To prevent an OOM error, it is possible to use :class:~pytorch_lightning.callbacks.BasePredictionWriter
callback to write the predictions to disk or database after each batch or on epoch end.
The :class:~pytorch_lightning.callbacks.BasePredictionWriter
should be used while using a spawn
based accelerator. This happens for Trainer(accelerator="ddp_spawn")
or training on 8 TPU cores with Trainer(tpu_cores=8)
as predictions won't be returned.
Example ::
class MyModel(LightningModule):
def predicts_step(self, batch, batch_idx, dataloader_idx):
return self(batch)
dm = ...
model = MyModel()
trainer = Trainer(gpus=2)
predictions = trainer.predict(model, dm)
Parameters:
Name | Type | Description | Default |
---|---|---|---|
batch |
Current batch |
required | |
batch_idx |
Index of current batch |
required | |
dataloader_idx |
Optional[int] |
Index of the current dataloader |
None |
Returns:
Type | Description |
---|---|
Predicted output |
Source code in zamba/models/slowfast_models.py
def predict_step(self, batch, batch_idx, dataloader_idx: Optional[int] = None):
x, y = batch
y_hat = self(x)
pred = torch.sigmoid(y_hat).cpu().numpy()
return pred
prepare_data(self) -> None
inherited
¶
Use this to download and prepare data.
.. warning:: DO NOT set state to the model (use setup
instead)
since this is NOT called on every GPU in DDP/TPU
Example::
def prepare_data(self):
# good
download_data()
tokenize()
etc()
# bad
self.split = data_split
self.some_state = some_other_state()
In DDP prepare_data can be called in two ways (using Trainer(prepare_data_per_node)):
- Once per node. This is the default and is only called on LOCAL_RANK=0.
- Once in total. Only called on GLOBAL_RANK=0.
Example::
# DEFAULT
# called once per node on LOCAL_RANK=0 of that node
Trainer(prepare_data_per_node=True)
# call on GLOBAL_RANK=0 (great for shared file systems)
Trainer(prepare_data_per_node=False)
This is called before requesting the dataloaders:
.. code-block:: python
model.prepare_data()
initialize_distributed()
model.setup(stage)
model.train_dataloader()
model.val_dataloader()
model.test_dataloader()
Source code in zamba/models/slowfast_models.py
def prepare_data(self) -> None:
"""
Use this to download and prepare data.
.. warning:: DO NOT set state to the model (use `setup` instead)
since this is NOT called on every GPU in DDP/TPU
Example::
def prepare_data(self):
# good
download_data()
tokenize()
etc()
# bad
self.split = data_split
self.some_state = some_other_state()
In DDP prepare_data can be called in two ways (using Trainer(prepare_data_per_node)):
1. Once per node. This is the default and is only called on LOCAL_RANK=0.
2. Once in total. Only called on GLOBAL_RANK=0.
Example::
# DEFAULT
# called once per node on LOCAL_RANK=0 of that node
Trainer(prepare_data_per_node=True)
# call on GLOBAL_RANK=0 (great for shared file systems)
Trainer(prepare_data_per_node=False)
This is called before requesting the dataloaders:
.. code-block:: python
model.prepare_data()
initialize_distributed()
model.setup(stage)
model.train_dataloader()
model.val_dataloader()
model.test_dataloader()
"""
print(self, *args, **kwargs) -> None
inherited
¶
Prints only from process 0. Use this in any distributed mode to log only once.
Parameters:
Name | Type | Description | Default |
---|---|---|---|
*args |
The thing to print. The same as for Python's built-in print function. |
() |
|
**kwargs |
The same as for Python's built-in print function. |
{} |
Example::
def forward(self, x):
self.print(x, 'in forward')
Source code in zamba/models/slowfast_models.py
def print(self, *args, **kwargs) -> None:
r"""
Prints only from process 0. Use this in any distributed mode to log only once.
Args:
*args: The thing to print. The same as for Python's built-in print function.
**kwargs: The same as for Python's built-in print function.
Example::
def forward(self, x):
self.print(x, 'in forward')
"""
if self.trainer.is_global_zero:
progress_bar = self.trainer.progress_bar_callback
if progress_bar is not None and progress_bar.is_enabled:
progress_bar.print(*args, **kwargs)
else:
print(*args, **kwargs)
register_backward_hook(self, hook: Callable[[Module, Union[Tuple[torch.Tensor, ...], torch.Tensor], Union[Tuple[torch.Tensor, ...], torch.Tensor]], Union[NoneType, torch.Tensor]]) -> RemovableHandle
inherited
¶
Registers a backward hook on the module.
This function is deprecated in favor of :meth:~torch.nn.Module.register_full_backward_hook
and
the behavior of this function will change in future versions.
Returns:
Type | Description |
---|---|
class: |
Source code in zamba/models/slowfast_models.py
def register_backward_hook(
self, hook: Callable[['Module', _grad_t, _grad_t], Union[None, Tensor]]
) -> RemovableHandle:
r"""Registers a backward hook on the module.
This function is deprecated in favor of :meth:`~torch.nn.Module.register_full_backward_hook` and
the behavior of this function will change in future versions.
Returns:
:class:`torch.utils.hooks.RemovableHandle`:
a handle that can be used to remove the added hook by calling
``handle.remove()``
"""
if self._is_full_backward_hook is True:
raise RuntimeError("Cannot use both regular backward hooks and full backward hooks on a "
"single Module. Please use only one of them.")
self._is_full_backward_hook = False
handle = hooks.RemovableHandle(self._backward_hooks)
self._backward_hooks[handle.id] = hook
return handle
register_buffer(self, name: str, tensor: Optional[torch.Tensor], persistent: bool = True) -> None
inherited
¶
Adds a buffer to the module.
This is typically used to register a buffer that should not to be
considered a model parameter. For example, BatchNorm's running_mean
is not a parameter, but is part of the module's state. Buffers, by
default, are persistent and will be saved alongside parameters. This
behavior can be changed by setting :attr:persistent
to False
. The
only difference between a persistent buffer and a non-persistent buffer
is that the latter will not be a part of this module's
:attr:state_dict
.
Buffers can be accessed as attributes using given names.
Parameters:
Name | Type | Description | Default |
---|---|---|---|
name |
string |
name of the buffer. The buffer can be accessed from this module using the given name |
required |
tensor |
Tensor or None |
buffer to be registered. If |
required |
persistent |
bool |
whether the buffer is part of this module's
:attr: |
True |
Example::
>>> self.register_buffer('running_mean', torch.zeros(num_features))
Source code in zamba/models/slowfast_models.py
def register_buffer(self, name: str, tensor: Optional[Tensor], persistent: bool = True) -> None:
r"""Adds a buffer to the module.
This is typically used to register a buffer that should not to be
considered a model parameter. For example, BatchNorm's ``running_mean``
is not a parameter, but is part of the module's state. Buffers, by
default, are persistent and will be saved alongside parameters. This
behavior can be changed by setting :attr:`persistent` to ``False``. The
only difference between a persistent buffer and a non-persistent buffer
is that the latter will not be a part of this module's
:attr:`state_dict`.
Buffers can be accessed as attributes using given names.
Args:
name (string): name of the buffer. The buffer can be accessed
from this module using the given name
tensor (Tensor or None): buffer to be registered. If ``None``, then operations
that run on buffers, such as :attr:`cuda`, are ignored. If ``None``,
the buffer is **not** included in the module's :attr:`state_dict`.
persistent (bool): whether the buffer is part of this module's
:attr:`state_dict`.
Example::
>>> self.register_buffer('running_mean', torch.zeros(num_features))
"""
if persistent is False and isinstance(self, torch.jit.ScriptModule):
raise RuntimeError("ScriptModule does not support non-persistent buffers")
if '_buffers' not in self.__dict__:
raise AttributeError(
"cannot assign buffer before Module.__init__() call")
elif not isinstance(name, torch._six.string_classes):
raise TypeError("buffer name should be a string. "
"Got {}".format(torch.typename(name)))
elif '.' in name:
raise KeyError("buffer name can't contain \".\"")
elif name == '':
raise KeyError("buffer name can't be empty string \"\"")
elif hasattr(self, name) and name not in self._buffers:
raise KeyError("attribute '{}' already exists".format(name))
elif tensor is not None and not isinstance(tensor, torch.Tensor):
raise TypeError("cannot assign '{}' object to buffer '{}' "
"(torch Tensor or None required)"
.format(torch.typename(tensor), name))
else:
self._buffers[name] = tensor
if persistent:
self._non_persistent_buffers_set.discard(name)
else:
self._non_persistent_buffers_set.add(name)
register_forward_hook(self, hook: Callable[..., NoneType]) -> RemovableHandle
inherited
¶
Registers a forward hook on the module.
The hook will be called every time after :func:forward
has computed an output.
It should have the following signature::
hook(module, input, output) -> None or modified output
The input contains only the positional arguments given to the module.
Keyword arguments won't be passed to the hooks and only to the forward
.
The hook can modify the output. It can modify the input inplace but
it will not have effect on forward since this is called after
:func:forward
is called.
Returns:
Type | Description |
---|---|
class: |
Source code in zamba/models/slowfast_models.py
def register_forward_hook(self, hook: Callable[..., None]) -> RemovableHandle:
r"""Registers a forward hook on the module.
The hook will be called every time after :func:`forward` has computed an output.
It should have the following signature::
hook(module, input, output) -> None or modified output
The input contains only the positional arguments given to the module.
Keyword arguments won't be passed to the hooks and only to the ``forward``.
The hook can modify the output. It can modify the input inplace but
it will not have effect on forward since this is called after
:func:`forward` is called.
Returns:
:class:`torch.utils.hooks.RemovableHandle`:
a handle that can be used to remove the added hook by calling
``handle.remove()``
"""
handle = hooks.RemovableHandle(self._forward_hooks)
self._forward_hooks[handle.id] = hook
return handle
register_forward_pre_hook(self, hook: Callable[..., NoneType]) -> RemovableHandle
inherited
¶
Registers a forward pre-hook on the module.
The hook will be called every time before :func:forward
is invoked.
It should have the following signature::
hook(module, input) -> None or modified input
The input contains only the positional arguments given to the module.
Keyword arguments won't be passed to the hooks and only to the forward
.
The hook can modify the input. User can either return a tuple or a
single modified value in the hook. We will wrap the value into a tuple
if a single value is returned(unless that value is already a tuple).
Returns:
Type | Description |
---|---|
class: |
Source code in zamba/models/slowfast_models.py
def register_forward_pre_hook(self, hook: Callable[..., None]) -> RemovableHandle:
r"""Registers a forward pre-hook on the module.
The hook will be called every time before :func:`forward` is invoked.
It should have the following signature::
hook(module, input) -> None or modified input
The input contains only the positional arguments given to the module.
Keyword arguments won't be passed to the hooks and only to the ``forward``.
The hook can modify the input. User can either return a tuple or a
single modified value in the hook. We will wrap the value into a tuple
if a single value is returned(unless that value is already a tuple).
Returns:
:class:`torch.utils.hooks.RemovableHandle`:
a handle that can be used to remove the added hook by calling
``handle.remove()``
"""
handle = hooks.RemovableHandle(self._forward_pre_hooks)
self._forward_pre_hooks[handle.id] = hook
return handle
register_full_backward_hook(self, hook: Callable[[Module, Union[Tuple[torch.Tensor, ...], torch.Tensor], Union[Tuple[torch.Tensor, ...], torch.Tensor]], Union[NoneType, torch.Tensor]]) -> RemovableHandle
inherited
¶
Registers a backward hook on the module.
The hook will be called every time the gradients with respect to module inputs are computed. The hook should have the following signature::
hook(module, grad_input, grad_output) -> tuple(Tensor) or None
The :attr:grad_input
and :attr:grad_output
are tuples that contain the gradients
with respect to the inputs and outputs respectively. The hook should
not modify its arguments, but it can optionally return a new gradient with
respect to the input that will be used in place of :attr:grad_input
in
subsequent computations. :attr:grad_input
will only correspond to the inputs given
as positional arguments and all kwarg arguments are ignored. Entries
in :attr:grad_input
and :attr:grad_output
will be None
for all non-Tensor
arguments.
For technical reasons, when this hook is applied to a Module, its forward function will receive a view of each Tensor passed to the Module. Similarly the caller will receive a view of each Tensor returned by the Module's forward function.
.. warning :: Modifying inputs or outputs inplace is not allowed when using backward hooks and will raise an error.
Returns:
Type | Description |
---|---|
class: |
Source code in zamba/models/slowfast_models.py
def register_full_backward_hook(
self, hook: Callable[['Module', _grad_t, _grad_t], Union[None, Tensor]]
) -> RemovableHandle:
r"""Registers a backward hook on the module.
The hook will be called every time the gradients with respect to module
inputs are computed. The hook should have the following signature::
hook(module, grad_input, grad_output) -> tuple(Tensor) or None
The :attr:`grad_input` and :attr:`grad_output` are tuples that contain the gradients
with respect to the inputs and outputs respectively. The hook should
not modify its arguments, but it can optionally return a new gradient with
respect to the input that will be used in place of :attr:`grad_input` in
subsequent computations. :attr:`grad_input` will only correspond to the inputs given
as positional arguments and all kwarg arguments are ignored. Entries
in :attr:`grad_input` and :attr:`grad_output` will be ``None`` for all non-Tensor
arguments.
For technical reasons, when this hook is applied to a Module, its forward function will
receive a view of each Tensor passed to the Module. Similarly the caller will receive a view
of each Tensor returned by the Module's forward function.
.. warning ::
Modifying inputs or outputs inplace is not allowed when using backward hooks and
will raise an error.
Returns:
:class:`torch.utils.hooks.RemovableHandle`:
a handle that can be used to remove the added hook by calling
``handle.remove()``
"""
if self._is_full_backward_hook is False:
raise RuntimeError("Cannot use both regular backward hooks and full backward hooks on a "
"single Module. Please use only one of them.")
self._is_full_backward_hook = True
handle = hooks.RemovableHandle(self._backward_hooks)
self._backward_hooks[handle.id] = hook
return handle
register_parameter(self, name: str, param: Optional[torch.nn.parameter.Parameter]) -> None
inherited
¶
Adds a parameter to the module.
The parameter can be accessed as an attribute using given name.
Parameters:
Name | Type | Description | Default |
---|---|---|---|
name |
string |
name of the parameter. The parameter can be accessed from this module using the given name |
required |
param |
Parameter or None |
parameter to be added to the module. If
|
required |
Source code in zamba/models/slowfast_models.py
def register_parameter(self, name: str, param: Optional[Parameter]) -> None:
r"""Adds a parameter to the module.
The parameter can be accessed as an attribute using given name.
Args:
name (string): name of the parameter. The parameter can be accessed
from this module using the given name
param (Parameter or None): parameter to be added to the module. If
``None``, then operations that run on parameters, such as :attr:`cuda`,
are ignored. If ``None``, the parameter is **not** included in the
module's :attr:`state_dict`.
"""
if '_parameters' not in self.__dict__:
raise AttributeError(
"cannot assign parameter before Module.__init__() call")
elif not isinstance(name, torch._six.string_classes):
raise TypeError("parameter name should be a string. "
"Got {}".format(torch.typename(name)))
elif '.' in name:
raise KeyError("parameter name can't contain \".\"")
elif name == '':
raise KeyError("parameter name can't be empty string \"\"")
elif hasattr(self, name) and name not in self._parameters:
raise KeyError("attribute '{}' already exists".format(name))
if param is None:
self._parameters[name] = None
elif not isinstance(param, Parameter):
raise TypeError("cannot assign '{}' object to parameter '{}' "
"(torch.nn.Parameter or None required)"
.format(torch.typename(param), name))
elif param.grad_fn:
raise ValueError(
"Cannot assign non-leaf Tensor to parameter '{0}'. Model "
"parameters must be created explicitly. To express '{0}' "
"as a function of another Tensor, compute the value in "
"the forward() method.".format(name))
else:
self._parameters[name] = param
requires_grad_(self: ~T, requires_grad: bool = True) -> ~T
inherited
¶
Change if autograd should record operations on parameters in this module.
This method sets the parameters' :attr:requires_grad
attributes
in-place.
This method is helpful for freezing part of the module for finetuning or training parts of a model individually (e.g., GAN training).
See :ref:locally-disable-grad-doc
for a comparison between
.requires_grad_()
and several similar mechanisms that may be confused with it.
Parameters:
Name | Type | Description | Default |
---|---|---|---|
requires_grad |
bool |
whether autograd should record operations on
parameters in this module. Default: |
True |
Returns:
Type | Description |
---|---|
Module |
self |
Source code in zamba/models/slowfast_models.py
def requires_grad_(self: T, requires_grad: bool = True) -> T:
r"""Change if autograd should record operations on parameters in this
module.
This method sets the parameters' :attr:`requires_grad` attributes
in-place.
This method is helpful for freezing part of the module for finetuning
or training parts of a model individually (e.g., GAN training).
See :ref:`locally-disable-grad-doc` for a comparison between
`.requires_grad_()` and several similar mechanisms that may be confused with it.
Args:
requires_grad (bool): whether autograd should record operations on
parameters in this module. Default: ``True``.
Returns:
Module: self
"""
for p in self.parameters():
p.requires_grad_(requires_grad)
return self
save_hyperparameters(self, *args, *, ignore: Union[Sequence[str], str] = None, frame: Optional[frame] = None, logger: bool = True) -> None
inherited
¶
Save arguments to hparams
attribute.
Parameters:
Name | Type | Description | Default |
---|---|---|---|
args |
single object of |
() |
|
ignore |
Union[Sequence[str], str] |
an argument name or a list of argument names from
class |
None |
frame |
Optional[frame] |
a frame object. Default is None |
None |
logger |
bool |
Whether to send the hyperparameters to the logger. Default: True |
True |
Example:: >>> class ManuallyArgsModel(HyperparametersMixin): ... def init(self, arg1, arg2, arg3): ... super().init() ... # manually assign arguments ... self.save_hyperparameters('arg1', 'arg3') ... def forward(self, args, *kwargs): ... ... >>> model = ManuallyArgsModel(1, 'abc', 3.14) >>> model.hparams "arg1": 1 "arg3": 3.14
>>> class AutomaticArgsModel(HyperparametersMixin):
... def __init__(self, arg1, arg2, arg3):
... super().__init__()
... # equivalent automatic
... self.save_hyperparameters()
... def forward(self, *args, **kwargs):
... ...
>>> model = AutomaticArgsModel(1, 'abc', 3.14)
>>> model.hparams
"arg1": 1
"arg2": abc
"arg3": 3.14
>>> class SingleArgModel(HyperparametersMixin):
... def __init__(self, params):
... super().__init__()
... # manually assign single argument
... self.save_hyperparameters(params)
... def forward(self, *args, **kwargs):
... ...
>>> model = SingleArgModel(Namespace(p1=1, p2='abc', p3=3.14))
>>> model.hparams
"p1": 1
"p2": abc
"p3": 3.14
>>> class ManuallyArgsModel(HyperparametersMixin):
... def __init__(self, arg1, arg2, arg3):
... super().__init__()
... # pass argument(s) to ignore as a string or in a list
... self.save_hyperparameters(ignore='arg2')
... def forward(self, *args, **kwargs):
... ...
>>> model = ManuallyArgsModel(1, 'abc', 3.14)
>>> model.hparams
"arg1": 1
"arg3": 3.14
Source code in zamba/models/slowfast_models.py
def save_hyperparameters(
self,
*args,
ignore: Optional[Union[Sequence[str], str]] = None,
frame: Optional[types.FrameType] = None,
logger: bool = True,
) -> None:
"""Save arguments to ``hparams`` attribute.
Args:
args: single object of `dict`, `NameSpace` or `OmegaConf`
or string names or arguments from class ``__init__``
ignore: an argument name or a list of argument names from
class ``__init__`` to be ignored
frame: a frame object. Default is None
logger: Whether to send the hyperparameters to the logger. Default: True
Example::
>>> class ManuallyArgsModel(HyperparametersMixin):
... def __init__(self, arg1, arg2, arg3):
... super().__init__()
... # manually assign arguments
... self.save_hyperparameters('arg1', 'arg3')
... def forward(self, *args, **kwargs):
... ...
>>> model = ManuallyArgsModel(1, 'abc', 3.14)
>>> model.hparams
"arg1": 1
"arg3": 3.14
>>> class AutomaticArgsModel(HyperparametersMixin):
... def __init__(self, arg1, arg2, arg3):
... super().__init__()
... # equivalent automatic
... self.save_hyperparameters()
... def forward(self, *args, **kwargs):
... ...
>>> model = AutomaticArgsModel(1, 'abc', 3.14)
>>> model.hparams
"arg1": 1
"arg2": abc
"arg3": 3.14
>>> class SingleArgModel(HyperparametersMixin):
... def __init__(self, params):
... super().__init__()
... # manually assign single argument
... self.save_hyperparameters(params)
... def forward(self, *args, **kwargs):
... ...
>>> model = SingleArgModel(Namespace(p1=1, p2='abc', p3=3.14))
>>> model.hparams
"p1": 1
"p2": abc
"p3": 3.14
>>> class ManuallyArgsModel(HyperparametersMixin):
... def __init__(self, arg1, arg2, arg3):
... super().__init__()
... # pass argument(s) to ignore as a string or in a list
... self.save_hyperparameters(ignore='arg2')
... def forward(self, *args, **kwargs):
... ...
>>> model = ManuallyArgsModel(1, 'abc', 3.14)
>>> model.hparams
"arg1": 1
"arg3": 3.14
"""
self._log_hyperparams = logger
# the frame needs to be created in this file.
if not frame:
frame = inspect.currentframe().f_back
save_hyperparameters(self, *args, ignore=ignore, frame=frame)
set_extra_state(self, state: Any)
inherited
¶
This function is called from :func:load_state_dict
to handle any extra state
found within the state_dict
. Implement this function and a corresponding
:func:get_extra_state
for your module if you need to store extra state within its
state_dict
.
Parameters:
Name | Type | Description | Default |
---|---|---|---|
state |
dict |
Extra state from the |
required |
Source code in zamba/models/slowfast_models.py
def set_extra_state(self, state: Any):
"""
This function is called from :func:`load_state_dict` to handle any extra state
found within the `state_dict`. Implement this function and a corresponding
:func:`get_extra_state` for your module if you need to store extra state within its
`state_dict`.
Args:
state (dict): Extra state from the `state_dict`
"""
raise RuntimeError(
"Reached a code path in Module.set_extra_state() that should never be called. "
"Please file an issue at https://github.com/pytorch/pytorch/issues/new?template=bug-report.md "
"to report this bug.")
setup(self, stage: Optional[str] = None) -> None
inherited
¶
Called at the beginning of fit (train + validate), validate, test, and predict. This is a good hook when you need to build models dynamically or adjust something about them. This hook is called on every process when using DDP.
Parameters:
Name | Type | Description | Default |
---|---|---|---|
stage |
Optional[str] |
either |
None |
Example::
class LitModel(...):
def __init__(self):
self.l1 = None
def prepare_data(self):
download_data()
tokenize()
# don't do this
self.something = else
def setup(stage):
data = Load_data(...)
self.l1 = nn.Linear(28, data.num_classes)
Source code in zamba/models/slowfast_models.py
def setup(self, stage: Optional[str] = None) -> None:
"""
Called at the beginning of fit (train + validate), validate, test, and predict.
This is a good hook when you need to build models dynamically or adjust something about them.
This hook is called on every process when using DDP.
Args:
stage: either ``'fit'``, ``'validate'``, ``'test'``, or ``'predict'``
Example::
class LitModel(...):
def __init__(self):
self.l1 = None
def prepare_data(self):
download_data()
tokenize()
# don't do this
self.something = else
def setup(stage):
data = Load_data(...)
self.l1 = nn.Linear(28, data.num_classes)
"""
share_memory(self: ~T) -> ~T
inherited
¶
See :meth:torch.Tensor.share_memory_
Source code in zamba/models/slowfast_models.py
def share_memory(self: T) -> T:
r"""See :meth:`torch.Tensor.share_memory_`"""
return self._apply(lambda t: t.share_memory_())
state_dict(self, destination = None, prefix = '', keep_vars = False)
inherited
¶
Returns a dictionary containing a whole state of the module.
Both parameters and persistent buffers (e.g. running averages) are
included. Keys are corresponding parameter and buffer names.
Parameters and buffers set to None
are not included.
Returns:
Type | Description |
---|---|
dict |
a dictionary containing a whole state of the module |
Example::
>>> module.state_dict().keys()
['bias', 'weight']
Source code in zamba/models/slowfast_models.py
def state_dict(self, destination=None, prefix='', keep_vars=False):
r"""Returns a dictionary containing a whole state of the module.
Both parameters and persistent buffers (e.g. running averages) are
included. Keys are corresponding parameter and buffer names.
Parameters and buffers set to ``None`` are not included.
Returns:
dict:
a dictionary containing a whole state of the module
Example::
>>> module.state_dict().keys()
['bias', 'weight']
"""
if destination is None:
destination = OrderedDict()
destination._metadata = OrderedDict()
destination._metadata[prefix[:-1]] = local_metadata = dict(version=self._version)
self._save_to_state_dict(destination, prefix, keep_vars)
for name, module in self._modules.items():
if module is not None:
module.state_dict(destination, prefix + name + '.', keep_vars=keep_vars)
for hook in self._state_dict_hooks.values():
hook_result = hook(self, destination, prefix, local_metadata)
if hook_result is not None:
destination = hook_result
return destination
summarize(self, mode: Optional[str] = 'top', max_depth: Optional[int] = None) -> Optional[pytorch_lightning.core.memory.ModelSummary]
inherited
¶
Summarize this LightningModule.
Parameters:
Name | Type | Description | Default |
---|---|---|---|
mode |
Optional[str] |
Can be either .. deprecated:: v1.4
This parameter was deprecated in v1.4 in favor of |
'top' |
max_depth |
Optional[int] |
The maximum depth of layer nesting that the summary will include. A value of 0 turns the layer summary off. Default: 1. |
None |
Returns:
Type | Description |
---|---|
Optional[pytorch_lightning.core.memory.ModelSummary] |
The model summary object |
Source code in zamba/models/slowfast_models.py
def summarize(self, mode: Optional[str] = "top", max_depth: Optional[int] = None) -> Optional[ModelSummary]:
"""
Summarize this LightningModule.
Args:
mode: Can be either ``'top'`` (summarize only direct submodules) or ``'full'`` (summarize all layers).
.. deprecated:: v1.4
This parameter was deprecated in v1.4 in favor of `max_depth` and will be removed in v1.6.
max_depth: The maximum depth of layer nesting that the summary will include. A value of 0 turns the
layer summary off. Default: 1.
Return:
The model summary object
"""
model_summary = None
# temporary mapping from mode to max_depth
if max_depth is None:
if mode in ModelSummary.MODES:
max_depth = ModelSummary.MODES[mode]
rank_zero_deprecation(
f"Argument `mode` in `LightningModule.summarize` is deprecated in v1.4"
f" and will be removed in v1.6. Use `max_depth={max_depth}` to replicate `mode={mode}` behavior."
)
model_summary = ModelSummary(self, max_depth=max_depth)
elif mode is not None:
raise MisconfigurationException(f"`mode` can be None, {', '.join(ModelSummary.MODES)}, got {mode}")
else:
model_summary = ModelSummary(self, max_depth=max_depth)
log.info("\n" + str(model_summary))
return model_summary
tbptt_split_batch(self, batch: Tensor, split_size: int) -> list
inherited
¶
When using truncated backpropagation through time, each batch must be split along the time dimension. Lightning handles this by default, but for custom behavior override this function.
Parameters:
Name | Type | Description | Default |
---|---|---|---|
batch |
Tensor |
Current batch |
required |
split_size |
int |
The size of the split |
required |
Returns:
Type | Description |
---|---|
List of batch splits. Each split will be passed to |
meth: |
Examples::
def tbptt_split_batch(self, batch, split_size):
splits = []
for t in range(0, time_dims[0], split_size):
batch_split = []
for i, x in enumerate(batch):
if isinstance(x, torch.Tensor):
split_x = x[:, t:t + split_size]
elif isinstance(x, collections.Sequence):
split_x = [None] * len(x)
for batch_idx in range(len(x)):
split_x[batch_idx] = x[batch_idx][t:t + split_size]
batch_split.append(split_x)
splits.append(batch_split)
return splits
!!! note
Called in the training loop after
:meth:~pytorch_lightning.callbacks.base.Callback.on_batch_start
if :paramref:~pytorch_lightning.core.lightning.LightningModule.truncated_bptt_steps
> 0.
Each returned batch split is passed separately to :meth:training_step
.
Source code in zamba/models/slowfast_models.py
def tbptt_split_batch(self, batch: Tensor, split_size: int) -> list:
r"""
When using truncated backpropagation through time, each batch must be split along the
time dimension. Lightning handles this by default, but for custom behavior override
this function.
Args:
batch: Current batch
split_size: The size of the split
Return:
List of batch splits. Each split will be passed to :meth:`training_step` to enable truncated
back propagation through time. The default implementation splits root level Tensors and
Sequences at dim=1 (i.e. time dim). It assumes that each time dim is the same length.
Examples::
def tbptt_split_batch(self, batch, split_size):
splits = []
for t in range(0, time_dims[0], split_size):
batch_split = []
for i, x in enumerate(batch):
if isinstance(x, torch.Tensor):
split_x = x[:, t:t + split_size]
elif isinstance(x, collections.Sequence):
split_x = [None] * len(x)
for batch_idx in range(len(x)):
split_x[batch_idx] = x[batch_idx][t:t + split_size]
batch_split.append(split_x)
splits.append(batch_split)
return splits
Note:
Called in the training loop after
:meth:`~pytorch_lightning.callbacks.base.Callback.on_batch_start`
if :paramref:`~pytorch_lightning.core.lightning.LightningModule.truncated_bptt_steps` > 0.
Each returned batch split is passed separately to :meth:`training_step`.
"""
time_dims = [len(x[0]) for x in batch if isinstance(x, (torch.Tensor, collections.Sequence))]
assert len(time_dims) >= 1, "Unable to determine batch time dimension"
assert all(x == time_dims[0] for x in time_dims), "Batch time dimension length is ambiguous"
splits = []
for t in range(0, time_dims[0], split_size):
batch_split = []
for i, x in enumerate(batch):
if isinstance(x, torch.Tensor):
split_x = x[:, t : t + split_size]
elif isinstance(x, collections.Sequence):
split_x = [None] * len(x)
for batch_idx in range(len(x)):
split_x[batch_idx] = x[batch_idx][t : t + split_size]
batch_split.append(split_x)
splits.append(batch_split)
return splits
teardown(self, stage: Optional[str] = None) -> None
inherited
¶
Called at the end of fit (train + validate), validate, test, predict, or tune.
Parameters:
Name | Type | Description | Default |
---|---|---|---|
stage |
Optional[str] |
either |
None |
Source code in zamba/models/slowfast_models.py
def teardown(self, stage: Optional[str] = None) -> None:
"""
Called at the end of fit (train + validate), validate, test, predict, or tune.
Args:
stage: either ``'fit'``, ``'validate'``, ``'test'``, or ``'predict'``
"""
test_dataloader(self) -> Union[torch.utils.data.dataloader.DataLoader, Sequence[torch.utils.data.dataloader.DataLoader]]
inherited
¶
Implement one or multiple PyTorch DataLoaders for testing.
The dataloader you return will not be reloaded unless you set
:paramref:~pytorch_lightning.trainer.Trainer.reload_dataloaders_every_n_epochs
to
a postive integer.
For data processing use the following pattern:
- download in :meth:`prepare_data`
- process and split in :meth:`setup`
However, the above are only necessary for distributed processing.
.. warning:: do not assign state in prepare_data
- :meth:
~pytorch_lightning.trainer.Trainer.fit
- ...
- :meth:
prepare_data
- :meth:
setup
- :meth:
train_dataloader
- :meth:
val_dataloader
- :meth:
test_dataloader
!!! note Lightning adds the correct sampler for distributed and arbitrary hardware. There is no need to set it yourself.
Returns:
Type | Description |
---|---|
A |
class: |
Example::
def test_dataloader(self):
transform = transforms.Compose([transforms.ToTensor(),
transforms.Normalize((0.5,), (1.0,))])
dataset = MNIST(root='/path/to/mnist/', train=False, transform=transform,
download=True)
loader = torch.utils.data.DataLoader(
dataset=dataset,
batch_size=self.batch_size,
shuffle=False
)
return loader
# can also return multiple dataloaders
def test_dataloader(self):
return [loader_a, loader_b, ..., loader_n]
!!! note
If you don't need a test dataset and a :meth:test_step
, you don't need to implement
this method.
!!! note
In the case where you return multiple test dataloaders, the :meth:test_step
will have an argument dataloader_idx
which matches the order here.
Source code in zamba/models/slowfast_models.py
def test_dataloader(self) -> EVAL_DATALOADERS:
r"""
Implement one or multiple PyTorch DataLoaders for testing.
The dataloader you return will not be reloaded unless you set
:paramref:`~pytorch_lightning.trainer.Trainer.reload_dataloaders_every_n_epochs` to
a postive integer.
For data processing use the following pattern:
- download in :meth:`prepare_data`
- process and split in :meth:`setup`
However, the above are only necessary for distributed processing.
.. warning:: do not assign state in prepare_data
- :meth:`~pytorch_lightning.trainer.Trainer.fit`
- ...
- :meth:`prepare_data`
- :meth:`setup`
- :meth:`train_dataloader`
- :meth:`val_dataloader`
- :meth:`test_dataloader`
Note:
Lightning adds the correct sampler for distributed and arbitrary hardware.
There is no need to set it yourself.
Return:
A :class:`torch.utils.data.DataLoader` or a sequence of them specifying testing samples.
Example::
def test_dataloader(self):
transform = transforms.Compose([transforms.ToTensor(),
transforms.Normalize((0.5,), (1.0,))])
dataset = MNIST(root='/path/to/mnist/', train=False, transform=transform,
download=True)
loader = torch.utils.data.DataLoader(
dataset=dataset,
batch_size=self.batch_size,
shuffle=False
)
return loader
# can also return multiple dataloaders
def test_dataloader(self):
return [loader_a, loader_b, ..., loader_n]
Note:
If you don't need a test dataset and a :meth:`test_step`, you don't need to implement
this method.
Note:
In the case where you return multiple test dataloaders, the :meth:`test_step`
will have an argument ``dataloader_idx`` which matches the order here.
"""
test_epoch_end(self, outputs: List[Dict[str, numpy.ndarray]])
inherited
¶
Called at the end of a test epoch with the output of all test steps.
.. code-block:: python
# the pseudocode for these calls
test_outs = []
for test_batch in test_data:
out = test_step(test_batch)
test_outs.append(out)
test_epoch_end(test_outs)
Parameters:
Name | Type | Description | Default |
---|---|---|---|
outputs |
List[Dict[str, numpy.ndarray]] |
List of outputs you defined in :meth: |
required |
Returns:
Type | Description |
---|---|
None |
!!! note
If you didn't define a :meth:test_step
, this won't be called.
Examples:
With a single dataloader:
.. code-block:: python
def test_epoch_end(self, outputs):
# do something with the outputs of all test batches
all_test_preds = test_step_outputs.predictions
some_result = calc_all_results(all_test_preds)
self.log(some_result)
With multiple dataloaders, outputs
will be a list of lists. The outer list contains
one entry per dataloader, while the inner list contains the individual outputs of
each test step for that dataloader.
.. code-block:: python
def test_epoch_end(self, outputs):
final_value = 0
for dataloader_outputs in outputs:
for test_step_out in dataloader_outputs:
# do something
final_value += test_step_out
self.log("final_metric", final_value)
Source code in zamba/models/slowfast_models.py
def test_epoch_end(self, outputs: List[Dict[str, np.ndarray]]):
y_true, y_pred, y_proba = self.aggregate_step_outputs(outputs)
self.compute_and_log_metrics(y_true, y_pred, y_proba, subset="test")
test_step(self, batch, batch_idx)
inherited
¶
Operates on a single batch of data from the test set. In this step you'd normally generate examples or calculate anything of interest such as accuracy.
.. code-block:: python
# the pseudocode for these calls
test_outs = []
for test_batch in test_data:
out = test_step(test_batch)
test_outs.append(out)
test_epoch_end(test_outs)
Parameters:
Name | Type | Description | Default |
---|---|---|---|
batch |
class: |
required | |
batch_idx |
int |
The index of this batch. |
required |
dataloader_idx |
int |
The index of the dataloader that produced this batch (only if multiple test dataloaders used). |
required |
Returns:
Type | Description |
---|---|
Any of.
|
.. code-block:: python
# if you have one test dataloader:
def test_step(self, batch, batch_idx):
...
# if you have multiple test dataloaders:
def test_step(self, batch, batch_idx, dataloader_idx):
...
Examples::
# CASE 1: A single test dataset
def test_step(self, batch, batch_idx):
x, y = batch
# implement your own
out = self(x)
loss = self.loss(out, y)
# log 6 example images
# or generated text... or whatever
sample_imgs = x[:6]
grid = torchvision.utils.make_grid(sample_imgs)
self.logger.experiment.add_image('example_images', grid, 0)
# calculate acc
labels_hat = torch.argmax(out, dim=1)
test_acc = torch.sum(y == labels_hat).item() / (len(y) * 1.0)
# log the outputs!
self.log_dict({'test_loss': loss, 'test_acc': test_acc})
If you pass in multiple test dataloaders, :meth:test_step
will have an additional argument.
.. code-block:: python
# CASE 2: multiple test dataloaders
def test_step(self, batch, batch_idx, dataloader_idx):
# dataloader_idx tells you which dataset this is.
...
!!! note If you don't need to test you don't need to implement this method.
!!! note
When the :meth:test_step
is called, the model has been put in eval mode and
PyTorch gradients have been disabled. At the end of the test epoch, the model goes back
to training mode and gradients are enabled.
Source code in zamba/models/slowfast_models.py
def test_step(self, batch, batch_idx):
return self.validation_step(batch, batch_idx)
test_step_end(self, *args, **kwargs) -> Union[torch.Tensor, Dict[str, Any]]
inherited
¶
Use this when testing with dp or ddp2 because :meth:test_step
will operate
on only part of the batch. However, this is still optional
and only needed for things like softmax or NCE loss.
!!! note If you later switch to ddp or some other mode, this will still be called so that you don't have to change your code.
.. code-block:: python
# pseudocode
sub_batches = split_batches_for_dp(batch)
batch_parts_outputs = [test_step(sub_batch) for sub_batch in sub_batches]
test_step_end(batch_parts_outputs)
Parameters:
Name | Type | Description | Default |
---|---|---|---|
batch_parts_outputs |
What you return in :meth: |
required |
Returns:
Type | Description |
---|---|
Union[torch.Tensor, Dict[str, Any]] |
None or anything |
.. code-block:: python
# WITHOUT test_step_end
# if used in DP or DDP2, this batch is 1/num_gpus large
def test_step(self, batch, batch_idx):
# batch is 1/num_gpus big
x, y = batch
out = self(x)
loss = self.softmax(out)
self.log("test_loss", loss)
# --------------
# with test_step_end to do softmax over the full batch
def test_step(self, batch, batch_idx):
# batch is 1/num_gpus big
x, y = batch
out = self.encoder(x)
return out
def test_step_end(self, output_results):
# this out is now the full size of the batch
all_test_step_outs = output_results.out
loss = nce_loss(all_test_step_outs)
self.log("test_loss", loss)
See Also:
See the :ref:advanced/multi_gpu:Multi-GPU training
guide for more details.
Source code in zamba/models/slowfast_models.py
def test_step_end(self, *args, **kwargs) -> Optional[STEP_OUTPUT]:
"""
Use this when testing with dp or ddp2 because :meth:`test_step` will operate
on only part of the batch. However, this is still optional
and only needed for things like softmax or NCE loss.
Note:
If you later switch to ddp or some other mode, this will still be called
so that you don't have to change your code.
.. code-block:: python
# pseudocode
sub_batches = split_batches_for_dp(batch)
batch_parts_outputs = [test_step(sub_batch) for sub_batch in sub_batches]
test_step_end(batch_parts_outputs)
Args:
batch_parts_outputs: What you return in :meth:`test_step` for each batch part.
Return:
None or anything
.. code-block:: python
# WITHOUT test_step_end
# if used in DP or DDP2, this batch is 1/num_gpus large
def test_step(self, batch, batch_idx):
# batch is 1/num_gpus big
x, y = batch
out = self(x)
loss = self.softmax(out)
self.log("test_loss", loss)
# --------------
# with test_step_end to do softmax over the full batch
def test_step(self, batch, batch_idx):
# batch is 1/num_gpus big
x, y = batch
out = self.encoder(x)
return out
def test_step_end(self, output_results):
# this out is now the full size of the batch
all_test_step_outs = output_results.out
loss = nce_loss(all_test_step_outs)
self.log("test_loss", loss)
See Also:
See the :ref:`advanced/multi_gpu:Multi-GPU training` guide for more details.
"""
to(self, *args: Any, **kwargs: Any) -> DeviceDtypeModuleMixin
inherited
¶
Moves and/or casts the parameters and buffers.
This can be called as
.. function:: to(device=None, dtype=None, non_blocking=False)
.. function:: to(dtype, non_blocking=False)
.. function:: to(tensor, non_blocking=False)
Its signature is similar to :meth:torch.Tensor.to
, but only accepts
floating point desired :attr:dtype
s. In addition, this method will
only cast the floating point parameters and buffers to :attr:dtype
(if given). The integral parameters and buffers will be moved
:attr:device
, if that is given, but with dtypes unchanged. When
:attr:non_blocking
is set, it tries to convert/move asynchronously
with respect to the host if possible, e.g., moving CPU Tensors with
pinned memory to CUDA devices.
See below for examples.
!!! note This method modifies the module in-place.
Parameters:
Name | Type | Description | Default |
---|---|---|---|
device |
the desired device of the parameters and buffers in this module |
required | |
dtype |
the desired floating point type of the floating point parameters and buffers in this module |
required | |
tensor |
Tensor whose dtype and device are the desired dtype and device for all parameters and buffers in this module |
required |
Returns:
Type | Description |
---|---|
Module |
self |
Example:: >>> class ExampleModule(DeviceDtypeModuleMixin): ... def init(self, weight: torch.Tensor): ... super().init() ... self.register_buffer('weight', weight) >>> _ = torch.manual_seed(0) >>> module = ExampleModule(torch.rand(3, 4)) >>> module.weight #doctest: +ELLIPSIS tensor([[...]]) >>> module.to(torch.double) ExampleModule() >>> module.weight #doctest: +ELLIPSIS tensor([[...]], dtype=torch.float64) >>> cpu = torch.device('cpu') >>> module.to(cpu, dtype=torch.half, non_blocking=True) ExampleModule() >>> module.weight #doctest: +ELLIPSIS tensor([[...]], dtype=torch.float16) >>> module.to(cpu) ExampleModule() >>> module.weight #doctest: +ELLIPSIS tensor([[...]], dtype=torch.float16) >>> module.device device(type='cpu') >>> module.dtype torch.float16
Source code in zamba/models/slowfast_models.py
def to(self, *args: Any, **kwargs: Any) -> "DeviceDtypeModuleMixin":
"""Moves and/or casts the parameters and buffers.
This can be called as
.. function:: to(device=None, dtype=None, non_blocking=False)
.. function:: to(dtype, non_blocking=False)
.. function:: to(tensor, non_blocking=False)
Its signature is similar to :meth:`torch.Tensor.to`, but only accepts
floating point desired :attr:`dtype` s. In addition, this method will
only cast the floating point parameters and buffers to :attr:`dtype`
(if given). The integral parameters and buffers will be moved
:attr:`device`, if that is given, but with dtypes unchanged. When
:attr:`non_blocking` is set, it tries to convert/move asynchronously
with respect to the host if possible, e.g., moving CPU Tensors with
pinned memory to CUDA devices.
See below for examples.
Note:
This method modifies the module in-place.
Args:
device: the desired device of the parameters
and buffers in this module
dtype: the desired floating point type of
the floating point parameters and buffers in this module
tensor: Tensor whose dtype and device are the desired
dtype and device for all parameters and buffers in this module
Returns:
Module: self
Example::
>>> class ExampleModule(DeviceDtypeModuleMixin):
... def __init__(self, weight: torch.Tensor):
... super().__init__()
... self.register_buffer('weight', weight)
>>> _ = torch.manual_seed(0)
>>> module = ExampleModule(torch.rand(3, 4))
>>> module.weight #doctest: +ELLIPSIS
tensor([[...]])
>>> module.to(torch.double)
ExampleModule()
>>> module.weight #doctest: +ELLIPSIS
tensor([[...]], dtype=torch.float64)
>>> cpu = torch.device('cpu')
>>> module.to(cpu, dtype=torch.half, non_blocking=True)
ExampleModule()
>>> module.weight #doctest: +ELLIPSIS
tensor([[...]], dtype=torch.float16)
>>> module.to(cpu)
ExampleModule()
>>> module.weight #doctest: +ELLIPSIS
tensor([[...]], dtype=torch.float16)
>>> module.device
device(type='cpu')
>>> module.dtype
torch.float16
"""
# there is diff nb vars in PT 1.5
out = torch._C._nn._parse_to(*args, **kwargs)
self.__update_properties(device=out[0], dtype=out[1])
return super().to(*args, **kwargs)
to_disk(self, path: PathLike)
inherited
¶
Source code in zamba/models/slowfast_models.py
def to_disk(self, path: os.PathLike):
checkpoint = {
"state_dict": self.state_dict(),
"hyper_parameters": self.hparams,
}
torch.save(checkpoint, path)
to_empty(self: ~T, *, device: Union[str, torch.device]) -> ~T
inherited
¶
Moves the parameters and buffers to the specified device without copying storage.
Parameters:
Name | Type | Description | Default |
---|---|---|---|
device |
class: |
required |
Returns:
Type | Description |
---|---|
Module |
self |
Source code in zamba/models/slowfast_models.py
def to_empty(self: T, *, device: Union[str, device]) -> T:
r"""Moves the parameters and buffers to the specified device without copying storage.
Args:
device (:class:`torch.device`): The desired device of the parameters
and buffers in this module.
Returns:
Module: self
"""
return self._apply(lambda t: torch.empty_like(t, device=device))
to_onnx(self, file_path: Union[str, pathlib.Path], input_sample: Optional[Any] = None, **kwargs)
inherited
¶
Saves the model in ONNX format.
Parameters:
Name | Type | Description | Default |
---|---|---|---|
file_path |
Union[str, pathlib.Path] |
The path of the file the onnx model should be saved to. |
required |
input_sample |
Optional[Any] |
An input for tracing. Default: None (Use self.example_input_array) |
None |
**kwargs |
Will be passed to torch.onnx.export function. |
{} |
Examples:
>>> class SimpleModel(LightningModule):
... def __init__(self):
... super().__init__()
... self.l1 = torch.nn.Linear(in_features=64, out_features=4)
...
... def forward(self, x):
... return torch.relu(self.l1(x.view(x.size(0), -1)))
>>> with tempfile.NamedTemporaryFile(suffix='.onnx', delete=False) as tmpfile:
... model = SimpleModel()
... input_sample = torch.randn((1, 64))
... model.to_onnx(tmpfile.name, input_sample, export_params=True)
... os.path.isfile(tmpfile.name)
True
Source code in zamba/models/slowfast_models.py
@torch.no_grad()
def to_onnx(self, file_path: Union[str, Path], input_sample: Optional[Any] = None, **kwargs):
"""
Saves the model in ONNX format.
Args:
file_path: The path of the file the onnx model should be saved to.
input_sample: An input for tracing. Default: None (Use self.example_input_array)
**kwargs: Will be passed to torch.onnx.export function.
Example:
>>> class SimpleModel(LightningModule):
... def __init__(self):
... super().__init__()
... self.l1 = torch.nn.Linear(in_features=64, out_features=4)
...
... def forward(self, x):
... return torch.relu(self.l1(x.view(x.size(0), -1)))
>>> with tempfile.NamedTemporaryFile(suffix='.onnx', delete=False) as tmpfile:
... model = SimpleModel()
... input_sample = torch.randn((1, 64))
... model.to_onnx(tmpfile.name, input_sample, export_params=True)
... os.path.isfile(tmpfile.name)
True
"""
mode = self.training
if input_sample is None:
if self.example_input_array is None:
raise ValueError(
"Could not export to ONNX since neither `input_sample` nor"
" `model.example_input_array` attribute is set."
)
input_sample = self.example_input_array
input_sample = self._apply_batch_transfer_handler(input_sample)
if "example_outputs" not in kwargs:
self.eval()
if isinstance(input_sample, Tuple):
kwargs["example_outputs"] = self(*input_sample)
else:
kwargs["example_outputs"] = self(input_sample)
torch.onnx.export(self, input_sample, file_path, **kwargs)
self.train(mode)
to_torchscript(self, file_path: Union[str, pathlib.Path] = None, method: Optional[str] = 'script', example_inputs: Optional[Any] = None, **kwargs) -> Union[torch._C.ScriptModule, Dict[str, torch._C.ScriptModule]]
inherited
¶
By default compiles the whole model to a :class:~torch.jit.ScriptModule
.
If you want to use tracing, please provided the argument method='trace'
and make sure that either the
example_inputs
argument is provided, or the model has :attr:example_input_array
set.
If you would like to customize the modules that are scripted you should override this method.
In case you want to return multiple modules, we recommend using a dictionary.
Parameters:
Name | Type | Description | Default |
---|---|---|---|
file_path |
Union[str, pathlib.Path] |
Path where to save the torchscript. Default: None (no file saved). |
None |
method |
Optional[str] |
Whether to use TorchScript's script or trace method. Default: 'script' |
'script' |
example_inputs |
Optional[Any] |
An input to be used to do tracing when method is set to 'trace'.
Default: None (uses :attr: |
None |
**kwargs |
Additional arguments that will be passed to the :func: |
{} |
!!! note
- Requires the implementation of the
:meth:~pytorch_lightning.core.lightning.LightningModule.forward
method.
- The exported script will be set to evaluation mode.
- It is recommended that you install the latest supported version of PyTorch
to use this feature without limitations. See also the :mod:torch.jit
documentation for supported features.
Examples:
>>> class SimpleModel(LightningModule):
... def __init__(self):
... super().__init__()
... self.l1 = torch.nn.Linear(in_features=64, out_features=4)
...
... def forward(self, x):
... return torch.relu(self.l1(x.view(x.size(0), -1)))
...
>>> model = SimpleModel()
>>> torch.jit.save(model.to_torchscript(), "model.pt") # doctest: +SKIP
>>> os.path.isfile("model.pt") # doctest: +SKIP
>>> torch.jit.save(model.to_torchscript(file_path="model_trace.pt", method='trace', # doctest: +SKIP
... example_inputs=torch.randn(1, 64))) # doctest: +SKIP
>>> os.path.isfile("model_trace.pt") # doctest: +SKIP
True
Returns:
Type | Description |
---|---|
Union[torch._C.ScriptModule, Dict[str, torch._C.ScriptModule]] |
This LightningModule as a torchscript, regardless of whether |
Source code in zamba/models/slowfast_models.py
@torch.no_grad()
def to_torchscript(
self,
file_path: Optional[Union[str, Path]] = None,
method: Optional[str] = "script",
example_inputs: Optional[Any] = None,
**kwargs,
) -> Union[ScriptModule, Dict[str, ScriptModule]]:
"""
By default compiles the whole model to a :class:`~torch.jit.ScriptModule`.
If you want to use tracing, please provided the argument ``method='trace'`` and make sure that either the
`example_inputs` argument is provided, or the model has :attr:`example_input_array` set.
If you would like to customize the modules that are scripted you should override this method.
In case you want to return multiple modules, we recommend using a dictionary.
Args:
file_path: Path where to save the torchscript. Default: None (no file saved).
method: Whether to use TorchScript's script or trace method. Default: 'script'
example_inputs: An input to be used to do tracing when method is set to 'trace'.
Default: None (uses :attr:`example_input_array`)
**kwargs: Additional arguments that will be passed to the :func:`torch.jit.script` or
:func:`torch.jit.trace` function.
Note:
- Requires the implementation of the
:meth:`~pytorch_lightning.core.lightning.LightningModule.forward` method.
- The exported script will be set to evaluation mode.
- It is recommended that you install the latest supported version of PyTorch
to use this feature without limitations. See also the :mod:`torch.jit`
documentation for supported features.
Example:
>>> class SimpleModel(LightningModule):
... def __init__(self):
... super().__init__()
... self.l1 = torch.nn.Linear(in_features=64, out_features=4)
...
... def forward(self, x):
... return torch.relu(self.l1(x.view(x.size(0), -1)))
...
>>> model = SimpleModel()
>>> torch.jit.save(model.to_torchscript(), "model.pt") # doctest: +SKIP
>>> os.path.isfile("model.pt") # doctest: +SKIP
>>> torch.jit.save(model.to_torchscript(file_path="model_trace.pt", method='trace', # doctest: +SKIP
... example_inputs=torch.randn(1, 64))) # doctest: +SKIP
>>> os.path.isfile("model_trace.pt") # doctest: +SKIP
True
Return:
This LightningModule as a torchscript, regardless of whether `file_path` is
defined or not.
"""
mode = self.training
if method == "script":
torchscript_module = torch.jit.script(self.eval(), **kwargs)
elif method == "trace":
# if no example inputs are provided, try to see if model has example_input_array set
if example_inputs is None:
if self.example_input_array is None:
raise ValueError(
"Choosing method=`trace` requires either `example_inputs`"
" or `model.example_input_array` to be defined."
)
example_inputs = self.example_input_array
# automatically send example inputs to the right device and use trace
example_inputs = self._apply_batch_transfer_handler(example_inputs)
torchscript_module = torch.jit.trace(func=self.eval(), example_inputs=example_inputs, **kwargs)
else:
raise ValueError(f"The 'method' parameter only supports 'script' or 'trace', but value given was: {method}")
self.train(mode)
if file_path is not None:
fs = get_filesystem(file_path)
with fs.open(file_path, "wb") as f:
torch.jit.save(torchscript_module, f)
return torchscript_module
toggle_optimizer(self, optimizer: Optimizer, optimizer_idx: int)
inherited
¶
Makes sure only the gradients of the current optimizer's parameters are calculated
in the training step to prevent dangling gradients in multiple-optimizer setup.
It works with :meth:untoggle_optimizer
to make sure param_requires_grad_state
is properly reset.
Override for your own behavior.
Parameters:
Name | Type | Description | Default |
---|---|---|---|
optimizer |
Optimizer |
Current optimizer used in the training loop |
required |
optimizer_idx |
int |
Current optimizer idx in the training loop |
required |
!!! note Only called when using multiple optimizers
Source code in zamba/models/slowfast_models.py
def toggle_optimizer(self, optimizer: Optimizer, optimizer_idx: int):
"""
Makes sure only the gradients of the current optimizer's parameters are calculated
in the training step to prevent dangling gradients in multiple-optimizer setup.
It works with :meth:`untoggle_optimizer` to make sure ``param_requires_grad_state`` is properly reset.
Override for your own behavior.
Args:
optimizer: Current optimizer used in the training loop
optimizer_idx: Current optimizer idx in the training loop
Note:
Only called when using multiple optimizers
"""
# Iterate over all optimizer parameters to preserve their `requires_grad` information
# in case these are pre-defined during `configure_optimizers`
param_requires_grad_state = {}
for opt in self.optimizers(use_pl_optimizer=False):
for group in opt.param_groups:
for param in group["params"]:
# If a param already appear in param_requires_grad_state, continue
if param in param_requires_grad_state:
continue
param_requires_grad_state[param] = param.requires_grad
param.requires_grad = False
# Then iterate over the current optimizer's parameters and set its `requires_grad`
# properties accordingly
for group in optimizer.param_groups:
for param in group["params"]:
param.requires_grad = param_requires_grad_state[param]
self._param_requires_grad_state = param_requires_grad_state
train(self: ~T, mode: bool = True) -> ~T
inherited
¶
Sets the module in training mode.
This has any effect only on certain modules. See documentations of
particular modules for details of their behaviors in training/evaluation
mode, if they are affected, e.g. :class:Dropout
, :class:BatchNorm
,
etc.
Parameters:
Name | Type | Description | Default |
---|---|---|---|
mode |
bool |
whether to set training mode ( |
True |
Returns:
Type | Description |
---|---|
Module |
self |
Source code in zamba/models/slowfast_models.py
def train(self: T, mode: bool = True) -> T:
r"""Sets the module in training mode.
This has any effect only on certain modules. See documentations of
particular modules for details of their behaviors in training/evaluation
mode, if they are affected, e.g. :class:`Dropout`, :class:`BatchNorm`,
etc.
Args:
mode (bool): whether to set training mode (``True``) or evaluation
mode (``False``). Default: ``True``.
Returns:
Module: self
"""
if not isinstance(mode, bool):
raise ValueError("training mode is expected to be boolean")
self.training = mode
for module in self.children():
module.train(mode)
return self
train_dataloader(self) -> Union[torch.utils.data.dataloader.DataLoader, Sequence[torch.utils.data.dataloader.DataLoader], Sequence[Sequence[torch.utils.data.dataloader.DataLoader]], Sequence[Dict[str, torch.utils.data.dataloader.DataLoader]], Dict[str, torch.utils.data.dataloader.DataLoader], Dict[str, Dict[str, torch.utils.data.dataloader.DataLoader]], Dict[str, Sequence[torch.utils.data.dataloader.DataLoader]]]
inherited
¶
Implement one or more PyTorch DataLoaders for training.
Returns:
Type | Description |
---|---|
A collection of |
class: |
The dataloader you return will not be reloaded unless you set
:paramref:~pytorch_lightning.trainer.Trainer.reload_dataloaders_every_n_epochs
to
a positive integer.
For data processing use the following pattern:
- download in :meth:`prepare_data`
- process and split in :meth:`setup`
However, the above are only necessary for distributed processing.
.. warning:: do not assign state in prepare_data
- :meth:
~pytorch_lightning.trainer.Trainer.fit
- ...
- :meth:
prepare_data
- :meth:
setup
- :meth:
train_dataloader
!!! note Lightning adds the correct sampler for distributed and arbitrary hardware. There is no need to set it yourself.
Example::
# single dataloader
def train_dataloader(self):
transform = transforms.Compose([transforms.ToTensor(),
transforms.Normalize((0.5,), (1.0,))])
dataset = MNIST(root='/path/to/mnist/', train=True, transform=transform,
download=True)
loader = torch.utils.data.DataLoader(
dataset=dataset,
batch_size=self.batch_size,
shuffle=True
)
return loader
# multiple dataloaders, return as list
def train_dataloader(self):
mnist = MNIST(...)
cifar = CIFAR(...)
mnist_loader = torch.utils.data.DataLoader(
dataset=mnist, batch_size=self.batch_size, shuffle=True
)
cifar_loader = torch.utils.data.DataLoader(
dataset=cifar, batch_size=self.batch_size, shuffle=True
)
# each batch will be a list of tensors: [batch_mnist, batch_cifar]
return [mnist_loader, cifar_loader]
# multiple dataloader, return as dict
def train_dataloader(self):
mnist = MNIST(...)
cifar = CIFAR(...)
mnist_loader = torch.utils.data.DataLoader(
dataset=mnist, batch_size=self.batch_size, shuffle=True
)
cifar_loader = torch.utils.data.DataLoader(
dataset=cifar, batch_size=self.batch_size, shuffle=True
)
# each batch will be a dict of tensors: {'mnist': batch_mnist, 'cifar': batch_cifar}
return {'mnist': mnist_loader, 'cifar': cifar_loader}
Source code in zamba/models/slowfast_models.py
def train_dataloader(self) -> TRAIN_DATALOADERS:
"""
Implement one or more PyTorch DataLoaders for training.
Return:
A collection of :class:`torch.utils.data.DataLoader` specifying training samples.
In the case of multiple dataloaders, please see this :ref:`page <multiple-training-dataloaders>`.
The dataloader you return will not be reloaded unless you set
:paramref:`~pytorch_lightning.trainer.Trainer.reload_dataloaders_every_n_epochs` to
a positive integer.
For data processing use the following pattern:
- download in :meth:`prepare_data`
- process and split in :meth:`setup`
However, the above are only necessary for distributed processing.
.. warning:: do not assign state in prepare_data
- :meth:`~pytorch_lightning.trainer.Trainer.fit`
- ...
- :meth:`prepare_data`
- :meth:`setup`
- :meth:`train_dataloader`
Note:
Lightning adds the correct sampler for distributed and arbitrary hardware.
There is no need to set it yourself.
Example::
# single dataloader
def train_dataloader(self):
transform = transforms.Compose([transforms.ToTensor(),
transforms.Normalize((0.5,), (1.0,))])
dataset = MNIST(root='/path/to/mnist/', train=True, transform=transform,
download=True)
loader = torch.utils.data.DataLoader(
dataset=dataset,
batch_size=self.batch_size,
shuffle=True
)
return loader
# multiple dataloaders, return as list
def train_dataloader(self):
mnist = MNIST(...)
cifar = CIFAR(...)
mnist_loader = torch.utils.data.DataLoader(
dataset=mnist, batch_size=self.batch_size, shuffle=True
)
cifar_loader = torch.utils.data.DataLoader(
dataset=cifar, batch_size=self.batch_size, shuffle=True
)
# each batch will be a list of tensors: [batch_mnist, batch_cifar]
return [mnist_loader, cifar_loader]
# multiple dataloader, return as dict
def train_dataloader(self):
mnist = MNIST(...)
cifar = CIFAR(...)
mnist_loader = torch.utils.data.DataLoader(
dataset=mnist, batch_size=self.batch_size, shuffle=True
)
cifar_loader = torch.utils.data.DataLoader(
dataset=cifar, batch_size=self.batch_size, shuffle=True
)
# each batch will be a dict of tensors: {'mnist': batch_mnist, 'cifar': batch_cifar}
return {'mnist': mnist_loader, 'cifar': cifar_loader}
"""
rank_zero_warn("`train_dataloader` must be implemented to be used with the Lightning Trainer")
training_epoch_end(self, outputs: List[Union[torch.Tensor, Dict[str, Any]]]) -> None
inherited
¶
Called at the end of the training epoch with the outputs of all training steps.
Use this in case you need to do something with all the outputs returned by :meth:training_step
.
.. code-block:: python
# the pseudocode for these calls
train_outs = []
for train_batch in train_data:
out = training_step(train_batch)
train_outs.append(out)
training_epoch_end(train_outs)
Parameters:
Name | Type | Description | Default |
---|---|---|---|
outputs |
List[Union[torch.Tensor, Dict[str, Any]]] |
List of outputs you defined in :meth: |
required |
Returns:
Type | Description |
---|---|
None |
None |
!!! note If this method is not overridden, this won't be called.
Example::
def training_epoch_end(self, training_step_outputs):
# do something with all training_step outputs
return result
With multiple dataloaders, outputs
will be a list of lists. The outer list contains
one entry per dataloader, while the inner list contains the individual outputs of
each training step for that dataloader.
.. code-block:: python
def training_epoch_end(self, training_step_outputs):
for out in training_step_outputs:
...
Source code in zamba/models/slowfast_models.py
def training_epoch_end(self, outputs: EPOCH_OUTPUT) -> None:
"""
Called at the end of the training epoch with the outputs of all training steps.
Use this in case you need to do something with all the outputs returned by :meth:`training_step`.
.. code-block:: python
# the pseudocode for these calls
train_outs = []
for train_batch in train_data:
out = training_step(train_batch)
train_outs.append(out)
training_epoch_end(train_outs)
Args:
outputs: List of outputs you defined in :meth:`training_step`, or if there are
multiple dataloaders, a list containing a list of outputs for each dataloader.
Return:
None
Note:
If this method is not overridden, this won't be called.
Example::
def training_epoch_end(self, training_step_outputs):
# do something with all training_step outputs
return result
With multiple dataloaders, ``outputs`` will be a list of lists. The outer list contains
one entry per dataloader, while the inner list contains the individual outputs of
each training step for that dataloader.
.. code-block:: python
def training_epoch_end(self, training_step_outputs):
for out in training_step_outputs:
...
"""
training_step(self, batch, batch_idx)
inherited
¶
Here you compute and return the training loss and some additional metrics for e.g. the progress bar or logger.
Parameters:
Name | Type | Description | Default |
---|---|---|---|
batch |
class: |
required | |
batch_idx |
int |
Integer displaying index of this batch |
required |
optimizer_idx |
int |
When using multiple optimizers, this argument will also be present. |
required |
hiddens( |
class: |
required |
Returns:
Type | Description |
---|---|
Any of.
- |
class: |
In this step you'd normally do the forward pass and calculate the loss for a batch. You can also do fancier things like multiple forward passes or something model specific.
Example::
def training_step(self, batch, batch_idx):
x, y, z = batch
out = self.encoder(x)
loss = self.loss(out, x)
return loss
If you define multiple optimizers, this step will be called with an additional
optimizer_idx
parameter.
.. code-block:: python
# Multiple optimizers (e.g.: GANs)
def training_step(self, batch, batch_idx, optimizer_idx):
if optimizer_idx == 0:
# do training_step with encoder
...
if optimizer_idx == 1:
# do training_step with decoder
...
If you add truncated back propagation through time you will also get an additional argument with the hidden states of the previous step.
.. code-block:: python
# Truncated back-propagation through time
def training_step(self, batch, batch_idx, hiddens):
# hiddens are the hidden states from the previous truncated backprop step
...
out, hiddens = self.lstm(data, hiddens)
...
return {"loss": loss, "hiddens": hiddens}
!!! note The loss value shown in the progress bar is smoothed (averaged) over the last values, so it differs from the actual loss returned in train/validation step.
Source code in zamba/models/slowfast_models.py
def training_step(self, batch, batch_idx):
x, y = batch
y_hat = self(x)
loss = F.binary_cross_entropy_with_logits(y_hat, y)
self.log("train_loss", loss.detach())
return loss
training_step_end(self, *args, **kwargs) -> Union[torch.Tensor, Dict[str, Any]]
inherited
¶
Use this when training with dp or ddp2 because :meth:training_step
will operate on only part of the batch. However, this is still optional
and only needed for things like softmax or NCE loss.
!!! note If you later switch to ddp or some other mode, this will still be called so that you don't have to change your code
.. code-block:: python
# pseudocode
sub_batches = split_batches_for_dp(batch)
batch_parts_outputs = [training_step(sub_batch) for sub_batch in sub_batches]
training_step_end(batch_parts_outputs)
Parameters:
Name | Type | Description | Default |
---|---|---|---|
batch_parts_outputs |
What you return in |
required |
Returns:
Type | Description |
---|---|
Union[torch.Tensor, Dict[str, Any]] |
Anything |
When using dp/ddp2 distributed backends, only a portion of the batch is inside the training_step:
.. code-block:: python
def training_step(self, batch, batch_idx):
# batch is 1/num_gpus big
x, y = batch
out = self(x)
# softmax uses only a portion of the batch in the denominator
loss = self.softmax(out)
loss = nce_loss(loss)
return loss
If you wish to do something with all the parts of the batch, then use this method to do it:
.. code-block:: python
def training_step(self, batch, batch_idx):
# batch is 1/num_gpus big
x, y = batch
out = self.encoder(x)
return {"pred": out}
def training_step_end(self, training_step_outputs):
gpu_0_pred = training_step_outputs[0]["pred"]
gpu_1_pred = training_step_outputs[1]["pred"]
gpu_n_pred = training_step_outputs[n]["pred"]
# this softmax now uses the full batch
loss = nce_loss([gpu_0_pred, gpu_1_pred, gpu_n_pred])
return loss
See Also:
See the :ref:advanced/multi_gpu:Multi-GPU training
guide for more details.
Source code in zamba/models/slowfast_models.py
def training_step_end(self, *args, **kwargs) -> STEP_OUTPUT:
"""
Use this when training with dp or ddp2 because :meth:`training_step`
will operate on only part of the batch. However, this is still optional
and only needed for things like softmax or NCE loss.
Note:
If you later switch to ddp or some other mode, this will still be called
so that you don't have to change your code
.. code-block:: python
# pseudocode
sub_batches = split_batches_for_dp(batch)
batch_parts_outputs = [training_step(sub_batch) for sub_batch in sub_batches]
training_step_end(batch_parts_outputs)
Args:
batch_parts_outputs: What you return in `training_step` for each batch part.
Return:
Anything
When using dp/ddp2 distributed backends, only a portion of the batch is inside the training_step:
.. code-block:: python
def training_step(self, batch, batch_idx):
# batch is 1/num_gpus big
x, y = batch
out = self(x)
# softmax uses only a portion of the batch in the denominator
loss = self.softmax(out)
loss = nce_loss(loss)
return loss
If you wish to do something with all the parts of the batch, then use this method to do it:
.. code-block:: python
def training_step(self, batch, batch_idx):
# batch is 1/num_gpus big
x, y = batch
out = self.encoder(x)
return {"pred": out}
def training_step_end(self, training_step_outputs):
gpu_0_pred = training_step_outputs[0]["pred"]
gpu_1_pred = training_step_outputs[1]["pred"]
gpu_n_pred = training_step_outputs[n]["pred"]
# this softmax now uses the full batch
loss = nce_loss([gpu_0_pred, gpu_1_pred, gpu_n_pred])
return loss
See Also:
See the :ref:`advanced/multi_gpu:Multi-GPU training` guide for more details.
"""
transfer_batch_to_device(self, batch: Any, device: device, dataloader_idx: int) -> Any
inherited
¶
Override this hook if your :class:~torch.utils.data.DataLoader
returns tensors
wrapped in a custom data structure.
The data types listed below (and any arbitrary nesting of them) are supported out of the box:
- :class:
torch.Tensor
or anything that implements.to(...)
- :class:
list
- :class:
dict
- :class:
tuple
- :class:
torchtext.data.batch.Batch
For anything else, you need to define how the data is moved to the target device (CPU, GPU, TPU, ...).
!!! note
This hook should only transfer the data and not modify it, nor should it move the data to
any other device than the one passed in as argument (unless you know what you are doing).
To check the current state of execution of this hook you can use
self.trainer.training/testing/validating/predicting
so that you can
add different logic as per your requirement.
!!! note This hook only runs on single GPU training and DDP (no data-parallel). Data-Parallel support will come in near future.
Parameters:
Name | Type | Description | Default |
---|---|---|---|
batch |
Any |
A batch of data that needs to be transferred to a new device. |
required |
device |
device |
The target device as defined in PyTorch. |
required |
dataloader_idx |
int |
The index of the dataloader to which the batch belongs. |
required |
Returns:
Type | Description |
---|---|
Any |
A reference to the data on the new device. |
Example::
def transfer_batch_to_device(self, batch, device):
if isinstance(batch, CustomBatch):
# move all tensors in your custom data structure to the device
batch.samples = batch.samples.to(device)
batch.targets = batch.targets.to(device)
!!! else
batch = super().transfer_batch_to_device(data, device)
return batch
See Also:
- :meth:move_data_to_device
- :meth:apply_to_collection
Source code in zamba/models/slowfast_models.py
def transfer_batch_to_device(self, batch: Any, device: torch.device, dataloader_idx: int) -> Any:
"""
Override this hook if your :class:`~torch.utils.data.DataLoader` returns tensors
wrapped in a custom data structure.
The data types listed below (and any arbitrary nesting of them) are supported out of the box:
- :class:`torch.Tensor` or anything that implements `.to(...)`
- :class:`list`
- :class:`dict`
- :class:`tuple`
- :class:`torchtext.data.batch.Batch`
For anything else, you need to define how the data is moved to the target device (CPU, GPU, TPU, ...).
Note:
This hook should only transfer the data and not modify it, nor should it move the data to
any other device than the one passed in as argument (unless you know what you are doing).
To check the current state of execution of this hook you can use
``self.trainer.training/testing/validating/predicting`` so that you can
add different logic as per your requirement.
Note:
This hook only runs on single GPU training and DDP (no data-parallel).
Data-Parallel support will come in near future.
Args:
batch: A batch of data that needs to be transferred to a new device.
device: The target device as defined in PyTorch.
dataloader_idx: The index of the dataloader to which the batch belongs.
Returns:
A reference to the data on the new device.
Example::
def transfer_batch_to_device(self, batch, device):
if isinstance(batch, CustomBatch):
# move all tensors in your custom data structure to the device
batch.samples = batch.samples.to(device)
batch.targets = batch.targets.to(device)
else:
batch = super().transfer_batch_to_device(data, device)
return batch
Raises:
MisconfigurationException:
If using data-parallel, ``Trainer(accelerator='dp')``.
See Also:
- :meth:`move_data_to_device`
- :meth:`apply_to_collection`
"""
return move_data_to_device(batch, device)
type(self, dst_type: Union[str, torch.dtype]) -> DeviceDtypeModuleMixin
inherited
¶
Casts all parameters and buffers to :attr:dst_type
.
Parameters:
Name | Type | Description | Default |
---|---|---|---|
dst_type |
type or string |
the desired type |
required |
Returns:
Type | Description |
---|---|
Module |
self |
Source code in zamba/models/slowfast_models.py
def type(self, dst_type: Union[str, torch.dtype]) -> "DeviceDtypeModuleMixin":
"""Casts all parameters and buffers to :attr:`dst_type`.
Arguments:
dst_type (type or string): the desired type
Returns:
Module: self
"""
self.__update_properties(dtype=dst_type)
return super().type(dst_type=dst_type)
unfreeze(self) -> None
inherited
¶
Unfreeze all parameters for training.
.. code-block:: python
model = MyLightningModule(...)
model.unfreeze()
Source code in zamba/models/slowfast_models.py
def unfreeze(self) -> None:
"""
Unfreeze all parameters for training.
.. code-block:: python
model = MyLightningModule(...)
model.unfreeze()
"""
for param in self.parameters():
param.requires_grad = True
self.train()
untoggle_optimizer(self, optimizer_idx: int)
inherited
¶
Resets the state of required gradients that were toggled with :meth:toggle_optimizer
.
Override for your own behavior.
Parameters:
Name | Type | Description | Default |
---|---|---|---|
optimizer_idx |
int |
Current optimizer idx in the training loop |
required |
!!! note Only called when using multiple optimizers
Source code in zamba/models/slowfast_models.py
def untoggle_optimizer(self, optimizer_idx: int):
"""
Resets the state of required gradients that were toggled with :meth:`toggle_optimizer`.
Override for your own behavior.
Args:
optimizer_idx: Current optimizer idx in the training loop
Note:
Only called when using multiple optimizers
"""
for opt_idx, opt in enumerate(self.optimizers(use_pl_optimizer=False)):
if optimizer_idx != opt_idx:
for group in opt.param_groups:
for param in group["params"]:
if param in self._param_requires_grad_state:
param.requires_grad = self._param_requires_grad_state[param]
# save memory
self._param_requires_grad_state = {}
val_dataloader(self) -> Union[torch.utils.data.dataloader.DataLoader, Sequence[torch.utils.data.dataloader.DataLoader]]
inherited
¶
Implement one or multiple PyTorch DataLoaders for validation.
The dataloader you return will not be reloaded unless you set
:paramref:~pytorch_lightning.trainer.Trainer.reload_dataloaders_every_n_epochs
to
a positive integer.
It's recommended that all data downloads and preparation happen in :meth:prepare_data
.
- :meth:
~pytorch_lightning.trainer.Trainer.fit
- ...
- :meth:
prepare_data
- :meth:
train_dataloader
- :meth:
val_dataloader
- :meth:
test_dataloader
!!! note Lightning adds the correct sampler for distributed and arbitrary hardware There is no need to set it yourself.
Returns:
Type | Description |
---|---|
A |
class: |
Examples::
def val_dataloader(self):
transform = transforms.Compose([transforms.ToTensor(),
transforms.Normalize((0.5,), (1.0,))])
dataset = MNIST(root='/path/to/mnist/', train=False,
transform=transform, download=True)
loader = torch.utils.data.DataLoader(
dataset=dataset,
batch_size=self.batch_size,
shuffle=False
)
return loader
# can also return multiple dataloaders
def val_dataloader(self):
return [loader_a, loader_b, ..., loader_n]
!!! note
If you don't need a validation dataset and a :meth:validation_step
, you don't need to
implement this method.
!!! note
In the case where you return multiple validation dataloaders, the :meth:validation_step
will have an argument dataloader_idx
which matches the order here.
Source code in zamba/models/slowfast_models.py
def val_dataloader(self) -> EVAL_DATALOADERS:
r"""
Implement one or multiple PyTorch DataLoaders for validation.
The dataloader you return will not be reloaded unless you set
:paramref:`~pytorch_lightning.trainer.Trainer.reload_dataloaders_every_n_epochs` to
a positive integer.
It's recommended that all data downloads and preparation happen in :meth:`prepare_data`.
- :meth:`~pytorch_lightning.trainer.Trainer.fit`
- ...
- :meth:`prepare_data`
- :meth:`train_dataloader`
- :meth:`val_dataloader`
- :meth:`test_dataloader`
Note:
Lightning adds the correct sampler for distributed and arbitrary hardware
There is no need to set it yourself.
Return:
A :class:`torch.utils.data.DataLoader` or a sequence of them specifying validation samples.
Examples::
def val_dataloader(self):
transform = transforms.Compose([transforms.ToTensor(),
transforms.Normalize((0.5,), (1.0,))])
dataset = MNIST(root='/path/to/mnist/', train=False,
transform=transform, download=True)
loader = torch.utils.data.DataLoader(
dataset=dataset,
batch_size=self.batch_size,
shuffle=False
)
return loader
# can also return multiple dataloaders
def val_dataloader(self):
return [loader_a, loader_b, ..., loader_n]
Note:
If you don't need a validation dataset and a :meth:`validation_step`, you don't need to
implement this method.
Note:
In the case where you return multiple validation dataloaders, the :meth:`validation_step`
will have an argument ``dataloader_idx`` which matches the order here.
"""
validation_epoch_end(self, outputs: List[Dict[str, numpy.ndarray]])
inherited
¶
Aggregates validation_step outputs to compute and log the validation macro F1 and top K metrics.
Parameters:
Name | Type | Description | Default |
---|---|---|---|
outputs |
List[dict] |
list of output dictionaries from each validation step containing y_pred and y_true. |
required |
Source code in zamba/models/slowfast_models.py
def validation_epoch_end(self, outputs: List[Dict[str, np.ndarray]]):
"""Aggregates validation_step outputs to compute and log the validation macro F1 and top K
metrics.
Args:
outputs (List[dict]): list of output dictionaries from each validation step
containing y_pred and y_true.
"""
y_true, y_pred, y_proba = self.aggregate_step_outputs(outputs)
self.compute_and_log_metrics(y_true, y_pred, y_proba, subset="val")
validation_step(self, batch, batch_idx)
inherited
¶
Operates on a single batch of data from the validation set. In this step you'd might generate examples or calculate anything of interest like accuracy.
.. code-block:: python
# the pseudocode for these calls
val_outs = []
for val_batch in val_data:
out = validation_step(val_batch)
val_outs.append(out)
validation_epoch_end(val_outs)
Parameters:
Name | Type | Description | Default |
---|---|---|---|
batch |
class: |
required | |
batch_idx |
int |
The index of this batch |
required |
dataloader_idx |
int |
The index of the dataloader that produced this batch (only if multiple val dataloaders used) |
required |
Returns:
Type | Description |
---|---|
|
.. code-block:: python
# pseudocode of order
val_outs = []
for val_batch in val_data:
out = validation_step(val_batch)
if defined("validation_step_end"):
out = validation_step_end(out)
val_outs.append(out)
val_outs = validation_epoch_end(val_outs)
.. code-block:: python
# if you have one val dataloader:
def validation_step(self, batch, batch_idx):
...
# if you have multiple val dataloaders:
def validation_step(self, batch, batch_idx, dataloader_idx):
...
Examples::
# CASE 1: A single validation dataset
def validation_step(self, batch, batch_idx):
x, y = batch
# implement your own
out = self(x)
loss = self.loss(out, y)
# log 6 example images
# or generated text... or whatever
sample_imgs = x[:6]
grid = torchvision.utils.make_grid(sample_imgs)
self.logger.experiment.add_image('example_images', grid, 0)
# calculate acc
labels_hat = torch.argmax(out, dim=1)
val_acc = torch.sum(y == labels_hat).item() / (len(y) * 1.0)
# log the outputs!
self.log_dict({'val_loss': loss, 'val_acc': val_acc})
If you pass in multiple val dataloaders, :meth:validation_step
will have an additional argument.
.. code-block:: python
# CASE 2: multiple validation dataloaders
def validation_step(self, batch, batch_idx, dataloader_idx):
# dataloader_idx tells you which dataset this is.
...
!!! note If you don't need to validate you don't need to implement this method.
!!! note
When the :meth:validation_step
is called, the model has been put in eval mode
and PyTorch gradients have been disabled. At the end of validation,
the model goes back to training mode and gradients are enabled.
Source code in zamba/models/slowfast_models.py
def validation_step(self, batch, batch_idx):
x, y = batch
y_hat = self(x)
loss = F.binary_cross_entropy_with_logits(y_hat, y)
self.log("val_loss", loss.detach())
y_proba = torch.sigmoid(y_hat.cpu()).numpy()
return {
"y_true": y.cpu().numpy().astype(int),
"y_pred": y_proba.round().astype(int),
"y_proba": y_proba,
}
validation_step_end(self, *args, **kwargs) -> Union[torch.Tensor, Dict[str, Any]]
inherited
¶
Use this when validating with dp or ddp2 because :meth:validation_step
will operate on only part of the batch. However, this is still optional
and only needed for things like softmax or NCE loss.
!!! note If you later switch to ddp or some other mode, this will still be called so that you don't have to change your code.
.. code-block:: python
# pseudocode
sub_batches = split_batches_for_dp(batch)
batch_parts_outputs = [validation_step(sub_batch) for sub_batch in sub_batches]
validation_step_end(batch_parts_outputs)
Parameters:
Name | Type | Description | Default |
---|---|---|---|
batch_parts_outputs |
What you return in :meth: |
required |
Returns:
Type | Description |
---|---|
Union[torch.Tensor, Dict[str, Any]] |
None or anything |
.. code-block:: python
# WITHOUT validation_step_end
# if used in DP or DDP2, this batch is 1/num_gpus large
def validation_step(self, batch, batch_idx):
# batch is 1/num_gpus big
x, y = batch
out = self.encoder(x)
loss = self.softmax(out)
loss = nce_loss(loss)
self.log("val_loss", loss)
# --------------
# with validation_step_end to do softmax over the full batch
def validation_step(self, batch, batch_idx):
# batch is 1/num_gpus big
x, y = batch
out = self(x)
return out
def validation_step_end(self, val_step_outputs):
for out in val_step_outputs:
...
See Also:
See the :ref:advanced/multi_gpu:Multi-GPU training
guide for more details.
Source code in zamba/models/slowfast_models.py
def validation_step_end(self, *args, **kwargs) -> Optional[STEP_OUTPUT]:
"""
Use this when validating with dp or ddp2 because :meth:`validation_step`
will operate on only part of the batch. However, this is still optional
and only needed for things like softmax or NCE loss.
Note:
If you later switch to ddp or some other mode, this will still be called
so that you don't have to change your code.
.. code-block:: python
# pseudocode
sub_batches = split_batches_for_dp(batch)
batch_parts_outputs = [validation_step(sub_batch) for sub_batch in sub_batches]
validation_step_end(batch_parts_outputs)
Args:
batch_parts_outputs: What you return in :meth:`validation_step`
for each batch part.
Return:
None or anything
.. code-block:: python
# WITHOUT validation_step_end
# if used in DP or DDP2, this batch is 1/num_gpus large
def validation_step(self, batch, batch_idx):
# batch is 1/num_gpus big
x, y = batch
out = self.encoder(x)
loss = self.softmax(out)
loss = nce_loss(loss)
self.log("val_loss", loss)
# --------------
# with validation_step_end to do softmax over the full batch
def validation_step(self, batch, batch_idx):
# batch is 1/num_gpus big
x, y = batch
out = self(x)
return out
def validation_step_end(self, val_step_outputs):
for out in val_step_outputs:
...
See Also:
See the :ref:`advanced/multi_gpu:Multi-GPU training` guide for more details.
"""
write_prediction(self, name: str, value: Union[torch.Tensor, List[torch.Tensor]], filename: str = 'predictions.pt')
inherited
¶
Write predictions to disk using torch.save
Example::
self.write_prediction('pred', torch.tensor(...), filename='my_predictions.pt')
Parameters:
Name | Type | Description | Default |
---|---|---|---|
name |
str |
a string indicating the name to save the predictions under |
required |
value |
Union[torch.Tensor, List[torch.Tensor]] |
the predictions, either a single :class: |
required |
filename |
str |
name of the file to save the predictions to |
'predictions.pt' |
!!! note
when running in distributed mode, calling write_prediction
will create a file for
each device with respective names: filename_rank_0.pt
, filename_rank_1.pt
, ...
.. deprecated::v1.3 Will be removed in v1.5.0.
Source code in zamba/models/slowfast_models.py
def write_prediction(
self, name: str, value: Union[torch.Tensor, List[torch.Tensor]], filename: str = "predictions.pt"
):
"""
Write predictions to disk using ``torch.save``
Example::
self.write_prediction('pred', torch.tensor(...), filename='my_predictions.pt')
Args:
name: a string indicating the name to save the predictions under
value: the predictions, either a single :class:`~torch.Tensor` or a list of them
filename: name of the file to save the predictions to
Note:
when running in distributed mode, calling ``write_prediction`` will create a file for
each device with respective names: ``filename_rank_0.pt``, ``filename_rank_1.pt``, ...
.. deprecated::v1.3
Will be removed in v1.5.0.
"""
rank_zero_deprecation(
"LightningModule method `write_prediction` was deprecated in v1.3 and will be removed in v1.5."
)
self.trainer._evaluation_loop.predictions._add_prediction(name, value, filename)
write_prediction_dict(self, predictions_dict: Dict[str, Any], filename: str = 'predictions.pt')
inherited
¶
Write a dictonary of predictions to disk at once using torch.save
Example::
pred_dict = {'pred1': torch.tensor(...), 'pred2': torch.tensor(...)}
self.write_prediction_dict(pred_dict)
Parameters:
Name | Type | Description | Default |
---|---|---|---|
predictions_dict |
Dict[str, Any] |
dict containing predictions, where each prediction should
either be single :class: |
required |
!!! note
when running in distributed mode, calling write_prediction_dict
will create a file for
each device with respective names: filename_rank_0.pt
, filename_rank_1.pt
, ...
.. deprecated::v1.3 Will be removed in v1.5.0.
Source code in zamba/models/slowfast_models.py
def write_prediction_dict(self, predictions_dict: Dict[str, Any], filename: str = "predictions.pt"):
"""
Write a dictonary of predictions to disk at once using ``torch.save``
Example::
pred_dict = {'pred1': torch.tensor(...), 'pred2': torch.tensor(...)}
self.write_prediction_dict(pred_dict)
Args:
predictions_dict: dict containing predictions, where each prediction should
either be single :class:`~torch.Tensor` or a list of them
Note:
when running in distributed mode, calling ``write_prediction_dict`` will create a file for
each device with respective names: ``filename_rank_0.pt``, ``filename_rank_1.pt``, ...
.. deprecated::v1.3
Will be removed in v1.5.0.
"""
rank_zero_deprecation(
"LightningModule method `write_prediction_dict` was deprecated in v1.3 and will be removed in v1.5."
)
for k, v in predictions_dict.items():
self.write_prediction(k, v, filename)
xpu(self: ~T, device: Union[int, torch.device] = None) -> ~T
inherited
¶
Moves all model parameters and buffers to the XPU.
This also makes associated parameters and buffers different objects. So it should be called before constructing optimizer if the module will live on XPU while being optimized.
.. note:: This method modifies the module in-place.
Parameters:
Name | Type | Description | Default |
---|---|---|---|
device |
int |
if specified, all parameters will be copied to that device |
None |
Returns:
Type | Description |
---|---|
Module |
self |
Source code in zamba/models/slowfast_models.py
def xpu(self: T, device: Optional[Union[int, device]] = None) -> T:
r"""Moves all model parameters and buffers to the XPU.
This also makes associated parameters and buffers different objects. So
it should be called before constructing optimizer if the module will
live on XPU while being optimized.
.. note::
This method modifies the module in-place.
Arguments:
device (int, optional): if specified, all parameters will be
copied to that device
Returns:
Module: self
"""
return self._apply(lambda t: t.xpu(device))
zero_grad(self, set_to_none: bool = False) -> None
inherited
¶
Sets gradients of all model parameters to zero. See similar function
under :class:torch.optim.Optimizer
for more context.
Parameters:
Name | Type | Description | Default |
---|---|---|---|
set_to_none |
bool |
instead of setting to zero, set the grads to None.
See :meth: |
False |
Source code in zamba/models/slowfast_models.py
def zero_grad(self, set_to_none: bool = False) -> None:
r"""Sets gradients of all model parameters to zero. See similar function
under :class:`torch.optim.Optimizer` for more context.
Args:
set_to_none (bool): instead of setting to zero, set the grads to None.
See :meth:`torch.optim.Optimizer.zero_grad` for details.
"""
if getattr(self, '_is_replica', False):
warnings.warn(
"Calling .zero_grad() from a module created with nn.DataParallel() has no effect. "
"The parameters are copied (in a differentiable manner) from the original module. "
"This means they are not leaf nodes in autograd and so don't accumulate gradients. "
"If you need gradients in your forward method, consider using autograd.grad instead.")
for p in self.parameters():
if p.grad is not None:
if set_to_none:
p.grad = None
else:
if p.grad.grad_fn is not None:
p.grad.detach_()
else:
p.grad.requires_grad_(False)
p.grad.zero_()