Module bastionlab.torch.psg.nn

Functions

expanded_convolution(conv_fn: Callable, tuple_fn: Callable[[~T], Tuple[int, ...]]) ‑> Callable

Classes

ConvLinear(in_features: int, out_features: int, max_batch_size: int, bias: bool = True, device: Union[torch.device, str, ForwardRef(None)] = None, dtype: Optional[torch.dtype] = None)

Linear layer with expanded weights that internally uses an expanded 1D convolution.

Refer to the documentation of convolutions for more about the internals and Pytorch's Linear Layer documentation for more about the parameters and their usage.

Initializes internal Module state, shared by both nn.Module and ScriptModule.

Ancestors (in MRO)

torch.nn.modules.module.Module

Class variables

dump_patches: bool :

training: bool :

Methods

forward(self, x) ‑> Callable[..., Any]

Defines the computation performed at every call.

Should be overridden by all subclasses.

.. note:: Although the recipe for forward pass needs to be defined within this function, one should call the :class:Module instance afterwards instead of this since the former takes care of running the registered hooks while the latter silently ignores them.

Embedding(num_embeddings: int, embedding_dim: int, max_batch_size: int, padding_idx: Optional[int] = None, max_norm: Optional[float] = None, norm_type: float = 2.0, scale_grad_by_freq: bool = False, sparse: bool = False, device: Union[torch.device, str, ForwardRef(None)] = None, dtype: Optional[torch.dtype] = None)

Linear layer with expanded weights to be used with DP-SGD.

Weights are expanded to the max_batch_size so that the autodif computes the per-samples gradient needed by the DP-SGD algorithm.

An embedding layer is essentially a lookup table that internally stores all the vectors of the vocabulary and returns the vector associated with each input index. To compute per-sample gradients, we "copy" the lookup table as many times as the maximum number of samples in a batch. The input indexes are offseted by their sample number times the vocabulary size before actually looking up so that each sample uses a different "copy" of the lookup table.

The copy of the lookup table is intself costless as we only use an expanded view (similar to broadcasting). The runtime cost is low as well as we just need to remap the input indexes.

Refer to the Pytorch documentation for more on how to use the various parameters: https://pytorch.org/docs/stable/generated/torch.nn.Embedding.html#torch.nn.Embedding.

Initializes internal Module state, shared by both nn.Module and ScriptModule.

Ancestors (in MRO)

torch.nn.modules.sparse.Embedding
torch.nn.modules.module.Module

Class variables

embedding_dim: int :

max_norm: Optional[float] :

norm_type: float :

num_embeddings: int :

padding_idx: Optional[int] :

scale_grad_by_freq: bool :

sparse: bool :

weight: torch.Tensor :

Methods

extra_repr(self)

Set the extra representation of the module

To print customized extra information, you should re-implement this method in your own modules. Both single-line and multi-line strings are acceptable.

forward(self, x: torch.Tensor) ‑> torch.Tensor

Defines the computation performed at every call.

Should be overridden by all subclasses.

.. note:: Although the recipe for forward pass needs to be defined within this function, one should call the :class:Module instance afterwards instead of this since the former takes care of running the registered hooks while the latter silently ignores them.

LayerNorm(normalized_shape: Union[int, List[int], torch.Size], max_batch_size: int, eps: float = 1e-05, elementwise_affine: bool = True, device: Union[torch.device, str, ForwardRef(None)] = None, dtype: Optional[torch.dtype] = None)

LayerNorm layer with expanded weights to be used with DP-SGD.

Weights are expanded to the max_batch_size so that the autodif computes the per-samples gradient needed by the DP-SGD algorithm.

Expansion is made without copying or allocating more memory as expanded weights are just a view on the original weights (similar to broadcasting).

This comes at no additional cost during the forward pass as LayerNorm involves an affine elementwise operation that can directly be done with the expanded weights with proper views.

Refer to the Pytorch documentation for more on how to use the various parameters: https://pytorch.org/docs/stable/generated/torch.nn.LayerNorm.html#torch.nn.LayerNorm.

Initializes internal Module state, shared by both nn.Module and ScriptModule.

Ancestors (in MRO)

torch.nn.modules.normalization.LayerNorm
torch.nn.modules.module.Module

Class variables

elementwise_affine: bool :

eps: float :

normalized_shape: Tuple[int, ...] :

Methods

extra_repr(self)

Set the extra representation of the module

To print customized extra information, you should re-implement this method in your own modules. Both single-line and multi-line strings are acceptable.

forward(self, x: torch.Tensor) ‑> torch.Tensor

Defines the computation performed at every call.

Should be overridden by all subclasses.

.. note:: Although the recipe for forward pass needs to be defined within this function, one should call the :class:Module instance afterwards instead of this since the former takes care of running the registered hooks while the latter silently ignores them.

Linear(in_features: int, out_features: int, max_batch_size: int, bias: bool = True, device: Union[torch.device, str, ForwardRef(None)] = None, dtype: Optional[torch.dtype] = None)

Linear layer with expanded weights to be used with DP-SGD.

Weights are expanded to the max_batch_size so that the autodif computes the per-samples gradient needed by the DP-SGD algorithm.

Expansion is made without copying or allocating more memory as expanded weights are just a view on the original weights (similar to broadcasting).

However, this implies the forward pass is performed with einsum which may slightly decrese the performance of the computation.

Refer to the Pytorch documentation for more on how to use the various parameters: https://pytorch.org/docs/stable/generated/torch.nn.Linear.html#torch.nn.Linear.

Initializes internal Module state, shared by both nn.Module and ScriptModule.

Ancestors (in MRO)

torch.nn.modules.linear.Linear
torch.nn.modules.module.Module

Class variables

in_features: int :

out_features: int :

weight: torch.Tensor :

Methods

extra_repr(self)

Set the extra representation of the module

To print customized extra information, you should re-implement this method in your own modules. Both single-line and multi-line strings are acceptable.

forward(self, x: torch.Tensor) ‑> torch.Tensor

Defines the computation performed at every call.

Should be overridden by all subclasses.

.. note:: Although the recipe for forward pass needs to be defined within this function, one should call the :class:Module instance afterwards instead of this since the former takes care of running the registered hooks while the latter silently ignores them.

Conv1d(in_channels: int, out_channels: int, kernel_size: ~T, max_batch_size: int, stride: ~T = 1, padding: Union[str, ~T] = 0, dilation: ~T = 1, groups: int = 1, bias: bool = True, padding_mode: str = 'zeros', device: Union[torch.device, str, ForwardRef(None)] = None, dtype: Optional[torch.dtype] = None)

Convolutional layer with expanded weights to be used with DP-SGD.

Weights are expanded to the provided max_batch_size so that the autodiff computes the per-samples gradient needed by the DP-SGD algorithm.

Expansion is made without copying or allocating more memory at the model lifetime scale as expanded weights are just a view on the original weights (similar to broadcasting).

However, weights are reallocated while computing the forward pass for a short amount of time as the forward pass computation needs them in a contiguous format. As layers are typically used one after the other, the overall memory impact is neglectable.

To speed up the computation of the forward pass with expanded weights, we use grouped convolutions with a number of groups equal to the number of samples: the convolution operator uses one kernel group per sample (which makes sample computations independent) and the weights of these are shared thanks to the expansion.

Refer to the Pytorch documentation for more on how to use the various parameters: 1D: https://pytorch.org/docs/stable/generated/torch.nn.Conv1d.html#torch.nn.Conv1d 2D: https://pytorch.org/docs/stable/generated/torch.nn.Conv2d.html#torch.nn.Conv2d 3D: https://pytorch.org/docs/stable/generated/torch.nn.Conv3d.html#torch.nn.Conv3d

Ancestors (in MRO)

bastionlab.torch.psg.nn._ConvNd
torch.nn.modules.conv._ConvNd
torch.nn.modules.module.Module

Class variables

bias: Optional[torch.Tensor] :

dilation: Tuple[int, ...] :

groups: int :

in_channels: int :

kernel_size: Tuple[int, ...] :

out_channels: int :

output_padding: Tuple[int, ...] :

padding: Union[str, Tuple[int, ...]] :

padding_mode: str :

stride: Tuple[int, ...] :

transposed: bool :

weight: torch.Tensor :

Methods

forward(self, x: torch.Tensor) ‑> torch.Tensor :

Conv2d(in_channels: int, out_channels: int, kernel_size: ~T, max_batch_size: int, stride: ~T = 1, padding: Union[str, ~T] = 0, dilation: ~T = 1, groups: int = 1, bias: bool = True, padding_mode: str = 'zeros', device: Union[torch.device, str, ForwardRef(None)] = None, dtype: Optional[torch.dtype] = None)

Convolutional layer with expanded weights to be used with DP-SGD.

Weights are expanded to the provided max_batch_size so that the autodiff computes the per-samples gradient needed by the DP-SGD algorithm.

Expansion is made without copying or allocating more memory at the model lifetime scale as expanded weights are just a view on the original weights (similar to broadcasting).

However, weights are reallocated while computing the forward pass for a short amount of time as the forward pass computation needs them in a contiguous format. As layers are typically used one after the other, the overall memory impact is neglectable.

To speed up the computation of the forward pass with expanded weights, we use grouped convolutions with a number of groups equal to the number of samples: the convolution operator uses one kernel group per sample (which makes sample computations independent) and the weights of these are shared thanks to the expansion.

Refer to the Pytorch documentation for more on how to use the various parameters: 1D: https://pytorch.org/docs/stable/generated/torch.nn.Conv1d.html#torch.nn.Conv1d 2D: https://pytorch.org/docs/stable/generated/torch.nn.Conv2d.html#torch.nn.Conv2d 3D: https://pytorch.org/docs/stable/generated/torch.nn.Conv3d.html#torch.nn.Conv3d

Ancestors (in MRO)

bastionlab.torch.psg.nn._ConvNd
torch.nn.modules.conv._ConvNd
torch.nn.modules.module.Module

Class variables

bias: Optional[torch.Tensor] :

dilation: Tuple[int, ...] :

groups: int :

in_channels: int :

kernel_size: Tuple[int, ...] :

out_channels: int :

output_padding: Tuple[int, ...] :

padding: Union[str, Tuple[int, ...]] :

padding_mode: str :

stride: Tuple[int, ...] :

transposed: bool :

weight: torch.Tensor :

Methods

forward(self, x: torch.Tensor) ‑> torch.Tensor :

Conv3d(in_channels: int, out_channels: int, kernel_size: ~T, max_batch_size: int, stride: ~T = 1, padding: Union[str, ~T] = 0, dilation: ~T = 1, groups: int = 1, bias: bool = True, padding_mode: str = 'zeros', device: Union[torch.device, str, ForwardRef(None)] = None, dtype: Optional[torch.dtype] = None)

Convolutional layer with expanded weights to be used with DP-SGD.

Weights are expanded to the provided max_batch_size so that the autodiff computes the per-samples gradient needed by the DP-SGD algorithm.

Expansion is made without copying or allocating more memory at the model lifetime scale as expanded weights are just a view on the original weights (similar to broadcasting).

However, weights are reallocated while computing the forward pass for a short amount of time as the forward pass computation needs them in a contiguous format. As layers are typically used one after the other, the overall memory impact is neglectable.

To speed up the computation of the forward pass with expanded weights, we use grouped convolutions with a number of groups equal to the number of samples: the convolution operator uses one kernel group per sample (which makes sample computations independent) and the weights of these are shared thanks to the expansion.

Refer to the Pytorch documentation for more on how to use the various parameters: 1D: https://pytorch.org/docs/stable/generated/torch.nn.Conv1d.html#torch.nn.Conv1d 2D: https://pytorch.org/docs/stable/generated/torch.nn.Conv2d.html#torch.nn.Conv2d 3D: https://pytorch.org/docs/stable/generated/torch.nn.Conv3d.html#torch.nn.Conv3d

Ancestors (in MRO)

bastionlab.torch.psg.nn._ConvNd
torch.nn.modules.conv._ConvNd
torch.nn.modules.module.Module

Class variables

bias: Optional[torch.Tensor] :

dilation: Tuple[int, ...] :

groups: int :

in_channels: int :

kernel_size: Tuple[int, ...] :

out_channels: int :

output_padding: Tuple[int, ...] :

padding: Union[str, Tuple[int, ...]] :

padding_mode: str :

stride: Tuple[int, ...] :

transposed: bool :

weight: torch.Tensor :

Methods

forward(self, x: torch.Tensor) ‑> torch.Tensor :