Autoregressive bijections

Autoregressive bijections belong in one of two categories: coupling or masked autoregressive bijections. Architectures like IAF make use of the inverse masked autoregressive bijection, which simply swaps the forward and inverse methods of its corresponding masked autoregressive counterpart. Multiscale architectures are special cases of coupling architectures. Each autoregressive bijection consists of a transformer (parameterized bijection that transforms a part of the input), a conditioner, and a conditioner transform (model that predicts transformer parameters). See the List of transformers and List of conditioner transforms sections for more details. To improve performance, we define subclasses according to the conditioner type. We list these subclasses in the rest of the document.

Coupling bijections

Coupling architectures are compositions of coupling bijections, which extend the following base class:

class torchflows.bijections.finite.autoregressive.layers_base.CouplingBijection(event_shape: ~typing.Tuple[int, ...] | ~torch.Size, transformer_class: ~typing.Type[~torchflows.bijections.finite.autoregressive.transformers.base.TensorTransformer], context_shape: ~typing.Tuple[int, ...] | ~torch.Size = None, coupling: ~torchflows.bijections.finite.autoregressive.conditioning.coupling_masks.PartialCoupling = None, conditioner_transform_class: ~typing.Type[~torchflows.bijections.finite.autoregressive.conditioning.transforms.ConditionerTransform] = <class 'torchflows.bijections.finite.autoregressive.conditioning.transforms.FeedForward'>, coupling_kwargs: dict = None, conditioner_kwargs: dict = None, transformer_kwargs: dict = None, **kwargs)

Base coupling bijection object.

A coupling bijection is defined using a transformer, conditioner transform, and always a coupling conditioner (specifying how to partition the input tensor).

The coupling conditioner receives as input an event tensor \(x\). It then partitions an input event tensor x into a constant part \(x_A\) and a modifiable part \(x_B\). For \(x_A\), the conditioner outputs a set of parameters which is always the same. For \(x_B\), the conditioner outputs a set of parameters which are predicted from \(x_A\). Coupling conditioners differ in the partitioning method. By default, the event is flattened; the first half is \(x_A\) and the second half is \(x_B\). When using this in a normalizing flow, permutation layers can shuffle event dimensions.

For improved performance, this implementation does not use a standalone coupling conditioner, but implements a method to partition x into \(x_A\) and \(x_B\) and then predict parameters for \(x_B\).

__init__(event_shape: ~typing.Tuple[int, ...] | ~torch.Size, transformer_class: ~typing.Type[~torchflows.bijections.finite.autoregressive.transformers.base.TensorTransformer], context_shape: ~typing.Tuple[int, ...] | ~torch.Size = None, coupling: ~torchflows.bijections.finite.autoregressive.conditioning.coupling_masks.PartialCoupling = None, conditioner_transform_class: ~typing.Type[~torchflows.bijections.finite.autoregressive.conditioning.transforms.ConditionerTransform] = <class 'torchflows.bijections.finite.autoregressive.conditioning.transforms.FeedForward'>, coupling_kwargs: dict = None, conditioner_kwargs: dict = None, transformer_kwargs: dict = None, **kwargs)

CouplingBijection constructor.

Parameters:

event_shape (Union[Tuple[int, ...], torch.Size]) – shape of the event tensor.
transformer_class (Type[TensorTransformer]) – transformer class.
context_shape (Union[Tuple[int, ...], torch.Size]) –
coupling (PartialCoupling) –
conditioner_transform_class (Type[ConditionerTransform]) –
coupling_kwargs (Dict) –
conditioner_kwargs (Dict) –
transformer_kwargs (Dict) –
kwargs –

We give an example on how to create a custom coupling bijection using a transformer, coupling strategy, and conditioner transform:

from torchflows.bijections.finite.autoregressive.layers_base import CouplingBijection
from torchflows.bijections.finite.autoregressive.transformers.linear.affine import Affine
from torchflows.bijections.finite.autoregressive.conditioning.coupling_masks import HalfSplit
from torchflows.bijections.finite.autoregressive.conditioning.transforms import ResidualFeedForward

class AffineCoupling(CouplingBijection):
    def __init__(self, event_shape, **kwargs):
        coupling = HalfSplit(event_shape)
        super().__init__(
            event_shape,
            transformer_class=Affine,
            coupling=HalfSplit(event_shape),
            conditioner_transform_class=ResidualFeedForward
        )

event_shape = (10,)  # say we have vectors of size 10
bijection = AffineCoupling(event_shape)  # create the bijection

Masked autoregressive bijections

Masked autoregressive and inverse autoregressive architectures are compositions of their respective bijections, extending one of the following classes:

class torchflows.bijections.finite.autoregressive.layers_base.MaskedAutoregressiveBijection(event_shape: Tuple[int, ...] | Size, transformer_class: Type[ScalarTransformer], context_shape: Tuple[int, ...] | Size = None, transformer_kwargs: dict = None, conditioner_kwargs: dict = None, **kwargs)

Masked autoregressive bijection class.

This bijection is specified with a scalar transformer. Its conditioner is always MADE, which receives as input a tensor x with x.shape = (*batch_shape, *event_shape). MADE outputs parameters h for the scalar transformer with

h.shape = (*batch_shape, *event_shape, *parameter_shape_per_element).

The transformer then applies the bijection elementwise.

__init__(event_shape: Tuple[int, ...] | Size, transformer_class: Type[ScalarTransformer], context_shape: Tuple[int, ...] | Size = None, transformer_kwargs: dict = None, conditioner_kwargs: dict = None, **kwargs)

Bijection constructor.

Parameters:

event_shape – shape of the event tensor.
context_shape – shape of the context tensor.
kwargs – unused.

class torchflows.bijections.finite.autoregressive.layers_base.InverseMaskedAutoregressiveBijection(*args, **kwargs)

__init__(*args, **kwargs)

Bijection constructor.

Parameters:

event_shape – shape of the event tensor.
context_shape – shape of the context tensor.
kwargs – unused.

We give an example on how to create a custom coupling bijection using a transformer, coupling strategy, and conditioner transform:

from torchflows.bijections.finite.autoregressive.layers_base import CouplingBijection
from torchflows.bijections.finite.autoregressive.transformers.linear.affine import Affine
from torchflows.bijections.finite.autoregressive.conditioning.transforms import ResidualFeedForward

class AffineForwardMaskedAutoregressive(MaskedAutoregressiveBijection):
    def __init__(self, event_shape, **kwargs):
        super().__init__(
            event_shape,
            transformer_class=Affine,
            conditioner_transform_class=ResidualFeedForward
        )

class AffineInverseMaskedAutoregressive(InverseMaskedAutoregressiveBijection):
    def __init__(self, event_shape, **kwargs):
        super().__init__(
            event_shape,
            transformer_class=Affine,
            conditioner_transform_class=ResidualFeedForward
        )


# say we have 100 vectors of size 10
event_shape = (10,)
x = torch.randn(size=(100, *event_shape))

bijection = AffineCoupling(event_shape)  # create the bijection
z, log_det_forward = bijection.forward(x)
y, log_det_inverse = bijection.inverse(z)

Multiscale autoregressive bijections

Multiscale architectures are coupling architectures which are specialized for image modeling, extending the class below:

class torchflows.bijections.finite.multiscale.base.MultiscaleBijection(event_shape: ~torch.Size | ~typing.Tuple[int, ...], transformer_class: ~typing.Type[~torchflows.bijections.finite.autoregressive.transformers.base.TensorTransformer], n_blocks: int, n_checkerboard_layers: int = 3, n_channel_wise_layers: int = 3, use_resnet: bool = False, checkerboard_class: ~typing.Type[~torchflows.bijections.finite.multiscale.base.CheckerboardCoupling] | ~typing.Type[~torchflows.bijections.finite.multiscale.base.NormalizedCheckerboardCoupling] | ~typing.Type[~torchflows.bijections.finite.multiscale.base.GlowCheckerboardCoupling] = <class 'torchflows.bijections.finite.multiscale.base.NormalizedCheckerboardCoupling'>, channel_wise_class: ~typing.Type[~torchflows.bijections.finite.multiscale.base.ChannelWiseCoupling] | ~typing.Type[~torchflows.bijections.finite.multiscale.base.NormalizedChannelWiseCoupling] | ~typing.Type[~torchflows.bijections.finite.multiscale.base.GlowChannelWiseCoupling] = <class 'torchflows.bijections.finite.multiscale.base.NormalizedChannelWiseCoupling'>, first_layer: bool = True, **kwargs)

__init__(event_shape: ~torch.Size | ~typing.Tuple[int, ...], transformer_class: ~typing.Type[~torchflows.bijections.finite.autoregressive.transformers.base.TensorTransformer], n_blocks: int, n_checkerboard_layers: int = 3, n_channel_wise_layers: int = 3, use_resnet: bool = False, checkerboard_class: ~typing.Type[~torchflows.bijections.finite.multiscale.base.CheckerboardCoupling] | ~typing.Type[~torchflows.bijections.finite.multiscale.base.NormalizedCheckerboardCoupling] | ~typing.Type[~torchflows.bijections.finite.multiscale.base.GlowCheckerboardCoupling] = <class 'torchflows.bijections.finite.multiscale.base.NormalizedCheckerboardCoupling'>, channel_wise_class: ~typing.Type[~torchflows.bijections.finite.multiscale.base.ChannelWiseCoupling] | ~typing.Type[~torchflows.bijections.finite.multiscale.base.NormalizedChannelWiseCoupling] | ~typing.Type[~torchflows.bijections.finite.multiscale.base.GlowChannelWiseCoupling] = <class 'torchflows.bijections.finite.multiscale.base.NormalizedChannelWiseCoupling'>, first_layer: bool = True, **kwargs)

Bijection constructor.

Parameters:

event_shape – shape of the event tensor.
context_shape – shape of the context tensor.
kwargs – unused.

Autoregressive bijections

Coupling bijections

Masked autoregressive bijections

Multiscale autoregressive bijections

See also