Models

Most modern deep learning models are based on artificial neural networks, specifically convolutional neural networks (CNN). During the training process, algorithms use unknown elements in the input distribution to extract features, group objects, and discover useful data patterns. Much like training machines for self-learning, this occurs at multiple levels, using the algorithms to make a inference on a image without annotation at the end.

While not one network is considered perfect, some algorithms are better suited to perform specific tasks or extract specific patterns.

Here some models available in GDL.

Segmentation

UNet

Unet is a fully convolution neural network for image semantic segmentation. Consist of encoder and decoder parts connected with skip connections. Encoder extract features of different spatial resolution (skip connections) which are used by decoder to define accurate segmentation mask. Use concatenation for fusing decoder blocks with skip connections.

Here some implementation found in the config model folder.

class models.unet.UNetSmall(*args: Any, **kwargs: Any)[source]

Main UNetSmall architecture, less deep version of Unet

__init__(classes, in_channels, dropout=False, prob=0.5)[source]

Initialize the UNetSmall.

Parameters:
  • classes (int) – number of classes for output mask (or you can think as a number of channels of output mask).

  • in_channels (int) – number of input channels for the model, default is 3 (RGB images).

  • dropout (bool, optional) – spatial dropout rate in range. Defaults to False.

  • prob (float, optional) – dropout probability. Defaults to 0.5.

forward(input_data)[source]

Foward function use during trainning.

Parameters:

input_data (Tensor) – tensor containing the image.

Returns:

tensor containing the result from the model.

Return type:

Tensor

class models.unet.UNet(*args: Any, **kwargs: Any)[source]

Main UNet architecture

__init__(classes, in_channels, dropout=False, prob=0.5)[source]

Initialize the UNet.

Parameters:
  • classes (int) – number of classes for output mask (or you can think as a number of channels of output mask).

  • in_channels (int) – number of input channels for the model, default is 3 (RGB images).

  • dropout (bool, optional) – spatial dropout rate in range. Defaults to False.

  • prob (float, optional) – dropout probability. Defaults to 0.5.

forward(input_data)[source]

Foward function use during trainning.

Parameters:

input_data (Tensor) – tensor containing the image.

Returns:

tensor containing the result from the model.

Return type:

Tensor

And an implementation from smp model library. Plus, the folder contains some specific combinaisons the smp model like : unet++, unet pretrained on imagenet, unet with senet154 encoder, unet with resnext101 encoder and more. We invite you to see the config model folder to the complete list on different combinaisons.

DeepLabV3

DeepLabV3 implementation of Rethinking Atrous Convolution for Semantic Image Segmentation paper from smp model library.

Also from the same library, another version of DeepLabV3, named DeepLabV3+ of the Encoder-Decoder with Atrous Separable Convolution for Semantic Image Segmentation paper.

GDL have his own version of DeepLabV3 with a dualhead that concatenate at a specify point.

class models.deeplabv3_dualhead.DeepLabV3_dualhead(*args: Any, **kwargs: Any)[source]

Create a model where two models concatenate at a specific point.

This method copy the model in input when initialize, copy it, change the input of the second model for the input dimension of the second entry. Concatenate the two models at a specific point chosen when initialize. Like that we have a new model that take two entries with differents depth and combine it to have an ouput with the depth of corresponding at the number of classes.

Note

Only available for DeeplabV3 with a backbone of a Resnet101.

__init__(encoder_name: str = 'resnet34', encoder_depth: int = 5, encoder_weights: str = 'imagenet', decoder_channels: int = 256, in_channels: int = 3, classes: int = 1, activation: Optional[str] = None, upsampling: int = 8, aux_params: Optional[dict] = None, conc_point: str = 'conv1')[source]

Initialization all the part needed for the dualhead. modelRGB, containing all the backbone layers before the concatenation point. modelNIR, containing all the backbone layers before the concatenation point and have the number of channels in input change for 1. leftover, containing all the backbone layers after the concatenation point. For some points, this variables will be empty, since we concatenate after the backbone. conv1x1, an conv2D layer that will be use after the concatenation to go back at the depth of one model, since the concatenation operation will double the depth.

Parameters:
  • encoder_name (str, optional) – name of the encoder use for the DeepLabV3 network. Defaults to “resnet34”.

  • encoder_depth (int, optional) – depth of the network. Defaults to 5.

  • encoder_weights (str, optional) – name of the weith use tu initialize DeepLabV3. Defaults to “imagenet”.

  • decoder_channels (int, optional) – size of the decoder. Defaults to 256.

  • in_channels (int, optional) – number of channels for the input. Defaults to 3.

  • classes (int, optional) – number of classes wanted to predict. Defaults to 1.

  • activation (str, optional) – _description_. Defaults to None.

  • upsampling (int, optional) – level of upsampling. Defaults to 8.

  • aux_params (dict, optional) – other parameter for the DeepLabV3. Defaults to None.

  • conc_point (str, optional) – name of the layer where the concatenation have place. Defaults to ‘conv1’.

forward(inputs)[source]

Foward function use during trainning.

Accepting a list of Tensors in input data to return a Tensor of output data. With two tensor as input, the firt one containing the RGB images, and the second one containing the modalitie.

Note

for now this only accept NIR as second entry, with a shape of [1, h, w].

The result of each partial model (RGB and the other) are concatenate on the depth dimension and pass by a convolution operation to recover the depth taht match the leftover entry. Follow by the classifier and the interpolation to return a Tensor with the same [h, w] then the input Tensor. :param inputs: List containing two Tensors, one containing the RGB tensor

and the other containing the NIR tensor.

Returns:

Result from the bland of the two models.

Return type:

Tensor

static split_RGB_NIR(inputs)[source]

Split RGB and NIR in input imagery being fed to models for training.

Parameters:

inputs (Tensor) – Images with 4 channels RGBN, shape (N, C, H, W).

Returns:

two tensors, one for all but last channel with shape (N, C-1, H, W)

and the other for NIR with shape (N, 1, H, W).

Return type:

Tensor