{ "nbformat": 4, "nbformat_minor": 0, "metadata": { "colab": { "name": "T02_Padding_and_Stride_Tutorial.ipynb", "provenance": [] }, "kernelspec": { "name": "python3", "display_name": "Python 3" } }, "cells": [ { "cell_type": "markdown", "metadata": { "id": "MU9igqUmmzl4" }, "source": [ "#Introduction\n", "When working with neural networks, it's fundamental to keep in mind the **dimension**, or shape, of the tensors we're working with. Wrongly shaped tensors can break a computational graph, causing an error when using machine learning libraries. \n", "When we use fully connected layers, it is generally easy to keep in mind what's the output dimension of a tensor: after all, a **fully connected layer is simply a linear matrix multiplication** followed by a non-linearity. When using libraries like PyTorch or TensorFlow, we explicitly specify the input dimension of the tensor, the output dimension we want, and we know what to expect as an output. \n", "\n", "Things are a bit more complicated when we use **convolutional layers**. You're probably familiar with the convolution operation done by a **convolutional filter**. A filter, say, a $(3 \\times 3)$ filter, slides over the input matrix of dimension $(m \\times n)$ (for simplicity I'm not considering the third dimensions of both the filter and the input tensor, since they are not relevant to this tutorial). How do you predict the output shape? And more importantly, how can you control the output shape to be the one you need for layers futher down in the computational graph?\n", "\n", "In this tutorial, we will see two parameters that allow us to change the behaviour of the convolutional filters, effectively changing the shape of the output tensor: **padding** and **stride**." ] }, { "cell_type": "markdown", "metadata": { "id": "w16m5Ym0o4yy" }, "source": [ "![link image](https://d2l.ai/_images/correlation.svg)\n", "\n", "(*images taken from https://d2l.ai/chapter_convolutional-neural-networks/padding-and-strides.html*)\n", "\n", "Given the input tensor *Input* and the convolutional filter (or kernel) *Kernel*, the output of the convolutional operation, as generally defined in the Deep Learning community, is reported in the image. The $(2 \\times 2)$ kernel slides over the image, shifting of one position to the right at each step, and going down of one row each time the previous row is completed. In this case, the output shape is $(2 \\times 2)$.\n", "\n", "In this case the shape of the input is effectively reduced. This can be undesired, especially if we plan to use several convolutional filters. \n", "\n", "*Question 1: What would be the shape of the output if we applied the same kernel of the figure the the Output tensor given in the figure?*\n", "\n", "#Padding\n", "\n", "A technique to effectively allow the output tensor to have the same or even a larger shape than the input tensor is called **padding**. This is nothing more than \"surrounding\" the input matrix with zeros. Thus, we add two or more columns and rows around the original matrix, and normally apply the filter (which now start further up and left than it did before). \n", "\n", "![](https://d2l.ai/_images/conv-pad.svg)\n", "\n", "In the prevous figure you can see how padding can be used in this case. Notice how the output tensor is now actually larger than the original, unpadded input matrix. If we were to apply another kernel with the same shape, the output dimension would be $(3 \\times 3)$ again.\n", "\n", "In the following example, we show how padding works in PyTorch, and how it affects the output dimensionality of the input tensor. We use a $(3 \\times 3)$ kernel and padding of 1 (i.e. we add one more row and column on each side of the matrix, for a total of two additional rows and two additional columns)." ] }, { "cell_type": "code", "metadata": { "colab": { "base_uri": "https://localhost:8080/" }, "id": "R8uTKJqmsKKl", "outputId": "38d66436-e82a-4856-d1a1-767b5d3f6f56" }, "source": [ "\"\"\"\n", "Code taken from https://d2l.ai/chapter_convolutional-neural-networks/padding-and-strides.html\n", "\"\"\"\n", "\n", "import torch\n", "from torch import nn\n", "\n", "# We define a convenience function to calculate the convolutional layer. This\n", "# function initializes the convolutional layer weights and performs\n", "# corresponding dimensionality elevations and reductions on the input and\n", "# output\n", "def comp_conv2d(conv2d, X):\n", " # Here (1, 1) indicates that the batch size and the number of channels\n", " # are both 1\n", " X = X.reshape((1, 1) + X.shape)\n", " Y = conv2d(X)\n", " # Exclude the first two dimensions that do not interest us: examples and\n", " # channels\n", " return Y.reshape(Y.shape[2:])\n", "# Note that here 1 row or column is padded on either side, so a total of 2\n", "# rows or columns are added\n", "conv2d = nn.Conv2d(1, 1, kernel_size=3, padding=1)\n", "X = torch.rand(size=(8, 8))\n", "comp_conv2d(conv2d, X).shape" ], "execution_count": null, "outputs": [ { "output_type": "execute_result", "data": { "text/plain": [ "torch.Size([8, 8])" ] }, "metadata": { "tags": [] }, "execution_count": 1 } ] }, { "cell_type": "code", "metadata": { "id": "i7CpvCGUsKj2" }, "source": [ "" ], "execution_count": null, "outputs": [] }, { "cell_type": "markdown", "metadata": { "id": "Snr78sY-syD2" }, "source": [ "Altough rarely done in the Deep Learning community, it is also possible to specify different amounts of paddings for rows and columns, especially useful when the kernels have a different number of rows and columns as well (very rare in practice)." ] }, { "cell_type": "code", "metadata": { "colab": { "base_uri": "https://localhost:8080/" }, "id": "47VM40gas_H0", "outputId": "7cc1b115-2ba9-4a75-af41-16a63d633781" }, "source": [ "# Here, we use a convolution kernel with a height of 5 and a width of 3. The\n", "# padding numbers on either side of the height and width are 2 and 1,\n", "# respectively\n", "conv2d = nn.Conv2d(1, 1, kernel_size=(5, 3), padding=(2, 1))\n", "comp_conv2d(conv2d, X).shape" ], "execution_count": null, "outputs": [ { "output_type": "execute_result", "data": { "text/plain": [ "torch.Size([8, 8])" ] }, "metadata": { "tags": [] }, "execution_count": 2 } ] }, { "cell_type": "markdown", "metadata": { "id": "UBnjVgFBtIGB" }, "source": [ "*Question 2: Let \"input\" be a matrix of shape $(m \\times m)$ and \"kernel\" be a convolutional kernel of shape $(5 \\times 5)$. What padding would you need to apply in order for the output matrix to also have shape $(m \\times m)$?*\n", "\n", "*Question 3: Applying a too large padding parameter can be useless, even if we want to have a large output shape. Try to understand why that's the case by visualizing the following case: imagine you have an input matrix of shape $(4 \\times 4)$ and a convolutional kernel of shape $(3 \\times 3)$. What's the difference between applying a padding of 2 (i.e. adding two rows on the top and two rows on the bottom, for a total of four, and the same for the columns) and a padding of 3? Is it actually useful to go beyond 2?* " ] }, { "cell_type": "markdown", "metadata": { "id": "eVnmPhKluiu0" }, "source": [ "#Stride\n", "\n", "In the previous section, we saw how padding can solve the problem of the output tensor generally having a lower dimension tham the input tensor. When using a series of convolutional filters, this may lead to the output tensor becoming too small too quickly. \n", "\n", "But in certain circumstances, we may actually have the opposite problem: we may want the output tensor to be smaller than it normally would, e.g. when we have a large input matrix. Normally, as we saw before, the convolutional filter slides over the input matrix and moves right by one position at each step, then moves down of one position when the row is completed. At each position we apply the convolutional operation and obtain a new output number for the output tensor. To obtain a smaller shape as an output, we may decide to let the convolutional filter **slide of more than one position at each time, both horizontally and vertically**. We refer to this sliding amount as the **stride**. By default, Deep Learning libraries use a stride of 1, as we always did before, because it is the minimum movement after applying the filter at a certain position. \n", "\n", "![](https://d2l.ai/_images/conv-stride.svg)\n", "\n", "In the previous figure we use a stride of 2 horizontally (sliding right of 2 positions) and 3 vertically (sliding down of 3 when the row is over). Notice that this is relatively unusual, and strides also tend to be equal horizontally and vertically. In the figure, the white and blue areas are mapped to white and blue numbers in the output. Notice how the last column is always ignored, due to the stride not allowing the filter to reach it. The same applied for the third row.\n", "\n", "You can see an example of how to change the stride in PyTorch below, and its effect on the output shape, by comparing it to the previous code cells.\n", "\n" ] }, { "cell_type": "code", "metadata": { "colab": { "base_uri": "https://localhost:8080/" }, "id": "wk9ymUkps_jC", "outputId": "d40e4170-d5f0-414b-c8e0-8f61a56e6774" }, "source": [ "conv2d = nn.Conv2d(1, 1, kernel_size=3, padding=1, stride=2)\n", "comp_conv2d(conv2d, X).shape" ], "execution_count": null, "outputs": [ { "output_type": "execute_result", "data": { "text/plain": [ "torch.Size([4, 4])" ] }, "metadata": { "tags": [] }, "execution_count": 3 } ] }, { "cell_type": "code", "metadata": { "colab": { "base_uri": "https://localhost:8080/" }, "id": "E1V2caBJ0YRp", "outputId": "c1d8f1ae-d670-437d-c23f-107cee7049f8" }, "source": [ "conv2d = nn.Conv2d(1, 1, kernel_size=(3, 5), padding=(0, 1), stride=(3, 4))\n", "comp_conv2d(conv2d, X).shape" ], "execution_count": null, "outputs": [ { "output_type": "execute_result", "data": { "text/plain": [ "torch.Size([2, 2])" ] }, "metadata": { "tags": [] }, "execution_count": 4 } ] }, { "cell_type": "markdown", "metadata": { "id": "HwHqpaNO0qBb" }, "source": [ "*Question 4: compute the output of the convolution operation of the figure below (the same example used previously in this tutorial), but using of $(2, 2)$ first, then $(2,3)$ and finally $(3,3)$. Given the original output tensor, notice how you could easily compute the result by just selecting certain parts of the matrix.*\n", "\n", "![](https://d2l.ai/_images/conv-pad.svg)\n", "\n", "*Question 5: consider the same above figure. Using a padding of 1, the output shape is doubled, going from $(2 \\times 2)$ to $(4 \\times 4)$. At the same time, in the previous question, you should have noticed how applying a stride of $(2,2)$ the output shape is half of the original, becoming again $(2 \\times 2)$. Still, the output matrix is different from simply applying the kernel with zero padding and stride of 1. What are the differences? " ] }, { "cell_type": "code", "metadata": { "id": "BH3xnbwp0bkT" }, "source": [ "" ], "execution_count": null, "outputs": [] } ] }