{
  "nbformat": 4,
  "nbformat_minor": 0,
  "metadata": {
    "kernelspec": {
      "name": "python3",
      "display_name": "Python 3"
    },
    "language_info": {
      "codemirror_mode": {
        "name": "ipython",
        "version": 3
      },
      "file_extension": ".py",
      "mimetype": "text/x-python",
      "name": "python",
      "nbconvert_exporter": "python",
      "pygments_lexer": "ipython3",
      "version": "3.6.9"
    },
    "colab": {
      "name": "notebook1_pytorch_demo.ipynb",
      "provenance": [],
      "collapsed_sections": []
    },
    "accelerator": "GPU"
  },
  "cells": [
    {
      "cell_type": "markdown",
      "metadata": {
        "id": "Igb2TlfTOBn3"
      },
      "source": [
        "# Implementing and training a one hidden layer fully-connected network to classify MNIST digits using PyTorch."
      ]
    },
    {
      "cell_type": "markdown",
      "metadata": {
        "id": "2kiLr6UMOBn6"
      },
      "source": [
        "This notebook illustrates how we would iterate through the following workflow in PyTorch:\n",
        "* Defining a model\n",
        "* Fetching batches of training data\n",
        "* Obtaining model predictions\n",
        "* Computing the loss\n",
        "* Computing gradient of the loss wrt model parameters\n",
        "* Updating the model parameters using some gradient-based optimization method.\n",
        "\n",
        "Acknowledgments: based on official PyTorch tutorials."
      ]
    },
    {
      "cell_type": "code",
      "metadata": {
        "id": "NQoMwXaDOBn8"
      },
      "source": [
        "import numpy as np\n",
        "import matplotlib.pyplot as plt\n",
        "\n",
        "import torch\n",
        "import torch.nn as nn\n",
        "import torch.nn.functional as F\n",
        "import torch.optim as optim\n",
        "from torch.utils.data import Dataset, DataLoader\n",
        "from torchvision import datasets, transforms\n",
        "\n",
        "# You should set a random seed to ensure that your results are reproducible.\n",
        "torch.manual_seed(0)\n",
        "use_cuda = torch.cuda.is_available()\n",
        "device = torch.device('cuda' if use_cuda else 'cpu')\n",
        "    \n",
        "print(\"Using GPU: {}\".format(use_cuda))"
      ],
      "execution_count": null,
      "outputs": []
    },
    {
      "cell_type": "markdown",
      "metadata": {
        "id": "SK29LbIJOBoD"
      },
      "source": [
        "### Defining a model\n",
        "\n",
        "It's usually a good idea to define your model as classes which inherit from `nn.Module`. The parameters of any submodules of the `torch.nn` module that you declare attributes of the class (e.g. `nn.Linear` below) will automatically be registered as parameters of your model, which will prove to be convenient when constructing an optimizer, as we shall see below."
      ]
    },
    {
      "cell_type": "code",
      "metadata": {
        "id": "I-HrY-lyOBoF"
      },
      "source": [
        "class OneHiddenLayerMNISTClassifier(nn.Module):\n",
        "    # Define entities containing model weights in the constructor.\n",
        "    def __init__(self, n_hidden):\n",
        "        super().__init__()\n",
        "        self.linear1 = nn.Linear(\n",
        "            in_features=784, out_features=n_hidden, bias=True\n",
        "        )\n",
        "        self.linear2 = nn.Linear(\n",
        "            in_features=n_hidden, out_features=10, bias=True\n",
        "        )\n",
        "\n",
        "    # Then, all you need to do is implement a `forward` method to define the\n",
        "    # computation that takes place on the forward pass. A corresponding\n",
        "    # `backward` method, which computes gradients, is automatically defined!\n",
        "    def forward(self, inputs):\n",
        "        h = self.linear1(inputs.view(-1, 784))\n",
        "        h = F.relu(h)\n",
        "        h = self.linear2(h)\n",
        "        return F.log_softmax(h, dim=1)"
      ],
      "execution_count": null,
      "outputs": []
    },
    {
      "cell_type": "code",
      "metadata": {
        "id": "361GWOfQGCaB"
      },
      "source": [
        "# (B, 1, 28, 28) -> (B, 784)"
      ],
      "execution_count": null,
      "outputs": []
    },
    {
      "cell_type": "markdown",
      "metadata": {
        "id": "RSNXYF4uOBoL"
      },
      "source": [
        "We'll also define some utility functions that perform a single iteration of training and evalution:"
      ]
    },
    {
      "cell_type": "code",
      "metadata": {
        "id": "2RrgdRSWOBoM"
      },
      "source": [
        "def train(model, train_loader, optimizer, epoch, log_interval=100):\n",
        "    \"\"\"\n",
        "    A utility function that performs a basic training loop.\n",
        "\n",
        "    For each batch in the training set, fetched using `train_loader`:\n",
        "        - Zeroes the gradient used by `optimizer`\n",
        "        - Performs forward pass through `model` on the given batch\n",
        "        - Computes loss on batch\n",
        "        - Performs backward pass\n",
        "        - `optimizer` updates model parameters using computed gradient\n",
        "\n",
        "    Prints the training loss on the current batch every `log_interval` batches.\n",
        "    \"\"\"\n",
        "    for batch_idx, (inputs, targets) in enumerate(train_loader):\n",
        "        # We need to send our batch to the device we are using. If this is not\n",
        "        # it will default to using the CPU.\n",
        "        inputs = inputs.to(device)\n",
        "        targets = targets.to(device)\n",
        "        \n",
        "        # Zeroes the gradient used by `optimizer`; NOTE: if this is not done,\n",
        "        # then gradients will be accumulated across batches!\n",
        "        optimizer.zero_grad()\n",
        "\n",
        "        # Performs forward pass through `model` on the given batch; equivalent\n",
        "        # to `model.forward(inputs)`. Any information needed to compute\n",
        "        # gradients is automatically thanks to autograd running under the hood.\n",
        "        outputs = model(inputs)\n",
        "\n",
        "        # Computes loss on batch; `F.nll_loss` computes the mean negative log-\n",
        "        # likelihood on the batch.\n",
        "        loss = F.nll_loss(outputs, targets)\n",
        "\n",
        "        # Performs backward pass; steps backward through the computation graph,\n",
        "        # computing the gradient of the loss wrt model parameters.\n",
        "        loss.backward()\n",
        "\n",
        "        # `optimizer` updates model parameters using computed gradient.\n",
        "        optimizer.step()\n",
        "\n",
        "        # Prints the training loss on the current batch every `log_interval`\n",
        "        # batches.\n",
        "        if batch_idx % log_interval == 0:\n",
        "            print(\n",
        "                \"Train Epoch: {:02d} -- Batch: {:03d} -- Loss: {:.4f}\".format(\n",
        "                    epoch,\n",
        "                    batch_idx,\n",
        "                    # Calling `loss.item()` returns the scalar loss as a Python\n",
        "                    # number.\n",
        "                    loss.item(),\n",
        "                )\n",
        "            )\n",
        "\n",
        "\n",
        "def test(model, test_loader):\n",
        "    \"\"\"\n",
        "    A utility function to compute the loss and accuracy on a test set by\n",
        "    iterating through the test set using the provided `test_loader` and\n",
        "    accumulating the loss and accuracy on each batch.\n",
        "    \"\"\"\n",
        "    test_loss = 0.0\n",
        "    correct = 0\n",
        "\n",
        "    # You should use the `torch.no_grad()` context when you want to perform a\n",
        "    # forward pass but do not need gradients. This effectively disables\n",
        "    # autograd and results in fewer resources being used to perform the forward\n",
        "    # pass (since information needed to compute gradients is not logged).\n",
        "    with torch.no_grad():\n",
        "        for inputs, targets in test_loader:\n",
        "            inputs = inputs.to(device)\n",
        "            targets = targets.to(device)\n",
        "            outputs = model(inputs)\n",
        "            # We use `reduction=\"sum\"` to aggregate losses across batches using\n",
        "            # summation instead of taking the mean - we will take the mean at\n",
        "            # the end once we have accumulated all the losses.\n",
        "            test_loss += F.nll_loss(outputs, targets, reduction=\"sum\").item()\n",
        "            pred = outputs.argmax(dim=1, keepdim=True)\n",
        "            correct += pred.eq(targets.view_as(pred)).sum().item()\n",
        "\n",
        "    test_loss /= len(test_loader.dataset)\n",
        "\n",
        "    print(\n",
        "        \"\\nTest set: Average loss: {:.4f}, Accuracy: {:.4f}\\n\".format(\n",
        "            test_loss, correct / len(test_loader.dataset)\n",
        "        )\n",
        "    )"
      ],
      "execution_count": null,
      "outputs": []
    },
    {
      "cell_type": "markdown",
      "metadata": {
        "id": "D3RAlOr4OBoR"
      },
      "source": [
        "### Fetching data\n",
        "\n",
        "* https://pytorch.org/docs/stable/data.html\n",
        "* https://pytorch.org/tutorials/beginner/data_loading_tutorial.html"
      ]
    },
    {
      "cell_type": "code",
      "metadata": {
        "id": "1oyUl62SOBoT"
      },
      "source": [
        "transform = transforms.Compose(\n",
        "    [transforms.ToTensor(), transforms.Normalize((0.1307,), (0.3081,))]\n",
        ")\n",
        "\n",
        "train_dat = datasets.MNIST(\n",
        "    \"data/\", train=True, download=True, transform=transform\n",
        ")\n",
        "test_dat = datasets.MNIST(\"data/\", train=False, transform=transform)\n",
        "\n",
        "sample_image, sample_target = train_dat[0]\n",
        "\n",
        "print(sample_target)\n",
        "plt.imshow(sample_image.squeeze(), cmap='gray')\n",
        "\n",
        "print(sample_image.shape)"
      ],
      "execution_count": null,
      "outputs": []
    },
    {
      "cell_type": "code",
      "metadata": {
        "id": "rFFQ-iUeOBoX"
      },
      "source": [
        "# Create dataloaders\n",
        "train_loader = DataLoader(train_dat, batch_size=64, shuffle=True)\n",
        "test_loader = DataLoader(test_dat, batch_size=1024, shuffle=False)\n",
        "\n",
        "it = iter(train_loader)\n",
        "sample_inputs, sample_targets = next(it)\n",
        "\n",
        "print(sample_inputs.shape, sample_targets.shape, '\\n')\n",
        "print(sample_inputs)\n",
        "print(sample_targets)"
      ],
      "execution_count": null,
      "outputs": []
    },
    {
      "cell_type": "markdown",
      "metadata": {
        "id": "UQSpdPWsOBoa"
      },
      "source": [
        "### Running the training loop"
      ]
    },
    {
      "cell_type": "code",
      "metadata": {
        "id": "Kz-c759yOBoc"
      },
      "source": [
        "# Create instance of model; we also need to call `.to(device)` to\n",
        "# indicate the device we would like to use. This defaults to CPU if\n",
        "# not specified.\n",
        "model = OneHiddenLayerMNISTClassifier(n_hidden=32).to(device)\n",
        "\n",
        "# Create instance of optimizer\n",
        "optimizer = optim.SGD(model.parameters(), lr=0.01, momentum=0.5)\n",
        "\n",
        "# Train-test loop\n",
        "for epoch in range(3):\n",
        "    train(model, train_loader, optimizer, epoch)\n",
        "    test(model, test_loader)"
      ],
      "execution_count": null,
      "outputs": []
    },
    {
      "cell_type": "markdown",
      "metadata": {
        "id": "CRKB7H3LOBoh"
      },
      "source": [
        "### Saving and loading model weights"
      ]
    },
    {
      "cell_type": "code",
      "metadata": {
        "id": "o85pY_3ZOBoi"
      },
      "source": [
        "# Saving / restoring model weights\n",
        "torch.save(model.state_dict(), \"mnist_fc_model.pt\")\n",
        "model.load_state_dict(torch.load(\"mnist_fc_model.pt\"))"
      ],
      "execution_count": null,
      "outputs": []
    }
  ]
}