DNN Course

Configuring Google Compute Engine

We are going to use Google Compute Engine in our practicals.
You’ll need a to create an account to access Google Compute Engine, if you already have a Gmail account you are already set. If you don’t have a Gmail account you can either create a Gmail account or register with your own email.
First be sure to redeem your Google Compute Engine coupon by following the instructions in the email you received (you’ll need your @cs.ox.ac.uk or @kellogg.ox.ac.uk email for this)
Once you have successuflly logged in, head over the top left corner and click on the navigation Menu icon (the three stacked horizontal lines, similar to 三, which by the way means “three” in Chinese and you can pronounce it like sān). Then go to Compute Engine –> VM instances. Here you’ll probably see a message saying Compute Engine is getting ready. Wait for a few secs (if this message doesn’t disappear within 1 minute, reload the website)
Go to 三 –> IAM & admin –> Quotas. Here you’ll need to request an increase in GPU quotas before been capable of launching VMs with GPUs. We’ll need to request a raise in two quotas. You can use the Metric menu to filter the entries (you may find useful to first click on None to unselect all entries and then use the top bar to find the ones with the label matching the quotas we want to raise):
- GPUs (all regions) quota: tick the square next to it.
- NVIDIA K80 GPUs (us-east1): tick the square next to it.
- Now that both quotas are selected, click on [+]EDIT QUOTAS (you’ll find it at the top of the web. A new panel will appear on the right hand side prompting you to set the new quota limits. Set both to 4. Under Request description state that you need them for a Deep Learning course in Oxford University. Click Done. Then click Submit request.
- Now, you’ll receive an email notifying you that your request has been submitted. In about 5mins you should receive another email notifying you that the new quotas have been approved. Even before this happens you can proceed with the next step (but you’ll need the approval before starting the VM)
We are going to be working with Jupyter Notebooks and use TensorBoard to monitor the training of the neural networks. We’ll access these via our browser. In order to do this, we will need to add two firewall exceptions for two specific ports: 7001 and 7002. This is done as follows:
- Go to 三 –> VPC Network –> Firewall rules.
- Click on [+] Create firewall rule at the top of the page.
- Now edit the following fields in the panel as follows:
  - Name: (up to you)
  - Targets: All Instances in the network
  - Source IP ranges: 0.0.0.0/0
  - Protocols and ports: enable tcp with port 7001
  - Click Create. This will create the rule for Jupyter Notebook access.
  - Repeat the same process but use port 7002 instead. This will be used for Tensorboard access.

Creating a VM

Now let’s create a VM. Click on the menu icon 三 and go to Compute Engine –> VM instances. Click on [+] CREATE INSTANCE and follow these steps:
- Name: (your choice)
- Region: us-east1 (South Carolina)
- Zone: us-esat1-c
- Machine Type: click on customize
  - Cores: 8
  - Memory: 30
  - GPUs: 1 of type K80
- Boot disk: click on change
  - On the panel that pops up go to Custom images and under Show images from chose COMP-GI23-M089-G98. If you don’t see this option tell me.
  - If the above is successful, click on the oxdnn-master and then click Select at the bottom of the page.
- Firewall: enable both HTTP/HTTPS traffic
- Click on Management, security, disks, networking, sole tenancy:
  - Go to the Networking sub-menu and, under Network interfaces click on the pencil icon to edit it.
  - Towards to the bottom of the panel you’ll see the option IP forwarding, set it to ON.
  - Click Done.
- Click on the blue button [Create]. Hooray!!

Connecting to your VM

You’ll find your VMs (whether they are active or not) in 三 –> Compute Engine –> VM Instances. The one you just created should already be running (this is indicated by a green tick next to the VM’s name). If it’s not running, you can start it by selecting it and then click on START above. Once it’s launched click on SSH to connect to it. This will create a new window with a standard UNIX terminal. Click here for a quick overview of the basic UNIX commands.
If successful, you can proceed to the next section in which you be setting up your VM with all the necessary software for this course.

Setting up your machine

All the base software comes with the disk image oxdnn-master that you’ll be using to create your own VM instance. This image includes the NVIDIA driver for Tesla K80 GPUs, basic miniconda installation, htop, tmux and comes with the MNIST and CIFAR-10 dataset pre-loaded. You’ll need to setup your miniconda environment by executing the bash script in this zip file. The instructions to setup your environment are as follow:

First we need to get the setup scripts in the link above. The easiest way to get this file in your Google Cloud VM is by using the wget command as follows:

$ wget "http://jafermarq.com/media/practicals/setup.zip"

Then unzip the file by doing:

$ unzip setup.zip

You can now delete the zip file if you wish. To launch the setup process describe in setup.sh follow the code lines below. Note there is a . before executing the bash script. Don’t miss it! At some point you will be prompted to introduce a password. This password will be use to access the Jupyter Notebooks that we’ll be using for the practicals in this course. Chose any password you like but make sure to remember it. After doing this, the setup will continue and a SSL certificate will be generated and prompt you for some details (feel free to skip all these by typing intro in all of them).

$ cd setup # this gets you in the directory
$ . ./setup.sh

(if you are curious to know exactly what the script is doing, you can print it on your terminal by typing cat setup.sh)

Once the process is completed a tmux session will be spawned. Tmux is a terminal multiplexer and allows you to split a terminal window in to multiple terminals without needing for a new UI window. You can check all the basic commands on how to use it here. Now let’s start with practical one:

First you’ll need to activate the miniconda environment that the setup.sh script configured before. The name of the environment is Pytorch:

$ source activate Pytorch

Now let’s create a new directory for the practicals of day one and download the Jupyter Notebook for the practical. As you see below, we are using wget command to download a file listed in the section of practicals below. To get the path to that file just do right-click on Practical 1 and copy it (depending on your browser the option may be called copy link, copy link location or copy link address):

$ mkdir dayOne # this creates the directory
$ cd dayOne
$ wget "copy link to practical here"

Now that we have the Jupyter Notebook file, we just need to launch Jupyter Notebook. To do so type:

$ jupyter notebook

This will launch a no-browser instance of Jupyter Notebook running on port 7001. To connect to that port you’ll need to go to your Google VM Instances panel and click on the IP under External IP. This will open an new tab in your browser. Edit the address opened by appending :7001 after the last number of the IP (if there’s a ‘/’, remove it). Then click intro. This will alert you of unsecured connection. Ignore this and click on add exception (again, the message you’ll see depends on your browser). You won’t be able to precede if you are using Safari and you don’t have admin privileges in the machine you are using. Please use either Firefox or Chrome if you encounter this issue. After adding the security exception to connect to your VM, you’ll be asked to introduce your Jupyter Notebook password. Now you should be able to see the jupyter landing panel and open the practical for day one.

Launching TensorBoard

Assuming you are in a Tmux session running Jupyter Notebook, you’ll need to split the terminal into two. You can do this by pressing [ctrl]+[b] and then % for a horizontal split (if you prefer, you can press " instead for a vertical view split). Once you have your new terminal running, probably you’ll need to reactivate the Pytorch environment. You can do so by typing source activate Pytorch then, insert the command below to launch Tensorboard on port 7002:

$ tensorboard --logdir . --port 7002

The above command will make tensorboard see all experiments stored in your current directory.

Create a Snapshot of your VM

In Practical 2c we’ll be doing multi-GPU training. Concretely, we’ll be using 4 NVIDIA K80 GPUs. The VM we’ve been using so far was configured to have a single K80. We need to create a new VM with 4x K80 GPUs. You could follow exactly the same steps as you did yesterday to create your first VM. However, doing so means you’ll have to execute the script setup.sh again. We can avoid that by creating a snapshot of your current (single-GPU) VM and use it as boot disk for your the VM you are going to create. Below are the steps needed to create the snapshot image:

Turn off your VM
Once it’s off, got to Snapshots in the left side menu.
Click on [+] Create Snapshot:
- Name: (up to you)
- Source disk: chose your machine
- Click: Create, this might take a few seconds.

Once the snapshot is ready, it’s time to create a VM with 4 GPUs. In order to do that follow the steps above, select 4 GPUs instead and, for boot disk, on the panel that pops up go to Snapshots and click on the snapshot you have generated. Complete the rest of the configuration of the VM as we did for the first VM (i.e. firewall traffic and IP forwarding).

Once you complete your practicals, please remember to STOP your VMs otherwise, even though you are not using them, you’ll be charged.

Practicals

Day 1: Practical 1 - Solution
Day 2: Practical 2 - Solution
Day 3: Practical 3 - Solution
Day 4: Practical 4 - Solution
Day 5: Practical 5 - model (GAN) - model (PSNR) - Solution
Other images: Classroom - Kremlin