# Implementing a Neural Network

In this exercise we will develop a neural network with fully-connected layers to perform classification, and test it out on the CIFAR-10 dataset.

We will use the class TwoLayerNet in the file cs231n/classifiers/neural_net.py to represent instances of our network. The network parameters are stored in the instance variable self.params where keys are string parameter names and values are numpy arrays. Below, we initialize toy data and a toy model that we will use to develop your implementation.

# Forward pass: compute scores

Open the file cs231n/classifiers/neural_net.py and look at the method TwoLayerNet.loss. This function is very similar to the loss functions you have written for the SVM and Softmax exercises: It takes the data and weights and computes the class scores, the loss, and the gradients on the parameters.

Implement the first part of the forward pass which uses the weights and biases to compute the scores for all inputs.

Your scores:
[[-0.81233741 -1.27654624 -0.70335995]
[-0.17129677 -1.18803311 -0.47310444]
[-0.51590475 -1.01354314 -0.8504215 ]
[-0.15419291 -0.48629638 -0.52901952]
[-0.00618733 -0.12435261 -0.15226949]]

correct scores:
[[-0.81233741 -1.27654624 -0.70335995]
[-0.17129677 -1.18803311 -0.47310444]
[-0.51590475 -1.01354314 -0.8504215 ]
[-0.15419291 -0.48629638 -0.52901952]
[-0.00618733 -0.12435261 -0.15226949]]

Difference between your scores and correct scores:
3.68027204961e-08


# Forward pass: compute loss

In the same function, implement the second part that computes the data and regularizaion loss.

Difference between your loss and correct loss:
1.79412040779e-13


# Backward pass

Implement the rest of the function. This will compute the gradient of the loss with respect to the variables W1, b1, W2, and b2. Now that you (hopefully!) have a correctly implemented forward pass, you can debug your backward pass using a numeric gradient check:

W1 max relative error: 3.669858e-09
W2 max relative error: 3.440708e-09
b2 max relative error: 3.865028e-11
b1 max relative error: 2.738422e-09


# Train the network

To train the network we will use stochastic gradient descent (SGD), similar to the SVM and Softmax classifiers. Look at the function TwoLayerNet.train and fill in the missing sections to implement the training procedure. This should be very similar to the training procedure you used for the SVM and Softmax classifiers. You will also have to implement TwoLayerNet.predict, as the training process periodically performs prediction to keep track of accuracy over time while the network trains.

Once you have implemented the method, run the code below to train a two-layer network on toy data. You should achieve a training loss less than 0.2.

Final training loss:  0.0171496079387


Now that you have implemented a two-layer network that passes gradient checks and works on toy data, it’s time to load up our favorite CIFAR-10 data so we can use it to train a classifier on a real dataset.

Train data shape:  (49000L, 3072L)
Train labels shape:  (49000L,)
Validation data shape:  (1000L, 3072L)
Validation labels shape:  (1000L,)
Test data shape:  (1000L, 3072L)
Test labels shape:  (1000L,)


# Train a network

To train our network we will use SGD with momentum. In addition, we will adjust the learning rate with an exponential learning rate schedule as optimization proceeds; after each epoch, we will reduce the learning rate by multiplying it by a decay rate.

iteration 0 / 1000: loss 2.302954
iteration 100 / 1000: loss 2.302550
iteration 200 / 1000: loss 2.297648
iteration 300 / 1000: loss 2.259602
iteration 400 / 1000: loss 2.204170
iteration 500 / 1000: loss 2.118565
iteration 600 / 1000: loss 2.051535
iteration 700 / 1000: loss 1.988466
iteration 800 / 1000: loss 2.006591
iteration 900 / 1000: loss 1.951473
Validation accuracy:  0.287


# Debug the training

With the default parameters we provided above, you should get a validation accuracy of about 0.29 on the validation set. This isn’t very good.

One strategy for getting insight into what’s wrong is to plot the loss function and the accuracies on the training and validation sets during optimization.

Another strategy is to visualize the weights that were learned in the first layer of the network. In most neural networks trained on visual data, the first layer weights typically show some visible structure when visualized.

What’s wrong?. Looking at the visualizations above, we see that the loss is decreasing more or less linearly, which seems to suggest that the learning rate may be too low. Moreover, there is no gap between the training and validation accuracy, suggesting that the model we used has low capacity, and that we should increase its size. On the other hand, with a very large model we would expect to see more overfitting, which would manifest itself as a very large gap between the training and validation accuracy.

Tuning. Tuning the hyperparameters and developing intuition for how they affect the final performance is a large part of using Neural Networks, so we want you to get a lot of practice. Below, you should experiment with different values of the various hyperparameters, including hidden layer size, learning rate, numer of training epochs, and regularization strength. You might also consider tuning the learning rate decay, but you should be able to get good performance using the default value.

Approximate results. You should be aim to achieve a classification accuracy of greater than 48% on the validation set. Our best network gets over 52% on the validation set.

Experiment: You goal in this exercise is to get as good of a result on CIFAR-10 as you can, with a fully-connected Neural Network. For every 1% above 52% on the Test set we will award you with one extra bonus point. Feel free implement your own techniques (e.g. PCA to reduce dimensionality, or adding dropout, or adding features to the solver, etc.).

1 / 144


cs231n\classifiers\neural_net.py:104: RuntimeWarning: overflow encountered in exp
exp_scores = np.exp(scores)
cs231n\classifiers\neural_net.py:105: RuntimeWarning: invalid value encountered in divide
a2 = exp_scores / np.sum(exp_scores, axis=1, keepdims=True)
cs231n\classifiers\neural_net.py:107: RuntimeWarning: divide by zero encountered in log
correct_log_probs = -np.log(a2[range(N), y])
cs231n\classifiers\neural_net.py:81: RuntimeWarning: invalid value encountered in maximum
a1 = np.maximum(0, z1)
cs231n\classifiers\neural_net.py:131: RuntimeWarning: invalid value encountered in less_equal
dhidden[z1 <= 0] = 0
cs231n\classifiers\neural_net.py:247: RuntimeWarning: invalid value encountered in maximum
a1 = np.maximum(0, z1)  # pass through ReLU activation function

2 / 144
3 / 144
4 / 144
5 / 144
6 / 144
7 / 144
8 / 144
9 / 144
10 / 144
11 / 144
12 / 144
13 / 144
14 / 144
15 / 144
16 / 144
17 / 144
18 / 144
19 / 144
20 / 144
21 / 144
22 / 144
23 / 144
24 / 144
25 / 144
26 / 144
27 / 144
28 / 144
29 / 144
30 / 144
31 / 144
32 / 144
33 / 144
34 / 144
35 / 144
36 / 144
37 / 144
38 / 144
39 / 144
40 / 144
41 / 144
42 / 144
43 / 144
44 / 144
45 / 144
46 / 144
47 / 144
48 / 144
49 / 144
50 / 144
51 / 144
52 / 144
53 / 144
54 / 144
55 / 144
56 / 144
57 / 144
58 / 144
59 / 144
60 / 144
61 / 144
62 / 144
63 / 144
64 / 144
65 / 144
66 / 144
67 / 144
68 / 144
69 / 144
70 / 144
71 / 144
72 / 144
73 / 144
74 / 144
75 / 144
76 / 144
77 / 144
78 / 144
79 / 144
80 / 144
81 / 144
82 / 144
83 / 144
84 / 144
85 / 144
86 / 144
87 / 144
88 / 144
89 / 144
90 / 144
91 / 144
92 / 144
93 / 144
94 / 144
95 / 144
96 / 144
97 / 144
98 / 144
99 / 144
100 / 144
101 / 144
102 / 144
103 / 144
104 / 144
105 / 144
106 / 144
107 / 144
108 / 144
109 / 144
110 / 144
111 / 144
112 / 144
113 / 144
114 / 144
115 / 144
116 / 144
117 / 144
118 / 144
119 / 144
120 / 144
121 / 144
122 / 144
123 / 144
124 / 144
125 / 144
126 / 144
127 / 144
128 / 144
129 / 144
130 / 144
131 / 144
132 / 144
133 / 144
134 / 144
135 / 144
136 / 144
137 / 144
138 / 144
139 / 144
140 / 144
141 / 144
142 / 144
143 / 144
144 / 144
best validation accuracy achieved during cross-validation: 0.540000


# Run on the test set

When you are done experimenting, you should evaluate your final trained network on the test set; you should get above 48%.

We will give you extra bonus point for every 1% of accuracy above 52%.

Test accuracy:  0.531