# Multiclass Support Vector Machine exercise

Complete and hand in this completed worksheet (including its outputs and any supporting code outside of the worksheet) with your assignment submission. For more details see the assignments page on the course website.

In this exercise you will:

• implement a fully-vectorized loss function for the SVM
• implement the fully-vectorized expression for its analytic gradient
• use a validation set to tune the learning rate and regularization strength
• optimize the loss function with SGD
• visualize the final learned weights

Training data shape:  (50000L, 32L, 32L, 3L)
Training labels shape:  (50000L,)
Test data shape:  (10000L, 32L, 32L, 3L)
Test labels shape:  (10000L,)


Train data shape:  (49000L, 32L, 32L, 3L)
Train labels shape:  (49000L,)
Validation data shape:  (1000L, 32L, 32L, 3L)
Validation labels shape:  (1000L,)
Test data shape:  (1000L, 32L, 32L, 3L)
Test labels shape:  (1000L,)

Training data shape:  (49000L, 3072L)
Validation data shape:  (1000L, 3072L)
Test data shape:  (1000L, 3072L)
dev data shape:  (500L, 3072L)

[ 130.64189796  135.98173469  132.47391837  130.05569388  135.34804082
131.75402041  130.96055102  136.14328571  132.47636735  131.48467347]


(49000L, 3073L) (1000L, 3073L) (1000L, 3073L) (500L, 3073L)


## SVM Classifier

Your code for this section will all be written inside cs231n/classifiers/linear_svm.py.

As you can see, we have prefilled the function compute_loss_naive which uses for loops to evaluate the multiclass SVM loss function.

loss: 8.831645


The grad returned from the function above is right now all zero. Derive and implement the gradient for the SVM cost function and implement it inline inside the function svm_loss_naive. You will find it helpful to interleave your new code inside the existing function.

To check that you have correctly implemented the gradient correctly, you can numerically estimate the gradient of the loss function and compare the numeric estimate to the gradient that you computed. We have provided code that does this for you:

numerical: -13.865929 analytic: -13.865929, relative error: 1.283977e-12
numerical: 7.842142 analytic: 7.735021, relative error: 6.876784e-03
numerical: 3.464393 analytic: 3.464393, relative error: 9.040092e-11
numerical: -23.034911 analytic: -23.034911, relative error: 6.876266e-12
numerical: -0.185311 analytic: -0.185311, relative error: 2.538774e-10
numerical: 25.825504 analytic: 25.825504, relative error: 1.336035e-11
numerical: 4.457836 analytic: 4.457836, relative error: 1.015819e-10
numerical: 3.184691 analytic: 3.184691, relative error: 8.849109e-11
numerical: 10.428446 analytic: 10.374317, relative error: 2.601982e-03
numerical: 12.479957 analytic: 12.479957, relative error: 6.825191e-12
numerical: 12.237949 analytic: 12.326308, relative error: 3.597051e-03
numerical: 4.377103 analytic: 4.377103, relative error: 3.904758e-11
numerical: -1.951930 analytic: -1.951930, relative error: 1.432276e-10
numerical: 33.752503 analytic: 33.752503, relative error: 4.254520e-12
numerical: 11.367149 analytic: 11.367149, relative error: 1.682727e-11
numerical: 16.461879 analytic: 16.461879, relative error: 4.766805e-12
numerical: 3.814562 analytic: 3.814562, relative error: 1.087469e-10
numerical: 13.931226 analytic: 13.931226, relative error: 9.578349e-12
numerical: -27.291095 analytic: -27.395406, relative error: 1.907445e-03
numerical: -7.610407 analytic: -7.610407, relative error: 1.015282e-12


### Inline Question 1:

It is possible that once in a while a dimension in the gradcheck will not match exactly. What could such a discrepancy be caused by? Is it a reason for concern? What is a simple example in one dimension where a gradient check could fail? Hint: the SVM loss function is not strictly speaking differentiable

Your Answer: Maybe the SVM loss function is not differentiable on that dimension

4915.822409730994

Naive loss: 8.831645e+00 computed in 0.071000s
Vectorized loss: 8.831645e+00 computed in 0.000000s
difference: 0.000000

Naive loss and gradient: computed in 0.084000s
Vectorized loss and gradient: computed in 0.005000s
difference: 0.000000


We now have vectorized and efficient expressions for the loss, the gradient and our gradient matches the numerical gradient. We are therefore ready to do SGD to minimize the loss.

iteration 0 / 1500: loss 791.772037
iteration 100 / 1500: loss 286.021346
iteration 200 / 1500: loss 107.673095
iteration 300 / 1500: loss 41.812791
iteration 400 / 1500: loss 18.665578
iteration 500 / 1500: loss 10.614984
iteration 600 / 1500: loss 6.664814
iteration 700 / 1500: loss 6.509693
iteration 800 / 1500: loss 5.792204
iteration 900 / 1500: loss 4.986855
iteration 1000 / 1500: loss 5.914691
iteration 1100 / 1500: loss 5.058078
iteration 1200 / 1500: loss 5.491475
iteration 1300 / 1500: loss 5.609450
iteration 1400 / 1500: loss 5.376595
That took 5.454000s


training accuracy: 0.364980
validation accuracy: 0.378000

lr 1.000000e-08 reg 1.000000e+04 train accuracy: 0.221898 val accuracy: 0.247000
lr 1.000000e-08 reg 2.000000e+04 train accuracy: 0.233653 val accuracy: 0.258000
lr 1.000000e-08 reg 3.000000e+04 train accuracy: 0.234694 val accuracy: 0.225000
lr 1.000000e-08 reg 4.000000e+04 train accuracy: 0.255959 val accuracy: 0.249000
lr 1.000000e-08 reg 5.000000e+04 train accuracy: 0.259755 val accuracy: 0.273000
lr 1.000000e-08 reg 6.000000e+04 train accuracy: 0.267408 val accuracy: 0.269000
lr 1.000000e-08 reg 7.000000e+04 train accuracy: 0.269102 val accuracy: 0.287000
lr 1.000000e-08 reg 8.000000e+04 train accuracy: 0.277102 val accuracy: 0.285000
lr 1.000000e-08 reg 1.000000e+05 train accuracy: 0.295306 val accuracy: 0.301000
lr 1.000000e-07 reg 1.000000e+04 train accuracy: 0.369388 val accuracy: 0.374000
lr 1.000000e-07 reg 2.000000e+04 train accuracy: 0.380265 val accuracy: 0.390000
lr 1.000000e-07 reg 3.000000e+04 train accuracy: 0.375490 val accuracy: 0.378000
lr 1.000000e-07 reg 4.000000e+04 train accuracy: 0.375633 val accuracy: 0.385000
lr 1.000000e-07 reg 5.000000e+04 train accuracy: 0.369694 val accuracy: 0.375000
lr 1.000000e-07 reg 6.000000e+04 train accuracy: 0.372469 val accuracy: 0.383000
lr 1.000000e-07 reg 7.000000e+04 train accuracy: 0.356000 val accuracy: 0.370000
lr 1.000000e-07 reg 8.000000e+04 train accuracy: 0.352816 val accuracy: 0.355000
lr 1.000000e-07 reg 1.000000e+05 train accuracy: 0.356796 val accuracy: 0.377000
lr 2.000000e-07 reg 1.000000e+04 train accuracy: 0.393510 val accuracy: 0.395000
lr 2.000000e-07 reg 2.000000e+04 train accuracy: 0.377020 val accuracy: 0.382000
lr 2.000000e-07 reg 3.000000e+04 train accuracy: 0.363857 val accuracy: 0.373000
lr 2.000000e-07 reg 4.000000e+04 train accuracy: 0.368714 val accuracy: 0.372000
lr 2.000000e-07 reg 5.000000e+04 train accuracy: 0.361531 val accuracy: 0.364000
lr 2.000000e-07 reg 6.000000e+04 train accuracy: 0.354714 val accuracy: 0.368000
lr 2.000000e-07 reg 7.000000e+04 train accuracy: 0.348306 val accuracy: 0.365000
lr 2.000000e-07 reg 8.000000e+04 train accuracy: 0.358082 val accuracy: 0.378000
lr 2.000000e-07 reg 1.000000e+05 train accuracy: 0.347898 val accuracy: 0.358000
best validation accuracy achieved during cross-validation: 0.395000


linear SVM on raw pixels final test set accuracy: 0.383000


array([0, 1, 2])


### Inline question 2:

Describe what your visualized SVM weights look like, and offer a brief explanation for why they look they way that they do.