# Softmax exercise

Complete and hand in this completed worksheet (including its outputs and any supporting code outside of the worksheet) with your assignment submission. For more details see the assignments page on the course website.

This exercise is analogous to the SVM exercise. You will:

• implement a fully-vectorized loss function for the Softmax classifier
• implement the fully-vectorized expression for its analytic gradient
• use a validation set to tune the learning rate and regularization strength
• optimize the loss function with SGD
• visualize the final learned weights
Train data shape:  (49000L, 3073L)
Train labels shape:  (49000L,)
Validation data shape:  (1000L, 3073L)
Validation labels shape:  (1000L,)
Test data shape:  (1000L, 3073L)
Test labels shape:  (1000L,)
dev data shape:  (500L, 3073L)
dev labels shape:  (500L,)


## Softmax Classifier

Your code for this section will all be written inside cs231n/classifiers/softmax.py.

loss: 2.395985
sanity check: 2.302585


## Inline Question 1:

Why do we expect our loss to be close to -log(0.1)? Explain briefly.

Your answer: Because the W is selected by random, so the probability of select the true class is 1/10. That is, 0.1.

numerical: 2.368141 analytic: 2.368141, relative error: 2.349797e-08
numerical: 1.324690 analytic: 1.324690, relative error: 7.140560e-08
numerical: 3.170412 analytic: 3.170411, relative error: 1.324741e-08
numerical: 0.249509 analytic: 0.249509, relative error: 2.647240e-08
numerical: 1.536095 analytic: 1.536095, relative error: 4.345856e-08
numerical: 1.075819 analytic: 1.075819, relative error: 3.902323e-08
numerical: -0.198098 analytic: -0.198098, relative error: 5.737134e-08
numerical: -0.089902 analytic: -0.089902, relative error: 8.604010e-07
numerical: -0.339487 analytic: -0.339487, relative error: 3.992996e-08
numerical: -4.819781 analytic: -4.819781, relative error: 3.465667e-09
numerical: 1.869922 analytic: 1.869921, relative error: 7.536693e-08
numerical: 0.783465 analytic: 0.783465, relative error: 6.960291e-08
numerical: -3.206007 analytic: -3.206007, relative error: 2.337350e-09
numerical: 0.532183 analytic: 0.532183, relative error: 1.498128e-07
numerical: 0.900500 analytic: 0.900500, relative error: 6.954913e-09
numerical: -0.353224 analytic: -0.353224, relative error: 1.836960e-07
numerical: -1.331470 analytic: -1.331470, relative error: 2.726426e-08
numerical: -0.082452 analytic: -0.082452, relative error: 7.712355e-07
numerical: -1.322133 analytic: -1.322133, relative error: 5.516628e-09
numerical: 0.345814 analytic: 0.345814, relative error: 1.251858e-07

naive loss: 2.395985e+00 computed in 0.080000s
vectorized loss: 2.395985e+00 computed in 0.003000s
Loss difference: 0.000000

lr 1.000000e-08 reg 1.000000e+04 train accuracy: 0.175633 val accuracy: 0.179000
lr 1.000000e-08 reg 2.000000e+04 train accuracy: 0.174102 val accuracy: 0.161000
lr 1.000000e-08 reg 3.000000e+04 train accuracy: 0.203490 val accuracy: 0.210000
lr 1.000000e-08 reg 4.000000e+04 train accuracy: 0.191367 val accuracy: 0.202000
lr 1.000000e-08 reg 5.000000e+04 train accuracy: 0.208000 val accuracy: 0.197000
lr 1.000000e-08 reg 6.000000e+04 train accuracy: 0.203571 val accuracy: 0.215000
lr 1.000000e-08 reg 7.000000e+04 train accuracy: 0.213551 val accuracy: 0.215000
lr 1.000000e-08 reg 8.000000e+04 train accuracy: 0.238347 val accuracy: 0.229000
lr 1.000000e-08 reg 1.000000e+05 train accuracy: 0.245102 val accuracy: 0.242000
lr 1.000000e-07 reg 1.000000e+04 train accuracy: 0.358265 val accuracy: 0.362000
lr 1.000000e-07 reg 2.000000e+04 train accuracy: 0.356306 val accuracy: 0.374000
lr 1.000000e-07 reg 3.000000e+04 train accuracy: 0.347327 val accuracy: 0.362000
lr 1.000000e-07 reg 4.000000e+04 train accuracy: 0.336347 val accuracy: 0.354000
lr 1.000000e-07 reg 5.000000e+04 train accuracy: 0.331490 val accuracy: 0.348000
lr 1.000000e-07 reg 6.000000e+04 train accuracy: 0.320163 val accuracy: 0.336000
lr 1.000000e-07 reg 7.000000e+04 train accuracy: 0.314551 val accuracy: 0.325000
lr 1.000000e-07 reg 8.000000e+04 train accuracy: 0.313082 val accuracy: 0.324000
lr 1.000000e-07 reg 1.000000e+05 train accuracy: 0.303000 val accuracy: 0.315000
lr 2.000000e-07 reg 1.000000e+04 train accuracy: 0.374163 val accuracy: 0.389000
lr 2.000000e-07 reg 2.000000e+04 train accuracy: 0.353184 val accuracy: 0.365000
lr 2.000000e-07 reg 3.000000e+04 train accuracy: 0.340265 val accuracy: 0.359000
lr 2.000000e-07 reg 4.000000e+04 train accuracy: 0.334673 val accuracy: 0.351000
lr 2.000000e-07 reg 5.000000e+04 train accuracy: 0.326531 val accuracy: 0.337000
lr 2.000000e-07 reg 6.000000e+04 train accuracy: 0.319857 val accuracy: 0.336000
lr 2.000000e-07 reg 7.000000e+04 train accuracy: 0.317878 val accuracy: 0.329000
lr 2.000000e-07 reg 8.000000e+04 train accuracy: 0.310449 val accuracy: 0.329000
lr 2.000000e-07 reg 1.000000e+05 train accuracy: 0.316286 val accuracy: 0.315000
best validation accuracy achieved during cross-validation: 0.389000

softmax on raw pixels final test set accuracy: 0.375000