Abracadabra

cs231n Lecture 11 Recap

Working with CNNs in practice

  • Making the most of your data
    • Data augmentation
    • Transfer learning
  • All about convolutions
    • How to arrange them
    • How to compute them fast
  • Implementation details
    • GPU / CPU, bottlenecks, ditributed training

Data Augmentation

Horizontal flips

Random crops/scales

Training: sample random crops /scales

ResNet:

  1. Pick random L in range [256, 480]
  2. Resize training image, short side = L
  3. Sample random 224 x 224 patch

Testing: average a fixed set of crops

ResNet:

  1. Resize image at 5 scales: {224, 256, 384, 480, 640}
  2. For each size, use 10 224 x 224 crops: 4 corners + center, + flips

Color jitter

Simple:
Randomly jitter contrast

Complex:

  1. Apply PCA to all [R, G, B] pixels in training set
  2. Sample a “color offset” along principal component directions
  3. Add offset to all pixels of a training image

(As seen in [Krizhevsky et al. 2012], ResNet, etc)

Transfer Learning

“You need a lot of a data if you want to train/use CNNs”

tl

some tricks:

very similar datasetvery different dataset
very little dataUse Linear Classifer on top layerTry linear classifer from different stages
quite a lot of dataFinetune a few layersFinetune a larger number of layers

All about Convolutions

How to stack them

  • Replace large convolutions (5 x 5, 7 x 7) with stacks of 3 x 3 convolutions
  • 1 x 1 “bottleneck” convolutions are very efficient
  • Can factor N x N convolutions into 1 x N and N x 1
  • All of the above give fewer parameters, less compute, more nonlinearity

How to compute them

im2col

im2col

BLAS

FFT

  1. Compute FFT of weights: F(W)
  2. Compute FFT of image: F(X)
  3. Compute elementwise product: F(W) ○ F(X)
  4. Compute inverse FFT: Y = F-1(F(W) ○ F(X))

FFT convolutions get a big speedup for larger filters

Not much speedup for 3x3 filters =(

Fast algorithms

  • Strassen’s Algorithm
  • And so on…

Implementation Details

  • GPUs much faster than CPUs
  • Distributed training is sometimes used
    • Not needed for small problems
  • Be aware of bottlenecks: CPU / GPU, CPU / disk
  • Low precison makes things faster and still works
    • 32 bit is standard now, 16 bit soon
    • In the future: binary nets?