# Working with CNNs in practice

• Making the most of your data
• Data augmentation
• Transfer learning
• How to arrange them
• How to compute them fast
• Implementation details
• GPU / CPU, bottlenecks, ditributed training

## Data Augmentation

### Random crops/scales

Training: sample random crops /scales

ResNet:

1. Pick random L in range [256, 480]
2. Resize training image, short side = L
3. Sample random 224 x 224 patch

Testing: average a fixed set of crops

ResNet:

1. Resize image at 5 scales: {224, 256, 384, 480, 640}
2. For each size, use 10 224 x 224 crops: 4 corners + center, + flips

### Color jitter

Simple:
Randomly jitter contrast

Complex:

1. Apply PCA to all [R, G, B] pixels in training set
2. Sample a “color offset” along principal component directions
3. Add offset to all pixels of a training image

(As seen in [Krizhevsky et al. 2012], ResNet, etc)

## Transfer Learning

“You need a lot of a data if you want to train/use CNNs”

some tricks:

very similar datasetvery different dataset
very little dataUse Linear Classifer on top layerTry linear classifer from different stages
quite a lot of dataFinetune a few layersFinetune a larger number of layers

### How to stack them

• Replace large convolutions (5 x 5, 7 x 7) with stacks of 3 x 3 convolutions
• 1 x 1 “bottleneck” convolutions are very efficient
• Can factor N x N convolutions into 1 x N and N x 1
• All of the above give fewer parameters, less compute, more nonlinearity

### How to compute them

#### FFT

1. Compute FFT of weights: F(W)
2. Compute FFT of image: F(X)
3. Compute elementwise product: F(W) ○ F(X)
4. Compute inverse FFT: Y = F-1(F(W) ○ F(X))

FFT convolutions get a big speedup for larger filters

Not much speedup for 3x3 filters =(

#### Fast algorithms

• Strassen’s Algorithm
• And so on…

## Implementation Details

• GPUs much faster than CPUs
• Distributed training is sometimes used
• Not needed for small problems
• Be aware of bottlenecks: CPU / GPU, CPU / disk
• Low precison makes things faster and still works
• 32 bit is standard now, 16 bit soon
• In the future: binary nets?