# Working with CNNs in practice

- Making the most of your data
- Data augmentation
- Transfer learning

- All about convolutions
- How to arrange them
- How to compute them fast

- Implementation details
- GPU / CPU, bottlenecks, ditributed training

## Data Augmentation

### Horizontal flips

### Random crops/scales

**Training:** sample random crops /scales

ResNet:

- Pick random L in range [256, 480]
- Resize training image, short side = L
- Sample random 224 x 224 patch

**Testing:** average a fixed set of crops

ResNet:

- Resize image at 5 scales: {224, 256, 384, 480, 640}
- For each size, use 10 224 x 224 crops: 4 corners + center, + flips

### Color jitter

**Simple:**

Randomly jitter contrast

**Complex:**

- Apply PCA to all [R, G, B] pixels in training set
- Sample a “color offset” along principal component directions
- Add offset to all pixels of a training image

(As seen in [Krizhevsky et al. 2012], ResNet, etc)

## Transfer Learning

“You need a lot of a data if you want to train/use CNNs”

some tricks:

very similar dataset | very different dataset | |
---|---|---|

very little data | Use Linear Classifer on top layer | Try linear classifer from different stages |

quite a lot of data | Finetune a few layers | Finetune a larger number of layers |

## All about Convolutions

### How to stack them

- Replace large convolutions (5 x 5, 7 x 7) with stacks of 3 x 3 convolutions
- 1 x 1 “bottleneck” convolutions are very efficient
- Can factor N x N convolutions into 1 x N and N x 1
- All of the above give fewer parameters, less compute, more nonlinearity

### How to compute them

#### im2col

#### BLAS

#### FFT

- Compute FFT of weights: F(W)
- Compute FFT of image: F(X)
- Compute elementwise product: F(W) ○ F(X)
- Compute inverse FFT: Y = F-1(F(W) ○ F(X))

FFT convolutions get a big speedup for larger filters

Not much speedup for 3x3 filters =(

#### Fast algorithms

- Strassen’s Algorithm
- And so on…

## Implementation Details

- GPUs much faster than CPUs
- Distributed training is sometimes used
- Not needed for small problems

- Be aware of bottlenecks: CPU / GPU, CPU / disk
- Low precison makes things faster and still works
- 32 bit is standard now, 16 bit soon
- In the future: binary nets?