Abracadabra

CS231n Lecture2 note

图像分类

目标

给一张输入图片赋予一个标签, 这个标签属于事先定义好的类别集合中

地位

计算机视觉的核心问题

例子

classify_example

难点

  1. 拍摄点的视角多样
  2. 拍摄点的远近距离多样
  3. 物体的变形
  4. 物体部分遮挡
  5. 光线
  6. 背景相似
  7. 品种多样

challenges

方法

数据驱动(即包含训练数据)

pipeline

输入 -> 学习 -> 评估

最近邻分类器

nearest_neighbor

距离度量

L1 distance

$$d_1 (I_1, I_2) = \sum_{p} \left| I^p_1 - I^p_2 \right|$$

l1-distance

L2 distance

$$d_2 (I_1, I_2) = \sqrt{\sum_{p} \left( I^p_1 - I^p_2 \right)^2}$$

示例代码

数据读取

1
2
3
4
Xtr, Ytr, Xte, Yte = load_CIFAR10('data/cifar10/') # a magic function we provide
# flatten out all images to be one-dimensional
Xtr_rows = Xtr.reshape(Xtr.shape[0], 32 * 32 * 3) # Xtr_rows becomes 50000 x 3072
Xte_rows = Xte.reshape(Xte.shape[0], 32 * 32 * 3) # Xte_rows becomes 10000 x 3072

预测及评估

1
2
3
4
5
6
nn = NearestNeighbor() # create a Nearest Neighbor classifier class
nn.train(Xtr_rows, Ytr) # train the classifier on the training images and labels
Yte_predict = nn.predict(Xte_rows) # predict labels on the test images
# and now print the classification accuracy, which is the average number
# of examples that are correctly predicted (i.e. label matches)
print 'accuracy: %f' % ( np.mean(Yte_predict == Yte) )

基本方法

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
import numpy as np
class NearestNeighbor(object):
def __init__(self):
pass
def train(self, X, y):
""" X is N x D where each row is an example. Y is 1-dimension of size N """
# the nearest neighbor classifier simply remembers all the training data
self.Xtr = X
self.ytr = y
def predict(self, X):
""" X is N x D where each row is an example we wish to predict label for """
num_test = X.shape[0]
# lets make sure that the output type matches the input type
Ypred = np.zeros(num_test, dtype = self.ytr.dtype)
# loop over all test rows
for i in xrange(num_test):
# find the nearest training image to the i'th test image
# using the L1 distance (sum of absolute value differences)
distances = np.sum(np.abs(self.Xtr - X[i,:]), axis = 1)
min_index = np.argmin(distances) # get the index with smallest distance
Ypred[i] = self.ytr[min_index] # predict the label of the nearest example
return Ypred

实验结果

L1-distance 38.6% on CIFAR-10

L2-distance 35.4% on CIFAR-10

L1 vs. L2 L2比L1的差异容忍度更小

K近邻分类器

kNN

采用验证集进行超参调参

实例代码

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
# assume we have Xtr_rows, Ytr, Xte_rows, Yte as before
# recall Xtr_rows is 50,000 x 3072 matrix
Xval_rows = Xtr_rows[:1000, :] # take first 1000 for validation
Yval = Ytr[:1000]
Xtr_rows = Xtr_rows[1000:, :] # keep last 49,000 for train
Ytr = Ytr[1000:]
# find hyperparameters that work best on the validation set
validation_accuracies = []
for k in [1, 3, 5, 10, 20, 50, 100]:
# use a particular value of k and evaluation on validation data
nn = NearestNeighbor()
nn.train(Xtr_rows, Ytr)
# here we assume a modified NearestNeighbor class that can take a k as input
Yval_predict = nn.predict(Xval_rows, k = k)
acc = np.mean(Yval_predict == Yval)
print 'accuracy: %f' % (acc,)
# keep track of what works on the validation set
validation_accuracies.append((k, acc))

交叉验证

cv

cv_result

最近邻分类器的优缺点

优点

训练速度快

缺点

  1. 测试速度很慢

    ​ 解决方法:1. ANN 2. FANN

  2. 距离度量不合适

    kNN_shortage_1

kNN_shortage_2

(http://cs231n.github.io/assets/pixels_embed_cifar10_big.jpg)

接下来…

t-SNE http://lvdmaaten.github.io/tsne/

random projection http://scikit-learn.org/stable/modules/random_projection.html

INTUITION FAILS IN HIGH DIMENSIONS http://homes.cs.washington.edu/~pedrod/papers/cacm12.pdf

Recognizing and Learning Object Categories http://people.csail.mit.edu/torralba/shortCourseRLOC/index.html

线性分类

一个从图像到标签的映射函数

$$f(x_i, W, b) = W x_i + b$$

x shape is [D x 1], W shape is [K x D], b shape is [K x 1]

注意点:

  1. W代表K个分类器的参数放在一起,因此整个模型是K个分类器的一个整合
  2. 向量化能够大大提升计算速度

线性分类器的解释

  1. 权重W表示不同的标签对于图像不同位置不同颜色的重视程度。比如太阳可能对于圆形的区域以及黄颜色比较看重

linear_classification_interpret_1

  1. 将图像看成高维空间中的点

linear_clasification_interpret_2

  1. 将线性分类器看成模板匹配

将W的每一行看成一个模板,通过内积计算,每一张图片张成的列向量都与每一个模板作比较,最后选出最匹配的,这也是一种最近邻算法。

templates

从上图可以看出,这里的模板是各个图像的一种折中。

将bias项放入W中

wb

数据预处理

​ 数据中心化