Feature Transform
- In order to overcome limit of linear classifier, we transform the original data so it’s more suitable for linear classifier
- ex) Circular Data could be transformed linearly if we transform cartesian coordinates to polar coordinates
- Need to think of structure of data we are dealing with
Image Features
Color Histogram
- Transform image to a histogram of color of pixels in the image.
Histogram of Oriented Gradients (HoG)
- Compute Edge Direction/Strength at each pixel
- Divide image into 8x8 regions
- Within each region compute a histogram of edge directions weighted by edge strength
Neural Networks
- Linear Score Functions: $f=Wx$
- 2-layer Neural Network: $f=W_2\,max(0,W_1x)$
- $W_2\in\mathbb{R}^{C\times H}\quad W_1\in\mathbb{R}^{H\times D}\quad x\in\mathbb{R}^D$
- C = # of classes, H = Hidden Layer
- In practice, there will be bias between each layer
- These are called Fully Connected NN or Multi-Layer Perceptron (MLP)
- Units in each layer of network are connected to each other.
- Depth = number of layers = number of weight Matrices
import numpy as np
from numpy.random import randn
# Initialize Weights and data
N, Din, H, Dout = 64, 1000, 100, 10
x, y = randn(N, Din), randn(N, Dout)
w1, w2 = randn(Din, H), randn(H, Dout)
for t in range (10000):
# Compute Loss (Sigmoid Activation, L2 Loss)
h = 1.0/(1.9 + np.exp(-x.dot(w1)))
y_pred = h.dot(w2)
loss = np.square(y_pred - y).sum()
# Compute Gradients
dy_pred = 2.0 * (y_pred - y)
dw2 = h.T.dot(dy_pred)
dh = dy_pred.dot(w2.T)
dw1 = x.T.dot(dh * h * (1 - h))
# SGD Step
w1 -= 1e-4 * dw1
w2 -= 1e-4 * dw2
Activation Function
- A non-linear function between the layers that prevent the neural network becoming just a linear classifier
- The 2-layer NN would become a Linear Classifier without the ReLU function
$$
s=W_2W_1x=W_3x,\quad W_x=W_2W_1
$$