Feature Transform

In order to overcome limit of linear classifier, we transform the original data so it’s more suitable for linear classifier
- ex) Circular Data could be transformed linearly if we transform cartesian coordinates to polar coordinates
Need to think of structure of data we are dealing with

Image Features

Color Histogram

Transform image to a histogram of color of pixels in the image.

Histogram of Oriented Gradients (HoG)

Compute Edge Direction/Strength at each pixel
Divide image into 8x8 regions
Within each region compute a histogram of edge directions weighted by edge strength

Neural Networks

Linear Score Functions: $f=Wx$
2-layer Neural Network: $f=W_2\,max(0,W_1x)$
- $W_2\in\mathbb{R}^{C\times H}\quad W_1\in\mathbb{R}^{H\times D}\quad x\in\mathbb{R}^D$
- C = # of classes, H = Hidden Layer
- In practice, there will be bias between each layer
These are called Fully Connected NN or Multi-Layer Perceptron (MLP)
- Units in each layer of network are connected to each other.
Depth = number of layers = number of weight Matrices

import numpy as np
from numpy.random import randn

# Initialize Weights and data
N, Din, H, Dout = 64, 1000, 100, 10
x, y = randn(N, Din), randn(N, Dout)
w1, w2 = randn(Din, H), randn(H, Dout)

for t in range (10000):
	# Compute Loss (Sigmoid Activation, L2 Loss)
	h = 1.0/(1.9 + np.exp(-x.dot(w1)))
	y_pred = h.dot(w2)
	loss = np.square(y_pred - y).sum()

	# Compute Gradients
	dy_pred = 2.0 * (y_pred - y)
	dw2 = h.T.dot(dy_pred)
	dh = dy_pred.dot(w2.T)
	dw1 = x.T.dot(dh * h * (1 - h))
	
	# SGD Step
	w1 -= 1e-4 * dw1
	w2 -= 1e-4 * dw2

Activation Function

A non-linear function between the layers that prevent the neural network becoming just a linear classifier
The 2-layer NN would become a Linear Classifier without the ReLU function

$$ s=W_2W_1x=W_3x,\quad W_x=W_2W_1 $$