How do we compute Gradients?

Simple Example

Lets say we have a function $f(x,y,z)=(x+y)z$, Given point (-2, 5, -4)

Forward Pass: Compute Outputs
- $f = (x+y)z = -12$
Backward Pass: Compute Derivatives
- We want $\frac{\partial f}{\partial x},\frac{\partial f}{\partial y},\frac{\partial f}{\partial z}$
- Given that $f$ is the Loss function, each derivative represent how the changes in each elements affect the Loss value

Patterns in Gradient FLow

Top is the value in Forward Pass, Bottom is Gradient in Backward Pass

Screen Shot 2022-07-12 at 8.52.01 PM.png

Back-Propagation Implementation

“Flat” Gradient Code

Given $f = sigmoid((W_0X_0+W_1X_1) + W_2)$

def f(w0, x0, w1, x1, w2):
	# Forward Pass
	s0 = w0 * x0
	s1 = w1 * x1
	s2 = s0 + s1
	s3 = s2 + w2
	L = sigmoid(s3)

	# Backward Pass (Back-Prop)
	grad_L = 1.0
	grad_s3 = grad_L * (1 - L)*L
	
	grad_w2 = grad_s3
	grad_s2 = grad_s3

	grad_s0 = grad_s2
	grad_s1 = grad_s2

	grad_w1 = grad_s1 * x1
	grad_x1 = grad_s1 * w1

	grad_w0 = grad_s0 * x0
	grad_x0 = grad_s0 * w0

Modular API

Define forward and backward passes for different functions


# Pseudo Code
class ComputationGraph(object):
	def forward(inputs):
		# define forward pass
		for gate in self.graph.nodes_topologically_sorted():
			gate.forward()
		return loss

	def backward():
		# define backward pass
		for gate in self.graph.nodes_topologically_sorted():
			gate.backward()
		return inputs_gradients

# Example Code for Multiply Gates
class Multiply(torge.autograd.Function):
	@staticmethod
	def forward(ctx, x, y):
		ctx.save_for_backward(x, y)
		z = x * y
		return z

	@staticmethod
	def backward(ctx, grad_z):
		x, y = ctx.saved_tensors
		grad_x = y * grad_z # dz/dx * dL/dz
		grad_y = x * grad_z # dz/dy * dL/dz
		return grad_x, grad_y