How do we compute Gradients?

Simple Example

Lets say we have a function $f(x,y,z)=(x+y)z$, Given point (-2, 5, -4)

Patterns in Gradient FLow

Screen Shot 2022-07-12 at 8.52.01 PM.png

Back-Propagation Implementation

“Flat” Gradient Code

Given $f = sigmoid((W_0X_0+W_1X_1) + W_2)$

def f(w0, x0, w1, x1, w2):
	# Forward Pass
	s0 = w0 * x0
	s1 = w1 * x1
	s2 = s0 + s1
	s3 = s2 + w2
	L = sigmoid(s3)

	# Backward Pass (Back-Prop)
	grad_L = 1.0
	grad_s3 = grad_L * (1 - L)*L
	
	grad_w2 = grad_s3
	grad_s2 = grad_s3

	grad_s0 = grad_s2
	grad_s1 = grad_s2

	grad_w1 = grad_s1 * x1
	grad_x1 = grad_s1 * w1

	grad_w0 = grad_s0 * x0
	grad_x0 = grad_s0 * w0

Modular API

Define forward and backward passes for different functions


# Pseudo Code
class ComputationGraph(object):
	def forward(inputs):
		# define forward pass
		for gate in self.graph.nodes_topologically_sorted():
			gate.forward()
		return loss

	def backward():
		# define backward pass
		for gate in self.graph.nodes_topologically_sorted():
			gate.backward()
		return inputs_gradients

# Example Code for Multiply Gates
class Multiply(torge.autograd.Function):
	@staticmethod
	def forward(ctx, x, y):
		ctx.save_for_backward(x, y)
		z = x * y
		return z

	@staticmethod
	def backward(ctx, grad_z):
		x, y = ctx.saved_tensors
		grad_x = y * grad_z # dz/dx * dL/dz
		grad_y = x * grad_z # dz/dy * dL/dz
		return grad_x, grad_y