If we want to start with a topic before getting into deep learning, the perceptron is a good place to start, as it is the basic unit with which artificial neural networks (ANNs) are built. We can then use multiple perceptrons in parallel to form a dense layer. By using multiple dense layers, we can build a deep neural network (DNN).


Our first topic is to train a perceptron for the simple linear regression task. We refer to “simple” as predicting a single output from multiple inputs.
Purpose of this Notebook:
The purposes of this notebook are:
Create a dataset for linear regression task
Create our own Perceptron class from scratch
Calculate the gradient descent from scratch
Train our Perceptron
Compare our Perceptron to the one prebuilt by PyTorch
import torch
from torch import nn
from platform import python_version
python_version(), torch.__version__('3.12.12', '2.9.0+cu128')device = 'cpu'
if torch.cuda.is_available():
device = 'cuda'
device'cpu'torch.set_default_dtype(torch.float64)def add_to_class(Class):
"""Register functions as methods in created class."""
def wrapper(obj): setattr(Class, obj.__name__, obj)
return wrapperDataset¶
create dataset¶
For our supervised task, we need two set and , where is called input data and is called target data. In this case, and are a matrix and a vector respectively.
where is the number of samples and is the number of features.
from sklearn.datasets import make_regression
import random
M: int = 10_100 # number of samples
N: int = 4 # number of features
X, Y = make_regression(
n_samples=M,
n_features=N,
n_targets=1,
n_informative=N - 1,
bias=random.random(), # random true bias
noise=1
)
print(X.shape)
print(Y.shape)(10100, 4)
(10100,)
split dataset¶
X_train = torch.tensor(X[:100], device=device)
Y_train = torch.tensor(Y[:100], device=device)
X_train.shape, Y_train.shape(torch.Size([100, 4]), torch.Size([100]))X_valid = torch.tensor(X[100:], device=device)
Y_valid = torch.tensor(Y[100:], device=device)
X_valid.shape, Y_valid.shape(torch.Size([10000, 4]), torch.Size([10000]))delete raw dataset¶
del X
del YScratch model¶
weight and bias¶
Trainable parameters
where is called weight and is called bias.
class SimpleLinearRegression:
def __init__(self, n_features: int):
self.w = torch.randn(n_features, device=device)
self.b = torch.randn(1, device=device)
def copy_params(self, torch_layer: nn.modules.linear.Linear):
"""
Copy the parameters from a module.linear to this model.
Args:
torch_layer: Pytorch module from which to copy the parameters.
"""
self.b.copy_(torch_layer.bias.detach().clone())
self.w.copy_(torch_layer.weight[0,:].detach().clone())weighted sum¶
where is called predicted output data.
where
@add_to_class(SimpleLinearRegression)
def predict(self, x: torch.Tensor) -> torch.Tensor:
"""
Predict the output for input x.
Args:
x: Input tensor of shape (n_samples, n_features).
Returns:
y_pred: Predicted output tensor of shape (n_samples,).
"""
return torch.matmul(x, self.w) + self.bMSE¶
We need a loss function. We will use Mean Squared Error (MSE)
Vectorized form
@add_to_class(SimpleLinearRegression)
def mse_loss(self, y_true: torch.Tensor, y_pred: torch.Tensor):
"""
MSE loss function between target y_true and y_pred.
Args:
y_true: Target tensor of shape (n_samples,).
y_pred: Predicted tensor of shape (n_samples,).
Returns:
loss: MSE loss between predictions and true values.
"""
return ((y_pred - y_true)**2).mean().item()
@add_to_class(SimpleLinearRegression)
def evaluate(self, x: torch.Tensor, y_true: torch.Tensor):
"""
Evaluate the model on input x and target y_true using MSE.
Args:
x: Input tensor of shape (n_samples, n_features).
y_true: Target tensor of shape (n_samples,).
Returns:
loss: MSE loss between predictions and true values.
"""
y_pred = self.predict(x)
return self.mse_loss(y_true, y_pred)computing gradients¶
Gradient descent is
where their shapes are
MSE derivative¶
where
for all .
therefore
weighted sum derivative¶
respect to bias¶
where
for all .
therefore
respect to weight¶
where
for all and .
therefore
gradients¶
parameters update¶
where is called learning rate.
@add_to_class(SimpleLinearRegression)
def update(self, x: torch.Tensor, y_true: torch.Tensor, y_pred: torch.Tensor, lr: float):
"""
Update the model parameters.
Args:
x: Input tensor of shape (n_samples, n_features).
y_true: Target tensor of shape (n_samples,).
y_pred: Predicted output tensor of shape (n_samples,).
lr: Learning rate.
"""
delta = 2 * (y_pred - y_true) / len(y_true)
self.b -= lr * delta.sum()
self.w -= lr * torch.matmul(delta, x)gradient descent¶
We have assumed that we will use the entire train dataset to update our parameters, but we can use only a fraction of the samples in our train dataset to update our parameters. There are mainly 3 ways to use Gradient descent (GD).
batch GD
stochastic GD (SGD)
mini-batch GD
batch GD¶
The batch GD uses all samples of train dataset to update our parameters:
where is the number of epochs.
Remark: is an arbitrary parameter, for this model we have to update and .
stochastic GD (SGD)¶
The SGD for each epoch, we update our parameters for each sample in our train dataset:
where and are the -th
input and -th output sample of the train dataset respectly.
Note: and .
mini-batch GD¶
The mini-batch GD is intermediate between SGD and batch GD since a fraction of the dataset larger than SGD but smaller than batch GD is used to update our parameters per epoch:
where is the number of samples per minibatch and
and are the -th to -th samples.
Note: If , then mini-batch GD becomes SGD.
And if , then mini-batch GD becomes batch GD.
@add_to_class(SimpleLinearRegression)
def fit(self, x: torch.Tensor, y: torch.Tensor,
epochs: int, lr: float, batch_size: int,
x_valid: torch.Tensor, y_valid: torch.Tensor):
"""
Fit the model using gradient descent.
Args:
x: Input tensor of shape (n_samples, n_features).
y: Target tensor of shape (n_samples,).
epochs: Number of epochs to fit.
lr: learning rate.
batch_size: Int number of batch.
x_valid: Input tensor of shape (n_valid_samples, n_features).
y_valid: Target tensor of shape (n_valid_samples,).
"""
for epoch in range(epochs):
loss = []
for batch in range(0, len(y), batch_size):
end_batch = batch + batch_size
y_pred = self.predict(x[batch:end_batch])
loss.append(self.mse_loss(
y[batch:end_batch],
y_pred
))
self.update(
x[batch:end_batch],
y[batch:end_batch],
y_pred,
lr
)
loss = round(sum(loss) / len(loss), 4)
loss_v = round(self.evaluate(x_valid, y_valid), 4)
print(f'epoch: {epoch} - MSE: {loss} - MSE_v: {loss_v}')Scratch vs Torch.nn¶
Torch.nn model¶
class TorchLinearRegression(nn.Module):
def __init__(self, n_features):
super(TorchLinearRegression, self).__init__()
self.layer = nn.Linear(n_features, 1, device=device)
self.loss = nn.MSELoss()
def forward(self, x):
return self.layer(x)
def evaluate(self, x, y):
self.eval()
with torch.no_grad():
y_pred = self.forward(x)
return self.loss(y_pred, y).item()
def fit(self, x, y, epochs, lr, batch_size, x_valid, y_valid):
optimizer = torch.optim.SGD(self.parameters(), lr=lr)
for epoch in range(epochs):
loss_t = [] # train loss
for batch in range(0, len(y), batch_size):
end_batch = batch + batch_size
y_pred = self.forward(x[batch:end_batch])
loss = self.loss(y_pred, y[batch:end_batch])
loss_t.append(loss.item())
optimizer.zero_grad()
loss.backward()
optimizer.step()
loss_t = round(sum(loss_t) / len(loss_t), 4)
loss_v = round(self.evaluate(x_valid, y_valid), 4)
print(f'epoch: {epoch} - MSE: {loss_t} - MSE_v: {loss_v}')
optimizer.zero_grad()torch_model = TorchLinearRegression(N)scratch model¶
model = SimpleLinearRegression(N)evals¶
import MAPE modified¶
# This cell imports torch_mape
# if you are running this notebook locally
# or from Google Colab.
import os
import sys
module_path = os.path.abspath(os.path.join('..'))
if module_path not in sys.path:
sys.path.append(module_path)
try:
from tools.torch_metrics import torch_mape as mape
print('mape imported locally.')
except ModuleNotFoundError:
import subprocess
repo_url = 'https://raw.githubusercontent.com/PilotLeoYan/inside-deep-learning/main/content/tools/torch_metrics.py'
local_file = 'torch_metrics.py'
subprocess.run(['wget', repo_url, '-O', local_file], check=True)
try:
from torch_metrics import torch_mape as mape # type: ignore
print('mape imported from GitHub.')
except Exception as e:
print(e)mape imported locally.
predict¶
mape(
model.predict(X_valid),
torch_model.forward(X_valid).squeeze(-1)
)3447.2241178621794copy parameters¶
model.copy_params(torch_model.layer)predict after copy parameters¶
mape(
model.predict(X_valid),
torch_model.forward(X_valid).squeeze(-1)
)0.0loss¶
mape(
model.evaluate(X_valid, Y_valid),
torch_model.evaluate(X_valid, Y_valid.unsqueeze(-1))
)0.0training¶
LR = 0.01 # learning rate
EPOCHS = 16 # number of epochs
BATCH = len(X_train) // 3 # number of minibatchtorch_model.fit(
X_train,
Y_train.unsqueeze(-1),
EPOCHS, LR, BATCH,
X_valid,
Y_valid.unsqueeze(-1)
)epoch: 0 - MSE: 8327.4864 - MSE_v: 8157.4615
epoch: 1 - MSE: 7137.4902 - MSE_v: 7025.4753
epoch: 2 - MSE: 6120.8559 - MSE_v: 6054.2807
epoch: 3 - MSE: 5251.8738 - MSE_v: 5220.5685
epoch: 4 - MSE: 4508.7119 - MSE_v: 4504.4714
epoch: 5 - MSE: 3872.8212 - MSE_v: 3889.0472
epoch: 6 - MSE: 3328.4343 - MSE_v: 3359.8405
epoch: 7 - MSE: 2862.1411 - MSE_v: 2904.5115
epoch: 8 - MSE: 2462.5309 - MSE_v: 2512.5203
epoch: 9 - MSE: 2119.8892 - MSE_v: 2174.8604
epoch: 10 - MSE: 1825.9413 - MSE_v: 1883.831
epoch: 11 - MSE: 1573.6354 - MSE_v: 1632.8447
epoch: 12 - MSE: 1356.9592 - MSE_v: 1416.2634
epoch: 13 - MSE: 1170.7837 - MSE_v: 1229.2591
epoch: 14 - MSE: 1010.7313 - MSE_v: 1067.6952
epoch: 15 - MSE: 873.0641 - MSE_v: 928.0258
model.fit(
X_train, Y_train,
EPOCHS, LR, BATCH,
X_valid, Y_valid
)epoch: 0 - MSE: 8327.4864 - MSE_v: 8157.4615
epoch: 1 - MSE: 7137.4902 - MSE_v: 7025.4753
epoch: 2 - MSE: 6120.8559 - MSE_v: 6054.2807
epoch: 3 - MSE: 5251.8738 - MSE_v: 5220.5685
epoch: 4 - MSE: 4508.7119 - MSE_v: 4504.4714
epoch: 5 - MSE: 3872.8212 - MSE_v: 3889.0472
epoch: 6 - MSE: 3328.4343 - MSE_v: 3359.8405
epoch: 7 - MSE: 2862.1411 - MSE_v: 2904.5115
epoch: 8 - MSE: 2462.5309 - MSE_v: 2512.5203
epoch: 9 - MSE: 2119.8892 - MSE_v: 2174.8604
epoch: 10 - MSE: 1825.9413 - MSE_v: 1883.831
epoch: 11 - MSE: 1573.6354 - MSE_v: 1632.8447
epoch: 12 - MSE: 1356.9592 - MSE_v: 1416.2634
epoch: 13 - MSE: 1170.7837 - MSE_v: 1229.2591
epoch: 14 - MSE: 1010.7313 - MSE_v: 1067.6952
epoch: 15 - MSE: 873.0641 - MSE_v: 928.0258
predict after training¶
mape(
model.predict(X_valid),
torch_model.forward(X_valid).squeeze(-1)
)5.088568285940422e-14weight¶
mape(
model.w.clone(),
torch_model.layer.weight.detach().squeeze(0)
)1.8137912640341185e-14bias¶
mape(
model.b.clone(),
torch_model.layer.bias.detach()
)0.0
