Neural Network Fundamentals: Understanding the Core Components

8 min readFeb 14, 2023

Once upon a time, there lived the wisest and most powerful creatures in the world, known as the “Neurons.” The Neurons could solve any problem that came their way.

One day, the Neurons faced a challenge. Various mystical creatures inhabited their wonderful world, and the Neurons wanted to classify them into two groups based on their unique powers and abilities.

To solve this problem, the Neurons wielded their powers to create an incredible classification system. They built a network that contains multiple Neurons. Each Neuron was responsible for a specific task and could communicate with each other to make the final decision.

🦸 A neuron is a mathematical function that represents the basic processing unit of a neural network. A neuron takes in multiple inputs (a vector of values), computes a weighted sum of these inputs, and passes the result through an activation function.

w are weights of the neuron
x are input to the neuron
n is the number of inputs
b is the bias
z is the weighted sum

The weighted sum z is passed through an activation function, which determines the neuron's output based on the value of z.

import torch
import torch.nn as nn

class Net(nn.Module):

    def __init__(self, input_dim):
        super(Net, self).__init__()
        self.fc1 = nn.Linear(input_dim, 1)

    def forward(self, x_in):
        return torch.sigmoid(self.fc1(x_in))

The Net class inherits from PyTorch’s nn.Module, which provides convenient functionalities for building neural networks. The __init__method sets up the linear layer of the model, which takes in input_dim number of input features and outputs a single value. The forward method defines the forward pass of the model, where the input xis passed through the linear layer, and the result is transformed using the sigmoid activation function to give the final output.

But how do these Neurons decide what information to pass on and what to discard? This was a mystery until they discovered the concept of activation functions. Activation functions were like the guardians of each Neuron. They decide if the information can pass through it or not.

The most popular activation function was the Sigmoid function, a wise old dragon. The Sigmoid dragon would listen to the information and decide if it was necessary based on its value, and if it were, the dragon would allow it to pass. Otherwise, it would be discarded.

🐉 A sigmoid function maps any real-valued number to the range [0, 1], with a smooth transition in between.

# sigmoid function
# https://en.wikipedia.org/wiki/Sigmoid_function
import torch
import matplotlib.pyplot as plt

x = torch.range(-4. , 4., 0.1)
y = torch.sigmoid(x)
plt.plot(x.numpy(), y.numpy())
plt.show())

However, more than just passing information through the network was needed. The Neurons wanted to ensure the network was learning and improving. This was where Loss functions came into play. Loss functions were like referees in a game. They would judge the network’s performance and provide feedback on improving it.

The Binary Cross-Entropy loss function was represented by a kind and fair unicorn. The Binary Cross-Entropy unicorn would calculate the difference between the network’s predictions and the actual output and provide a score indicating how well the network was doing.

🦄 Binary Cross-Entropy Loss measures the discrepancy between the predicted probability of an event and the actual label. The loss is calculated by taking the negative logarithm of the predicted probability of the positive class and summing over all data points.

# binary cross-entorpy loss
import torch
import torch.nn as nn

# Define the loss function
bce_loss = nn.BCELoss()

And finally, Neurons also used an “Optimization” algorithm. Optimization algorithms, such as Adam, were like coaches who would help the network adjust its parameters to minimize the loss and improve its performance. The optimization algorithms were represented by a group of energetic and enthusiastic fairies who loved to see the network improve with each iteration.

🧚Adam (Adaptive Moment Estimation) is an optimization algorithm that computes an exponential moving average of the gradients and second moments of the weights, which adapts the learning rate for each weight parameter in the mod

import torch.optim as optim

lr = 0.01
input_dim = 5

net = Net(input_dim=input_dim)
optimizer = optim.Adam(params=net.parameters(), lr=lr)

As time passed, the Neurons continued to train and fine-tune their network. Soon, they could classify creatures with remarkable accuracy, much to the delight of the kingdom. And so, the Neurons became the kingdom’s hero, using their magical abilities to solve even the toughest of problems.

I hope you liked my story about Neurons and their power. Now let’s see how to build a binary classification model from scratch.

Binary Classification

Binary classification is a supervised learning method that assigns new observations to one of two categories. The categories are represented by the two classes, 0 and 1.

The model will be trained on data that includes features of the creatures and their corresponding group label.

Data Collection

First, I create synthetic data using the make_classification function from sklearn.datasetsmodule for our binary classification task.

from sklearn.datasets import make_classification
from sklearn.model_selection import train_test_split
import pandas as pd

n = 1000
n_features = 5
seed = 123
n_classes = 2
X, y = make_classification(n_samples=n, n_features=n_features, 
                           n_classes=n_classes, random_state=seed)

x_data = torch.tensor(X, dtype=torch.float32)
y_truth = torch.tensor(y, dtype=torch.float32)

n is the number of samples
n_features is the number of features
n_classes is the number of classes
make_classification returns two arrays: X and y
X is a 2-D array with a shape (n, n_features) that contains the feature values for each sample
y is a 1-D array with a shape (n,) that contains the binary labels (0,1) for each sample

Function torch.tensor converts generated data into tensors. It converts the feature data X into a tensor x_data with type torch.float32, and the label data y into a tensor y_truth with the same type.

The Architecture of the Neural Network

Next, I define a simple feed-forward neural network in PyTorch with a single linear layer and Sigmoid activation function.

class Net(nn.Module):

    def __init__(self, input_dim):
        super(Net, self).__init__()
        self.fc1 = nn.Linear(input_dim, 1)

    def forward(self, x_in):
        return torch.sigmoid(self.fc1(x_in))

Linear(input_dim, 1) creates a single linear layer in the neural network
input_dim argument specifies the size of the input data
1 argument specifies the number of neurons in the output layer of the network.

This single linear layer receives the input features and computes the dot product between the input values and weights. The output goes through a sigmoid activation function torch.sigmoid(self.fc1(x_in)), which maps the output value to a probability in the range [0,1].

This final probability can be interpreted as the predicted likelihood of the input belonging to one of the two classes.

Loss Function and Optimizer

I use the binary cross-entropy loss (BCE) loss function to evaluate the network’s performance.

lr = 0.01
input_dim = 5

net = Net(input_dim=input_dim)
optimizer = optim.Adam(params=net.parameters(), lr=lr)
bce_loss = nn.BCELoss()

lr is the learning rate that determines the step size of the optimizer algorithm in adjusting the model parameters
input_dim represents the number of features in the input data
net is an instance of a PyTorch neural network class Net that takes input_dim as a parameter.
optimizer is an instance of the Adam optimizer from the PyTorch optim module. The Adam optimizer updates the model parameters based on the gradients computed during backpropagation. The parameters that the optimizer will update are the parameters of the net model. The learning rate of the Adam optimizer is set to lr
bce_loss is an instance of the binary cross-entropy loss function from the PyTorch nn module. This loss function will evaluate the difference between the predicted outputs and the true labels and compute the model's training loss.

Training Model

Finally, we can train our neural network.

n_epochs = 100

losses = []

change = 1.0
last_loss = 10.0
epsilon = 1e-3

for epoch in range(n_epochs):

    optimizer.zero_grad()
    y_pred = neuron(x_data).squeeze()
    loss = bce_loss(y_pred, y_truth)
    loss.backward()
    optimizer.step()
    
    losses.append(loss.item())
    
    if epoch % 10 == 0:
        print(f"Epoch: {epoch}; Loss: {loss.item()}; Change: {change}")
        
    change = abs(last_loss - loss.item())
    last_loss = loss.item()
    
    if change <= epsilon:
        break

Let’s take a look at how the training works:

The training starts with a for loop that runs for a specified number of epochs. An epoch is a complete iteration of the training data.
Before each iteration of the loop, the gradients of the parameters are set to zero using the optimizer.zero_grad() method.
The model’s predictions, y_pred, are obtained by passing the input x_data through the net model. The .squeeze() method removes any redundant dimensions from the output.
The binary cross-entropy loss function bce_loss calculates the loss between the model’s predictions and the ground truth labels, y_truth.
loss.backward() method calculates the gradients of the parameters for the loss.
The optimizer (Adam) updates the parameters using the optimizer.step().
The loss value is recorded and appended to the losses list.
The change in the loss value is calculated as the absolute difference between the current loss value and the last loss value.
If the change in the loss value is less than or equal to a specified threshold (epsilon), the training process is terminated.

The complete code:

import torch
import torch.nn as nn
import torch.nn.functional as F
import torch.optim as optim
from sklearn.datasets import make_classification

class Net(nn.Module):

    def __init__(self, input_dim):
        super(Net, self).__init__()
        self.fc1 = nn.Linear(input_dim, 1)

    def forward(self, x_in):
        return torch.sigmoid(self.fc1(x_in))


n = 1000
n_features = 5
seed = 123
n_classes = 2
X, y = make_classification(n_samples=n, n_features=n_features, 
                           n_classes=n_classes, random_state=seed)

x_data = torch.tensor(X, dtype=torch.float32)
y_truth = torch.tensor(y, dtype=torch.float32)

lr = 0.01
input_dim = 5

batch_size = 1000
n_epochs = 100
n_batches = 5

net = Net(input_dim=input_dim)
optimizer = optim.Adam(params=net.parameters(), lr=lr)
bce_loss = nn.BCELoss()

losses = []

change = 1.0
last_loss = 10.0
epsilon = 1e-3
for epoch in range(n_epochs):

    optimizer.zero_grad()
    y_pred = net(x_data).squeeze()
    loss = bce_loss(y_pred, y_truth)
    loss.backward()
    optimizer.step()
    
    losses.append(loss.item())
    
    if epoch % 10 == 0:
        print(f"Epoch: {epoch}; Loss: {loss.item()}; Change: {change}")
        
    change = abs(last_loss - loss.item())
    last_loss = loss.item()
    
    if change <= epsilon:
        break

The end ✨.