# Neural Network Fundamentals: Understanding the Core Components

--

Once upon a time, there lived the wisest and most powerful creatures in the world, known as the “Neurons.” The Neurons could solve any problem that came their way.

One day, the Neurons faced a challenge. Various mystical creatures inhabited their wonderful world, and the Neurons wanted to classify them into two groups based on their unique powers and abilities.

To solve this problem, the Neurons wielded their powers to create an incredible classification system. They built a network that contains multiple Neurons. Each Neuron was responsible for a specific task and could communicate with each other to make the final decision.

🦸 A neuron is a mathematical function that represents the basic processing unit of a neural network. A neuron takes in multiple inputs (a vector of values), computes a weighted sum of these inputs, and passes the result through an activation function.

• w are weights of the neuron
• x are input to the neuron
• n is the number of inputs
• b is the bias
• z is the weighted sum

The weighted sum z is passed through an activation function, which determines the neuron's output based on the value of z.

`import torchimport torch.nn as nnclass Net(nn.Module):    def __init__(self, input_dim):        super(Net, self).__init__()        self.fc1 = nn.Linear(input_dim, 1)    def forward(self, x_in):        return torch.sigmoid(self.fc1(x_in))`

The `Net` class inherits from PyTorch’s `nn.Module`, which provides convenient functionalities for building neural networks. The `__init__`method sets up the linear layer of the model, which takes in `input_dim` number of input features and outputs a single value. The `forward` method defines the forward pass of the model, where the input `x`is passed through the linear layer, and the result is transformed using the sigmoid activation function to give the final output.

But how do these Neurons decide what information to pass on and what to discard? This was a mystery until they discovered the concept of activation functions. Activation functions were like the guardians of each Neuron. They decide if the information can pass through it or not.

The most popular activation function was the Sigmoid function, a wise old dragon. The Sigmoid dragon would listen to the information and decide if it was necessary based on its value, and if it were, the dragon would allow it to pass. Otherwise, it would be discarded.

🐉 A sigmoid function maps any real-valued number to the range [0, 1], with a smooth transition in between.

`# sigmoid function# https://en.wikipedia.org/wiki/Sigmoid_functionimport torchimport matplotlib.pyplot as pltx = torch.range(-4. , 4., 0.1)y = torch.sigmoid(x)plt.plot(x.numpy(), y.numpy())plt.show())`

However, more than just passing information through the network was needed. The Neurons wanted to ensure the network was learning and improving. This was where Loss functions came into play. Loss functions were like referees in a game. They would judge the network’s performance and provide feedback on improving it.

The Binary Cross-Entropy loss function was represented by a kind and fair unicorn. The Binary Cross-Entropy unicorn would calculate the difference between the network’s predictions and the actual output and provide a score indicating how well the network was doing.

🦄 Binary Cross-Entropy Loss measures the discrepancy between the predicted probability of an event and the actual label. The loss is calculated by taking the negative logarithm of the predicted probability of the positive class and summing over all data points.

`# binary cross-entorpy lossimport torchimport torch.nn as nn# Define the loss functionbce_loss = nn.BCELoss()`

And finally, Neurons also used an “Optimization” algorithm. Optimization algorithms, such as Adam, were like coaches who would help the network adjust its parameters to minimize the loss and improve its performance. The optimization algorithms were represented by a group of energetic and enthusiastic fairies who loved to see the network improve with each iteration.

🧚Adam (Adaptive Moment Estimation) is an optimization algorithm that computes an exponential moving average of the gradients and second moments of the weights, which adapts the learning rate for each weight parameter in the mod

`import torch.optim as optimlr = 0.01input_dim = 5net = Net(input_dim=input_dim)optimizer = optim.Adam(params=net.parameters(), lr=lr)`

As time passed, the Neurons continued to train and fine-tune their network. Soon, they could classify creatures with remarkable accuracy, much to the delight of the kingdom. And so, the Neurons became the kingdom’s hero, using their magical abilities to solve even the toughest of problems.

I hope you liked my story about Neurons and their power. Now let’s see how to build a binary classification model from scratch.

# Binary Classification

Binary classification is a supervised learning method that assigns new observations to one of two categories. The categories are represented by the two classes, 0 and 1.

The model will be trained on data that includes features of the creatures and their corresponding group label.

## Data Collection

First, I create synthetic data using the `make_classification` function from `sklearn.datasets`module for our binary classification task.

`from sklearn.datasets import make_classificationfrom sklearn.model_selection import train_test_splitimport pandas as pdn = 1000n_features = 5seed = 123n_classes = 2X, y = make_classification(n_samples=n, n_features=n_features,                            n_classes=n_classes, random_state=seed)x_data = torch.tensor(X, dtype=torch.float32)y_truth = torch.tensor(y, dtype=torch.float32)`
• `n` is the number of samples
• `n_features` is the number of features
• `n_classes` is the number of classes
• `make_classification` returns two arrays: `X` and `y`
• `X` is a 2-D array with a shape `(n, n_features)` that contains the feature values for each sample
• `y` is a 1-D array with a shape `(n,)` that contains the binary labels (0,1) for each sample

Function `torch.tensor` converts generated data into tensors. It converts the feature data `X` into a tensor `x_data` with type `torch.float32`, and the label data `y` into a tensor `y_truth` with the same type.

## The Architecture of the Neural Network

Next, I define a simple feed-forward neural network in PyTorch with a single linear layer and Sigmoid activation function.

`class Net(nn.Module):    def __init__(self, input_dim):        super(Net, self).__init__()        self.fc1 = nn.Linear(input_dim, 1)    def forward(self, x_in):        return torch.sigmoid(self.fc1(x_in))`
• `Linear(input_dim, 1)` creates a single linear layer in the neural network
• `input_dim` argument specifies the size of the input data
• `1` argument specifies the number of neurons in the output layer of the network.

This single linear layer receives the input features and computes the dot product between the input values and weights. The output goes through a sigmoid activation function `torch.sigmoid(self.fc1(x_in))`, which maps the output value to a probability in the range [0,1].

This final probability can be interpreted as the predicted likelihood of the input belonging to one of the two classes.

## Loss Function and Optimizer

I use the binary cross-entropy loss (BCE) loss function to evaluate the network’s performance.

`lr = 0.01input_dim = 5net = Net(input_dim=input_dim)optimizer = optim.Adam(params=net.parameters(), lr=lr)bce_loss = nn.BCELoss()`
• `lr` is the learning rate that determines the step size of the optimizer algorithm in adjusting the model parameters
• `input_dim` represents the number of features in the input data
• `net` is an instance of a PyTorch neural network class `Net` that takes `input_dim` as a parameter.
• `optimizer` is an instance of the Adam optimizer from the PyTorch `optim` module. The Adam optimizer updates the model parameters based on the gradients computed during backpropagation. The parameters that the optimizer will update are the parameters of the `net` model. The learning rate of the Adam optimizer is set to `lr`
• `bce_loss` is an instance of the binary cross-entropy loss function from the PyTorch `nn` module. This loss function will evaluate the difference between the predicted outputs and the true labels and compute the model's training loss.

## Training Model

Finally, we can train our neural network.

`n_epochs = 100losses = []change = 1.0last_loss = 10.0epsilon = 1e-3for epoch in range(n_epochs):    optimizer.zero_grad()    y_pred = neuron(x_data).squeeze()    loss = bce_loss(y_pred, y_truth)    loss.backward()    optimizer.step()        losses.append(loss.item())        if epoch % 10 == 0:        print(f"Epoch: {epoch}; Loss: {loss.item()}; Change: {change}")            change = abs(last_loss - loss.item())    last_loss = loss.item()        if change <= epsilon:        break`

Let’s take a look at how the training works:

1. The training starts with a for loop that runs for a specified number of epochs. An epoch is a complete iteration of the training data.
2. Before each iteration of the loop, the gradients of the parameters are set to zero using the `optimizer.zero_grad()` method.
3. The model’s predictions, `y_pred`, are obtained by passing the input `x_data` through the `net` model. The `.squeeze()` method removes any redundant dimensions from the output.
4. The binary cross-entropy loss function `bce_loss` calculates the loss between the model’s predictions and the ground truth labels, `y_truth`.
5. `loss.backward()` method calculates the gradients of the parameters for the loss.
6. The optimizer (Adam) updates the parameters using the `optimizer.step()`.
7. The loss value is recorded and appended to the `losses` list.
8. The change in the loss value is calculated as the absolute difference between the current loss value and the last loss value.
9. If the change in the loss value is less than or equal to a specified threshold (`epsilon`), the training process is terminated.

The complete code:

`import torchimport torch.nn as nnimport torch.nn.functional as Fimport torch.optim as optimfrom sklearn.datasets import make_classificationclass Net(nn.Module):    def __init__(self, input_dim):        super(Net, self).__init__()        self.fc1 = nn.Linear(input_dim, 1)    def forward(self, x_in):        return torch.sigmoid(self.fc1(x_in))n = 1000n_features = 5seed = 123n_classes = 2X, y = make_classification(n_samples=n, n_features=n_features,                            n_classes=n_classes, random_state=seed)x_data = torch.tensor(X, dtype=torch.float32)y_truth = torch.tensor(y, dtype=torch.float32)lr = 0.01input_dim = 5batch_size = 1000n_epochs = 100n_batches = 5net = Net(input_dim=input_dim)optimizer = optim.Adam(params=net.parameters(), lr=lr)bce_loss = nn.BCELoss()losses = []change = 1.0last_loss = 10.0epsilon = 1e-3for epoch in range(n_epochs):    optimizer.zero_grad()    y_pred = net(x_data).squeeze()    loss = bce_loss(y_pred, y_truth)    loss.backward()    optimizer.step()        losses.append(loss.item())        if epoch % 10 == 0:        print(f"Epoch: {epoch}; Loss: {loss.item()}; Change: {change}")            change = abs(last_loss - loss.item())    last_loss = loss.item()        if change <= epsilon:        break`

The end ✨.