Part 1: Deep Learning for Engineers

Evolution of NN: the basics

Published in

Bumble Tech

12 min readMay 7, 2020

Introduction

Deep learning has become one of the most discussed topics in the Data Science community. It has long made sense for the Bumble team — the parent company operating Badoo and Bumble apps — to use deep learning on a multitude of tasks. With over millions of users around the world on both apps, it was important for us to create a safe environment for all our users.

In this first post, we will look at a bit of the history of neural networks and some of the main concepts, like perceptrons, layers, activation functions, etc.

To better explain the main concepts we will see how these concepts can be implemented from scratch using python. We will then see how easy it is using libraries like Keras. It’s very important to understand the basics, as the devil is in the detail and sometimes not knowing the theory can prevent practitioners from obtaining good results.

In a second article, we will explore why deep learning is so successful and why there’s so much room to grow in this field.

History of Machine Learning

Machine Learning has been around for years. Between the 1950s and the 1970s, the world saw the first, new, major era of AI discoveries, with applications in algebra, geometry, language, and robotics. The results were so astonishing that the field gained a lot of attention, but when the huge expectations were not met, research funding was cut off and interest in AI dwindled. The first Artificial Neural Network (ANN) was only able to perform simple logical functions but nowadays we are not sure what the limits of ANN are and we already have a multitude of networks that can solve even the most complex problems.

Rise of Neural Networks

Fast-forwarding to recent years when increasing numbers of devices started producing a huge amount of data and we had massive computational power, Machine Learning (ML) techniques started becoming more and more useful in business. In particular, the advent of the Graphic Processing Unit (GPU) enabled huge neural networks, commonly known as Deep Neural Networks (DNNs), on very big datasets, to be efficiently trained.

Despite the huge success of these techniques, it seems that we are only at the very beginning of a huge revolution: for the first time in history machines are now able to make better decisions than humans. This opens whole new possibilities, not only on the engineering side but also the philosophical, the hope being that by improving artificial intelligence we will be able to understand better human intelligence.

What is a Neural Network?

Neural Networks (NN) are algorithms that try to mimic the human brain. It’s composed of interconnected neurons, disposed in layers, that process a signal to produce an output.

They are used extensively in supervised learning: a machine learning technique that learns to predict outputs from certain inputs, through exposure to multiple input/output examples.

NN are so powerful because they are universal approximators, meaning that in theory they can approximate any function. And when I say I really mean ‘any’. Then in practice, it is just a question of the amount of time and computational power you have.

All you need to do to train your network is provide a lot of inputs with the corresponding output. I should also mention that you might need to experiment with a few network configurations to find the right one.

If that sounds like a lot of work, there is something to help: transfer learning.

Transfer Learning is a technique that allows the reuse of pre-trained networks for different purposes to those they were originally trained for. That might appear to be quite a convenient feature but it does not come cheap. You will probably have to retrain at least part of the neural network, but we will look at this in another post.

How Neural Networks work

First of all, let’s start with the basics. In this blog post, we will be focusing on supervised learning. The concept is quite simple: using some data with inputs and outputs, we create a model that, given the input, produces an output close to the real one.

For example, an input might be an image and the output might be whether or not that image contains hot dogs, like JinJang application. Ideally, this function will also be able to correctly classify unseen input, avoiding overfitting.

Neural networks are based on perceptrons. A perceptron is a simple function: it can accept several inputs and apply some weights to them. Then it sums all the weighted inputs and passes them to a function (called ‘activation’) that determines whether the signal will pass or not.

The mathematical representation of the diagram above would be:

When training the neuron to solve a particular task we only change the weights, everything else is decided before the training phase.

Implementation in Python

Python has become the goto language for Data Science and has a clear advantage over languages like R in Deep Learning because advanced libraries like Tensorflow and PyTorch have python API. Python is also the goto language for the DS team here at MagicLab. To have a better understanding of the theory we will see how to implement a perceptron and a simple NN in Python. Let’s start with defining the perceptron class, which must have a set of weights, an activation function and a way to train the network.

class Perceptron(object):
"""
Simple implementation of the perceptron algorithm
"""
# Initialising the weights:
    def __init__(self, w0=1, w1=0.1, w2=0.1):
    # weights
        self.w0 = w0 # bias
        self.w1 = w1
        self.w2 = w2

The perceptron needs an activation function, which is quite simple: if the weighted sum of all inputs (z) is higher than zero, it will let the signal pass (by returning 1) otherwise it will block the signal by returning 0.

Prior to the activation function, we want to define the function that calculates the weighted sum.

def weighted_sum_inputs(self, x1, x2):
    return sum([1 * self.w0, x1 * self.w1, x2 * self.w2])

Now we can define the step function

def step_function(self, z):
    if z >= 0:
        return 1
    else:
        return 0

We need to add a function that calculates the final prediction that the perceptron will make. This is simply the sum of all weighted inputs passed through the activation function.

Note: all functions that accept self as input are part of the perceptron class.

def predict(self, x1, x2):
    """
    Uses the step function to determine the output
    """
    z = self.weighted_sum_inputs(x1, x2)
    return self.step_function(z)

We will also define a simple function to display how the perceptron divides the input space with its predictions.

def predict_boundary(self, x):
    """
    Used to predict the boundaries of our classifier
    """
    return -(self.w1 * x + self.w0) / self.w2

Finally, we need a function to train the perceptron by finding the weights that minimise the difference between the predictions made by our model and the real output.

def fit(self, X, y, epochs=1, step=0.1, verbose=True):
    """
    Train the model given the dataset
    """
    errors = []
    for epoch in range(epochs):
        error = 0
        for i in range(0, len(X.index)):
            x1, x2, target = X.values[i][0], X.values[i][1], y.values[i]
            # The update is proportional to the step size and the error
            update = step * (target - self.predict(x1, x2))
            self.w1 += update * x1
            self.w2 += update * x2
            self.w0 += update
            error += int(update != 0.0)
            errors.append(error)
            if verbose:
                print('Epochs: {} - Error: {} - Errors from all epochs: {}'\
                .format(epoch, error, errors))

To making it easier

Even for a simple perceptron we had to write quite a few lines of code. Complex networks have a huge number of perceptrons (which in a network are called ‘neurons’) making the task very complicated. Luckily for us, it’s not necessary to rewrite all the components of the algorithm every time because a library can be used for that. The three largest most popular libraries for such tasks are: Keras, Tensorflow and Pytorch. Among the three, Keras is the easiest, being specifically designed for neural networks, while the other two are for more generic computations such as performing mathematical operations with tensors and computing graphs.

Using Keras our perceptron is only a few lines of code away.

First of all, we need to define the type of model to use. We opted for the sequential, which allows us to add layers one by one.

my_perceptron =Sequential()

Our perceptron only requires one which allows us to add layers one at time. We need a dense layer, which connects all the inputs with all the outputs. Also, we know that we only need one neuron (as we are implementing the perceptron) and we have two different inputs: x and y. We also want a linear activation function, and we reduce all weights to zero.

my_perceptron.add(Dense(1, input_dim=2, init=”zero”, activation=”linear”))

Now we need to compile the sequential model which simply specifies what loss function we will be using to measure our model’s errors, and the optimiser we are going to use to find the weights that minimise the loss.

my_perceptron.compile(loss=”mse”, optimizer=SGD(lr=0.01))

Now we can train the model providing inputs and outputs. We also specify the number of epochs (how many times the dataset will go through the network) and the batch size, which is how much data we will process before updating the weights.

my_perceptron.fit(train_x.values, train_y, nb_epoch=1, batch_size=1, shuffle=False)

And we are done! This algorithm is quite simple but it is also very limited, it can only be used in linearly separable inputs, which means that we can describe the rule for the classification using a straight line. This makes sense given that you can recognise the linear equation in the formula above.

But many problems are not linearly separable, such the XOR problem which also returns 0 when the outputs are equal, and 1 otherwise. Let’s visualise a XOR type of problem:

Fortunately, we can add non-linearity by simply adding more neurons in a layered structured like this:

Neural networks are incredibly powerful, so much so that there is a theorem that states that a NN with two layers can approximate any possible function under mild assumptions.

Solving everything with a two-layer network is impractical though, and the best approach is to create very deep networks and provide a good healthy dataset.

Loading the Data Set

Some standard datasets can be loaded from the Keras library itself. For example, we can use the mnist dataset which consists of different images of handwritten digits.

import keras
from keras.datasets import mnist

We also know that there are 10 different categories (one for each digit, so we can define a variable to store this information

num_classes = 10

Let’s now define some of the main features of a neural network.

This time we want to define a larger batch size because it will not be efficient to update the weights every single time, and also the training will be done quite irregularly. It’s better to select a size that fits in memory, so this will depend on your hardware. We decided to pick 256 for this example

batch_size = 256

Also, we will pass the dataset through the network twice, just to finish the training sooner. Normally the epoch will be much higher.

epochs = 2
# the data, split between train and test sets
(X_train, y_train), (X_test, y_test) = mnist.load_data()
# input image dimensions
img_rows, img_cols = X_train[0].shape
# Reshaping the data to use it in our network
X_train = X_train.reshape(X_train.shape[0], img_rows, img_cols, 1)
X_test = X_test.reshape(X_test.shape[0], img_rows, img_cols, 1)
input_shape = (img_rows, img_cols, 1)
# Scaling the data
X_train = X_train / 255.0
X_test = X_test / 255.0
# convert class vectors to binary class matrices
y_train = keras.utils.to_categorical(y_train, num_classes)
y_test = keras.utils.to_categorical(y_test, num_classes)

Coding a Neural Network

Now let’s see how it’s possible to create a neural network in Python.

Pure Python implementation can be lengthy, as shown in the perceptron example.

We need to define the activation function and its derivative, as we will use the gradient to find the weights that minimise the network error.

def sigmoid(s):
    # Activation function
    return 1 / (1 + np.exp(-s))

def sigmoid_prime(s):
    # Derivative of the sigmoid
    return sigmoid(s) * (1 - sigmoid(s))

Now we can create a class to define the Feed Forward Neural Network, with all the weights, the neural layers and the backward and forward function for training.

class FFNN(object):
    def __init__(self, input_size=2, hidden_size=2, output_size=1):
    # Adding 1 as it will be our bias
        self.input_size = input_size + 1
        self.hidden_size = hidden_size + 1
        self.output_size = output_size
        self.o_error = 0
        self.o_delta = 0
        self.z1 = 0
        self.z2 = 0
        self.z3 = 0
        self.z2_error = 0
        # The whole weight matrix, from the inputs till the hidden layer
        self.w1 = np.random.randn(self.input_size, self.hidden_size)
        # The final set of weights from the hidden layer till the output layer
        self.w2 = np.random.randn(self.hidden_size, self.output_size)
        
    def forward(self, X):
        # Forward propagation through our network
        X['bias'] = 1 # Adding 1 to the inputs to include the bias in the weight
        self.z1 = np.dot(X, self.w1) # dot product of X (input) and first set of 3x2 weights
        self.z2 = sigmoid(self.z1) # activation function
        self.z3 = np.dot(self.z2, self.w2) # dot product of hidden layer (z2) and second set of 3x1 weights
        o = sigmoid(self.z3) # final activation function
        return o
    
    def backward(self, X, y, output, step):
        # Backward propagation of the errors
        X['bias'] = 1 # Adding 1 to the inputs to include the bias in the weight
        self.o_error = y - output # error in output
        self.o_delta = self.o_error * sigmoid_prime(output) * step # applying derivative of sigmoid to error
        self.z2_error = self.o_delta.dot(
        self.w2.T) # z2 error: how much our hidden layer weights contributed to output error
        self.z2_delta = self.z2_error * sigmoid_prime(self.z2) * step # applying derivative of sigmoid to z2 error
        self.w1 += X.T.dot(self.z2_delta) # adjusting first of weights
        self.w2 += self.z2.T.dot(self.o_delta) # adjusting second set of weights
        
    def predict(self, X):
        return forward(self, X)
    def fit(self, X, y, epochs=10, step=0.05):
        for epoch in range(epochs):
            X['bias'] = 1 # Adding 1 to the inputs to include the bias in the weight
            output = self.forward(X)
             self.backward(X, y, output, step)

At this point we have a network that can be trained and can model our XOR problem. We can now use it to predict the label of unseen inputs. But what if we have more than two classes?

Convolutional NN with Keras

Let’s see how a multiclass classifier can be implemented in Keras. Again, we need to define a sequential model and add the layers of the neural network. We will define a very simple network with a few basic operations. This time we will use Conv2D layers that allow us to perform convolution. The convolution operation multiplies two matrices with each other to obtain a single number. In this case, we have 32 matrices with size 3x3 called filters that will be convoluted with the input data. These matrices will change their weights during training to reduce the output error. We will also perform a few other operations like MaxPooling2D, Dropout and Flatten. The Maxpool operation is a way to downsample the input image and save some computational power. It turns out that the signal will likely be present even with a downsampled image.

We will also use dropout to avoid overfitting. Dropout will randomly drop a certain percentage of connections between neurons, this will help the network focus on the most important pattern. Lastly, we use the Flatten function to transform the data from a matrix to a vector for the final classification. The final layer will do the actual classification and it will use this vector as input.

model = Sequential()
model.add(Conv2D(32, kernel_size=(3, 3),
                 activation='relu',
                 input_shape=input_shape))
model.add(Conv2D(32, (3, 3), activation='relu'))
model.add(MaxPooling2D(pool_size=(2, 2)))
model.add(Dropout(0.25))
model.add(Flatten())
model.add(Dense(128, activation='relu'))
model.add(Dropout(0.3))
model.add(Dense(num_classes, activation='softmax'))

This time we decided to use categorical cross-entropy as a loss function as we have more categories. We also use Adam as an optimiser, which is a gradient descent method.

loss = ‘categorical_crossentropy’
optimizer = ‘adam’

Now we can compile the model with the chosen loss and optimiser. We also should keep an eye on accuracy.

model.compile(loss=loss, optimizer=optimizer, metrics=['accuracy'])

Now we can train the model.

model.fit(X_train, y_train,
    batch_size=batch_size,
    epochs=epochs,
    verbose=1,
    validation_data=(X_test, y_test))
    score = model.evaluate(X_test, y_test, verbose=0)print(f'Test loss: { score[0]} - Test accuracy: {score[1]}')

Conclusion

In this first post we wanted to provide an in depth explanation of some of the major concepts around Neural Networks. To recap, we looked at:

Basic concepts of neural networks (perceptrons, activation functions and connections (weights)
Implementation of a perceptron from scratch to gain a practical understanding of the main NN concepts
Implementation leveraging Keras.
NN, using the same approach. We then saw how it’s possible to implement a neural network from scratch in Python and also in Keras.
More complex concepts like convolutional layers, dropouts and convolutional layers.

You now have more tools to help you understand what’s really going inside Neural Networks, regarded by many to be black boxes.

In a second post, we’ll explore how these concepts can be used to automatically extract features.