Ariel University

Course name: Deep learning and natural language processing

Course number: 7061510-1

Lecturer: Dr. Amos Azaria

Edited by: Moshe Hanukoglu

Date: First Semester 2018-2019

Multilayer Perceptron

Introduction

We will define some definitions before we begin to engage in this subject in depth.

Perceptron

This is a small unit that has inputs, a function to process the inputs, and an output.

This unit looks like this:

Part One: Inputs
Each input has weight. We denote the input $i$ as $x_i$ and the weight of $x_i$ as $w_i$. Each input can be 0 or 1. Sometimes we add one input called bais and its value will always be 1 and its weight will be marked as $b$.

Part Two: Activation function
Multiply all its weight and input and sum everything, that is $\sum_{i = 1}^{n} {w_ix_i}$ and then put the result into a given function. We will talk about the types of functions below.

Part Three: Output
The output of the unit will be the output given by the function. If this unit is connected to other units (as we will see below) then each of the other units connected to it will receive the output of this unit.

Multilayer Perceptron (Neural Network, NN)

is a structure composed of several perceptron that are connected to each other.

This structure looks like this:

credit

When each line of perceptron is called a layer, the first layer is called the input layer, the last layer is called the output layer and the other layers are called hidden layers.

Note: In cases where all neurons do the same, we initialize the weights to small random values so that we do not get the same result in all neurons.

Fully-connected

Is a multilayer perceptron that each perceptron level $i$ is connected to each perceptron at the $i + 1$ level, as in the image above.

Feedforward and backpropagation

This method is to find the right sounds on the bow so that the output is correct. In linear / logistic regression we looked for the right weights to get the right result when we put in new input, here, too, we want to find the right weights.
The difference between the cases is the difficulty in finding the weights, because in linear / logistic regression one group of weights was found that was almost independent of them. Now in neural network you have to find all the weights that are on the arches between the layers and any change in weight affects all the weights that are connected to it.

To expand knowledge link

^ Contents

Why use Multilayer Perceptron?

Until now, all learning was based on features we knew about data in advance and we taught the model to predict things based solely on the features we knew in advance. And we were able to learn only about these qualities.

The use of multilayer perceptron comes to help the model learn about more featurs and also more abstract connections.

^ Contents

Activation Function

An activation function is one function of a set of functions into which we insert the sum $\sum_{i = 1}^{n} {w_ix_i}$ and it gives some value other than the sum.

The question is why not stay with the amount we received as it is and output it but put it into the activation function?

If we do not use the activation function, but each neuron exits the sum $\sum_{i = 1}^{n} {w_ix_i}$ then if we analyze and open parentheses in the calculation of the final output we find that the final output (for each neuron in the output layer) is from the form $x_i*(Linear\ combination\ of\ weights)$

This means that the equation is exactly the same equation as linear regression, and we want to build the neural network so as not to reach the same answer again as linear regression but to another answer.

Therefore, in the output of each neuron we perform an activation function in order to "break" the linearity.

There are many types of activation functions (as can be seen in the link).

We'll display three common functions:

  1. $ReLU(x) = \left\{\begin{array}{ll}x & x > 0 \\0 & otherwise\end{array}\right.$
  2. $Leaky ReLU(\alpha,x) = \left\{\begin{array}{ll}x & x > 0 \\\alpha x & otherwise\end{array}\right.$
  3. $ELU(\alpha,x) = \left\{\begin{array}{ll}x & x > 0 \\\alpha(e^x-1) & otherwise\end{array} \right.$

The $Leaky ReLU(\alpha,x)$ and $ELU(\alpha,x)$ prevent dead neurons because negative values also get value. Meaning that it will be gradient for any value and we can continue to calculate the values of the neuron.

^ Contents

MultiLayer Perceptron For Identifying Distance

Data pairs of numbers and each pair given a label.
You have to build a model that could give the right label to a couple of numbers that he had not seen before.

In [8]:
import tensorflow as tf
import numpy as npfeatures = 2
hidden_layer_nodes = 10
x = tf.placeholder(tf.float32, [None, features])
y_ = tf.placeholder(tf.float32, [None, 1])

Construction of the first layer.

The truncated_normal function randomly initializes the weights with a normal distribution and dependence on stddev.

We give a positive bias to prevent dead neurons, because once a neuron reaches 0, it will always remain with 0 as a result of the ReLU.

In [9]:
W1 = tf.Variable(tf.truncated_normal([features,hidden_layer_nodes], stddev=0.1))
b1 = tf.Variable(tf.constant(0.1, shape=[hidden_layer_nodes]))
z1 = tf.add(tf.matmul(x,W1),b1)
a1 = tf.nn.relu(z1)
In [10]:
W2 = tf.Variable(tf.truncated_normal([hidden_layer_nodes,1], stddev=0.1))
b2 = tf.Variable(0.)
z2 = tf.matmul(a1,W2) + b2
In [11]:
y = tf.nn.sigmoid(z2)
loss = tf.reduce_mean(-(y_ * tf.log(y) + (1 - y_) * tf.log(1 - y)))
update = tf.train.GradientDescentOptimizer(0.0001).minimize(loss)data_x = np.array([[2,32], [25,1.2], [5,25.2], [23,2], [56,8.5], [60,60], [3,3], [46,53], [3.5,2]])
data_y = np.array([[1], [1], [1], [1], [1], [0], [0], [0], [0]])sess = tf.Session()
sess.run(tf.global_variables_initializer())
for i in range(0,50000):
sess.run(update, feed_dict = {x:data_x, y_:data_y})

The structure of this network is 2 neurons in the input layer, 10 neurons in the middle layer (hidden layer) and one neuron in the output layer.

We will present the structure of the network in the form of matrices:
The addition of the values of b we have not shown, this is only general evidence.

In [12]:
print('prediction: ', y.eval(session=sess, feed_dict = 	{x:[[13,12], [0,33], [40,3], [1,1], [50,50]]}))
prediction:  [[6.6102937e-02]
 [9.9863952e-01]
 [9.9742377e-01]
 [3.8266489e-01]
 [1.0349377e-04]]

^ Contents

Predicting Handwritten Numbers

In order for us to begin the prediction we need to teach our model on an existing dataset and for each data point there is a label of the actual number it represents.

Dataset's very familiar handwritten numbers is MNIST, a dataset that contains 60,000 samples with labels.

Build the NN model and use SoftMax to decide which digit.

Source: link

In [13]:
import tensorflow as tf
from tensorflow.examples.tutorials.mnist import input_data
import matplotlib.pyplot as plt
import numpy as np# Dounload the MNIST dataset and save it in MNIST_data folder.
# one_hot is One_hot means that for each data point we create an array the size of the classes, fill the array with '0',
# and only in the class whose data point belongs to it we will denote '1'.
mnist = input_data.read_data_sets("MNIST_data/", one_hot=True)(hidden1_size, hidden2_size) = (100, 50)
x = tf.placeholder(tf.float32, [None, 784])
y_ = tf.placeholder(tf.float32, [None, 10])W1 = tf.Variable(tf.truncated_normal([784, hidden1_size], stddev=0.1))
b1 = tf.Variable(tf.constant(0.1, shape=[hidden1_size]))
z1 = tf.nn.relu(tf.matmul(x,W1) + b1)W2 = tf.Variable(tf.truncated_normal([hidden1_size, hidden2_size], stddev=0.1))
b2 = tf.Variable(tf.constant(0.1, shape=[hidden2_size]))
z2 = tf.nn.relu(tf.matmul(z1,W2) + b2)W3 = tf.Variable(tf.truncated_normal([hidden2_size, 10], stddev=0.1))
b3 = tf.Variable(tf.constant(0.1, shape=[10]))y = tf.nn.softmax(tf.matmul(z2, W3) + b3)
cross_entropy = tf.reduce_mean(-tf.reduce_sum(y_ * tf.log(y), reduction_indices=[1]))
train_step = tf.train.GradientDescentOptimizer(0.001).minimize(cross_entropy)sess = tf.Session()
sess.run(tf.global_variables_initializer())correct_prediction = tf.equal(tf.argmax(y,1), tf.argmax(y_,1))
accuracy = tf.reduce_mean(tf.cast(correct_prediction, tf.float32))iteration_axis = np.array([])
accuracy_axis = np.array([])for i in range (500):
    for _ in range(1000):
batch_xs, batch_ys = mnist.train.next_batch(100)
sess.run(train_step, feed_dict={x: batch_xs, y_: batch_ys})
    iteration_axis = np.append(iteration_axis, [i], axis=0)
    accuracy_axis = np.append(accuracy_axis, [sess.run(accuracy, feed_dict={x: mnist.test.images, y_: mnist.test.labels})], axis=0)
    print("Iteration: ",i, " Accuracy:", sess.run(accuracy, feed_dict={x: mnist.test.images, y_: mnist.test.labels}))
Extracting MNIST_data/train-images-idx3-ubyte.gz
Extracting MNIST_data/train-labels-idx1-ubyte.gz
Extracting MNIST_data/t10k-images-idx3-ubyte.gz
Extracting MNIST_data/t10k-labels-idx1-ubyte.gz
Iteration:  0  Accuracy: 0.354
Iteration:  1  Accuracy: 0.5606
Iteration:  2  Accuracy: 0.6813
Iteration:  3  Accuracy: 0.7551
Iteration:  4  Accuracy: 0.7943
Iteration:  5  Accuracy: 0.8169
Iteration:  6  Accuracy: 0.834
Iteration:  7  Accuracy: 0.8478
Iteration:  8  Accuracy: 0.8568
Iteration:  9  Accuracy: 0.8623
Iteration:  10  Accuracy: 0.8681
Iteration:  11  Accuracy: 0.8751
Iteration:  12  Accuracy: 0.8803
Iteration:  13  Accuracy: 0.8835
Iteration:  14  Accuracy: 0.8872
Iteration:  15  Accuracy: 0.8889
Iteration:  16  Accuracy: 0.8914
Iteration:  17  Accuracy: 0.8934
Iteration:  18  Accuracy: 0.8961
Iteration:  19  Accuracy: 0.898
Iteration:  20  Accuracy: 0.8997
Iteration:  21  Accuracy: 0.9007
Iteration:  22  Accuracy: 0.902
Iteration:  23  Accuracy: 0.9036
Iteration:  24  Accuracy: 0.905
Iteration:  25  Accuracy: 0.9063
Iteration:  26  Accuracy: 0.907
Iteration:  27  Accuracy: 0.9074
Iteration:  28  Accuracy: 0.9077
Iteration:  29  Accuracy: 0.9084
Iteration:  30  Accuracy: 0.9099
Iteration:  31  Accuracy: 0.9113
Iteration:  32  Accuracy: 0.9115
Iteration:  33  Accuracy: 0.9128
Iteration:  34  Accuracy: 0.913
Iteration:  35  Accuracy: 0.9134
Iteration:  36  Accuracy: 0.9153
Iteration:  37  Accuracy: 0.9143
Iteration:  38  Accuracy: 0.9155
Iteration:  39  Accuracy: 0.9161
Iteration:  40  Accuracy: 0.9162
Iteration:  41  Accuracy: 0.9168
Iteration:  42  Accuracy: 0.9166
Iteration:  43  Accuracy: 0.9192
Iteration:  44  Accuracy: 0.9192
Iteration:  45  Accuracy: 0.9204
Iteration:  46  Accuracy: 0.92
Iteration:  47  Accuracy: 0.9215
Iteration:  48  Accuracy: 0.9209
Iteration:  49  Accuracy: 0.9217
Iteration:  50  Accuracy: 0.9218
Iteration:  51  Accuracy: 0.9232
Iteration:  52  Accuracy: 0.9238
Iteration:  53  Accuracy: 0.9248
Iteration:  54  Accuracy: 0.9239
Iteration:  55  Accuracy: 0.9254
Iteration:  56  Accuracy: 0.9269
Iteration:  57  Accuracy: 0.9267
Iteration:  58  Accuracy: 0.9269
Iteration:  59  Accuracy: 0.9273
Iteration:  60  Accuracy: 0.9275
Iteration:  61  Accuracy: 0.9279
Iteration:  62  Accuracy: 0.9295
Iteration:  63  Accuracy: 0.9303
Iteration:  64  Accuracy: 0.9307
Iteration:  65  Accuracy: 0.9301
Iteration:  66  Accuracy: 0.9309
Iteration:  67  Accuracy: 0.9316
Iteration:  68  Accuracy: 0.931
Iteration:  69  Accuracy: 0.9313
Iteration:  70  Accuracy: 0.9316
Iteration:  71  Accuracy: 0.9333
Iteration:  72  Accuracy: 0.9332
Iteration:  73  Accuracy: 0.9331
Iteration:  74  Accuracy: 0.9337
Iteration:  75  Accuracy: 0.9349
Iteration:  76  Accuracy: 0.9342
Iteration:  77  Accuracy: 0.936
Iteration:  78  Accuracy: 0.9352
Iteration:  79  Accuracy: 0.9353
Iteration:  80  Accuracy: 0.9366
Iteration:  81  Accuracy: 0.9367
Iteration:  82  Accuracy: 0.9372
Iteration:  83  Accuracy: 0.9369
Iteration:  84  Accuracy: 0.9381
Iteration:  85  Accuracy: 0.9387
Iteration:  86  Accuracy: 0.9379
Iteration:  87  Accuracy: 0.9381
Iteration:  88  Accuracy: 0.9388
Iteration:  89  Accuracy: 0.9384
Iteration:  90  Accuracy: 0.939
Iteration:  91  Accuracy: 0.9399
Iteration:  92  Accuracy: 0.9395
Iteration:  93  Accuracy: 0.9404
Iteration:  94  Accuracy: 0.9402
Iteration:  95  Accuracy: 0.9405
Iteration:  96  Accuracy: 0.9409
Iteration:  97  Accuracy: 0.9409
Iteration:  98  Accuracy: 0.9418
Iteration:  99  Accuracy: 0.9419
Iteration:  100  Accuracy: 0.9426
Iteration:  101  Accuracy: 0.9422
Iteration:  102  Accuracy: 0.9422
Iteration:  103  Accuracy: 0.9424
Iteration:  104  Accuracy: 0.9429
Iteration:  105  Accuracy: 0.9428
Iteration:  106  Accuracy: 0.9433
Iteration:  107  Accuracy: 0.9436
Iteration:  108  Accuracy: 0.9439
Iteration:  109  Accuracy: 0.9442
Iteration:  110  Accuracy: 0.9443
Iteration:  111  Accuracy: 0.9451
Iteration:  112  Accuracy: 0.9446
Iteration:  113  Accuracy: 0.945
Iteration:  114  Accuracy: 0.9453
Iteration:  115  Accuracy: 0.9456
Iteration:  116  Accuracy: 0.945
Iteration:  117  Accuracy: 0.9454
Iteration:  118  Accuracy: 0.9459
Iteration:  119  Accuracy: 0.9463
Iteration:  120  Accuracy: 0.9458
Iteration:  121  Accuracy: 0.946
Iteration:  122  Accuracy: 0.9469
Iteration:  123  Accuracy: 0.9468
Iteration:  124  Accuracy: 0.9471
Iteration:  125  Accuracy: 0.947
Iteration:  126  Accuracy: 0.9477
Iteration:  127  Accuracy: 0.9478
Iteration:  128  Accuracy: 0.9482
Iteration:  129  Accuracy: 0.9483
Iteration:  130  Accuracy: 0.9485
Iteration:  131  Accuracy: 0.949
Iteration:  132  Accuracy: 0.9486
Iteration:  133  Accuracy: 0.9493
Iteration:  134  Accuracy: 0.9492
Iteration:  135  Accuracy: 0.9494
Iteration:  136  Accuracy: 0.9495
Iteration:  137  Accuracy: 0.9503
Iteration:  138  Accuracy: 0.95
Iteration:  139  Accuracy: 0.9498
Iteration:  140  Accuracy: 0.9506
Iteration:  141  Accuracy: 0.9508
Iteration:  142  Accuracy: 0.9512
Iteration:  143  Accuracy: 0.9508
Iteration:  144  Accuracy: 0.9512
Iteration:  145  Accuracy: 0.951
Iteration:  146  Accuracy: 0.9516
Iteration:  147  Accuracy: 0.9516
Iteration:  148  Accuracy: 0.9518
Iteration:  149  Accuracy: 0.9516
Iteration:  150  Accuracy: 0.9524
Iteration:  151  Accuracy: 0.9518
Iteration:  152  Accuracy: 0.9524
Iteration:  153  Accuracy: 0.9524
Iteration:  154  Accuracy: 0.9525
Iteration:  155  Accuracy: 0.9521
Iteration:  156  Accuracy: 0.9528
Iteration:  157  Accuracy: 0.9532
Iteration:  158  Accuracy: 0.9529
Iteration:  159  Accuracy: 0.9533
Iteration:  160  Accuracy: 0.9534
Iteration:  161  Accuracy: 0.9531
Iteration:  162  Accuracy: 0.9534
Iteration:  163  Accuracy: 0.954
Iteration:  164  Accuracy: 0.9542
Iteration:  165  Accuracy: 0.9537
Iteration:  166  Accuracy: 0.9541
Iteration:  167  Accuracy: 0.9543
Iteration:  168  Accuracy: 0.9544
Iteration:  169  Accuracy: 0.9542
Iteration:  170  Accuracy: 0.9543
Iteration:  171  Accuracy: 0.955
Iteration:  172  Accuracy: 0.955
Iteration:  173  Accuracy: 0.9559
Iteration:  174  Accuracy: 0.9553
Iteration:  175  Accuracy: 0.9556
Iteration:  176  Accuracy: 0.9561
Iteration:  177  Accuracy: 0.956
Iteration:  178  Accuracy: 0.9561
Iteration:  179  Accuracy: 0.9562
Iteration:  180  Accuracy: 0.9561
Iteration:  181  Accuracy: 0.9563
Iteration:  182  Accuracy: 0.9559
Iteration:  183  Accuracy: 0.957
Iteration:  184  Accuracy: 0.9565
Iteration:  185  Accuracy: 0.9567
Iteration:  186  Accuracy: 0.957
Iteration:  187  Accuracy: 0.9565
Iteration:  188  Accuracy: 0.9574
Iteration:  189  Accuracy: 0.9571
Iteration:  190  Accuracy: 0.9572
Iteration:  191  Accuracy: 0.9568
Iteration:  192  Accuracy: 0.9579
Iteration:  193  Accuracy: 0.9576
Iteration:  194  Accuracy: 0.9577
Iteration:  195  Accuracy: 0.9581
Iteration:  196  Accuracy: 0.9579
Iteration:  197  Accuracy: 0.958
Iteration:  198  Accuracy: 0.958
Iteration:  199  Accuracy: 0.9577
Iteration:  200  Accuracy: 0.9583
Iteration:  201  Accuracy: 0.9583
Iteration:  202  Accuracy: 0.9581
Iteration:  203  Accuracy: 0.9582
Iteration:  204  Accuracy: 0.9586
Iteration:  205  Accuracy: 0.9583
Iteration:  206  Accuracy: 0.9588
Iteration:  207  Accuracy: 0.9587
Iteration:  208  Accuracy: 0.9588
Iteration:  209  Accuracy: 0.9592
Iteration:  210  Accuracy: 0.9593
Iteration:  211  Accuracy: 0.9595
Iteration:  212  Accuracy: 0.9592
Iteration:  213  Accuracy: 0.9594
Iteration:  214  Accuracy: 0.9596
Iteration:  215  Accuracy: 0.9596
Iteration:  216  Accuracy: 0.96
Iteration:  217  Accuracy: 0.9602
Iteration:  218  Accuracy: 0.9599
Iteration:  219  Accuracy: 0.96
Iteration:  220  Accuracy: 0.9603
Iteration:  221  Accuracy: 0.9601
Iteration:  222  Accuracy: 0.9609
Iteration:  223  Accuracy: 0.9602
Iteration:  224  Accuracy: 0.9607
Iteration:  225  Accuracy: 0.9603
Iteration:  226  Accuracy: 0.961
Iteration:  227  Accuracy: 0.9614
Iteration:  228  Accuracy: 0.9616
Iteration:  229  Accuracy: 0.961
Iteration:  230  Accuracy: 0.9609
Iteration:  231  Accuracy: 0.9611
Iteration:  232  Accuracy: 0.9618
Iteration:  233  Accuracy: 0.961
Iteration:  234  Accuracy: 0.9609
Iteration:  235  Accuracy: 0.962
Iteration:  236  Accuracy: 0.9619
Iteration:  237  Accuracy: 0.9612
Iteration:  238  Accuracy: 0.9617
Iteration:  239  Accuracy: 0.9615
Iteration:  240  Accuracy: 0.9617
Iteration:  241  Accuracy: 0.9613
Iteration:  242  Accuracy: 0.962
Iteration:  243  Accuracy: 0.9616
Iteration:  244  Accuracy: 0.9619
Iteration:  245  Accuracy: 0.9617
Iteration:  246  Accuracy: 0.9616
Iteration:  247  Accuracy: 0.9622
Iteration:  248  Accuracy: 0.9617
Iteration:  249  Accuracy: 0.9616
Iteration:  250  Accuracy: 0.962
Iteration:  251  Accuracy: 0.9623
Iteration:  252  Accuracy: 0.9624
Iteration:  253  Accuracy: 0.9618
Iteration:  254  Accuracy: 0.9621
Iteration:  255  Accuracy: 0.9625
Iteration:  256  Accuracy: 0.9626
Iteration:  257  Accuracy: 0.9625
Iteration:  258  Accuracy: 0.9623
Iteration:  259  Accuracy: 0.9627
Iteration:  260  Accuracy: 0.9624
Iteration:  261  Accuracy: 0.9631
Iteration:  262  Accuracy: 0.963
Iteration:  263  Accuracy: 0.9626
Iteration:  264  Accuracy: 0.9629
Iteration:  265  Accuracy: 0.9633
Iteration:  266  Accuracy: 0.9633
Iteration:  267  Accuracy: 0.9628
Iteration:  268  Accuracy: 0.9632
Iteration:  269  Accuracy: 0.9631
Iteration:  270  Accuracy: 0.9635
Iteration:  271  Accuracy: 0.9634
Iteration:  272  Accuracy: 0.9633
Iteration:  273  Accuracy: 0.9637
Iteration:  274  Accuracy: 0.9633
Iteration:  275  Accuracy: 0.9636
Iteration:  276  Accuracy: 0.9633
Iteration:  277  Accuracy: 0.9637
Iteration:  278  Accuracy: 0.9637
Iteration:  279  Accuracy: 0.9638
Iteration:  280  Accuracy: 0.9635
Iteration:  281  Accuracy: 0.9642
Iteration:  282  Accuracy: 0.9637
Iteration:  283  Accuracy: 0.9635
Iteration:  284  Accuracy: 0.9643
Iteration:  285  Accuracy: 0.964
Iteration:  286  Accuracy: 0.9642
Iteration:  287  Accuracy: 0.9639
Iteration:  288  Accuracy: 0.9641
Iteration:  289  Accuracy: 0.9646
Iteration:  290  Accuracy: 0.9643
Iteration:  291  Accuracy: 0.9641
Iteration:  292  Accuracy: 0.965
Iteration:  293  Accuracy: 0.9649
Iteration:  294  Accuracy: 0.965
Iteration:  295  Accuracy: 0.9648
Iteration:  296  Accuracy: 0.965
Iteration:  297  Accuracy: 0.965
Iteration:  298  Accuracy: 0.9652
Iteration:  299  Accuracy: 0.9652
Iteration:  300  Accuracy: 0.965
Iteration:  301  Accuracy: 0.9652
Iteration:  302  Accuracy: 0.9648
Iteration:  303  Accuracy: 0.9652
Iteration:  304  Accuracy: 0.9653
Iteration:  305  Accuracy: 0.9647
Iteration:  306  Accuracy: 0.9651
Iteration:  307  Accuracy: 0.9651
Iteration:  308  Accuracy: 0.9649
Iteration:  309  Accuracy: 0.9652
Iteration:  310  Accuracy: 0.9653
Iteration:  311  Accuracy: 0.9651
Iteration:  312  Accuracy: 0.9648
Iteration:  313  Accuracy: 0.9654
Iteration:  314  Accuracy: 0.9653
Iteration:  315  Accuracy: 0.9653
Iteration:  316  Accuracy: 0.9656
Iteration:  317  Accuracy: 0.9651
Iteration:  318  Accuracy: 0.9656
Iteration:  319  Accuracy: 0.9655
Iteration:  320  Accuracy: 0.9657
Iteration:  321  Accuracy: 0.9657
Iteration:  322  Accuracy: 0.9659
Iteration:  323  Accuracy: 0.9656
Iteration:  324  Accuracy: 0.9654
Iteration:  325  Accuracy: 0.9657
Iteration:  326  Accuracy: 0.966
Iteration:  327  Accuracy: 0.9656
Iteration:  328  Accuracy: 0.9661
Iteration:  329  Accuracy: 0.9657
Iteration:  330  Accuracy: 0.9659
Iteration:  331  Accuracy: 0.966
Iteration:  332  Accuracy: 0.9661
Iteration:  333  Accuracy: 0.9664
Iteration:  334  Accuracy: 0.9662
Iteration:  335  Accuracy: 0.9662
Iteration:  336  Accuracy: 0.9663
Iteration:  337  Accuracy: 0.9664
Iteration:  338  Accuracy: 0.9659
Iteration:  339  Accuracy: 0.9667
Iteration:  340  Accuracy: 0.9666
Iteration:  341  Accuracy: 0.9664
Iteration:  342  Accuracy: 0.9665
Iteration:  343  Accuracy: 0.9665
Iteration:  344  Accuracy: 0.9665
Iteration:  345  Accuracy: 0.9664
Iteration:  346  Accuracy: 0.9667
Iteration:  347  Accuracy: 0.9664
Iteration:  348  Accuracy: 0.9672
Iteration:  349  Accuracy: 0.9667
Iteration:  350  Accuracy: 0.967
Iteration:  351  Accuracy: 0.9673
Iteration:  352  Accuracy: 0.9671
Iteration:  353  Accuracy: 0.967
Iteration:  354  Accuracy: 0.9668
Iteration:  355  Accuracy: 0.9672
Iteration:  356  Accuracy: 0.9675
Iteration:  357  Accuracy: 0.967
Iteration:  358  Accuracy: 0.9678
Iteration:  359  Accuracy: 0.9674
Iteration:  360  Accuracy: 0.9671
Iteration:  361  Accuracy: 0.9671
Iteration:  362  Accuracy: 0.9669
Iteration:  363  Accuracy: 0.9674
Iteration:  364  Accuracy: 0.9675
Iteration:  365  Accuracy: 0.9675
Iteration:  366  Accuracy: 0.9675
Iteration:  367  Accuracy: 0.9677
Iteration:  368  Accuracy: 0.9675
Iteration:  369  Accuracy: 0.9677
Iteration:  370  Accuracy: 0.9679
Iteration:  371  Accuracy: 0.9682
Iteration:  372  Accuracy: 0.9675
Iteration:  373  Accuracy: 0.9678
Iteration:  374  Accuracy: 0.9678
Iteration:  375  Accuracy: 0.9677
Iteration:  376  Accuracy: 0.9681
Iteration:  377  Accuracy: 0.9681
Iteration:  378  Accuracy: 0.9681
Iteration:  379  Accuracy: 0.9677
Iteration:  380  Accuracy: 0.9681
Iteration:  381  Accuracy: 0.9682
Iteration:  382  Accuracy: 0.9681
Iteration:  383  Accuracy: 0.968
Iteration:  384  Accuracy: 0.9684
Iteration:  385  Accuracy: 0.9682
Iteration:  386  Accuracy: 0.968
Iteration:  387  Accuracy: 0.9684
Iteration:  388  Accuracy: 0.9681
Iteration:  389  Accuracy: 0.9686
Iteration:  390  Accuracy: 0.9685
Iteration:  391  Accuracy: 0.9686
Iteration:  392  Accuracy: 0.9685
Iteration:  393  Accuracy: 0.9685
Iteration:  394  Accuracy: 0.9686
Iteration:  395  Accuracy: 0.9686
Iteration:  396  Accuracy: 0.9687
Iteration:  397  Accuracy: 0.9687
Iteration:  398  Accuracy: 0.9685
Iteration:  399  Accuracy: 0.9684
Iteration:  400  Accuracy: 0.9687
Iteration:  401  Accuracy: 0.9689
Iteration:  402  Accuracy: 0.9688
Iteration:  403  Accuracy: 0.9689
Iteration:  404  Accuracy: 0.9691
Iteration:  405  Accuracy: 0.9688
Iteration:  406  Accuracy: 0.969
Iteration:  407  Accuracy: 0.969
Iteration:  408  Accuracy: 0.9693
Iteration:  409  Accuracy: 0.9692
Iteration:  410  Accuracy: 0.969
Iteration:  411  Accuracy: 0.969
Iteration:  412  Accuracy: 0.9693
Iteration:  413  Accuracy: 0.9693
Iteration:  414  Accuracy: 0.9693
Iteration:  415  Accuracy: 0.9693
Iteration:  416  Accuracy: 0.9698
Iteration:  417  Accuracy: 0.9693
Iteration:  418  Accuracy: 0.9694
Iteration:  419  Accuracy: 0.9694
Iteration:  420  Accuracy: 0.9697
Iteration:  421  Accuracy: 0.9697
Iteration:  422  Accuracy: 0.9698
Iteration:  423  Accuracy: 0.9695
Iteration:  424  Accuracy: 0.9697
Iteration:  425  Accuracy: 0.9694
Iteration:  426  Accuracy: 0.97
Iteration:  427  Accuracy: 0.9699
Iteration:  428  Accuracy: 0.9696
Iteration:  429  Accuracy: 0.9697
Iteration:  430  Accuracy: 0.9699
Iteration:  431  Accuracy: 0.97
Iteration:  432  Accuracy: 0.97
Iteration:  433  Accuracy: 0.9704
Iteration:  434  Accuracy: 0.9698
Iteration:  435  Accuracy: 0.9699
Iteration:  436  Accuracy: 0.97
Iteration:  437  Accuracy: 0.97
Iteration:  438  Accuracy: 0.97
Iteration:  439  Accuracy: 0.9704
Iteration:  440  Accuracy: 0.97
Iteration:  441  Accuracy: 0.9707
Iteration:  442  Accuracy: 0.97
Iteration:  443  Accuracy: 0.9705
Iteration:  444  Accuracy: 0.9705
Iteration:  445  Accuracy: 0.9705
Iteration:  446  Accuracy: 0.9701
Iteration:  447  Accuracy: 0.9703
Iteration:  448  Accuracy: 0.9705
Iteration:  449  Accuracy: 0.9703
Iteration:  450  Accuracy: 0.9705
Iteration:  451  Accuracy: 0.9709
Iteration:  452  Accuracy: 0.9708
Iteration:  453  Accuracy: 0.9707
Iteration:  454  Accuracy: 0.9708
Iteration:  455  Accuracy: 0.9708
Iteration:  456  Accuracy: 0.9706
Iteration:  457  Accuracy: 0.9708
Iteration:  458  Accuracy: 0.9708
Iteration:  459  Accuracy: 0.9707
Iteration:  460  Accuracy: 0.9708
Iteration:  461  Accuracy: 0.9708
Iteration:  462  Accuracy: 0.9709
Iteration:  463  Accuracy: 0.9709
Iteration:  464  Accuracy: 0.9708
Iteration:  465  Accuracy: 0.9709
Iteration:  466  Accuracy: 0.9709
Iteration:  467  Accuracy: 0.9708
Iteration:  468  Accuracy: 0.9708
Iteration:  469  Accuracy: 0.971
Iteration:  470  Accuracy: 0.9708
Iteration:  471  Accuracy: 0.9709
Iteration:  472  Accuracy: 0.9707
Iteration:  473  Accuracy: 0.9708
Iteration:  474  Accuracy: 0.9707
Iteration:  475  Accuracy: 0.971
Iteration:  476  Accuracy: 0.9708
Iteration:  477  Accuracy: 0.9708
Iteration:  478  Accuracy: 0.9713
Iteration:  479  Accuracy: 0.971
Iteration:  480  Accuracy: 0.9715
Iteration:  481  Accuracy: 0.971
Iteration:  482  Accuracy: 0.9713
Iteration:  483  Accuracy: 0.971
Iteration:  484  Accuracy: 0.9713
Iteration:  485  Accuracy: 0.9712
Iteration:  486  Accuracy: 0.9712
Iteration:  487  Accuracy: 0.9712
Iteration:  488  Accuracy: 0.9711
Iteration:  489  Accuracy: 0.9711
Iteration:  490  Accuracy: 0.9714
Iteration:  491  Accuracy: 0.9709
Iteration:  492  Accuracy: 0.9712
Iteration:  493  Accuracy: 0.9713
Iteration:  494  Accuracy: 0.9713
Iteration:  495  Accuracy: 0.9711
Iteration:  496  Accuracy: 0.9713
Iteration:  497  Accuracy: 0.9711
Iteration:  498  Accuracy: 0.9714
Iteration:  499  Accuracy: 0.9712

The structure of this network is 784 neurons in the input layer, 100 neurons in the first hidden layer, 50 neurons in the second hidden layer and 10 neuron in the output layer. And finally put the outputs to softmax.

In [14]:
plt.plot(iteration_axis, accuracy_axis)
plt.show()

^ Contents

Resolving Errors

If train error is too high:

  • Error in code
  • Extend the training time
  • Add more layers or more features
  • Change the $\alpha$ or the initial values of the weights
  • Change the activation function (eg: SGD, RMSProp, Adagrad, Adam)

If test error is too high (but train-error is ok):

  • Add more data to dataset
  • Use early stopping
  • Use regularization
    • Dropout
    • Ridge/LASSO