Main Page | Namespace List | Class Hierarchy | Alphabetical List | Class List | Directories | Namespace Members | Class Members

TwoLayerPerceptron Class Reference

#include <src/TwoLayerPerceptron.h>

List of all members.


Detailed Description

This is a straightforward Two-Layer Perceptron implementation.

It uses a custom transfer function I devised for this project (though of course I was not the first one to think about it, see below). f(x) = x / (1 + abs(x)) This function is sigmoid-like, -1 / +1 bounded, continuous to any order, and much faster to compute than tanh. It is also unfortunately slower to converge, see the note below.

Many neural network packages use tanh, to the point it has become the standard function to use. It has the very nice property of having tanh' = 1 - tanh^2. Thus, backpropagation can be made faster by reusing previous computations for the derivative, and very often the goal is to best train the network so this is a very desirable property. Other use the sigmoid function s(x) = 1 / (1+exp(-x)), where s' = s*(1-s), for the same reason.

However, in this project, the neural network is used primarily in forward mode, and backpropagation/explicit training is only used to provide a starting point for the genetic algorithm to work on.

Thus, the most important point is to have the fastest transfer function: we are going to apply it a lot, nhidden+noutput times per agent, and for many agents! On the other hand, it doesn't matter much if we couldn't re-use some computations in backpropagation: the learning phase is done only once, and is fast enough as it is with the function above anyway.

As a matter of fact, with the function above, we CAN reuse the results exactly the same way as for tanh: f'(x) = 1 / (1 + abs(x))^2 = (1 - abs(f)) ^2.

Note: It seems this function was first found by David Elliott, in "A Better Activation Function for Artificial Neural Networks". See http://www.dontveter.com/bpr/activate.html for a discussion of its merits and drawbacks relativly to other activation functions. In short, this function is much slower to converge to a data than tanh. But numerical experiments in this project have shown that it is close to what the sigmoid does for our purpose. And since once again all we want here is a good starting point for the genetic algorithm, we don't care much.

The training is a very simple on-line gradient descent. All you have to do is provide an input and a target data to the train function, plus a learning rate (control how steep is the gradient descent, 0.2 is already quite big). This will train the network to minimize the (half) sum of square error on the outputs. You can also use your own error function, in a three-step algorithm. It works as follow:

You can repeatedly do this on all the data you want to learn, in turn.

This MLP can also be used for batch learning, where you would provide a large input/output data set to train at once. To do this use backPropagate on the first mapping, then call batchBackPropagateAccumulate to accumulate the gradients for all other mappings. In the end, call batchBackPropagateTerminate with the total number of mappings. You can then use the learn algorithm as above.

For advanced learning techniques, you may consider looking at the CheapMatrix framework I created for the occasion. You'll find a scaled conjugate gradient algorithm, which converges faster than this simple gradient descent.

Author:
Nicolas Brodu


Public Member Functions

 TwoLayerPerceptron (int ninput, int nhidden, int noutput)
 Creates a network with the given dimensions.
 TwoLayerPerceptron (const TwoLayerPerceptron &tlp)
 Copy constructor: create this network from the other.
TwoLayerPerceptronoperator= (const TwoLayerPerceptron &tlp)
 Copy all the values in the other network The networks must be the same size.
virtual void computeOutput (const double *input, double *output)
 Gets the output corresponding to this input vector.
virtual void backPropagate (const double *input, const double *output, const double *gradout)
 Backpropagates the error function gradients in the network This function pre-supposes the current internal values of the hidden units match the given input to output mapping.
virtual void batchBackPropagateAccumulate (const double *input, const double *output, const double *gradout)
 Batch backpropagation accumulates all gradients from all mappings one by one.
virtual void batchBackPropagateTerminate (int nmappings)
 Batch backpropagation accumulates all gradients from all mappings one by one.
virtual void learn (double learningRate=defaultLearningRate)
 Very simple gradient descent, by the given amount.
virtual double train (const double *input, const double *target, double learningRate=defaultLearningRate)
 Commodity function to train the network for the given target using the very common 'half sum of square of the output' error.
virtual void getHidden (double *hidden)
 Read-write accessors to hidden values allow to store the results of computeOutput for later backpropagation.
virtual void setHidden (const double *hidden)
 Read-write accessors to hidden values allow to store the results of computeOutput for later backpropagation.
int getNInput ()
 Read-only accessors.
int getNOutput ()
int getNHidden ()
virtual void mutate (double ihwRate, double ihwJitter, double howRate, double howJitter, double hbRate, double hbJitter, double obRate, double obJitter)
 Mutate this network weights and biases with the given parameters.

Static Public Attributes

static double(* transfer )(double) = &defaultTransfer
 Set a transfer function.
static double(* transferDerivativeAsF )(double) = &defaultTransferDerivativeAsF
 Set the derivative of the transfer function, expressed in terms of the original function.
static const double defaultLearningRate = 0.1
 Default learning rate for the training by gradient descent. Default is 0.1.

Protected Attributes

int ninput
int nhidden
int noutput
int nih
int nho
double * ihw
double * how
double * ihwg
double * howg
double * hb
double * ob
double * hbg
double * obg
double * hv

Friends

std::ostream & operator<< (std::ostream &os, const TwoLayerPerceptron &tlp)
std::istream & operator>> (std::istream &is, TwoLayerPerceptron &tlp)


Constructor & Destructor Documentation

TwoLayerPerceptron::TwoLayerPerceptron int  ninput,
int  nhidden,
int  noutput
 

Creates a network with the given dimensions.

The weights are initially set to random values using a normal random distribution, scaled by the layer dimensions. The Utility random methods are used, so you can set the seed for reproducible results


Member Function Documentation

void TwoLayerPerceptron::backPropagate const double *  input,
const double *  output,
const double *  gradout
[virtual]
 

Backpropagates the error function gradients in the network This function pre-supposes the current internal values of the hidden units match the given input to output mapping.

This is the case is computeOutput was called previously to this function. This is usually necessary anyway to compute the error gradient, so it isn't a big requirement.

Parameters:
input The training data requested input. Array of size ninput
output The training data desired output. Array of size noutput
gradout The gradient of the error function on each of the outputs. Array of size noutput
Returns:
the error

void TwoLayerPerceptron::batchBackPropagateAccumulate const double *  input,
const double *  output,
const double *  gradout
[virtual]
 

Batch backpropagation accumulates all gradients from all mappings one by one.

Use backPropagate on the first mapping, then call this function to accumulate the gradients for all other mappings. In the end, call batchBackPropagateTerminate with the total number of mappings. This function pre-supposes the current internal values of the hidden units match the given input to output mapping. This is the case is computeOutput was called previously to this function. This is usually necessary anyway to compute the error gradient, so it isn't a big requirement.

Parameters:
input The training data requested input. Array of size ninput
output The training data desired output. Array of size noutput
gradout The gradient of the error function on each of the outputs. Array of size noutput
Returns:
the error

void TwoLayerPerceptron::batchBackPropagateTerminate int  nmappings  )  [virtual]
 

Batch backpropagation accumulates all gradients from all mappings one by one.

Use backPropagate on the first mapping, then call batchBackPropagateAccumulate to accumulate the gradients for all other mappings. In the end, call this function with the total number of mappings. This function pre-supposes the current internal values of the hidden units match the given input to output mapping. This is the case is computeOutput was called previously to this function. This is usually necessary anyway to compute the error gradient, so it isn't a big requirement.

Parameters:
input The training data requested input. Array of size ninput
output The training data desired output. Array of size noutput
gradout The gradient of the error function on each of the outputs. Array of size noutput
Returns:
the error

void TwoLayerPerceptron::learn double  learningRate = defaultLearningRate  )  [virtual]
 

Very simple gradient descent, by the given amount.

Uses the current gradients to update the weights and bias.

Parameters:
learningRate The amount of descent to go along the gradient.

void TwoLayerPerceptron::mutate double  ihwRate,
double  ihwJitter,
double  howRate,
double  howJitter,
double  hbRate,
double  hbJitter,
double  obRate,
double  obJitter
[virtual]
 

Mutate this network weights and biases with the given parameters.

This has nothing to do in this generic class, but I'm too lazy to split it cleanly.

For each "ihw" input-to-hidden weight, each "how" hidden-to-output weight each "hb" hidden bias, each "ob" output bias, the following parameters apply.

Parameters:
Rate The network weights are added a random number between -mutationRate*currentValue and +mutationRate*currentValue. 0 thus makes a perfect copy, which may be useful at the beginning to build a large population from only a few teachers.
Jitter The network weights are also added a random number between -jitter value & +jitter value, normalized by the network layer dimensions. This allows to give a chance to null weights, which would otherwise not be affected by the mutation rate.

double TwoLayerPerceptron::train const double *  input,
const double *  target,
double  learningRate = defaultLearningRate
[virtual]
 

Commodity function to train the network for the given target using the very common 'half sum of square of the output' error.

You may call this function repeatedly to train the network, checking the results till you're satisfied

Parameters:
learningRate The amount of descent to go along the gradient.
Returns:
the current error.


Member Data Documentation

double(* TwoLayerPerceptron::transfer)(double) = &defaultTransfer [static]
 

Set a transfer function.

Default is a custom transfer function: f(x) = x / (1 + abs(x)) This function is sigmoid-like, -1 / +1 bounded, continuous to any order, and much faster to compute than tanh. It is also unfortunately slower to converge, but about the same speed as the sigmoid function for this project according to preliminary experiments.

double(* TwoLayerPerceptron::transferDerivativeAsF)(double) = &defaultTransferDerivativeAsF [static]
 

Set the derivative of the transfer function, expressed in terms of the original function.

This is the differential equation relating f' and f. Such an equation does not always exist, but when it does, it provides a big boost for the backpropagation. In practice in neural networks, only such functions are thus used. Sorry, but this class does not handle the more generic case.

Default is a custom transfer function I devised especially for this project: f' = (1 - abs(f)) ^2 f'(x) = 1 / (1 + abs(x))^2


The documentation for this class was generated from the following files:
Generated on Mon Mar 28 11:28:11 2005 for Crogai by  doxygen 1.4.1