TwoLayerPerceptron Class Reference

#include <src/TwoLayerPerceptron.h>

List of all members.

Detailed Description

This is a straightforward Two-Layer Perceptron implementation.

It uses a custom transfer function I devised for this project (though of course I was not the first one to think about it, see below). f(x) = x / (1 + abs(x)) This function is sigmoid-like, -1 / +1 bounded, continuous to any order, and much faster to compute than tanh. It is also unfortunately slower to converge, see the note below.

Many neural network packages use tanh, to the point it has become the standard function to use. It has the very nice property of having tanh' = 1 - tanh^2. Thus, backpropagation can be made faster by reusing previous computations for the derivative, and very often the goal is to best train the network so this is a very desirable property. Other use the sigmoid function s(x) = 1 / (1+exp(-x)), where s' = s*(1-s), for the same reason.

However, in this project, the neural network is used primarily in forward mode, and backpropagation/explicit training is only used to provide a starting point for the genetic algorithm to work on.

Thus, the most important point is to have the fastest transfer function: we are going to apply it a lot, nhidden+noutput times per agent, and for many agents! On the other hand, it doesn't matter much if we couldn't re-use some computations in backpropagation: the learning phase is done only once, and is fast enough as it is with the function above anyway.

As a matter of fact, with the function above, we CAN reuse the results exactly the same way as for tanh: f'(x) = 1 / (1 + abs(x))^2 = (1 - abs(f)) ^2.

Note: It seems this function was first found by David Elliott, in "A Better Activation Function for Artificial Neural Networks". See http://www.dontveter.com/bpr/activate.html for a discussion of its merits and drawbacks relativly to other activation functions. In short, this function is much slower to converge to a data than tanh. But numerical experiments in this project have shown that it is close to what the sigmoid does for our purpose. And since once again all we want here is a good starting point for the genetic algorithm, we don't care much.

The training is a very simple on-line gradient descent. All you have to do is provide an input and a target data to the train function, plus a learning rate (control how steep is the gradient descent, 0.2 is already quite big). This will train the network to minimize the (half) sum of square error on the outputs. You can also use your own error function, in a three-step algorithm. It works as follow:

Step 1: provide some input data and get the output with computeOutput
Step 2: provide the same input/output, plus the gradient of the error made by the network. You can use the error function you like, so we don't need your desired target, just the previous input/output mapping because it is not stored in the network. Call backPropagate on the error gradient plus the previous input/output.
Step 3: call learn with a small learning rate. This is exactly what train does internally.

You can repeatedly do this on all the data you want to learn, in turn.

This MLP can also be used for batch learning, where you would provide a large input/output data set to train at once. To do this use backPropagate on the first mapping, then call batchBackPropagateAccumulate to accumulate the gradients for all other mappings. In the end, call batchBackPropagateTerminate with the total number of mappings. You can then use the learn algorithm as above.

For advanced learning techniques, you may consider looking at the CheapMatrix framework I created for the occasion. You'll find a scaled conjugate gradient algorithm, which converges faster than this simple gradient descent.

Author:: Nicolas Brodu

Public Member Functions

TwoLayerPerceptron (int ninput, int nhidden, int noutput)

Creates a network with the given dimensions.

TwoLayerPerceptron (const TwoLayerPerceptron &tlp)

Copy constructor: create this network from the other.

TwoLayerPerceptron & operator= (const TwoLayerPerceptron &tlp)

Copy all the values in the other network The networks must be the same size.

virtual void computeOutput (const double *input, double *output)

Gets the output corresponding to this input vector.

virtual void backPropagate (const double *input, const double *output, const double *gradout)

Backpropagates the error function gradients in the network This function pre-supposes the current internal values of the hidden units match the given input to output mapping.

virtual void batchBackPropagateAccumulate (const double *input, const double *output, const double *gradout)

Batch backpropagation accumulates all gradients from all mappings one by one.

virtual void batchBackPropagateTerminate (int nmappings)

Batch backpropagation accumulates all gradients from all mappings one by one.

virtual void learn (double learningRate=defaultLearningRate)

Very simple gradient descent, by the given amount.

virtual double train (const double *input, const double *target, double learningRate=defaultLearningRate)

Commodity function to train the network for the given target using the very common 'half sum of square of the output' error.

virtual void getHidden (double *hidden)

Read-write accessors to hidden values allow to store the results of computeOutput for later backpropagation.

virtual void setHidden (const double *hidden)

Read-write accessors to hidden values allow to store the results of computeOutput for later backpropagation.

int getNInput ()

Read-only accessors.

int getNOutput ()

int getNHidden ()

virtual void mutate (double ihwRate, double ihwJitter, double howRate, double howJitter, double hbRate, double hbJitter, double obRate, double obJitter)

Mutate this network weights and biases with the given parameters.

Static Public Attributes

static double(* transfer )(double) = &defaultTransfer

Set a transfer function.

static double(* transferDerivativeAsF )(double) = &defaultTransferDerivativeAsF

Set the derivative of the transfer function, expressed in terms of the original function.

static const double defaultLearningRate = 0.1

Default learning rate for the training by gradient descent. Default is 0.1.

Protected Attributes

int ninput

int nhidden

int noutput

int nih

int nho

double * ihw

double * how

double * ihwg

double * howg

double * hb

double * ob

double * hbg

double * obg

double * hv

Friends

std::ostream & operator<< (std::ostream &os, const TwoLayerPerceptron &tlp)

std::istream & operator>> (std::istream &is, TwoLayerPerceptron &tlp)

Constructor & Destructor Documentation

TwoLayerPerceptron::TwoLayerPerceptron ( int ninput,

int nhidden,

int noutput

)

Creates a network with the given dimensions.
The weights are initially set to random values using a normal random distribution, scaled by the layer dimensions. The Utility random methods are used, so you can set the seed for reproducible results

Member Function Documentation

void TwoLayerPerceptron::backPropagate ( const double * input,

const double * output,

const double * gradout

) [virtual]

Backpropagates the error function gradients in the network This function pre-supposes the current internal values of the hidden units match the given input to output mapping.
This is the case is computeOutput was called previously to this function. This is usually necessary anyway to compute the error gradient, so it isn't a big requirement.
Parameters:

input The training data requested input. Array of size ninput

output The training data desired output. Array of size noutput

gradout The gradient of the error function on each of the outputs. Array of size noutput

Returns:
the error

void TwoLayerPerceptron::batchBackPropagateAccumulate ( const double * input,

const double * output,

const double * gradout

) [virtual]

Batch backpropagation accumulates all gradients from all mappings one by one.
Use backPropagate on the first mapping, then call this function to accumulate the gradients for all other mappings. In the end, call batchBackPropagateTerminate with the total number of mappings. This function pre-supposes the current internal values of the hidden units match the given input to output mapping. This is the case is computeOutput was called previously to this function. This is usually necessary anyway to compute the error gradient, so it isn't a big requirement.
Parameters:

input The training data requested input. Array of size ninput

output The training data desired output. Array of size noutput

gradout The gradient of the error function on each of the outputs. Array of size noutput

Returns:
the error

void TwoLayerPerceptron::batchBackPropagateTerminate ( int nmappings ) [virtual]

Batch backpropagation accumulates all gradients from all mappings one by one.
Use backPropagate on the first mapping, then call batchBackPropagateAccumulate to accumulate the gradients for all other mappings. In the end, call this function with the total number of mappings. This function pre-supposes the current internal values of the hidden units match the given input to output mapping. This is the case is computeOutput was called previously to this function. This is usually necessary anyway to compute the error gradient, so it isn't a big requirement.
Parameters:

input The training data requested input. Array of size ninput

output The training data desired output. Array of size noutput

gradout The gradient of the error function on each of the outputs. Array of size noutput

Returns:
the error

void TwoLayerPerceptron::learn ( double learningRate = defaultLearningRate ) [virtual]

Very simple gradient descent, by the given amount.
Uses the current gradients to update the weights and bias.
Parameters:

learningRate The amount of descent to go along the gradient.

void TwoLayerPerceptron::mutate ( double ihwRate,

double ihwJitter,

double howRate,

double howJitter,

double hbRate,

double hbJitter,

double obRate,

double obJitter

) [virtual]

Mutate this network weights and biases with the given parameters.
This has nothing to do in this generic class, but I'm too lazy to split it cleanly.
For each "ihw" input-to-hidden weight, each "how" hidden-to-output weight each "hb" hidden bias, each "ob" output bias, the following parameters apply.
Parameters:

Rate The network weights are added a random number between -mutationRate*currentValue and +mutationRate*currentValue. 0 thus makes a perfect copy, which may be useful at the beginning to build a large population from only a few teachers.

Jitter The network weights are also added a random number between -jitter value & +jitter value, normalized by the network layer dimensions. This allows to give a chance to null weights, which would otherwise not be affected by the mutation rate.

double TwoLayerPerceptron::train ( const double * input,

const double * target,

double learningRate = defaultLearningRate

) [virtual]

Commodity function to train the network for the given target using the very common 'half sum of square of the output' error.
You may call this function repeatedly to train the network, checking the results till you're satisfied
Parameters:

learningRate The amount of descent to go along the gradient.

Returns:
the current error.

Member Data Documentation

double(* TwoLayerPerceptron::transfer)(double) = &defaultTransfer [static]

Set a transfer function.
Default is a custom transfer function: f(x) = x / (1 + abs(x)) This function is sigmoid-like, -1 / +1 bounded, continuous to any order, and much faster to compute than tanh. It is also unfortunately slower to converge, but about the same speed as the sigmoid function for this project according to preliminary experiments.

double(* TwoLayerPerceptron::transferDerivativeAsF)(double) = &defaultTransferDerivativeAsF [static]

Set the derivative of the transfer function, expressed in terms of the original function.
This is the differential equation relating f' and f. Such an equation does not always exist, but when it does, it provides a big boost for the backpropagation. In practice in neural networks, only such functions are thus used. Sorry, but this class does not handle the more generic case.
Default is a custom transfer function I devised especially for this project: f' = (1 - abs(f)) ^2 f'(x) = 1 / (1 + abs(x))^2

The documentation for this class was generated from the following files:

TwoLayerPerceptron.h
TwoLayerPerceptron.cpp

Generated on Mon Mar 28 11:28:11 2005 for Crogai by

1.4.1


Public Member Functions
	TwoLayerPerceptron (int ninput, int nhidden, int noutput)
	Creates a network with the given dimensions.
	TwoLayerPerceptron (const TwoLayerPerceptron &tlp)
	Copy constructor: create this network from the other.
TwoLayerPerceptron &	operator= (const TwoLayerPerceptron &tlp)
	Copy all the values in the other network The networks must be the same size.
virtual void	computeOutput (const double input, double output)
	Gets the output corresponding to this input vector.
virtual void	backPropagate (const double input, const double output, const double *gradout)
	Backpropagates the error function gradients in the network This function pre-supposes the current internal values of the hidden units match the given input to output mapping.
virtual void	batchBackPropagateAccumulate (const double input, const double output, const double *gradout)
	Batch backpropagation accumulates all gradients from all mappings one by one.
virtual void	batchBackPropagateTerminate (int nmappings)
	Batch backpropagation accumulates all gradients from all mappings one by one.
virtual void	learn (double learningRate=defaultLearningRate)
	Very simple gradient descent, by the given amount.
virtual double	train (const double input, const double target, double learningRate=defaultLearningRate)
	Commodity function to train the network for the given target using the very common 'half sum of square of the output' error.
virtual void	getHidden (double *hidden)
	Read-write accessors to hidden values allow to store the results of computeOutput for later backpropagation.
virtual void	setHidden (const double *hidden)
	Read-write accessors to hidden values allow to store the results of computeOutput for later backpropagation.
int	getNInput ()
	Read-only accessors.
int	getNOutput ()
int	getNHidden ()
virtual void	mutate (double ihwRate, double ihwJitter, double howRate, double howJitter, double hbRate, double hbJitter, double obRate, double obJitter)
	Mutate this network weights and biases with the given parameters.
Static Public Attributes
static double(*	transfer )(double) = &defaultTransfer
	Set a transfer function.
static double(*	transferDerivativeAsF )(double) = &defaultTransferDerivativeAsF
	Set the derivative of the transfer function, expressed in terms of the original function.
static const double	defaultLearningRate = 0.1
	Default learning rate for the training by gradient descent. Default is 0.1.
Protected Attributes
int	ninput
int	nhidden
int	noutput
int	nih
int	nho
double *	ihw
double *	how
double *	ihwg
double *	howg
double *	hb
double *	ob
double *	hbg
double *	obg
double *	hv
Friends
std::ostream &	operator<< (std::ostream &os, const TwoLayerPerceptron &tlp)
std::istream &	operator>> (std::istream &is, TwoLayerPerceptron &tlp)