Professional Version Basis of AI Backprop Hypertext Documentation

Copyright (c) 1990-97 by Donald R. Tveter

Training

This menu window deals with initializing the weights and training the network.

Run

The "Run" button runs a number of iterations (one iteration is one pass through the whole training set) and prints the status of the training and test set patterns at regular interval. One entry box lets you enter the total number of iterations however this value is not set until the "Run" button is clicked. The same applies to the printing rate. The typed form of the command looks like:

r 100 10

Seed Values

The seed is a randomly chosen number used to generate random initial values for the weights. This pro version allows you to have multiple seeds, a useful capability at times especially when benchmarking. Seed values can also be advanced outside of benchmarking, for details see the Use the entry box to type in a new seed value which should be less than 32767. The typed commands look like:

s 99         * seed is 99
s 1 2 3 4 5  * seeds are 1 2 3 4 5
s            * seed is CPU time mod 32768

Incrementing Seeds

Whenever you clear and initialize the default action is to re-set the seed value so that the initial random weights will always be the same. Sometimes you may want different weights when you clear and initialize, if the incrementing seeds flag is on you will get the next set of random weights that come from the seed value instead of the same ones, or if there is more than one seed you will advance to the next seed value. The button here allows you to turn this option on and off. The typed command to turn on incrementing seeds is "s i+" and the typed command to turn it off is "s i-". Turning incrementing seeds off resets the seed to the first value in the seed list.

Initialize Weights to +/-:

This button actually initializes the weights based on the seed value. If you enter a new weight range in the entry box and type a carriage return it also initializes the weights. The typed command to initialize to +/- 0.5 is:

ci 0.5   * clear and initialize to +/-0.5
You can click the button to read weights from the file named in the entry box or type a file name into the end box and end with a carriage return.

Set Weights To 0

This button sets the weights to 0, ordinarly this is not a good idea since random initial weights will almost certainly work better. The typed command is:

c   * set the weights to 0 (c for clear)

Weight Decay

One way to improve generalization is to use weight decay. In this procedure each weight in the network is decreased by a small factor at the end of each training cycle. If the weight is w and the weight decay term is 0.0001 then the decrease is given by:

   w = w - 0.0001 * w
Reasonable values to try for weight decay are 0.001 or less. There is one report that the best time to start weight decay is when the network reaches a minimum on the test set but it is difficult to decide when the network has reached a minimum. There is no automatic way of turning on weight decay at a certain point although its one of those things I should add. The typed command is: "a wd 0.0005" to set the weight decay factor to 0.0005.

Giving the Network a "Kick"

Sometimes networks get themselves trapped in local minima with no way out. One way to get the network out of the trap is to drastically alter the weights. To do this there is a "kick" command. For instance the command:

k 8 4
will take all the weights in the network that are greater than or equal to 8 (this is the kick size, hardly a good term, sigh) and decrease them by a random number between 0 and 4 (this is the kick range). Weights less than or equal to -8 will be increased by a random number between 0 and 4. The poor terminology came from the initial use of the kick command, it was designed to initialize a network before the clear and initialize command was added. There is no guarantee that this procedure is particularly good however I have found that it works at times.

The Paging Mechanism

When there is a large amount of output from a command it is convenient to get it page by page and to stop the output of the command if necessary. For example you may have 100 patterns to list and if you decide to list them you may be happy seeing only the first 24. To implement paging the program resets a line count parameter for every command and when the number of lines output from the command reaches the page size a "More?" box comes up that works like the traditional more command only here you have to click your choice rather than type. The choices are one more page, half a page, one line, stop paging or break the command loop. The default page size is 24 and it can be reset by entering a new value in the entry box in the GUI menu window.

Tolerance

The program will stop training when the output values are close enough to the target values. Close enough is defined by the tpu command as in:

tpu 0.1  * tolerance per unit is 0.1
where every output unit for every pattern must be within 0.1 of its target value. Another looser standard is to simply make the average error smaller than some value but this program does not implement that. In practice in classification problems you only care about the right answer getting the largest output value. In the A menu window there is an entry box where you can type in a new tolerance value.

Older versions of the program had the 't' command where the tolerance could only be set to a value in the range [0..1) but 'tpu' allows any positive value.