Professional Basis of AI Backprop Hypertext Documentation

Copyright (c) 1990-97 by Donald R. Tveter

Quickprop

Quickprop may be one of the fastest network training algorithms. It is loosely based on Newton's method. It works by using a different weight change value for each weight as the training proceeds and scaling back the size of weight changes when they are too large.

A parameter mu is used to limit the size of the weight change to less than or equal to mu times the previous weight change. Fahlman suggests mu = 1.75 is generally quite good so this is the initial value for mu but slightly larger or slightly smaller values are sometimes better. There is an entry box to change this value.

To get the process started quickprop makes the typical backprop weight change of - eta * slope. I have found that a good value for the quickprop eta value is around 1/n or 2/n where n is the number of patterns in the training set. Other people often use much larger values. In addition Fahlman uses this term at other times. I had to wonder if this was a good idea so in this code I've included a capability to add it in or not add it in. So far it seems to me that sometimes adding in this extra term helps and sometimes it doesn't. The default is to use the extra term. There is an entry box for eta and a button that turns on using or not using the extra slope term.

Another factor involved in quickprop comes about from the fact that the weights often grow very large very quickly. To minimize this problem there is a decay factor designed to keep the weights small. The weight decay is implemented by decreasing the value of the slope and it is different from the general weight decay that people use in backprop programs. Fahlman recently mentioned that now he does not use this unless the weights get very large. I've found that too large a decay factor can stall out the learning process so that if your network isn't learning fast enough or isn't learning at all one possible fix is to decrease the decay factor. There are entry boxes for the hidden and output layer decay values.

The parameters for quickprop are all set in the `qp' command like so:

qp d <value>   * set the weight decay factor for all layers to <value >
qp d h 0       * the default weight decay for hidden layer units
qp d o 0       * the default weight decay for output layer units
qp e 0.5       * the default value for eta
qp m 1.75      * the default value for mu
qp s+          * the default value is to always include the slope
or a whole series can go on one line:

qp d 0.1 e 0.5 m 1.75 s+