A Neural Network in Python, Part 2: activation functions, bias, SGD, etc.

This is Part 2 of A Neural Network in Python, which was a very simple neural network to learn the XOR function. This part builds on that example to demonstrate more activation functions, learning a simple math function, adding a bias, improvements to the initial random weights, stochastic gradient descent, mean square error loss function, and graphical visualisation. Phew! I won’t go into much, if any theory, but I will provide links to resources where you can find out more. What this program does is give you an example you can tinker with, to see what effect those various improvements have. In particular, its a lot faster!

Main variables:

  • Wh & Wz are the weight matrices, of dimension previous layer size * next layer size.
  • X is the input vector of evenly spaced values from which to compute Y.
  • Y is the corresponding target value Y = f(X)
  • Z is the vector of learned values for f(X), Z = activate(H.Wz).
#   A Very Simple Neural Network in Python 3 with Numpy, Part 2
#   Alan Richmond @ Python3.codes
import numpy as np
import matplotlib.pyplot as plt
import math, time
 
epochs = 3000
batchSize = 4
activation = 'sigmoid'
#activation = 'tanh'
#activation = 'ReLU'
 
def f(x): return np.sin(x)
 
minx, maxx = 0, 6.28
miny, maxy = -1, 1
numx = int(maxx * 5 + 1)
inputLayerSize, hiddenLayerSize, outputLayerSize = 2, 5, 1
 
funcs = {'sigmoid':  (lambda x: 1/(1 + np.exp(-x)),
                      lambda x: x * (1 - x),  (0,  1), .45),
            'tanh':  (lambda x: np.tanh(x),
                      lambda x: 1 - x**2,     (0, -1), 0.005),
            'ReLU':  (lambda x: x * (x > 0),
                      lambda x: x > 0,        (0, maxx), 0.0005),
        }
(activate, activatePrime, (mina, maxa), L) = funcs[activation]
 
X = x = np.linspace(minx, maxx, num=numx)
X.shape = (numx, 1)
Y = y = f(X)
Y = (Y - miny)*(maxa - mina)/(maxy - miny) + mina   # normalise into activation
 
# add a bias unit to the input layer
X = np.concatenate((np.atleast_2d(np.ones(X.shape[0])).T, X), axis=1)
 
# Random initial weights
r0 = math.sqrt(2.0/(inputLayerSize))
r1 = math.sqrt(2.0/(hiddenLayerSize))
Wh = np.random.uniform(size=(inputLayerSize, hiddenLayerSize),low=-r0,high=r0)
Wz = np.random.uniform(size=(hiddenLayerSize,outputLayerSize),low=-r1,high=r1)
 
def next_batch(X, Y):
    for i in np.arange(0, X.shape[0], batchSize):
        yield (X[i:i + batchSize], Y[i:i + batchSize])
 
start = time.time()
lossHistory = []
 
for i in range(epochs):         # Training:
    epochLoss = []
 
    for (Xb, Yb) in next_batch(X, Y):
 
        H = activate(np.dot(Xb, Wh))            # hidden layer results
        Z = activate(np.dot(H,  Wz))            # output layer results
        E = Yb - Z                              # how much we missed (error)
        epochLoss.append(np.sum(E**2))
 
        dZ = E * activatePrime(Z)               # delta Z
        dH = dZ.dot(Wz.T) * activatePrime(H)    # delta H
        Wz += H.T.dot(dZ) * L                   # update output layer weights
        Wh += Xb.T.dot(dH) * L                  # update hidden layer weights
 
    mse = np.average(epochLoss)
    lossHistory.append(mse)
 
X[:, 1] += maxx/(numx-1)/2
H = activate(np.dot(X, Wh))
Z = activate(np.dot(H, Wz))
Z = ((miny - maxy) * Z - maxa * miny + maxy * mina)/(mina - maxa)
Y = y
 
end = time.time()
 
plt.figure(figsize=(12, 9))
plt.subplot(311)
plt.plot(lossHistory)
plt.subplot(312)
plt.plot(H, '-*')
plt.subplot(313)
plt.plot(x, Y, 'ro')    # training data
plt.plot(X[:, 1], Z, 'bo')   # learned
plt.show()
 
print('[', inputLayerSize, hiddenLayerSize, outputLayerSize, ']',
      'Activation:', activation, 'Iterations:', epochs,
      'Learning rate:', L, 'Final loss:', mse, 'Time:', end - start)

Walkthrough

  1. We import some libraries: numpy, pyplot, math, time.
  2. Set the hyperparameters, including choice of activation function.
  3. Instead of the XOR in Part 1, we’re going to learn the sine function.
    1. Set min and max values and number of x points for graph plotting etc.
  4. Activation functions logistic, tanh, ReLU, and their parameters. mina and maxa are used for normalising. They are stored in a dictionary for convenience, using lambda expressions.
  5. Set linear increments for X and x. X is going to get an extra value for bias, so the copy x is for the graph plotting. Similarly for Y = y = f(X)
  6. Y is normalised to match the activation function’s output range.
  7. Adding a bias unit to X.
  8. The weights are randomly initialised to best practice recommendations.
  9. We’re not going to use all the input data on every epoch, rather, we’ll use it in batches and this will be much faster.
  10. Start the timer and prepare an empty loss history.
  11. Training:
    1. Grab a batch of training data and process it just as in Part 1, except for accumulating some loss data, and applying the learning factor L to the weight updates.
  12. We re-use the X vector to test the results, except we shift it along a bit so as to not test the same values that were used for training, otherwise the results could have simply been memorised. The notation X[:,1] selects the second column, avoiding the first column which has the bias in it.
  13. Apply the learned weights to the training data (same as the forward propagation) and de-normalise the results.
  14. Plot graphs and print some stats. The first graph shows how the error decreased over time. The second shows the values in the hidden layer, giving some insight into how the final output is calculated (as a linear sum of the hidden layer values). The final graph shows the target function (red dots) and the learned function (blue dots). You’ll notice that the learned function often starts out quite close to the target function, but may sometimes drift away, and sometimes even ‘give up’ trying to match the target!

Experiments

  • Try changing the number of epochs and the batch size.
  • Try selecting different activation functions by commenting out or uncommenting.
  • Try different functions. Don’t forget to change the min and max values on the following lines! Try extending the function domain further left or right.
  • Try changing the hidden layer size. The other 2 layers need to stay fixed.
  • Try varying the learning rate L – the last value in the funcs dictionary.
  • Try removing the bias unit (comment out the ‘concatenate’ instruction. Don’t forget to reduce the input layer size to 1.
  • Is there a better way to initialise the weights?
  • Keep an eye on the graphs and printed stats. Try to minimise the final error and the time taken

Bias Nodes

If a neural network does not have a bias node in a given layer, it will not be able to produce output in the next layer that differs from 0 (on the linear scale, or the value that corresponds to the transformation of 0 when passed through the activation function) when the feature values are 0.

Activation Functions

Activation functions are generally used to provide non-linearity. Without that, adding layers adds nothing that couldn’t be done with just one layer. So for example the XOR function we saw in the last part can’t be done. If you add a linear function onto another, the result is still linear.

Stochastic Gradient Descent

Stochastic Gradient Descent (SGD), a simple modification to the standard gradient descent algorithm that computes the gradient and updates our weight matrix W on small batches of training data, rather than the entire training set itself. Computing the cost and gradient for the entire training set can be very slow. Also batch optimization methods don’t give an easy way to incorporate new data in an ‘online’ setting. Stochastic Gradient Descent (SGD) addresses both of these issues by following the negative gradient of the objective after seeing only a single or a few training examples.

Initial Weights

The initial weights need to be different from each other in order for the learning process to gain traction, but they should not be too different from zero because the gradient of descent will be very shallow (think of the sigmoid curve, far from the origin) and learning will be very slow. Research has found that a small normal distribution proportional to sqrt(2/fan-in) works well.

A Neural Network in Python, Part 1: sigmoid function, gradient descent & backpropagation

In this article, I’ll show you a toy example to learn the XOR logical function. My objective is to make it as easy as possible for you to to see how the basic ideas work, and to provide a basis from which you can experiment further. In real applications, you would not write these programs from scratch (except we do use numpy for the low-level number crunching), you would use libraries such as Keras, Tensorflow, SciKit-Learn, etc.

What do you need to know to understand the code here? Python 3, numpy, and some linear algebra (e.g. vectors and matrices). If you want to proceed deeper into the topic, some calculus, e.g. partial derivatives would be very useful, if not essential. If you aren’t already familiar with the basic principles of ANNs, please read the sister article over on AILinux.net: A Brief Introduction to Artificial Neural Networks. When you have read this post, you might like to visit A Neural Network in Python, Part 2: activation functions, bias, SGD, etc.

This less-than-20-lines program learns how the exclusive-or logic function works. This function is true only if both inputs are different. Here is the truth-table for xor:

aba xor b
000
011
101
110

Main variables:

  • Wh & Wz are the weight matrices, of dimension previous layer size * next layer size.
  • X is the input matrix, dimension 4 * 2 = all combinations of 2 truth values.
  • Y is the corresponding target value of XOR of the 4 pairs of values in X.
  • Z is the vector of learned values for XOR.
#   XOR.py-A very simple neural network to do exclusive or.
import numpy as np
 
epochs = 60000           # Number of iterations
inputLayerSize, hiddenLayerSize, outputLayerSize = 2, 3, 1
 
X = np.array([[0,0], [0,1], [1,0], [1,1]])
Y = np.array([ [0],   [1],   [1],   [0]])
 
def sigmoid (x): return 1/(1 + np.exp(-x))      # activation function
def sigmoid_(x): return x * (1 - x)             # derivative of sigmoid
                                                # weights on layer inputs
Wh = np.random.uniform(size=(inputLayerSize, hiddenLayerSize))
Wz = np.random.uniform(size=(hiddenLayerSize,outputLayerSize))
 
for i in range(epochs):
 
    H = sigmoid(np.dot(X, Wh))                  # hidden layer results
    Z = sigmoid(np.dot(H, Wz))                  # output layer results
    E = Y - Z                                   # how much we missed (error)
    dZ = E * sigmoid_(Z)                        # delta Z
    dH = dZ.dot(Wz.T) * sigmoid_(H)             # delta H
    Wz +=  H.T.dot(dZ)                          # update output layer weights
    Wh +=  X.T.dot(dH)                          # update hidden layer weights
 
print(Z)                # what have we learnt?

Walk-through

We use numpy, because we’ll be using matrices and vectors. There are no ‘neuron’ objects in the code, rather, the neural network is encoded in the weight matrices.

Our hyperparameters (fancy word in AI for parameters) are epochs (lots) and layer sizes. Since the input data comprises 2 operands for the XOR operation, the input layer devotes 1 neuron per operand. The result of the XOR operation is one truth value, so we have one output node. The hidden layer can have any number of nodes, 3 seems sufficient, but you should experiment with this.

The successive values of our training data add another dimension at each layer (or matrix) so the input matrix X is 4 * 2, representing all possible combinations of truth value pairs. The training data Y is 4 values corresponding to the result of XOR on those combinations.

An activation function corresponds to the biological phenomenon of a neuron ‘firing’, i.e. triggering a nerve signal when the neuron’s inputs combine in some appropriate way. It has to be chosen so as to cause reasonably proportionate outputs within a small range, for small changes of input. We’ll use the very popular sigmoid function, but note that there are others. We also need the sigmoid derivative for backpropagation.

Initialise the weights. Setting them all to the same value, e.g. zero, would be a poor choice because the weights are very likely to end up different from each other and we should help that along with this ‘symmetry-breaking’.

Now for the learning process:

We’ll make an initial guess using the random initial weights, propagate it through the hidden layer as the dot product of those weights and the input vector of truth-value pairs. Recall that a matrix – vector multiplication proceeds along each row, multiplying each element by corresponding elements down through the vector, and then summing them. This matrix goes into the sigmoid function to produce H. So H = sigmoid(X * Wh)

Same for the Z (output) layer, Z = sigmoid(H * Wz)

Now we compare the guess with the training date, i.e. Y – Z, giving E.

Finally, backpropagation. This comprises computing changes (deltas) which are multiplied (specifically, via the dot product) with the values at the hidden and input layers, to provide increments for the appropriate weights. If any neuron values are zero or very close, then they aren’t contributing much and might as well not be there. The sigmoid derivative (greatest at zero) used in the backprop will help to push values away from zero. The sigmoid activation function shapes the output at each layer.

  1. E is the final error Y – Z.
  2. dZ is a change factor dependent on this error magnified by the slope of Z; if its steep we need to change more, if close to zero, not much. The slope is sigmoid_(Z).
  3. dH is dZ backpropagated through the weights Wz, amplified by the slope of H.

Finally, Wz and Wn are adjusted applying those deltas to the inputs at their layers, because the larger they are, the more the weights need to be tweaked to absorb the effect of the next forward prop. The input values are the value of the gradient that is being descended; we’re moving the weights down towards the minimum value of the cost function.

If you want to understand the code at more than a hand-wavey level, study the backpropagation algorithm mathematical derivation such as this one or this one so you appreciate the delta rule, which is used to update the weights. Essentially, its the partial derivative chain rule doing the backprop grunt work. Even if you don’t fully grok the math derivation at least check out the 4 equations of backprop, e.g. as listed here (click on the Backpropagation button near the bottom) and here because those are where the code ultimately derives from.

The X matrix holds the training data, excluding the required output values. Visualise it being rotated 90 degrees clockwise and fed one pair at a time into the input layer (X00 and X01, etc). They go across each column of the weight matrix Wh for the hidden layer to produce the first row of the result H, then the next etc, until all rows of the input data have gone in. H is then fed into the activation function, ready for the corresponding step from the hidden to the output layer Z.

If you run this program, you should get something like:

[[ 0.01288433]

[ 0.99223799]

[ 0.99223787]

[ 0.00199393]]

You won’t get the exact same results, but the first and last numbers should be close to zero, while the 2 inner numbers should be close to 1. You might have preferred exact 0s and 1s, but our learning process is analogue rather than digital; you could always just insert a final test to convert ‘nearly 0’ to 0, and ‘nearly 1’ to 1!

Here’s an improved version, it has no (or linear) activation on the output layer and gets more accurate results faster.
#   XOR.py-A very simple neural network to do exclusive or.
#   sigmoid activation for hidden layer, no (or linear) activation for output
 
import numpy as np
 
epochs = 20000                                  # Number of iterations
inputLayerSize, hiddenLayerSize, outputLayerSize = 2, 3, 1
L = .1                                          # learning rate      
 
X = np.array([[0,0], [0,1], [1,0], [1,1]])
Y = np.array([ [0],   [1],   [1],   [0]])
 
def sigmoid (x): return 1/(1 + np.exp(-x))      # activation function
def sigmoid_(x): return x * (1 - x)             # derivative of sigmoid
                                                # weights on layer inputs
Wh = np.random.uniform(size=(inputLayerSize, hiddenLayerSize))
Wz = np.random.uniform(size=(hiddenLayerSize,outputLayerSize))
 
for i in range(epochs):
 
    H = sigmoid(np.dot(X, Wh))                  # hidden layer results
    Z = np.dot(H,Wz)                            # output layer, no activation
    E = Y - Z                                   # how much we missed (error)
    dZ = E * L                                  # delta Z
    Wz +=  H.T.dot(dZ)                          # update output layer weights
    dH = dZ.dot(Wz.T) * sigmoid_(H)             # delta H
    Wh +=  X.T.dot(dH)                          # update hidden layer weights
 
print(Z)                # what have we learnt?

Output should look something like this:

[[ 6.66133815e-15]

[ 1.00000000e+00]

[ 1.00000000e+00]

[ 8.88178420e-15]]

Part 2 will build on this example, introducing biases, graphical visualisation, learning a math function (sine), etc…

Batteries Included: A quick look at Python modules

This article introduces the idea of importing Python modules that manage many common and/or special tasks that you would otherwise have to write programming code for. IDLE is featured in this article because of some of its handy features, but you can follow the examples in other ways as well.

The material in this article directly supplements the official Python Tutorial, and pretty much assumes you have at least skimmed over it or gotten the same information elsewhere. This article is intended to help you “get busy coding” right away and show you a few essential tricks.

WHAT A MODULE IS:
The short answer is that a module is a text file containing one or more Python statements and/or definitions (of functions and classes). Most modules also contain comments.

A statement is pretty much one or more lines of Python code that give Python orders. (The classic example of a first statement for programming students to write is to tell Python to print “Hello world”, but statements can be quite a bit more sophisticated.)

When you import a module, statements not included within the definitions of functions and classes are executed. These statements may even involve importing other modules in order to work.


IMPORTING BASICS:
Importing a Python module is usually more simple than you might think. For instance, if you want to use the module named “time” in your program, you merely enter the line:

import time

If you wanted to import everything the time module has to export, but not import the time module itself, you could simply type:

from time import *

If you only wanted to import isleap() from the calendar module, all you would need to do is to type:

from calendar import isleap


See if you can follow what’s happening in the following code:

>>> import time
>>> timeHere = time.localtime()
>>> timeHere
(2024, 5, 24, 0, 45, 7, 4, 144, 1)
>>> readableTime = time.asctime(timeHere)
>>> readableTime
'Fri May 24 00:45:07 2024'
>>> readableTime = time.asctime(time.localtime())
>>> readableTime
'Fri May 24 00:46:33 2024'

See if you can tell in what way that differs from the following:

>>> from time import *
>>> timeHere = localtime()
>>> timeHere
(2024, 5, 24, 1, 0, 52, 4, 144, 1)
>>> readableTime = asctime(timeHere)
>>> readableTime
'Fri May 24 01:00:52 2024'
>>> readableTime = asctime(localtime())
>>> readableTime
'Fri May 24 01:01:52 2024'
>>>

One difference between the two approaches is that by importing * from the time module, you are importing each of the resources of the time module for individual use, so you don’t have to qualify them by tacking “time.” onto the front. Sometimes this is a good idea, but you want to make sure not to have two objects “by the same name in the same place”, in a manner of speaking. Spend some time experimenting with different modules, and see if you can find out for yourself what that means.


SEVERAL WAYS TO LEARN YOUR WAY AROUND MODULES:
Python Documentation is usually bundled with Python when it is installed on your computer. It may also be found online at http://python.org/doc/ for the current release of Python as well as archives going back several years. Here you can find the official Python Tutorial, Module Index, Library Reference, and more.

You may also find handy utilities like pydoc (which may be labeled “Module Docs” or similar) either included in your distribution or available through other sources. Using such a utility, you can search for modules on your local machine just like you would use a search engine such as google.com on the web.

Opening Modules in IDLE or your preferred text editor can expose many details not described elsewhere. Remember, modules are saved as text files, so you can read them without the use of special tools. Open them in the usual way with your favorite text editor/viewer, or select File>Open module from IDLE (the *Python Shell*). Just remember to open the files ending in extensions other than .pyc, because .pyc files are not plain text files, but files compiled for execution.

And fiddling with example code is definitely one of the best ways to learn what you can do with specific modules and the tools they offer. You should do as much code tinkering as you can in order to keep learning. This fiddling is the heart and soul of Useless Python!

Once you have imported a module into your program (including an interactive interpreter session, such as IDLE), you can perform dir() on it to list the names of the module’s contents. Just remember this will not work until after you have imported the module.

Many modules contain enough items that dir() will unattractively dump a listing that will be very little fun to read. In the following example, also note how dir() is correctly used to display the contents of the calendar module at the IDLE prompt:


>>> import calendar
>>> dir(calendar)
['EPOCH', 'FRIDAY', 'February', 'January', 'MONDAY', 'SATURDAY', 'SUNDAY', 'SliceType', 'THURSDAY', 'TUESDAY', 'WEDNESDAY', '__all__', '__builtins__', '__doc__', '__file__', '__name__', '_center', '_colwidth', '_firstweekday', '_indexer', '_localized_day', '_localized_month', '_spacing', 'calendar', 'day_abbr', 'day_name', 'error', 'firstweekday', 'format3c', 'format3cstring', 'isleap', 'leapdays', 'localtime', 'mdays', 'mktime', 'month', 'month_abbr', 'month_name', 'monthcalendar', 'monthrange', 'prcal', 'prmonth', 'prweek', 'setfirstweekday', 'strftime', 'timegm', 'week', 'weekday', 'weekheader']


Not too pretty, eh? Even worse, it really isn’t easy to read and comprehend in this form. Fortunately, it is extremely easy to print this information out in a format that is both appealing and easy to follow with a simple “for” loop:

>>> for thingy in dir(calendar):
.......print thingy


EPOCH
FRIDAY
February
January
MONDAY
SATURDAY

.
.

It goes on like this for a while.

.
.

setfirstweekday
strftime
timegm
week
weekday
weekheader
>>>

Among its many uses, dir() can be a great quick-look-up tool to refresh your memory when you find yourself on the hunt for just the right module or function.

But if you think that’s impressive, help() should really blow you away! You don’t have to use any fancy “for” loops with help(), because help() provides a wealth of descriptive information about the module and its contents. Try help() for plenty of information in a flash.


>>> import socket
>>> help(socket)

This is an example of help() usage, but it provides so much information that I encourage you to experiment with it in your own favorite setting to see the output.

Jython and Swing

A simple demonstration of using Java Swing from the Jython interactive interpreter.

The following is an example of using the Jython interactive interpreter from the Windows 2000 command prompt. On this page, we demonstrate several Jython basics, including:

  • use of the jython interpreter from the MS Windows 2000 command prompt
  • collection of user input with javax.swing.JOptionPane.showInputDialog()
  • conversion of strings to integers, and integers to strings, plus simple addition
  • display of output with javax.swing.JOptionPane.showMessageDialog()

Please note that your web browser will probably wrap some longer lines of code.

In our example, Jython is installed in the \bin\ folder of a fairly typical Java installation. My comments are
interjected into the session, indicated by the traditional #.

C:\j2sdk1.4.0\bin>jython
Jython 2.1 on java1.4.0 (JIT: null)
Type "copyright", "credits" or "license" for more information.
>>> import javax.swing as sshwing
>>> firstNum = sshwing.JOptionPane.showInputDialog("Enter an integer: ")
>>> secondNum = sshwing.JOptionPane.showInputDialog("Enter an integer: ")

Each of the two previous lines of code caused a separate swing dialog box asking the user for an integer.

The user’s input is stored as a string, not dynamically determined to be an integer. This is easily demonstrated as follows:

>>> firstNum
'1'
>>> secondNum
'2'
>>> firstNum + secondNum
'12'

In the above lines of code, you see that the user input ‘1’ and ‘2’ in the input dialog boxes, which are strings and subject to concatenation.

Observe what happens if you try a traditional Java approach to converting the string to an integer:

>>> num1 = Integer.parseInt(firstNum)
Traceback (innermost last):
  File "<console>", line 1, in ?
NameError: Integer

If you want to use Integer.parseInt(), first use one of the following two import statements:

  1. from java.lang import Integer(which imports Integer from java.lang), or
  2. from java.lang import * (which imports all classes in java.lang, in case you need several of them).

A Python built-in function, however, does the trick:

>>> num1 = int(firstNum)
>>> num1
1
>>> num2 = int(secondNum)
>>> num2
2
>>> num1 + num2
3
>>> sum12 = num1 + num2
>>> sum12
3

Another Python built-in handily converts the sum on-the-fly to a string:

>>> sum12 = str(num1 + num2)
>>> sum12
'3'

Output of the sum in a swing message box is relatively painless. But in Jython, you use “None” in place of the more Java-esque “Null” in the following examples:

>>> sshwing.JOptionPane.showMessageDialog(None, "The sum of your integers is " + sum12)

The same output may be tweaked to show the box with a label “Sum” and with a more generic look:

 >>> sshwing.JOptionPane.showMessageDialog(None, "The sum of your integers is " + sum12, "Sum", sshwing.JOptionPane.PLAIN_MESSAGE)

A little tweak adds a “.” to the end of the Sum dialog statement.

 >>> sshwing.JOptionPane.showMessageDialog(None,"The sum is " + sum12 + ".", "Sum", sshwing.JOptionPane.PLAIN_MESSAGE)

Easy as Py! And, of course, using the interactive prompt is only one option. Save a .py source file and run it without a separate compile step.

Joe Useless Writes A Program: An Everyday Person’s Guide to Software Development

Writing computer programs to distribute is a process, but it can be a simple one. This article follows our hero Joe Useless on his journey from wanting a simple problem solved to giving his friends the answer in a program they can use over and over. This article is a work in progress, and helpful suggestions are welcome.


Step 1: Have a problem to solve.
(You can also think of this step as “Think of something you want to do.”)

Joe Useless has taken a job working as a clerk at ACME, and enjoys his job for the most part. But clerks in his department have to do a lot of something that Joe hates. They have to compare obscure lines of text, and his supervisor insists that there is no room for error.

Fortunately for Joe, ACME takes the enlightened approach of allowing him to run Python on his PC. “Whatever gets the job done right,” insists his supervisor.


Step 2: Find a way to do it at all.

Joe has used Python before and feels certain he remembers a simple way to do comparisons, so after spending a few minutes with the Library Reference, he turns on IDLE and tries out the cmp() built-in function:

>>> cmp('Harrelson, Adrienne', 'Harrelson, Adrianne')
1
>>>

It looks like this might do the trick for some of their more common and tedious comparisons. Joe gets into the habit of keeping IDLE running in the background to double-check some of his work for a while just to make sure.


Step 3: Tweak it until you have code that consistently does it the way you like.

This little trick works well enough that Joe finds he has largely automated the most annoying part of his job, to the envy of his co-workers. But Joe isn’t a Computer Scientist, and he finds that the “1/0/-1” values returned by cmp() leave him feeling nervous that he might make accidentally mistake “1” for “-1” and mess up the job. He doesn’t want to have to look up the meaning of the values cmp() returns, so he decides to write a function to dress-up the output.

>>> def compare(firstArg, secondArg):

    if cmp(firstArg, secondArg) == -1:
        print str(firstArg) + " is less than " + str(secondArg)

    elif cmp(firstArg, secondArg) == 0:
        print str(firstArg) + " is equal to " + str(secondArg)

    else:
        print str(firstArg) + " is greater than " + str(secondArg)

This is much better. Joe’s function takes the output from cmp() and displays the result in plain English (by clerk standards, anyway), like so:

>>> compare('Harrelson, Adrienne', 'Harrelson, Adrianne')
Harrelson, Adrienne is greater than Harrelson, Adrianne
>>>

Joe decided to use terms like “greater than” and “less than” in the program’s output, because this seemed like a familiar way of thinking about it, especially since the same function can be used to compare the values of numbers.

Step 4: Package it to be used repeatedly.

Joe’s co-workers are beginning to take notice now, especially since the management is becoming less tolerant of mistakes. Joe decides to make the function into a module that other workers can run on their own workstations. (By this point, Joe is also becoming something of a Python zealot and wants to see more people using it.) He adds a few lines of code to the text file where he keeps his comparison function, turning it into a program the clerks can start from the command line.

def compare(firstArg, secondArg):

    if cmp(firstArg, secondArg) == -1:
        print str(firstArg) + " is less than " + str(secondArg)

    elif cmp(firstArg, secondArg) == 0:
        print str(firstArg) + " is equal to " + str(secondArg)

    else:
        print str(firstArg) + " is greater than " + str(secondArg)

if __name__ == '__main__':
    import sys
    compare(sys.argv[1], sys.argv[2])

Joe was surprised to discover how little effort he had to put into making his handy little function into a full-fledged program. In Programming Python by Mark Lutz, he found out that if a Python script is run as a program, a built-in variable called “__name__” is assigned a special string called “__main__”, and that you can check for this with a simple “if” statement like the one he added. (He even thinks he half-way understands what this means!).

Now if his file is started from the command line as a program instead of imported into an interactive interpreter such as IDLE, the clerk who starts the program can enter the two strings needing to be compared after the program name. (This is sometimes referred to as “calling a program with arguments”. The first of the two strings entered is the first argument, which the program sees as sys.argv[1], and the second string is the second argument, or sys.argv[2].)


The two command-line argument strings are passed to the compare function in the last line of the program. So compare() starts up with sys.argv[1] as firstArg and sys.argv[2] as secondArg. It then calls cmp() and passes firstArg and secondArg to it, then prints a statement to the screen about how the two strings compared.

Step 5: Make it bullet-proof.

Joe’s fellow clerks keep making minor mistakes when running the program, and the department uses a lot of temporary workers. Joe doesn’t want to spend too much time showing people how to use the program, so he decides to have the program itself remind the user of the right way to use it when the most common mistakes are made.

def compare(firstArg, secondArg):

    if cmp(firstArg, secondArg) == -1:
        print str(firstArg) + " is less than " + str(secondArg)

    elif cmp(firstArg, secondArg) == 0:
        print str(firstArg) + " is equal to " + str(secondArg)

    else:
        print str(firstArg) + " is greater than " + str(secondArg)

if __name__ == '__main__':
    import sys
    if len(sys.argv) != 3:
        print '''********************\n
Usage suggestion:\n
python comparison.py argument1 argument2\n
\n
For example:\n
C:\Python22\python comparison.py "Jolie, Angelina" "Useless, Joe"\n
********************'''
    else:
        compare(sys.argv[1], sys.argv[2])

Now, if a user gives the program too many or too few arguments, the program suggests an example of correct usage instead of producing an error message. For example, if the user enters only one name to compare:

C:\Python22>python comparison.py "Stallman, Richard"
********************
Usage suggestion:
python comparison.py argument1 argument2


For example:
C:\Python22\python comparison.py "Jolie, Angelina" "Useless, Joe"
********************

Joe hopes that this will be enough of a reminder for people that he will only be have to spend a few minutes with them at first to train them in how to use the program meaningfully.

Step 6: If you haven’t already done it, comment your code.

Useless Joe’s supervisor is happy, as is the supervisor’s manager. The clerks are happy. And Joe wants this trend to continue, so he takes a little time to sit down and add comments to the source code of the program he has written. Even though this is a simple program, it would not make a lot of sense to someone who has never seen source code. If someone needs to change the code later (maybe even himself), he wants the code to look good and make sense without anyone having to waste much time figuring it out. Joe decides to add some comments describing what the different parts of the program are instead of describing every little technical detail. He also comments the date the program was last edited, so if the program is updated at some point, it will be easy to tell which version is in use on any given clerk’s PC.

#!/usr/bin/python
#
# comparison.py by Joe Useless for ACME clerks

# This is the function that does all the work.
# Arguments sent to the program are compared with the
# Python built-in function cmp()
def compare(firstArg, secondArg):

    # This if/elif/else block reformats cmp() output
    # so clerks can make out the meaning with fewer mistakes
    if cmp(firstArg, secondArg) == -1:
        print str(firstArg) + " is less than " + str(secondArg)

    elif cmp(firstArg, secondArg) == 0:
        print str(firstArg) + " is equal to " + str(secondArg)

    else:
        print str(firstArg) + " is greater than " + str(secondArg)

# This is the main method, which makes it possible to run the
# comparison as a top-level program instead of just by calling
# the compare function from within another program, such as
# a python interactive interpreter.
if __name__ == '__main__':
    import sys

    # If a clerk calls the program with the wrong number of
    # arguments, print an example of correct usage.
    if len(sys.argv) != 3:
        print '''********************\n
Usage suggestion:\n
python comparison.py argument1 argument2\n
\n
For example:\n
C:\Python22\python comparison.py "Jolie, Angelina" "Useless, Joe"\n
********************'''

    # If the clerk calls the program correctly, perform the
    # compare function with the arguments provided by the clerk.
    else:
        compare(sys.argv[1], sys.argv[2])

Now Joe Useless feels satisfied with the program. He feels the comments are okay for a program this small, and by putting his program in the hands of other clerks, he is beginning to get an idea of how the program can be improved some more.