folklore.org
The Newton, The First PDA:    4 of 11 
Newton Jabberwocky
Author: Paul Potts
Date: December 1993
Characters: Paul Potts
Topics: Handwriting Recognition, The Original MessagePad, The Print Recognizer
Summary: What recognizing "Jabberwocky" teaches us about the Newton's handwriting recognition algorithms. From a Usenet posting.

Lewis Carroll's famous poem, Jabberwocky, begins as follows:



`Twas brillig, and the slithy toves
Did gyre and gimble in the wabe:
All mimsy were the borogoves,
And the mome raths outgrabe.

This is fun nonsense, but interestingly, nonsense can teach us a bit about how the very first MessagePad's handwriting recognition software works, by bringing about some "worst-case scenario" behavior. Handwriting recognition was one of the features of the original MessagePad that was perhaps too strongly hyped with respect to what it delivered, while the improved recognizers in later models never received the public credit they deserved.

There are two main settings that affect the way the newton des recognition: a checkbox that says "words in word lists," and one that says "words not in word lists."

With the first one checked, the Newton does dictionary lookup *only*. That is, it uses a neural net to compare to its dictionary and sends back the closest match. The recognizer actually returns the top five or ten so if you're writing your own program, you could show the best guesses.

The results you get with "words in word lists" on depend almost entirely, therefore, on what you have put in the dictionary.

So, with "words in word lists" on I got:

Thus ailing and the sticky tones
did gym owed visible in tire make
all injury mere the bookstores
and the movie baths originals.

Note that the results are all words. Very wrong words, but words nonetheless! When the Newton makes wild guesses based on dictionary recognition, if you write the same word several times it will usually come up with different results. The reason is this. Suppose the word is "highly." This is in the dictionary. The recognizer will return "highly" (93%), "higher" (61%) and "lightly" (58%). There is a wide jump between the top word and the next most likely word. If you write a word that isn't in the dictionary (say "mome"), you'll get something like this: "Moines" (61%), "Rome" (60.9%), and "mom's" (59%). This means that if you write it over again, very subtle alterations can cause the recognizer to re-order its guesses.

So, I wrote the first verse again, and here is what I got:

Inks drilled Kirk the stifling trims
did gym and gamble in the wave
all mine were the drugstores
cried the Marie Ruth's original.

Here's the result of my girlfriend writing my name (Paul Potts) eight times (my name is not in the dictionary):

punt Potis
paid ports
Paul pour
Basil Ports
Purse doits
bank Ports
Panels ports
Pure Putz

(I'm particularly fond of the last one...)

Anyway, to show the difference in settings, I will write Jabberwocky again with "words not in word lists" *also* turned on. This means that if it can't make a good match with the dictionary, it will try character recognition. Here's what I get (my Newton is not particularly well trained; I just recently reset it).

Tcrac brillig and the slithcj loves
did gyrc and gimbli in tlu irabo
all mimsy were the boroojoves
and the jrrco me Ruth's ofgrrkbe

As you can see it got most of the normal words, but on the words it didn't know, it tried to do letter-by-letter recognition, which is more difficult (especially with my scrawl). So for example it got brillig and mimsy, "gimli" was off by a letter, but on some of them it did really badly, like "the" becoming "tlu" and "mome" is the worst; it was actually split into two words ("jrrco me").

The recognition of individual characters is highly dependent on the recognizer settings; there are a lot of things to tweak.

If I try it with *just* character recognition, here's what I get. This is roughly the quality of the recognition you get with the Zoomer or EO, which does *only* character recognition.

Twas brillisj amb fth slithy fovui
Did gijre amd gimble in ine crribe
aii mimsy were flu borocjores auk
ilu rno mire rciilis outyrabi.

Take a look at the last line: "mome raths" became "rno mire rciilis." This looks really bad, but you can see how it happened: the "m" in "mome" was recognized as an "r" followed by an "n" instead of an "m," and then it got "o," got the word break wrong, then turned the vertical stroke of the t and horizontal stroke into two "i" characters, then broke the "h" into a downstroke ("l") and another downstroke ("i"). There are a bunch of timing options you can mess with that will improve character recognition. Zoomer wouldn't have this particular problem because on that machine you write your letters in little boxes.

Hope this helps clear up how the Newton works a bit. Obviously it is technology that needs some work, but keep in mind that Jabberwocky isn't your typical business memo.

By the way, I took the time to enter all the strange Jabberwocky words into the dictionary, wrote it again, and got:

Twas brillig and the slithy took
did gyre and gimble in the wabe
Ice mimsy were the borogoves
and the mome raths outgrabe.

As you can see, it was then better at recognizing some of the outrageous words than it was at recognizing common words (like "All" becoming "Ice.") This is often the case on the Newton; it won't get "the quick brown fox" right twice in a row but it will get "discrimination" every time. The longer the word is, the sparser the search tree is.

Of course, you might want to remove these from your dictionary so that they don't pop up in your normal writing. Or maybe not!

It still doesn't know the title of the poem, though: Jabberwocky becomes:

job Democracy
Chemistry
Rubbing
Smoky
Job Dorothy
Job Bern lay
Sophomore
Jobs every city.

All of these examples were written just once.

How does the 2.0 print recognizer do by comparison? I get:

TWas brilly and the slithy tores
did gyre and gimbal in the Wabe
all mimsy were the borogroves
and the mome raths outgrabe.

There were a couple of errors in capitalization, "brillig" became "brilly," and there was one single-letter error. You can see why many people far preferred the print recognizer that shipped with Newton OS 2.0!

You're Not Supposed to Lose Data
Back to The Newton, The First PDA
Like a Bug Under a Microscope

Login
Account Name:

Password:

Create new account
Rating
Overall Rating: 4.00
(good)

Your rating:

  1

  2

  3

  4

  5
2 Comments