Number ^8
"by
Calvin
N.
Mooers
"V*>
Introduction
The Foundations
The Alphabetical Index
Numerical Code and Sorting
Dewey Decimal Classification
Method of Exclusive Subfields
Unit Card System
Microfilm Rapid Selector
Atomic Energy Commission Joint Project
Zatocoding
Epilog
I - INTRODUCTION
The problem under discussion here is machine searching and
retrieval of information from storage according to specification by
subject. A n example is the library problem of selection of technical
1
2
George Boole.
Cambridge, 18^7,
- 7 -
-9-
Pairs of cards having coincidences are set aside, and these in turn
are then collated against the cards of the collection Mc". Cards hav
ing triple coincidences represent the desired documents.
It is one of the hopes of proponents of this method that the
time-consuming collation process can be held down b y using v e r y narrow
and precise descriptors. If this could be done* the unit card system
would be an excellent solution to* information retrieval.
For my part. I have at least two objections to the unit card
methodbesides the matter of the great overload on storage. The first
is that in all my ejperience working with retrieval systems. I have
found that the descriptors must be broad, not precise. This seems
fundamental to the whole retrieval situation, and enters in several
different ways. The second point is that large-scale mechanization
of the unit card system gives rise to difficulties with respect to
collation and the ease of insertion of new items in the storage sys
tem. This is inherently a matter of the use of machines* and it in
volves the sequential sorting problem. In particular* readjustment
of the record would become most difficult if the system went beyond
the use of cards into the use of a magnetic or film record. I bring
this up because a unit card system applied to more than one million
items must run into an enormous collection of cards* and in a project
of this size it would be desirable to completely mechanize the process
by the use of a film or tape record.
It can be concluded, though with some serious reservations*
that the unit qard system is the first system considered that can ac
tually meet the requirements set for information retrieval.
VIII - MICROFILM RAPID SELECTOR
This is the machine constructed for the Department of
Agriculture by Engineering Research Associates along the lines of an
earlier though similar machine by V. Bush. It is a very interesting
device. As an electronic machine, it can be criticised for being at
least an order of magnitude too slow in its speed of scanning. It is
slow by a factor of 100 as compared to the internal processes of the
BIUAC. Its present low rate of scanning of only 10*000 items per min
ute makes its present cost difficult to justify when compared to the
speeds of about 1*000 items p e r minute that can be attained in a
comparable selection situation when sorting cards by a simple hand
operated machine. It has been suggested, however, that succeeding
versions of the machine would be cheaper than the $75000 cost of the
first model.
The full mechanical details of the machine are to be found
-10-
for machine selection, that unless the atomic ideas exist in a very
well-determined structure, the grammar can cause trouble by imposing
a "point of view". For instance.
1 eat a banana0 and "The banana
was eaten by me" mean exactly the same thing, though their form is
quite different because of the differing points of view. More complex
situations are even more difficult.
In chemistry the structures are quite determinate (at least
up to a certain point) and then grammar can often be used to advantage,
though even so it can be overdone.
X - ZATOCODING
Zatocoding is a new method of coding that I am very much
interested in, since I have been concerned vith its mathematical formu
lation. It is inherently a principle of coding rather than any specific
machine embodyment. It can be applied with a number of different
digital machines: electronic scanners, tabulating machinery, and even
with such simple hand-sorted punched cards as the one I am showing in
the Zator Company exhibit at Kutgers hero today. Zatocoding can be
briefly characterized as the coding technique which uses the super
imposition of random subject codes in a single coding field.
Zatocoding is a system of coding which was designed primarily
for information retrieval, and it has revealed the need for some drastic
changes in the conventional library postulates or doctrines. For
instance, in Zatocoding the unwanted bulk of the material in the file
is rejected according to statistical rules, rather than by the prin
ciples of the "exactness" implied in ordinary library systems.
Zatocoding is able to combine the best feature of the unit card system
(the independence of attributes) with the best feature of the method
of exclusive subfields (the use of a single card per information item).
Yet. Zatocoding is able to leave behind the mo3t serious disadvantages
of both methods: respectively, the many cards per information item in
the unit card system, and the indeterminacy of subfields in the method
of exclusive subfields.
The Zatocoding method is as follows: To each information
item there is delegated a single card which has a field for carrying
punches. Other carriers of digital information such as film could
be used auite as well. Each information item is characterized by a
set of attributes, which we can consider as having been written out
on the face of the card. There are as many cards as there are inform
ation items in the collection. The set of 411 the attributes used in
the whole collection forms a "vocabulary" of descriptive terms. Codes
are assigned to the attributes in the vocabulary by starting at the
top of the list and giving the first attribute a random pattern of
punches ranging over the field. The second attribute is given a
second pattern also ranging over the field, and generated randomly
-13-
Those cards are selected, which contain each and every one
of the selector attributes. If the patterns of these attributes have
been punched out on a particular card* the inclusion relation must hold
with this card, and it must be selected irrespective of the other pat
terns on the card. In this respect, selection is according to the
logical product of the selector attributes; e.g. all cards bearing
punches for large", "red", and "apples" simultaneously will come out
when these attributes are placed in the selector.
While all cards fitting the selector prescription must be
selected by the inclusion principle, the strict converse does not hold
v/ith repect to the exclusion of the unwanted cards. This peculiarity
of Zatocoding comes from the superimposition of many code patterns
in the single field of the card. There is an intermingling and over
lapping of the individual patterns. Because of this overlapping,
there is a finite statistical possibility that patterns having no
intellectual connection v/ith the desired patterns can combine to
simulate the configuration of punches in the cards having the desired
patterns. Such cards do select out, and are called "extra cards".
Eowever, and this is important, the relative frequency of such extra
cards with respect to the entire collection is under strict statis
tical control. Typically, in a selection on two patterns, the freouency will be .001 or less. Generally, where S is the total number
of positions in the selector pattern, the average ratio of extra cards
is always less than (1/2) , and often very much l e s s . 9
One might say that selection of cards by Zatocoding is
according to the logical product of the selector attributes plus
"epsilon", where epsilon can be made as small as desired by design
of the system. While Zatocoding selection is not exact from a per
fectionist' s standpoint, it is a good engineering solution to a
problemparticularly when epsilon can easily be brought to 10 ^ 0r
less if ever required.
Zatocoding, by accepting the existence of the inconsequen
tial epsilon, accomplishes these things:
1. There is no indeterminacy of subfielcs for the location
of an attribute on a card, because all the codes are in
a single coordinate frame.
2. To find an attribute, a selector mechanism need search
only in one location on the card.
3* Attributes are used entirely independently, both in
selection and in making up the card. Such independence,
in conjunction with good statistical control of extras,
is gained through the use of random code assignments.
-15-
-17-
uv