0 penilaian0% menganggap dokumen ini bermanfaat (0 suara)
22 tayangan50 halaman
An extension of these ideas to massively parallel, connectianist models appears to of f er a number of advantages. This paper introduces a general connectionist model and considers how it mi ght be used in COGNITIVE SCIENCE. Issues addressed are: stability and noise-sensitivity, distributed decisionmaking, t I me and sequence problems, and the representati on of compl ex concepts.
An extension of these ideas to massively parallel, connectianist models appears to of f er a number of advantages. This paper introduces a general connectionist model and considers how it mi ght be used in COGNITIVE SCIENCE. Issues addressed are: stability and noise-sensitivity, distributed decisionmaking, t I me and sequence problems, and the representati on of compl ex concepts.
An extension of these ideas to massively parallel, connectianist models appears to of f er a number of advantages. This paper introduces a general connectionist model and considers how it mi ght be used in COGNITIVE SCIENCE. Issues addressed are: stability and noise-sensitivity, distributed decisionmaking, t I me and sequence problems, and the representati on of compl ex concepts.
J. A . FELDMAN AND D. H. BALLARD Computer Science Department University of Rochester Rochester, NY 14627 Much of the progress in the fields constituting cognitive science has been based upon the use of expl i ci t information processing models, almost exclusively patterned af t er conventional serial computers. An extension of these ideas to massively paral l el , connectianist models appears to of f er a number of advan- tages. Af t er a prel i mi nary discussion, this paper introduces a general connec- tionist model and considers how it mi ght be used in cognitive science. Among the issues addressed are: stability and noise-sensitivity, distributed decision- making, t i me and sequence problems, and the representati on of compl ex concepts. 1. I NTRODUCTI ON Much of t he progress in t he fields const i t ut i ng cogni t i ve science has been based upon t he use of concr et e i nf or mat i on processi ng model s ( I PM) , al most exclusively pat t er ned af t er convent i onal sequent i al comput er s. Ther e are several reasons for t ryi ng t o ext end I PM t o cases where t he com- put at i ons are carri ed out by a parallel comput at i onal engi ne with per haps billions of active units. As an i nt r oduct i on, we will at t empt to mot i vat e t he cur r ent interest in massively paral l el model s f r om f our di f f er ent perspec- tives: anat omy, comput at i onal compl exi t y, t echnol ogy, and t he rol e of f or - mal l anguages in science. It is t he last of t hese which is of pr i mar y concer n here. We will focus upon a par t i cul ar f or mal i sm, connect i oni st model s (CM), which is based explicitly on an abst r act i on of our cur r ent under st and- ing of t he i nf or mat i on processi ng pr oper t i es of neur ons. Ani mal brai ns do not comput e like a convent i onal comput er . Com- parat i vel y slow (millisecond) neural comput i ng el ement s with compl ex, parallel connect i ons f or m a st r uct ur e which is dr amat i cal l y di f f er ent f r om a hi gh-speed, pr edomi nant l y serial machi ne. Much of cur r ent research in t he neurosci ences is concer ned with t raci ng out t hese connect i ons and with dis- coveri ng how t hey t r ansf er i nf or mat i on. One pur pose of this paper is t o suggest how connect i oni st t heori es of t he brai n can be used t o pr oduc e 205 206 FELDMAN AND BALLARD t est abl e, det ai l ed model s of i nt erest i ng behavi ors. The di st ri but ed nat ur e of i nf or mat i on processi ng in t he brai n is not a new di scovery. The t r adi t i onal view (which we shared) is t hat convent i onal comput er s and l anguages were Turi ng universal and coul d be made t o si mul at e any paral l el i sm (or anal og values) whi ch mi ght be r equi r ed. Cont e mpor a r y comput er science has sharp- ened our not i ons of what is " c o mp u t a b l e " t o i ncl ude bounds on t i me, st or- age, and ot her resources. It does not seem unr easonabl e to requi re t hat comput at i onal model s in cogni t i ve science be at least pl ausi bl e in t hei r post ul at ed r esour ce r equi r ement s. The critical r esour ce t hat is most obvi ous is time. Neur ons whose basic comput at i onal speed is a few milliseconds must be made t o account f or compl ex behavi or s which are carri ed out in a few hundr ed mi l l i seconds ( Posner , 1978). Thi s means t hat entire compl ex behaviors are carried out in less than a hundred time steps. Cur r ent AI and si mul at i on pr ogr ams requi re millions of t i me steps. It may appear t hat t he pr obl em posed here is i nher- ent l y unsol vabl e and t hat t here is an er r or in our f or mul at i on. But recent results in comput at i onal compl exi t y t heor y ( J a ' J a ' , 1980) suggest t hat net- works of active comput i ng el ement s can car r y out at least si mpl e compu- t at i ons in t he requi red t i me range. In subsequent sections we present fast sol ut i ons t o a vari et y of rel evant comput i ng pr obl ems. These sol ut i ons in- vol ve using massive number s of units and connect i ons, and we also address t he quest i ons of l i mi t at i ons on these resources. Anot her recent devel opment is t he feasibility of bui l di ng paral l el com- put ers. Ther e is cur r ent l y t he capabi l i t y to pr oduce chips with 100,000 gates at a r epr oduct i on cost of a few cents each, and t he t echnol ogy t o go t o 1,000,000 gat es/ chi p appear s t o be in hand. Thi s has t wo i mpor t ant conse- quences f or t he st udy of CM. The obvi ous consequence is t hat it is now fea- sible to f abr i cat e massi vel y paral l el comput er s, al t hough no one has yet done so ( Fahl man, 1980; Hillis, 1981). The second consequence of this devel op- ment is t he renewed interest in t he basic pr oper t i es of hi ghl y paral l el com- put at i on. A maj or r eason why t here ar en' t yet any of t hese CM machi nes is t hat we do not yet know how t o design, assembl e, test, or pr ogr am such engines. An i mpor t ant mot i vat i on f or t he car ef ul st udy of CM is t he hope t hat we will l earn mor e about how to do parallel comput i ng, but we will say no mor e about t hat in this paper . The most i mpor t ant r eason f or a serious concer n in cogni t i ve science f or CM is t hat t hey mi ght lead t o bet t er science. It is obvi ous t hat t he choi ce of t echni cal l anguage t hat is used f or expressing hypot heses has a pr of ound i nfl uence on t he f or m in which t heori es are f or mul at ed and exper i ment s under t aken. Art i fi ci al intelligence and art i cul at i ng cogni t i ve sciences have made great progress by empl oyi ng model s based on convent i onal digital comput er s as t heori es of intelligent behavi or . But a number of cruci al phenomena such as associ at i ve memor y, pri mi ng, per cept ual ri val ry, and CONNECTIONIST MODELS AND THEIR PROPERTIES 207 the remarkable recovery ability of animals have not yielded to this treat- ment. A major goal of this paper is to lay a foundation for the systematic use of massively parallel connectionist models in the cognitive sciences, even where these are not yet reducible to physiology or silicon. Over the past few years, a number of investigators in different fields have begun to employ highly parallel models (idiosyncratically) in their work. The general idea has been advocated for animal models by Arbib (1979) and for cognitive models by Anderson (Anderson et al., 1977) and Ratcliff (1978). Parallel search of semantic memory and various "spreading activation" theories have become common (though not quite consistent) parts of information processing modeling. In machine perception research, massively parallel, cooperative computational theories have become a dominant paradigm (Marr & Poggio, 1976; Rosenfeld et al., 1976) and many of our examples come from our own work in this area (Ballard, 1981; Sabbah, 1981). Scientists looking at performance errors and other non- repeatable behaviors have not found conventional IPM to be an adequate framework for their efforts. Norman (1981) has recently summarized argu- ments from cognitive psychology, and Kinsbourne and Hicks (1979) have been led to a similar view from a different perspective. It appears to us that all of these efforts could fit within the CM paradigm outlined here. One of the most interesting recent studies employing CM techniques is the partial theory of reading developed in (McClelland & Rumelhart, 1981). They were concerned with the word superiority effect and related questions in the perception of printed words, and had a large body of experimental data to explain. One major finding is that the presence of a printed letter in a brief display is easier to determine when the letter is presented in the context of a word than when it is presented alone. The model they developed (cf. Figure 1) explicitly represents three levels of processing: visual features of printed letters, letters, and words. The model assumes that there are positive and negative (circular tipped) connections from visual features to the letters that they can (respectively, cannot) be part of. The connections between let- ters and words can go in either direction and embody the constraints of English. The model assumes that many units can be simultaneously active, that units form algebraic sums of their inputs and output values propor- tionally. The activity of a unit is bounded from above and below, has some memory, and decays with time. All of these features, and several more, are captured in the abstract unit described in Section 2. This idea of simultaneously evaluating many hypotheses (here words) has been successfully used in machine perception for some time (Hanson & Riseman, 1978). What has occurred to us relatively recently is that this is a natural mode of computation for widely interconnected networks of active elements like those envisioned in connectionist models. The generalization of these ideas to the connectionist view of brain and behavior is that all im- 208 FELDMAN AND BALLARD Figure 1. A f ew of the neighbors of the node for the l et t er " t " in the first posi t i on in a wor d, and t hei r i nterconnecti ons (McCl el l and & Rumelhart, 1981). portant encodings in the brain are in terms of the relative strengths of synaptic connections. The fundamental premise of connectionism is that in- dividual neurons do not transmit large amounts of symbolic information. Instead they compute by being appropriately connected to large numbers of similar units. This is in sharp contrast to the conventional computer model of intelligence prevalent in computer science and cognitive psychology. The fundamental distinction between the conventional and connec- tionist computing models can be conveyed by the following example. When one sees an apple and says the phrase "wormy apple, " some information must be transferred, however indirectly, from the visual system to the speech system. Either a sequence of special symbols that denote a wormy apple is transmitted to the speech system, or there are special connections to the speech command area for the words. Figure 2 is a graphic presentation of the two alternatives. The path on the right described by double-lined arrows depicts the situation (as in a computer) where the information that a wormy apple has been seen is encoded by the visual system and sent as an abstract message (perhaps frequency-coded) to a general receiver in the speech system which decodes the message and initiates the appropriate speech act. Notice that a complex message would presumably have to be transmitted sequentially on this channel, and that each end would have to CONNECTIONIST MODELS AND THEIR PROPERTIES 209 der ~ o d e r Figure 2. Connectionism vs. symbolic encoding. As s u me s s o me general encoding Assumes individual connections l earn t he c ommon code f or every new concept . No one has yet pr oduced a bi ol ogi cal l y and comput at i onal l y pl ausi bl e real i zat i on of this convent i onal comput er model . The onl y al t ernat i ve t hat we have been able t o uncover is descri bed by t he pat h with single-width ar r ows. Thi s suggests t hat t her e are (indirect) links f r om t he units (cells, col umns, cent ers, or what - have- you) t hat recogni ze an appl e t o some units responsi bl e f or speaki ng t he wor d. The connect i oni st model requi res onl y very si mpl e messages (e.g. st i mul us st rengt h) t o cross a channel but put s st r ong demands on t he avai l abi l i t y of t he right connect i ons. Quest i ons concer ni ng t he l earni ng and r ei nf or cement of connect i ons are addressed in Fel dman, (1981b). For a number of reasons (i ncl udi ng r edundancy f or reliability), it is highly unlikely t hat t her e is exact l y one neur on f or each concept , but t he poi nt of view t aken here is t hat t he act i vi t y of a small number of neur ons (say 10) encodes a concept like appl e. An al t ernat i ve view ( Hi nt on & Ander - son, 1981) is t hat concept s are r epr esent ed by a " pa t t e r n of act i vi t y" in a much larger set of neur ons (say 1,000) whi ch also represent many ot her con- cepts. We have not seen how t o car r y out a pr ogr am of specific model i ng in t er ms of these di f f use model s. One of t he maj or pr obl ems with di f f use 210 FELDMAN AND BALLARD model s as a parallel comput at i on scheme is cross-t al k among concept s. For exampl e, i f concept s using units (10, 20, 30 . . . . ) and (5, 15, 25 . . . . ) were si mul t aneousl y act i vat ed, many ot her concept s, e. g. , (20, 25, 30, 35 . . . . ) woul d be active as well. In t he exampl e of Fi gure 2, this means t hat di f f use model s woul d be mor e like t he shar ed sequent i al channel . Al t hough a single concept coul d be t r ansmi t t ed in paral l el , compl ex concept s woul d have t o go one at a t i me. Si mul t aneousl y t r ansmi t t i ng mul t i pl e concept s t hat shar ed units woul d cause cross-t al k. It is still t r ue in our CM t hat many rel at ed units will be t ri ggered by spreadi ng act i vat i on, but t he r epr esent at i on of each concept is t aken t o be compact . Most cogni t i ve scientists believe t hat t he br ai n appear s t o be massi vel y parallel and t hat such st ruct ures can comput e special f unct i ons ver y well. But massively parallel st ruct ures do not seem t o be usabl e f or general pur- pose comput i ng and t her e is not near l y as much knowl edge of how t o con- st ruct and anal yze such model s. The c ommon bel i ef (which may well be right) is t hat t her e are one or mor e i nt er medi at e levels of comput at i onal or gani zat i on l ayered on t he neur onal st r uct ur e, and t hat t heori es of intelli- gent behavi or shoul d be descri bed in t er ms of t hese higher-level l anguages, such as Pr oduct i on Systems, Pr edi cat e Cal cul us, or LI SP. We have not seen a r educt i on (i nt erpret er, i f you will) of any hi gher f or mal i sm which has pl ausi bl e resource requi rement s, and this is a pr obl em well wor t h pursui ng. Our at t empt s t o devel op cogni t i ve science model s di rect l y in neural t erms mi ght fail for one of t wo reasons. It may be t hat t her e real l y is an in- t er pr et ed symbol system in ani mal brai ns. In this case we woul d hope t hat our ef f or t s woul d br eak down in a way t hat coul d shed light on t he nat ur e of this symbol system. The ot her possibility is t hat CM t echni ques are di rect l y appl i cabl e but we are unabl e t o figure out how t o model some im- por t ant capaci t y, e. g. , pl anni ng. Our pr ogr am is t o cont i nue t he CM at t ack on pr obl ems of increasing di ffi cul t y (and t o i nduce some of you t o j oi n us) until we encount er one t hat is i nt r act abl e in our t erms. Ther e are a number of pr obl ems t hat are known t o be di ffi cul t f or syst ems wi t hout an i nt er pr et ed symbol i c r epr esent at i on, i ncl udi ng compl ex concept s, l earni ng, and nat ur al l anguage under st andi ng. The cur r ent paper is mai nl y concer ned with laying out t he f or mal i sm and showi ng how it applies in t he easy cases, but we do address t he pr obl em of compl ex concept s in Sect i on 4. We have made some progress on t he pr obl em of l earni ng in CM systems ( Fel dman, 1981b) and are begi nni ng t o wor k seri ousl y on nat ur al l anguage processi ng and on higher-level vision. Our ef f or t s on pl anni ng and l ong- t er m me mor y r eor ga- ni zat i on have not advanced si gni fi cant l y beyond t he discursive pr esent at i on in ( Fel dman, 1980). We will cert ai nl y not get ver y far in this pr ogr am wi t hout devel opi ng some syst emat i c met hods of at t acki ng CM tasks and some bui l di ng-bl ock circuits whose pr oper t i es we under st and. A first step t owar ds a syst emat i c devel opment of CM is t o defi ne an abst r act comput i ng uni t . Our unit is CONNECTI ONI ST MODELS AND THEIR PROPERTIES 211 r at her mor e general t han previ ous pr oposal s and is i nt ended t o capt ur e t he cur r ent under st andi ng of t he i nf or mat i on processi ng capabilities of neur ons. Some useful special cases of our general def i ni t i on and some pr op- erties of ver y simple net wor ks are devel oped in Sect i on 2. Among t he key ideas are l ocal memor y, non- homogeneous and non- l i near f unct i ons, and t he not i ons of mut ual i nhi bi t i on and stable coal i t i ons. A maj or pur pose of t he rest of t he paper is t o descri be bui l di ng bl ocks which we have f ound useful in const r uct i ng CM sol ut i ons t o var i ous t asks. The const r uct i ons are i nt ended t o be used t o make specific model s but t he exampl es in this paper are onl y suggestive. We present a number of CM sol ut i ons t o general pr obl ems arising in intelligent behavi or , but we are not suggesting that any of these are necessarily employed by nature. Our not i on of an adequat e model is one t hat account s f or all of t he est abl i shed rel evant findings and this is not a t ask t o be under t aken lightly. We are devel opi ng some pr el i mi nar y sket ches (Bal l ard & Sabbah, 1981; Sabbah, 1981) f or a seri ous model of low and i nt er medi at e level vision. As we devel op vari ous bui l di ng bl ocks and t echni ques we will also be t ryi ng t o bur y some of t he cont ami nat ed debri s of past neural model i ng ef f or t s. Many of our const r uc- t i ons are i nt ended as answers t o known har d pr obl ems in CM comput at i on. Among t he issues addressed are: st abi l i t y and noi se-sensi t i vi t y, di st r i but ed deci si on- maki ng, t i me and sequence pr obl ems, and t he r epr esent at i on of compl ex concept s. The cruci al quest i ons of l earni ng and change in CM syst ems are discussed el sewhere ( Fel dman, 1981b). 2. NEURON- LI KE COMP UTI NG UNI TS As par t of our ef f or t t o devel op a general l y useful f r amewor k f or connec- t i oni st t heori es, we have devel oped a st andar d model of t he i ndi vi dual uni t . It will t ur n out t hat a " u n i t " may be used t o model anyt hi ng f r om a small par t of a neur on t o t he ext ernal f unct i onal i t y of a maj or subsyst em. But t he basic not i on of uni t is meant t o l oosel y cor r es pond t o an i nf or mat i on pr o- cessing model of our cur r ent under st andi ng of neur ons. The par t i cul ar defi ni t i ons here were chosen t o make it easy t o speci fy det ai l ed exampl es of rel at i vel y compl ex behavi ors. Ther e is no at t empt t o be mi ni mal or mat he- mat i cal l y el egant . The vari ous numer i cal values appear i ng in t he defi ni t i ons are ar bi t r ar y, but fixed finite bounds pl ay a cruci al rol e in t he devel opment . The pr esent at i on o f t he def i ni t i ons will be in stages, accompani ed by ex- ampl es. A compact t echni cal speci fi cat i on f or r ef er ence pur poses is i ncl uded as Appendi x A. Each uni t will be char act er i zed by a small number of dis- cret e states plus: p - - a cont i nuous val ue in [ - 10, 10], called potential (accuracy of several digits) v - - a n output value, integers O_ v _ 9 i - - a vect or of inputs i, . . . . . in 212 FELDMAN AND BALLARD P- Uni t s For some appl i cat i ons, we will be able t o use a par t i cul ar l y si mpl e ki nd of unit whose out put v is pr opor t i onal t o its pot ent i al p ( r ounded) when p > 0 and which has onl y one state. In ot her words p - - p +/3 Z; wkik V --i f p>O then r ound ( p - 0 ) e/se 0 [0_~ wk_< 1] [v = 0...91 where/~, 0 are const ant s and w~ are weights on t he i nput values. The weights are t he sole locus of change with experi ence in t he cur r ent model . Most of t en, t he pot ent i al and out put of a unit will be encodi ng its confidence, and we will somet i mes use this t erm. The " - - " not at i on is bor r owed f r om t he assi gnment st at ement of pr ogr ammi ng l anguages. Thi s not at i on covers bot h cont i nuous and di scret e t i me f or mul at i ons and allows us t o t al k about some issues wi t hout any explicit ment i on of time. Of course, cert ai n ot her ques- tions will i nherent l y i nvol ve t i me and comput er si mul at i on of any net wor k of units will raise del i cat e quest i ons of discretizing t i me. The rest ri ct i on t hat out put t ake on small i nt eger values is cent ral t o our ent erpri se. The firing frequenci es of neur ons range f r om a few t o a few hundr ed impulses per second. In t he 1/ 10 second needed f or basic ment al events, t here can onl y be a limited amount of i nf or mat i on encoded in fre- quencies. The t en out put values are an at t empt t o capt ur e this idea. A mor e accurat e renderi ng of neural events woul d be t o al l ow 100 discrete val ues with noise on t ransmi ssi on (cf. Sej nowski , 1977). Tr ansmi ssi on t i me is assumed t o be negligible; del ay units can be added when t ransi t t i me needs to be t aken i nt o account . The p-uni t is somewhat like classical l i near t hr eshol d el ement s ( Mi nsky & Paper t , 1972), but t her e are several di fferences. The pot ent i al , p, is a crude f or m of memor y and is an abst r act i on of t he i nst ant aneous membr ane pot ent i al t hat charact eri zes neur ons; it great l y reduces t he noi se sensitivity of our net wor ks. Wi t hout local memor y in t he uni t , one must guar ant ee t hat all t he inputs requi red f or a comput at i on appear si mul t aneousl y at t he uni t . One pr obl em with the defi ni t i on above of a p-uni t is t hat its pot ent i al does not decay in t he absence of i nput . Thi s decay is bot h a physi cal pr op- erty of neur ons and an i mpor t ant comput at i onal f eat ur e f or our hi ghl y parallel model s. One comput at i onal t ri ck t o solve this is t o have an in- hi bi t or y connect i on f r om t he unit back t o itself. I nf or mal l y, we i dent i f y t he negat i ve self f eedback with an exponent i al decay in pot ent i al whi ch is mat hemat i cal l y equi val ent . Wi t h this addi t i on, p-uni t s can be used f or many CM tasks of i nt er medi at e di ffi cul t y. The I nt er act i ve Act i vat i on model s of McCl el l and and Rumel har t can be descri bed nat ur al l y wi t h p-units, and some of our own wor k (Bal l ard, 1981) and t hat of ot her s ( Mar r CONNECTIONIST MODELS AND THEIR PROPERTIES 2 1 3 & Poggi o, 1976) can be done wi t h p-uni t s. But t her e are a numbe r o f addi - t i onal f eat ur es whi ch we have f ound val uabl e in mor e compl ex model i ng t asks. Disjunctive Firing Conditions and Conjunctive Connections It is bot h comput at i onal l y effi ci ent and bi ol ogi cal l y real i st i c t o al l ow a uni t to r es pond t o one of a numbe r of al t er nat i ve condi t i ons. One way t o vi ew this is t o i magi ne t he uni t havi ng " d e n d r i t e s " each o f whi ch depi ct s an al t er- nat i ve enabl i ng condi t i nn (Fi gure 3). For exampl e, one coul d ext end t he net - wor k of Fi gure 1 t o al l ow f or several di f f er ent t ype f ont s act i vat i ng t he s ame letter node, with t he hi gher connect i ons unchanged. Bi ol ogi cal l y, t he fi ri ng of a neur on depends, in ma ny cases, on local s pat i o- t empor al s umma t i on i nvol vi ng onl y a smal l par t of t he ne ur on' s sur f ace. So-cal l ed dendr i t i c spi kes t r ans mi t t he act i vat i on t o t he rest o f t he cell. i 3 i , is i8 i7 Fi gur e 3. Co n j u n c t i v e c onnec t i ons and d i s j u n c t i v e i n p u t s i t e s . In t er ms of our f or mal i s m, t hi s coul d be descr i bed in a var i et y of ways. One o f t he si mpl est is t o def i ne t he pot ent i al in t er ms of t he ma x i mu m of t he separ at e c omput a t i ons , e. g. , p - - p +/ ~Max(i , + i2 - ~, i3 + i, - ~, is + i6 - i7 - ) where/ 3 is a scale cons t ant as in t he p-uni t and is a cons t ant chosen (usual l y > 10) t o suppr ess noi se and r equi r e t he pr esence o f mul t i pl e act i ve i nput s ( Sabbah, 1981). The mi nus sign associ at ed wi t h i, cor r es ponds t o its bei ng an i nhi bi t or y i nput . It does not seem unr eas onabl e (given cur r ent dat a, Kuf f l er & Ni chol l s, 1976) t o model t he fi ri ng r at e o f s ome uni t s as t he ma x i mu m of t he rat es at its act i ve sites. Uni t s whose pot ent i al is changed accor di ng t o t he ma x i mu m of a set of al gebr ai c sums will occur f r equent l y in our speci fi c model s. One advant age of keepi ng t he pr ocessi ng power of our abs t r act uni t cl ose t o t hat o f a neur on is t hat it hel ps i nf or m our count i ng ar gument s . When we at - 214 FELDMAN AND BALLARD t empt t o model a par t i cul ar f unct i on (e. g. , stereopsis), we expect t o r equi r e t hat t he number of units and connect i ons as well as t he execut i on t i me re- qui red by t he model are plausible. The max- of - sum unit is t he cont i nuous anal og of a logical OR- of - AND (di sj unct i ve nor mal f or m) uni t and we will somet i mes use t he l at t er as an ap- pr oxi mat e versi on of t he f or mer . The OR- of - AND uni t cor r espondi ng t o Fi gure 3 is: p - p + ot OR (i,&i2, i3&i,, is&i~&(not i,) ) Thi s f or mul at i on stresses t he i mpor t ance t hat near by spat i al connect i ons all be firing bef or e t he pot ent i al is af f ect ed. Hence, in t he above exampl e, i3 and i4 make a conjunctive connection with t he uni t . The ef f ect of a conj unc- tive connect i on can always be si mul at ed with mor e uni t s but t he number of ext ra units may be ver y large. Q-Units and Compound Units Anot her useful special case arises when one suppresses t he numer i cal pot en- tial, p, and relies upon a fi ni t e-st at e set {q} f or model i ng. I f we also i dent i f y each i nput of i with a separ at e named i nput signal, we can get classical fi ni t e aut omat a. A si mpl e exampl e woul d be a uni t t hat coul d be st ar t ed or s t opped f r om firing. One coul d descri be t he behavi or of this uni t by a t abl e, with rows cor- r espondi ng t o states in {q} and col umns t o possible i nput s, e. g. , i~ (st art ) i2 (stop) Fi ri ng Firing Nul l Null Null Firing One woul d also have t o speci fy an out put f unct i on, giving out put val ues re- qui r ed by t he rest of t he net wor k, e. g. , v - - i f q = Fi ri ng t hen 6 else 0. Thi s coul d also be added t o t he t abl e above. An equi val ent not at i on woul d be t ransi t i on net wor ks wi t h states as nodes and i nput s and out put s on t he arcs. In or der t o bui l d model s of i nt erest i ng behavi or s we will need t o empl oy many of t he same t echni ques used by designers of compl ex com- put ers and pr ogr ams. One of t he most power f ul t echni ques will be encapsu- CONNECTIONIST MODELS AND THEIR PROPERTIES 215 lation and abstraction of a subnetwork by an individual unit. For example, a system that had separate motor abilities for turning !eft and turning right (e.g., fins) could use two start-stop units to model a turn-unit, as shown in Figure 4. l e f t start ~ c a u s e s s t o p ~ ~'~ motion t o left ~ ~ s t e c a u s e s right rt . ~ -~ m o t i o n ~ , . s t o p ~ t o r i g h t Figure 4. A Turn Unit. Note that the compound unit here has two distinct outputs, where basic units have only one (which can branch, of course). In general, com- pound units will differ from basic ones only in that they can have several distinct outputs. The main point of this example is that the turn-unit can be described abstractly, independent of the details of how it is built. For example, using the tabular conventions described above, Left Right Values Output a gauche a gauche adr oi t v, =7, v2=O adr oi t a gauche adr oi t v, =0, v2=8 where the right-going output being larger than the left could mean that we have a right-finned robot. There is a great deal more that must be said about the use of states and symbolic input names, about multiple simultaneous in- puts, etc., but the idea of describing the external behavior of a system only in enough detail for the task at hand is of great importance. This is one of the few ways known of coping with the complexity of the magnitude needed for serious modeling of biological functions. It is not strictly necessary that the same formalism be used at each level of functional abstraction and, in the long run, we may need to employ a wide range of models. For example, for certain purposes one might like to expand our units in terms of compart- mental models of neurons like those of (Perkel, 1979). The advantage of keeping within the same formalism is that we preserve intuition, mathe- matics, and the ability to use existing simulation programs. With sufficient care, we can use the units defined above to represent large subsystems with- 216 FELDMAN AND BALLARD out gi vi ng up t he not i on t hat each uni t can st and f or an abs t r act neur on. The cruci al poi nt is t hat a subsyst em mus t be el abor at ed i nt o its neur on- l evel uni t s f or t i mi ng and size cal cul at i ons, but can ( hopef ul l y) be descr i bed much mor e si mpl y when onl y its effect s on ot her subsyst ems ar e of di rect concer n. Un i t s E mp l o y i n g p and q It will al r eady have occur r ed t o t he r eader t hat a numer i cal val ue, like our p, woul d be useful f or model i ng the a mount o f t ur ni ng t o t he left or ri ght in t he last exampl e. It appear s t o be general l y t r ue t hat a single numer i cal val ue and a smal l set o f di scret e st at es combi ne t o pr ovi de a power f ul yet t r act abl e model i ng uni t . Thi s is one r eason t hat t he cur r ent def i ni t i ons were chosen. Anot her r eason is t hat t he mi xed uni t seems t o be a par t i cul ar l y conveni ent way o f model i ng t he i nf or mat i on pr ocessi ng behavi or o f neur ons, as gener- al l y descri bed. The di scret e st at es enabl e one t o model t he ef f ect s in neur ons o f pol ypept i de modul at or s , a bnor ma l chemi cal envi r onment s , fat i gue, etc. Al t hough t hese effect s are of t en cont i nuous f unct i ons of uni t par amet er s , t here are several advant ages t o usi ng di scret e st at es in our model s. Scientists and l aymen al i ke of t en gi ve di st i nct names (e. g. , cool , wa r m, hot ) t o p a r a m- et er ranges t hat t hey want t o t r eat di f f er ent l y. We al so can expl oi t a l arge l i t er at ur e on under st andi ng l oosel y- coupl ed syst ems as fi ni t e-st at e machi nes (Sunshi ne, 1979). It is al so t r adi t i onal t o br eak up a f unct i on i nt o s epar at e r anges when it is si mpl er t o descri be t hat way. We have al r eady empl oyed all of these uses of discrete st at es in our detailed wor k ( Fel dman, 1981b; Sabbah, 1981). One exampl e of a uni t empl oyi ng bot h p and q non- t r i vi al l y is t he fol l owi ng cr ude neur on model . Thi s model is concer ned wi t h s at ur at i on and assumes t hat t he out put st r engt h, v, is s omet hi ng like aver age fi ri ng f r e- quency. It is not a model of i ndi vi dual act i on pot ent i al s and r ef r act or y peri ods. We suppose t he di st i nct st at es o f t he uni t q e {nor mal , r ecover }. In normal st at e t he uni t behaves like a p- uni t , but while it is recovering it ig- nor es i nput s. The fol l owi ng t abl e capt ur es al mos t all of t hi s behavi or . ( i ncompl et e) nor ma l r ecover - 1 < p < 9 p > 9 Out put Value p - - p + Ei p - - - p / v - - ct p - / 9 r ecover nor ma l < i mpossi bl e > v- - O Her e we have t he change f r om one st at e t o t he ot her dependi ng on t he val ue of t he pot ent i al , p, r at her t han on speci fi c i nput s. The r ecover i ng st at e is al so char act er i zed by t he pot ent i al bei ng set negat i ve. The unspeci f i ed issue is what det er mi nes t he dur at i on of t he r ecover i ng s t a t e - - t he r e ar e CONNECTIONIST MODELS AND THEIR PROPERTIES 217 several possibilities. One is an explicit di shabi t uat i on signal like t hose in Kandel ' s exper i ment s (Kandel , 1976). Anot her woul d be t o have t he uni t sum i nput s in t he recoveri ng st at e as well. The r eader mi ght want t o con- sider how t o add this t o t he t abl e. The t hi r d possibility, which we will use f r equent l y, is t o assume t hat t he pot ent i al , p, decays t owar d zero ( f r om bot h direction~) unless explicitly changed. Thi s implicit decay p- - p0e -kf can be model ed by sel f i nhi bi t i on; t he decay const ant , k, det ermi nes t he length of t he r ecover y per i od. The general def i ni t i on of our abst r act neural comput i ng uni t is j ust a f or mal i zat i on of t he ideas pr esent ed above. To t he previ ous not i ons of p, v, and i we f or mal l y add { q} - - a set of discrete states, < 10 and f unct i ons f r om ol d t o new values of t hese p- - f ( i , p, q) q- - g( i , p, q) v- - h( i , p, q) which we assume, f or now, t o comput e cont i nuousl y. The f or m of t he f, g, and h f unct i ons will vary, but will general l y be rest ri ct ed t o condi t i onal s and simple f unct i ons. Ther e are bot h bi ol ogi cal and comput at i onal reasons f or allowing units t o r espond (for exampl e) l ogari t hmi cal l y t o t hei r i nput s and we have al ready seen i mpor t ant uses of t he maxi mum f unct i on. The onl y ot her not i on t hat we will need is modi f i er s associ at ed with t he i nput s of a unit. We el abor at e t he i nput vect or i in t er ms of received values, weights, and modi fi ers: V j , b = r~-w~..mj j = 1 . . . . . n where rj is t he value received f r om a predecessor [ r = 0 . . . 9 ] ; wj is a changeabl e weight, unsi gned [0_< w~_< 1] ( accur acy of several digits); and mj is a synapt o- synapt i c modifier whi ch is ei t her 0 or 1. The weights are t he onl y t hi ng in t he syst em whi ch can change with ex- peri ence. They are unsi gned because we do not want a connect i on t o change f r om exci t at or y t o i nhi bi t ory. The modi f i er or gat e simplifies many of our det ai l ed model s. Lear ni ng and change will not be t r eat ed t echni cal l y in this paper , but t he defi ni t i ons are i ncl uded in t he Appendi x f or compl et eness ( Fel dman, 1981b). We concl ude this sect i on with some pr el i mi nar y exampl es of net wor ks of our units, illustrating t he key i dea of mut ual (lateral) i nhi bi t i on (Fig. 5). Mut ual i nhi bi t i on is wi despread in nat ur e and has been one of t he basic comput at i onal schemes used in model i ng. We will present t wo exampl es of 218 FELDMAN AND BALLARD how it works t o hel p aid in i nt ui t i on as well as t o i l l ust rat e t he not at i on. The basic si t uat i on is symmet r i c conf i gur at i ons of p-uni t s whi ch mut ual l y in- hibit one anot her . Ti me is br oken i nt o discrete i nt erval s f or these exampl es. The exampl es are t oo simple t o be realistic, but do cont ai n ideas whi ch we will empl oy r epeat edl y. Two P-Units Symmetrically Connected Suppose w, = 1, w2 = - .5 p(t + 1) = p(t) + r, - (.5)r2 v = r ound( p) [ 0 . . . 9 ] rs = recei ved Referri ng t o Fi gure 5a, suppose t he initial i nput t o t he uni t A. 1 is 6, t hen 2 per t i me step, and t he initial i nput t o B. I is 5, t hen 2 per t i me step. At each t i me step, each uni t changes its pot ent i al by addi ng t he ext ernal val ue (r,) and subst ract i ng hal f t he out put val ue of its rival. Thi s system will stabilize t o t he side of t he l arger of t wo i nst ant aneous i nput s. Two Symmetric Coalitions of 2-Units wl =l W~=. 5 W3 = -- . 5 p(t + 1) = p(t) + r, + .5(r, - r3) v = r ound( p) A, C st art at 6, B, D at 5; A, B, C, D have no ext ernal i nput f or t > 1 The connect i ons f or this system are shown in Fi gure 5b. Thi s syst em converges fast er t han t he previ ous exampl e. The idea her e is t hat uni t s A and C f or m a " c oa l i t i on" with mut ual l y r ei nf or ci ng connect i ons. The compet i ng units are A vs. B and C vs. D. The last exampl e is t he smallest net wor k de- picting what we believe t o be t he basic mode of oper at i on in connect i oni st systems. The fast er conver gence is not an ar t i f act ; t he positive feedback among member s of a coal i t i on will general l y lead t o fast er conver gence t han in separat e compet i t i ons. It is t he amount of posi t i ve f eedback r at her t han j ust t he size of t he coal i t i on t hat det ermi nes t he rat e of conver gence (Feld- man & Ballard, 1982). In t er ms of Fi gure 1, this coul d represent t he behavi or of t he rival letters A and T in conj unct i on with t he rival words ABLE and TRAP, in t he absence of ot her act i ve nodes. Compet i ng coal i t i ons of units will be t he organi zi ng pri nci pl e behi nd most of our model s. Consi der t he t wo al t ernat i ve readi ngs of t he Necker . Q Q - Q . ~
E E . - u u ~ 0 0 o . ~ H ~ o 0 0 z u E E 0 E " 0 C 0 o M. 219 220 FELDMAN AND BALLARD cube shown in Fi gure 6. At each level of visual processi ng, t her e are mut ual l y cont r adi ct or y units represent i ng al t ernat i ve possibilities. The dashed lines denot e t he boundar i es of coal i t i ons which embody t he al t ernat i ve i nt erpre- t at i ons of t he image. A number of i nt erest i ng phenomena (e. g. , pr i mi ng, per cept ual ri val ry, filling, subj ect i ve cont our ) fi nd nat ur al expressi on in this f or mal i sm. We are engaged in an ongoi ng ef f or t (Bal l ard, 1981; Sabbah, 1981) t o model as much of visual processi ng as possible wi t hi n t he connec- tionist f r amewor k. The next sect i on describes in some detail a vari et y of simple net wor ks which we have f ound t o be useful in this ef f or t . 3. NETWORKS OF UNI TS The mai n rest ri ct i on i mposed by t he connect i oni st par adi gm is t hat no sym- bolic i nf or mat i on is passed f r om unit t o unit. Thi s rest ri ct i on makes it di ffi - cult t o empl oy st andar d comput at i onal devices like par amet er i zed f unct i ons. In this section, we present connect i oni st sol ut i ons t o a vari et y of comput a- t i onal probl ems. The sections address t wo pri nci pal issues. One is: Can t he net wor ks be connect ed up in a way t hat is suffi ci ent t o represent t he pr ob- lem at hand? The ot her is: Gi ven these connect i ons, how can t he net wor ks exhibit appr opr i at e dynami c behavi or , such as maki ng a decision at an appr opr i at e t i me? Usi ng a Uni t to Represent a Value One key t o many of our const r uct i ons is t he dedi cat i on of a separ at e unit t o each val ue of each par amet er of i nt erest , whi ch we t er m t he uni t / val ue pri n- ciple. We will show how t o comput e using uni t / val ue net wor ks and present ar gument s t hat t he number of units r equi r ed is not unr easonabl e. In this r epr esent at i on t he out put of a uni t may be t hought of as a conf i dence mea- sure. Suppose a net wor k of dept h units encodes t he di st ance of some obj ect f r om t he ret i na. Then i f t he uni t represent i ng dept h = 2 sat urat es, t he net- wor k is expressing conf i dence t hat t he di st ance is t wo units. Si mi l arl y, t he " G- h i d d e n " node in Fi gure 6 expresses conf i dence in its assert i on. Ther e is much neur ophysi ol ogi cal evi dence t o suggest uni t / val ue or gani zat i ons in less abst r act cort i cal maps. Exampl es are edge sensitive uni t s ( Hubel & Wiesel, 1979) and per cept ual col or units (Zeki , 1980), whi ch are rel at i vel y insensitive t o i l l umi nat i on spect ra. Exper i ment s with cort i cal mot or cont r ol in t he monkey and cat ( Wur t z & Al bano, 1980) suggest a uni t / val ue or gani - zat i on. Our hypot hesi s is t hat t he uni t / val ue or gani zat i on is wi despread, and is a f undament al design pri nci pl e. CONNECTIONIST MODELS AND THEIR PROPERTIES 221 H D B C \ \ / / \ / \ / J A / / \ I / ! \ Figure 6. The Necker Cube. Al t hough many physi cal neur ons do seem t o f ol l ow t he uni t / val ue rule and r espond accor di ng t o t he rel i abi l i t y of a par t i cul ar conf i gur at i on, t her e are also ot her neur ons whose out put represent s t he r ange of some par amet er , and appar ent l y some units whose firing f r equency reflects bot h r ange and st rengt h i nf or mat i on (Scientific Amer i can, 1979). Bot h of t he l at t er t ypes can be a c c ommoda t e d wi t hi n our def i ni t i on of a uni t , but we will empl oy onl y uni t / val ue net wor ks in t he r emai nder of this paper . In t he uni t / val ue r epr esent at i on, much comput at i on is done by t abl e l ook- up. As a simple exampl e, let us consi der t he mul t i pl i cat i on of t wo vari - ables, i . e. , z = x y . In t he uni t / val ue f or mal i sm t her e will be units f or e v e r y val ue of x and y t hat is i mpor t ant . Appr opr i at e pai rs of these will make a conj unct i ve connect i on with anot her uni t cell represent i ng a specific val ue f or t he pr oduct . Fi gure 7 shows this f or a small set of units represent i ng values f or x and y. Not i ce t hat t he conf i dence (expressed as out put value) t hat a par t i cul ar pr oduct is an answer can be a l i near f unct i on of t he max- 222 FELDMAN AND BALLARD i mum o f t he sums o f t he conf i dences of its t wo i nput s. A maj or pr obl em with f unct i on t abl es (and with CM in general ) is t he pot ent i al combi nat or i al expl osi on in t he number of units r equi r ed f or a comput at i on. A nai ve ap- pr oach woul d demand N 2 uni t s t o represent all pr oduct s of number s f r om 1 t o N. The net wor k of Fi gure 7 requi res many fewer units because each pr od- uct is represent ed onl y once, anot her advant age of conj unct i ve connect i ons. We coul d use even fewer units by expl oi t i ng posi t i onal not at i on and repl ac- ing each out put connect i on with a conj unct i on of out put s f r om uni t s repre- senting mul t i pl es of l , 10, 100, etc. The quest i on of effi ci ent ways of bui l di ng connect i on net wor ks is t r eat ed in detail in Sect i on 4 (cf. also Hi n- t on, 1981a; 1981b). z - - - - f ( x , y ) = x y x- uni t s ,.onit, U \ z uni t : Fi gure 7. Mul t i pl i cat i on Uni ts
Modifiers and Mappings The idea of f unct i on tables (Fig. 7) can be ext ended t hr ough t he use of vari- able mappings. In our def i ni t i on of t he comput at i onal uni t , we i ncl uded a bi nar y modi f i er , m, as an opt i on on every connect i on. As t he def i ni t i on specifies, i f t he modi f i er associ at ed wi t h a connect i on is zer o, t he val ue v sent al ong t hat connect i on is i gnored. Thus t he modi f i er denot es i nhi bi t i on, or bl ocki ng. Ther e is consi derabl e evi dence in nat ur e f or synapses on synap- ses (Kandel , 1976) and t he modi f i er s add great l y t o t he comput at i onal simplicity of our net works. Let us st art with an initial i nf or mal exampl e of t he use of modi f i er s and mappi ngs. Suppose t hat one has a model of grass as green except in Cal i f or ni a where it is br own (gol den), as shown in Fi gure 8. CONNECTIONIST MODELS A ND THEIR PROPERTIES 2 2 3 Fi gur e 8. Gr ass is Gr e e n c o n n e c t i o n mo d i f i e d by Ca l i f o r n i a . Her e we can see t hat grass and green are pot ent i al member s of a coal i t i on (can r ei nf or ce one anot her ) except when t he link is bl ocked. Thi s use is simi- lar t o t he cancel l at i on link of ( Fahl man, 1979) and gives a cr ude idea of how cont ext can ef f ect per cept i on in our model s. Not e t hat in Fi gure 8 we are using a s hor t hand not at i on. A modi f i er t ouchi ng a doubl e- ended ar r ow act ual l y bl ocks t wo connect i ons. (Somet i mes we also omi t t he ar r owheads when connect i on is doubl e- ended. ) Mappi ngs can also be used t o select among a number of possible values. Consi der t he exampl e of t he r el at i on bet ween dept h, physical size, and ret i nal size of a circle. (For now, assume t hat t he circle is cent er ed on and or t hogonal t o t he line of sight, t hat t he focus is fi xed, et c. ) Then t her e is a fixed rel at i on bet ween t he size of ret i nal i mage and t he size of t he physi cal circle f or any given dept h. That is, each dept h specifies a mapping f r om ret i nal t o physical size (see Fig. 9). Her e we suppose t he scales f or dept h and t he t wo sizes are chosen so t hat uni t dept h means t he same numer i cal size. I f we knew t he dept h of t he obj ect (by t ouch, cont ext , or magi c) we woul d know its physical size. The net wor k above allows ret i nal size 2 t o r ei nf or ce physical size 2 when dept h = 1 but i nhi bi t s this connect i on f or all ot her dept hs. Similarly, at dept h 3, we shoul d i nt er pr et ret i nal size 2 as physi cal size 8, and i nhi bi t ot her i nt er pr et at i ons. Several r emar ks are in or der . First, not i ce t hat this net wor k i mpl ement s a f unct i on phys = f(ret , dep) t hat maps f r om ret i nal size and dept h t o physi cal size, pr ovi di ng an exampl e of how t o repl ace funct i ons with par amet er s by mappi ngs. For t he simple case of l ooki ng at one obj ect per pendi cul ar t o t he line of sight, t here will be one consi st ent coal i t i on of uni t s whi ch will be stable. The wor k does somet hi ng mor e, and this is cruci al t o our ent erpri se; t he net wor k can r epr esent t he consi st ency rel at i on R among t he t hr ee quant i t i es: dept h, ret i nal size, and 224 FELDMAN AND BALLARD physical size. It embodi es not onl y t he f unct i on f, but its t wo inverse func- t i ons as well (dep =f~(ret , phys), and ret =f 2( phys, dep) ) . ( The net wor k as shown does not i ncl ude t he links f or f, and f~, but t hese are similar t o t hose f or f. ) Most of Sect i on 5 is devot ed t o laying out net wor ks t hat embody t heori es of par t i cul ar visual consi st ency rel at i ons. The idea of modi fi ers is, in a sense, compl ement ar y t o t hat of con- j unct i ve connect i ons. For exampl e, t he net wor k of Fi gure 9 coul d be t rans- f or med i nt o t he fol l owi ng net wor k (Fig. 10). In this net wor k t he vari abl es f or physical size, dept h, and ret i nal size are all given equal weight. For ex- ampl e, physical size =4 and dept h = 1 make a conjunctive connection with ret i nal size =4. Each of t he val ue units in a compet i ng r ow coul d be con- nect ed t o all of its compet i t or s by i nhi bi t or y links and this woul d t end t o make t he net wor k act i vat e onl y one val ue in each cat egor y. The general issue of ri val ry and coal i t i ons will be discussed in t he next t wo sub-sect i ons. When shoul d a rel at i on be i mpl ement ed with modi f i er s and when shoul d it be i mpl ement ed with conj unct i ve connect i ons? A simple, non- ri gorous answer t o this quest i on can be obt ai ned by exami ni ng t he size of t wo sets of units: (1) t he number of uni t s t hat woul d have t o be i nhi bi t ed by modi fi ers; and (2) t he number of units t hat woul d have t o be r ei nf or ced with conj unct i ve connect i ons. I f (1) is l arger t han (2), t hen one shoul d choose modi fi ers; ot herwi se choose conj unct i ve connect i ons. Somet i mes t he choi ce is obvi ous: t o i mpl ement t he br own Cal i f or ni an grass exampl e of Fi gure 8 with conj unct i ve connect i ons, one woul d have t o r ei nf or ce all units represent i ng places t hat had green grass! Cl earl y in this case it is easier t o handl e t he except i on with modi fi ers. On t he ot her hand, t he dept h rel at i on R( phy, dep, r et ) is mor e cheapl y i mpl ement ed with conj unct i ve connect i ons. Since our modi f i er s are strictly bi nar y, conj unct i ve connect i ons have t he addi t i onal advant age of cont i nuous modul at i on. To see how t he conj unct i ve connect i on st rat egy works in general , sup- pose a const r ai nt rel at i on t o be satisfied involves a vari abl e x, e. g. , f(x, y, z, w) = 0. For a par t i cul ar val ue of x, t here will be triples of values of y, z, and w that sat i sfy the rel at i on f. Each of these triples shoul d make a conj unct i ve connect i on with t he unit represent i ng t he x-value. Ther e coul d also be 3-in- put conj unct i ons at each val ue of y, z, w. Each of these f our di f f er ent kinds of conj unct i ve connect i ons cor r esponds t o an i nt er pr et at i on of t he relation f ( x, y, z, w) =0 as a function, i. e. , x = f , ( y, z, w) , y = f2(x,z,w), z = f3(x,y,w), or w = f, (x, y, z). Of course, t hese f unct i ons need not be single-valued. Thi s net- work connect i on pat t er n coul d be ext ended t o mor e t han f our vari abl es, but high number s of variables woul d t end t o increase its sensitivity t o noi sy in- puts. Hi nt on has suggested a special not at i on f or the si t uat i on where a net- wor k exact l y capt ures a consi st ency rel at i on. The mut ual l y consi st ent values are all shown t o be cent ral l y linked (Fig. 11). Thi s not at i on provi des an ele- a u_ e ~ e- a . . " 13 0 . c : 0 z e - Q . a~ 14. m 2 2 5 0 .u ~ ~ . ~ , . ~ i- ~'~ r ~ .o r - t - o u o > c 0 U . ~ , 0 0 Z n 226 CONNECTIONIST MODELS AND THEIR PROPERTIES Figure 1 I . Not at i on f or consi stency rel at i ons. 227 gant way of present i ng t he i nt er act i ons among net wor ks, but must be used with care. Wri t i ng down a t ri angl e di agr am does not i nsure t hat t he under - lying mappi ngs can be made consi st ent or comput at i onal l y wel l -behaved. Winner-Take-All Net works and Regul ated Net works A very general pr obl em t hat arises in any di st r i but ed comput i ng si t uat i on is how t o get t he ent i re syst em t o make a decision (or per f or m a coher ent ac- t i on, etc. ). Bi ol ogi cal l y necessary exampl es of this behavi or abound; rangi ng f r om t ur ni ng left or ri ght , t hr ough fi ght -or-fl i ght responses, t o i nt er pr et a- t i ons of ambi guous wor ds and i mages. Deci si on- maki ng is a par t i cul ar l y i m- por t ant issue f or t he cur r ent model because of its rest ri ct i ons on i nf or mat i on fl ow and because of t he al most li'near nat ur e of t he p-uni t s used in many o f our specific exampl es. Deci si on- maki ng i nt r oduces t he not i ons of stable states and convergence of net wor ks. One way t o deal with t he issue of coher ent decisions in a connect i oni st f r amewor k is t o i nt r oduce winner-take-all (WTA) net wor ks, whi ch have t he pr oper t y t hat onl y t he uni t with t he highest pot ent i al ( among a set of con- t enders) will have out put above zero af t er some setting t i me (Fig. 12). Ther e are a number of ways t o const r uct WTA net wor ks f r om t he uni t s descri bed above. For our pur poses it is enough t o consi der one exampl e o f a WTA net - wor k whi ch will oper at e in one t i me step f or a set of cont ender s each of whom can r ead t he pot ent i al of all of t he ot her s. Each uni t in t he net wor k comput es its new pot ent i al accor di ng t o t he rule: p- - i f p >max(ij, .1) then p else O. 2 2 8 FELDMAN A N D BALLARD That is, each uni t sets i t sel f t o zero i f it knows o f a hi gher i nput . Thi s is fast and si mpl e, but pr oba bl y a little t oo compl ex t o be pl ausi bl e as t he behavi or of a single neur on. Ther e is a s t andar d t ri ck ( appar ent l y wi del y used by nat ur e) t o conver t this i nt o a mor e pl ausi bl e scheme. Repl ace each uni t above wi t h t wo units; one comput es t he ma x i mu m of t he c ompe t i t or ' s in- put s and i nhi bi t s t he ot her . The ci rcui t above can be st r engt hened by addi ng a reverse i nhi bi t or y link, or one coul d use a modi f i er on t he out put , etc. Ob- vi ousl y one coul d have a WTA l ayer t hat got i nput s f r om s ome set of com- pet i t or s and set t l ed t o a wi nner when t ri ggered t o do so by s ome downs t r e a m net wor k. Thi s is an exact anal ogy of st r obi ng an out put buf f e r in a conven- t i onal comput er . One pr obl e m wi t h pr evi ous neur al model i ng at t empt s is t hat t he ci r- cuits pr opos ed were of t en unnat ur al l y del i cat e (unst abl e). Smal l changes in pa r a me t e r val ues woul d cause t he net wor ks t o osci l l at e or conver ge t o i ncor- rect answers. We will have t o be car ef ul not t o fall i nt o t hi s t r ap, but woul d like t o avoi d det ai l ed anal ysi s of each par t i cul ar model f or del i cacy in t hi s paper . What appear s t o be requi red are s ome bui l di ng bl ocks and combi na- t i on rules t hat pr eser ve t he desi red pr oper t i es. For exampl e, t he WTA sub- net wor ks of t he last exampl e will not osci l l at e in t he absence of osci l l at i ng i nput s. Thi s is al so t r ue of any s ymmet r i c mut ual l y i nhi bi t or y s ubnet wor k. Thi s is i nt ui t i vel y cl ear and coul d be pr oven r i gor ousl y under a var i et y o f as s umpt i ons (cf. Gr ossber g, 1980). I f ever y uni t receives i nhi bi t i on pr opor - t i onal t o t he act i vi t y ( pot ent i al ) of each of its rivals, t he i ns t ant aneous l eader will recei ve less i nhi bi t i on and t hus not lose its l ead unless t he i nput s change si gni fi cant l y. Anot her useful pr i nci pl e is t he e mpl oyme nt o f l ower - bound and upper - bound cells t o keep t he t ot al act i vi t y of a net wor k wi t hi n bounds (Fig. 13). Suppose t hat we add t wo ext r a uni t s, LB and UB, t o a net wor k whi ch has coor di nat ed out put . The LB cell c ompa r e s t he t ot al (sum) act i vi t y o f t he uni t s of t he net wor k wi t h a l ower bound and sends posi t i ve act i vat i on uni - f or ml y t o all member s i f t he sum is t oo l ow. The UB cell i nhi bi t s all uni t s equal l y i f t he sum o f act i vi t y is t oo hi gh. Not i ce t hat LB and UB can be par amet er s set f r om out si de t he net wor k. Under a wi de r ange of condi t i ons (but not all), t he LB- UB augment ed net wor k can be desi gned t o pr eser ve or der r el at i onshi ps a mong t he out put s vj of t he ori gi nal net wor k whi l e keep- ing t he sum bet ween LB and UB. We will of t en as s ume t hat LB- UB pai rs ar e used t o keep t he s um o f out put s f r om a net wor k wi t hi n a gi ven range. Thi s s ame mechani s m al so goes f ar t owar ds el i mi nat i ng t he t wi n peri l s o f uni f or m s at ur at i on and uni f or m silence whi ch can easi l y ari se in mut ual i nhi bi t i on net wor ks . Thus we will of t en be abl e t o r eason about t he c omput a t i on of a net wor k as s um- ing t hat it st ays act i ve and bounded. a) -i a t - C~ ,.C 0 c ~ L~ C~ ._o r-- LE O 0~ C O ~ Z aJ c ~ -j 0) LL 2 ~ 9 230 FELDMAN AND BALLARD Stable Coalitions For a massi vel y paral l el syst em t o act ual l y ma ke a deci si on (or do some- t hi ng), t here will have t o be st at es in whi ch s ome act i vi t y st r ongl y domi nat es . Such st abl e, connect ed, hi gh conf i dence uni t s are t er med st abl e coalitions. A st abl e coal i t i on is our ar chi t ect ur al l y- bi ased t er m f or t he psychol ogi cal not i ons of per cept , act i on, etc. We have shown s ome si mpl e i nst ances o f st abl e coal i t i ons, in Fi gure 5b and t he WTA net wor k. In t he dept h net wor ks of Fi gures 9 and 10, a st abl e coal i t i on woul d be t hr ee uni t s r epr esent i ng con- sistent val ues of ret i nal size, dept h, and physi cal size. But t he general i dea is t hat a very l arge compl ex subsyst em mus t st abi l i ze, e. g. , t o a fi xed i nt er pr e- t at i on o f vi sual i nput , as in Fi gure I. The way we bel i eve t hi s t o ha ppe n is t hr ough mut ual l y r ei nf or ci ng coal i t i ons whi ch domi na t e all ri val act i vi t y when the deci si on is requi red. The si mpl est case of t hi s is Fi gure 5b, where t he t wo units A and B f or m a coal i t i on whi ch suppr esses C and D. For mal l y, a coal i t i on will be cal l ed st abl e when t he out put o f all its me mbe r s is non- decreasing. Not i ce t hat a coal i t i on is not a par t i cul ar anat omi cal st r uct ur e, but an i nst ant aneousl y mut ual l y r ei nf or ci ng set of uni t s, in t he spi ri t o f He b b ' s cell assembl i es ( Jusczyk & Kl ei n, 1980). What can we say about t he condi t i ons under whi ch coal i t i ons will become and r emai n st abl e? We will begi n i nf or mal l y wi t h an al most t ri vi al condi t i on. Consi der a set o f uni t s {a, b . . . . } whi ch we wish t o exami ne as a possi bl e coal i t i on, ~r. For now, we assume t hat t he units in r ar e all p- uni t s and are in t he non- s at ur at ed r ange and have no decay. Thus f or each u in r , p( u) - - p( u) + Exc - I nh, where Exc is t he wei ght ed sum o f exci t at or y i nput s and I nh is t he wei ght ed sum o f i nhi bi t or y i nput s. Now suppose t hat ExclTr, t he exci t at i on f r om t he coal i t i on 7r onl y, were gr eat er t han I NH, t he l argest possi bl e i nhi bi t i on recei vabl e by u, f or each uni t u in ~r, i . e. , (SC) V u e r ; Ex c l r > I NH Then it fol l ows t hat V u e 7r ; p ( u ) - p ( u ) +~ where 6 >0 . CONNECTIONIST MODELS AND THEIR PROPERTIES 231 That is, the potential of every unit in the coalition will increase. This is not only true instantaneously, but remains true as long as nothing external changes (we are ignoring state change, saturation, and decay). This is because Excl~r continues to increase as the potential of the members of r in- creases. Taking saturation into account adds no new problems; if all of the units in ~- are saturated, the change, 6, will be zero, but the coalition will re- main stable. The condition that the excitation from other coalition members alone, Excl~r, be greater than any possible inhibition INH for each unit may ap- pear to be too strong to be useful. It is certainly true that coalitions can be stable without condition (SC) being met. The condition (SC) is useful for model building because it may be relatively easy to establish. Notice that INH is directly computable from the description of the unit; it is the largest negative weighted sum possible. If inhibition in our networks is mutual, the upper-bound possible after a fixed time r, INHr, will depend on the current value of potential in each unit u. The simplest case of this is when two units are "deadly ri val s"--each gets all its inhibition from the other. In such cases, it may well be feasible to show that after some time r, the stable coali- tion condition will hold (in the absence of decay, fatigue, and changes exter- nal to the network). Often, it will be enough to show that the coalition has a stable "front i er, " the set of units with outputs to some system under in- vestigation. There are a number of interesting properties of the stable coalition principle. First notice that it does not prohibit multiple stable coalitions nor single coalitions which contain units which mutually inhibit one another (although excessive mutual inhibition is precluded). If the units in the coali- tion had non-zero decay, the coalition excitation Excl~r would have to ex- ceed both INH and decay for the coalition to be stable. We suppose that a stable coalition yields control when its input elements change (fatigue and explicit resets are also feasible). To model coalitions with changeable inputs, we add boundary elements, which also had external "I nput " and thus whose condition for being part of a stable coalition, 7r, would be: ExcJ r + Input > INH. This kind of unit could disrupt the coalition if its Input went too low. The mathematical analysis of CM networks and stable coalitions continues to be a problem of interest. We have achieved some understanding of special cases (Feldman & Ballard, 1982) and these results have been useful in designing CM too complex to analyze in closed form. 232 FELDMAN AND BALLARD 4. CONSERVI NG CONNECTI ONS It is cur r ent l y est i mat ed t hat t here are about 10 ' t neur ons and 10 t5 connec- t i ons in t he human brai n and t hat each neur on receives i nput f r om about 10 3 - 10' ot her neur ons. These number s are qui t e large, but not so l arge as t o present no pr obl ems f or connect i oni st t heori es. It is also i mpor t ant t o r emember t hat neur ons are not switching devices; t he same signal is pr opa- gat ed al ong all of t he out goi ng br anches. For exampl e, suppose some model called f or a separat e, dedi cat ed pat h bet ween all possi bl e pairs of uni t s in t wo layers in size N. It is easy t o show t hat this requi res N 2 i nt er medi at e sites. Thi s means, f or exampl e, t hat t her e are not enough neur ons in t he brai n t o pr ovi de such a cross-bar switch f or subst r uct ur es of a mi l l i on ele- ment s each. Si mi l arl y, t here are not enough neur ons t o pr ovi de one t o represent each compl ex obj ect at ever y posi t i on, or i ent at i on, and scale o f visual space. Al t hough t he devel opment of connect i oni st model s is in its peri nat al per i od, we have been abl e t o accumul at e a number of ideas on how some of t he r equi r ed comput at i ons can be car r i ed out wi t hout excessive r esour ce requi rement s. Fi ve of t he most i mpor t ant of t hese are descri bed below: (I) f unct i onal decomposi t i on; (2) l i mi t ed preci si on comput at i on; (3) coarse and coar se- f i ne codi ng; (4) t uni ng; and (5) spat i al coher ence. Functional Decomposition When t he number of vari abl es in t he f unct i on becomes large, t he fan-i n or number of i nput connect i ons coul d become unreal i st i cal l y large. For exam- ple, with t he f unct i on t = f ( u, v, w, x, y, z) i mpl ement ed with I00 val ues of t, when each of its ar gument s can have 100 distinct values, woul d requi re an average number of i nput s per unit of 10'5/102, or 10 '. However , t her e are simple ways of t radi ng uni t s f or connect i ons. One is t o repl i cat e t he number of units with each value. Thi s is a good sol ut i on when t he i nput s can be par - t i t i oned in some nat ur al way as in t he vision exampl es in t he next sect i on. A mor e power f ul t echni que is t o use i nt er medi at e units when t he comput at i on can be decompos ed in some way. For exampl e, i f f ( u, v, w, x, y, z ) =g( u, v) o h( w, x, y, z) , where o is some composi t i on, t hen separ at e net wor ks of val ue units f or f(g, h), g(u, v), and h( w, x, y, z) can be used. The out put s f r om t he g and h units can be combi ned in conj unct i ve connect i ons accor di ng t o t he composi t i on oper at or o in a t hi r d net wor k represent i ng f. An exampl e is t he case of wor d r ecogni t i on. Let t er - f eat ur e uni t s woul d have t o connect t o vastly mor e wor d units wi t hout t he i mposi t i on of t he i nt er medi at e level of letter units. The letter units limit t he ways l et t er - f eat ur e uni t s can appear in a wor d. CONNECTIONIST MODELS AND THEIR PROPERTIES 233 Limited Precision Computation In the multiplication example z = xy, the number of z units required is pro- portional to NxN, even when redundant value units are eliminated, and in general the number of units could grow exponentially with the number of arguments. However, there are several refinements which can drastically reduce the number of required units. One way to do this is to fix the number of units at the precision required for the computation. Figure 14 shows the network of Figure 7 modified when less computational accuracy is required. Figure 14. Modi f i ed Mul t i pl i cat i on Table using Less Units. This is the same principle that is incorporated in integer calculations in a sequential computer: computations are rounded to within the machine' s accuracy. Accuracy is related to the number of bits and the number repre- sentation. The main difference is that since the sequential computer is general purpose, the number representations are conservative, involving large number of bits. The neural units need only represent sufficient ac- curacy for the problem at hand. This will generally vary from network to network, and may involve very inhomogeneous, special purpose number representations. Coarse and Coarse-Fine Coding Coarse coding is a general technical device for reducing the number of units needed to represent a range of values with some fixed precision, due to Hin- ton (1980). As Figure 15a suggests, one can represent a more precise value 234 FELDMAN AND BALLARD as t he si mul t aneous act i vat i on of several (here 3) over l appi ng coar se- val ued units. In general , D si mul t aneous act i vat i ons of coarse cells of di amet er D precise units suffi ce. For a par amet er space of di mensi on k, a r ange of F values can be capt ur ed by onl y P / D e-' units r at her t han F k in t he nai ve met hod. The coarse codi ng t ri ck and t he rel at ed coar se- f i ne t ri ck t o be descri bed next bot h depend on t he i nput at any given t i me bei ng sparse relative t o t he set of all values expressible by t he net wor k. The coarse-fi ne codi ng t echni que is useful when t he space of values t o be represent ed has a nat ur al st r uct ur e which can be expl oi t ed. Suppose a set of units represent s a vect or par amet er v whi ch can be t hought of as part i - t i oned i nt o t wo component s (r,s). Suppose f ur t her t hat t he number of uni t s r equi r ed t o represent t he subspace r is N, and t hat r equi r ed t o represent s is N,. Then t he number of units r equi r ed t o represent v is NrN,. It is easy t o const r uct exampl es in vision where t he pr oduct NrN, is t oo close t o t he upper bound of 10 ~' units t o be realistic. Consi der t he case of t r i hedr al (v) vertices, an i mpor t ant visual cue. Thr ee angles and t wo posi t i on coor di nat es are necessary t o uni quel y def i ne ever y possible t r i hedr al vert ex. (Two angles defi ne t he t ypes of vert ex (arrow, y-j oi nt ); the t hi r d specifies t he r ot at i on of t he j oi nt in space. ) I f we use 5 degree angl e sensitivity and l 0 s spat i al sampl e poi nt s, t he number of uni t s is gi ven by Nr = 3 . 6 x 105 and N, = 10 s so t hat N, N, = 3.6 x 10 '. How can we achi eve t he r equi r ed r epr esent at i on accur acy with less units? In many instances, one can t ake advant age of t he fact t hat t he act ual occurrence o f par amet er s is sparse. In t er ms of t r i hedr al vertices, one assumes t hat in an image, such vertices will r ar el y occur in t i ght spat i al clusters. ( I f t hey do, t hey cannot be resol ved as i ndi vi dual s si mul t aneousl y. ) Gi ven t hat si mul t aneous pr oxi mal values of par amet er s are unl i kel y, t hey can be r epr esent ed accur at el y f or ot her comput at i ons , wi t hout excessive cost . The sol ut i on is t o decompose t he space v i nt o t wo subspaces, r and s, each with uni l at eral l y r educed r esol ut i on. Inst ead of N, N, units, we represent v with t wo spaces, one with Nr , N, units where N, , < < N, and anot her wi t h Nr N, , uni t s where Ns, < < N, . To i l l ust rat e this t echni que with t he exampl e of t r i hedr al vertices we choose N, , =0. 01N, and N, , =0 . 0 I N, . Thus t he di mensi ons of t he t wo sets of units are: N, , N, = 3 . 6 x 10 s and N, N, , = 3. 6 x l 0 s. CONNECTI ONI ST MODELS AND THEIR PROPERTIES 235 The choices result in one set of units which accurately represent the angle measurements and fire for a specific trihedral vertex anywhere in a fairly broad visual region, and another set of units which fire only i f a general trihedral vertex is present at the precise position. The coarse-fine technique can be viewed as replacing the square coarse-valued covering in Figure 15a with rectangular (multi-dimensional) coverings, like those shown in Figure 16. In terms of our value units, the coarse-fine representation of trihedral vertices is shown in Figure 15b. o. b. F i g u r e 1 5 a . C o a r s e c o d i n g e x a m p l e . I n a t w o - d i m e n s i o n a l m e a s u r e m e n t s p a c e , t h e p r e s e n c e of a m e a s u r e m e n t c a n b e e n c o d e d b y m a k i n g a s i n g l e u n i t i n t h e f i n e r e s o l u t i o n s p a c e h a v e a h i g h c o n f i d e n c e v a l u e . T h e s a m e m e a s u r e m e n t c a n b e e n c o d e d by m a k i n g o v e r l a p p i n g c o a r s e u n i t s i n t h r e e d i s t i n c t c o a r s e a r r a y s h a v e h i g h c o n f i d e n c e v a l u e s . w h e r e A, , A2, A3 o r e r a n g e s of ongulor~ R Y w i t h ~ = 95 <~=81 ( x~=45 x = 2 7 y ----31 F i g u r e 1510. C o a r s e a n g l e - - f i n e p o s i t i o n a n d c o a r s e p o s i t i o n - - f i n e a n g l e u n i t s c ombi ne t o y i e l d p r e c i s e v a l u e s o f a l l f i v e p a r a m e t e r s . 236 FELDMAN AND BALLARD I f t he t r i hedr al angl e ent ers i nt o anot her rel at i on, say R(v,t~), where bot h its angl e and posi t i on are r equi r ed accur at el y, one conj unct i vel y con- nects pai rs of appr opr i at e units f r om each of t he r educed r esol ut i on spaces t o appr opr i at e R-units. The conj unct i ve connect i on represent s t he i nt ersec- t i on of each of its component s ' f i e l ds . Essent i al l y t he same mechani sm will suffi ce f or conj oi ni ng (e. g. ) accur at e col or wi t h coar se vel oci t y i nf or mat i on. An i mpor t ant l i mi t at i on of t hese t echni ques, however , is t hat t he in- put must be sparse. I f i nput s are t oo closely spaced, " g h o s t " firings will occur. In Fi gure 16, t wo sets of over l appi ng fields are shown, each with uni- l at eral l y r educed r esol ut i on. Act ual i nput at poi nt s A and B will pr oduce an er r oneous i ndi cat i on of an i nput at C, in addi t i on t o t he cor r ect signals. The sparseness r equi r ement has been shown t o be satisfied in a number of ex- peri ment s with visual dat a (Bal l ard & Ki mbal l , 1981a, 1981b; Bal l ard & Sabbah, 1981). The r esol ut i on device involves a uni t s / connect i ons t r adeof f , but in general , t he t r a de of f is at t ract i ve. To see this, consi der a uni t t hat receives i nput f r om a net wor k represent i ng a vect or par amet er v. I f n is t he number of places where t he out put is used, and conj unct i ve connect i ons are used t o conj oi n t he D firing units, t hen Dn synapses are r equi r ed. Thus i f A is t he number of non- coar se coded units t o achi eve a given acui t y, t hen coar se codi ng is at t ract i ve when A/ D k-' > Dn, assumi ng connect i ons and uni t s are equal l y scarce. Thi s result is opt i mi st i c in t hat , when ot her uses of conj unc- tive connect i ons are t aken i nt o account , t he number of conj unct i ve uni t s coul d be unreal i st i cal l y large. f D~ desi red resol ut i on J'-'-I f i e l d s o f d i f f e r e n t ~ units Figure 16. Inputs at A 8, B cause ghosts at C 8, D. CONNECTIONIST MODELS AND THEIR PROPERTIES 237 Tuning The i dea of t uni ng f ur t her expl oi t s net wor ks composed of coarsel y- and fi nel y-grai ned units. Suppose t her e are n fi ne r esol ut i on units of a f eat ur e A and n fine resol ut i ons f or a f eat ur e B. To have explicit units f or f eat ur e values AB, n 2 uni t s woul d be r equi r ed. Thi s is an unt enabl e sol ut i on f or large f eat ur e spaces (the number of units grows exponent i al l y with t he number of feat ures), so al t ernat i ves must be sought . One sol ut i on t o this pr obl em is t o var y t he grai n of t he AB units so t hat t hey are onl y coarsel y represent ed. Thi s sol ut i on has its at t endant di sadvant ages in t hat separ at e stimuli within t he limits of t he coarse r esol ut i on grai n cannot be distin- gui shed. Al so, a set of weak stimuli can be mi si nt er pr et ed. A bet t er sol ut i on is t o have a coarse unit t hat woul d r espond onl y t o a single sat ur at ed unit wi t hi n its i nput range. In t hat way a col l ect i on of weak i nput s is not misin- t er pr et ed. Thi s si t uat i on can be achi eved by havi ng t he units in each f i nel y- t uned net wor k t hat are in t he field of a coarse uni t l at eral l y i nhi bi t each ot her , e. g. , in t he WTA net wor k of Fi gure 5a. The out put s of t hese i ndi vi dual f eat ur e units t hen f or m di sj unct i ve connect i ons with appr opr i at e coarse r esol ut i on mul t i pl e f eat ur e units. I f m is t he grai n of t he coarse r esol ut i on units al ong with each f eat ur e di mensi on, t he number of di sj unct i ons per coarse unit is ( n/ m) 2. The result of this connect i on st rat egy is t hat a coarse unit r esponds with a st rengt h t hat varies as t he st rengt hs of t he largest max- i mum in t he subnet wor k of each of t he fi nel y-t uned units t hat cor r es pond t o its field. The response of a coar se- t uned uni t is t he maxi mum of t he sums of t he conj unct i ve i nput s f r om t he fi nel y t uned units which connect t o it. In t erms of Fi gure 15, a t uned coarse-angl e cell woul d r espond onl y t o one hi gh- conf i dence pai r of angles in its range, and not t o several weak ones (which coul dn' t cor r ect l y appear all at one posi t i on). Thi s is a bet t er pr op- er t y t han j ust havi ng unst r uct ur ed coarse units and it will be expl oi t ed in t he next sect i on, when we deal with perceiving compl ex obj ect s. Spatial Coherence The most serious pr obl em which requi res conservi ng connect i ons is t he r epr esent at i on of compl ex concept s. The obvi ous way of represent i ng con- cepts (sets of propert i es) is t o dedi cat e a separ at e unit t o each conj unct i on of feat ures. In fact , it first appear s t hat one woul d need a separ at e uni t f or each combi nat i on at each l ocat i on in t he visual field. We will present her e a sim- ple way ar ound t he pr obl em of separ at e units f or each l ocat i on and deal with t he mor e general pr obl em in t he next sect i on. 238 FELDMAN AND BALLARD The basic pr obl em can be readi l y seen in t he exampl e of Fi gur e 17. Suppose t her e were one unit each f or fi nal l y recogni zi ng concept s like col- or ed circles and squares. Now consi der t he case when a red circle (at x = 7) and a bl ue squar e (at x = 11) si mul t aneousl y appear in t he visual field. I f t he vari ous " c ol or e d f i gur e" units si mpl y s ummed t hei r i nput s, t he i ncor r ect " bl ue ci r cl e" unit woul d see t wo active i nput s, j ust like t he cor r ect " r e d cir- cl e" and " bl ue s qua r e " units. Thi s pr obl em is known as cross-t al k, and is always a pot ent i al hazar d in CM net wor ks. The sol ut i on pr esent ed in Fi gure 17 is qui t e general . Each uni t is assumed t o have a separ at e conj unct i ve con- nect i on site f or each posi t i on of t he visual field. In our exampl e, t he cor r ect units get dual i nput s t o a single site (and are act i vat ed) while t he part i al l y mat ched units receive separ at ed i nput s and are not act i vat ed. Onl y sets of pr oper t i es which are spat i al l y coher ent can serve t o act i vat e concept units. Thi s exampl e was meant t o show how spatial coher ence coul d be used with conj unct i ve connect i ons t o el i mi nat e cross-t al k. Ther e are a number of ad- di t i onal ways of using spat i al coher ence, each of whi ch involves di f f er ent t r adeof f s. These are discussed in t he next sect i on, whi ch consi ders some sampl e appl i cat i ons in mor e detail. at at 7 Red Ci r cl e 11 i ~ ~ . 1 Bl ue 7 -,-, _ Squar e A 11 \ \ \ at \ "S % \ \ I ~ . . . . ~ i I at at at Bl ue Ci r cl e Fi gur e 17. Spat i al coher ence on i nput s can represent complex concepts without cr oss- t al k. Sol i d l i nes s how act i ve i nput s and dashed l i nes (some of t he) i nact i ve i nput s. CONNECTIONIST MODELS AND THEIR PROPERTIES 239 5. APPLICATIONS This section illustrates the power of the CM paradigm via two groups of ex- amples. The first shows how the various techniques for conserving connec- tions can be used in an idealized form of perception of a complex object. Here the point is that an object has multiple features which are computed in parallel via the transform methodology. The second group of examples starts with a relatively simple problem, that of vergence eye movements, to illustrate motor control using value units. In this example, control is imme- diate; a visual signal produces an instantaneous output (within the settling time constants of the units). Extensions of this idea use space as a buffer for time. For motor output, space allows the incorporation of more complex motor commands. For speech input, spatial buffering allows for phoneme recognition based on subsequent information. These examples were chosen to show that CM can provide a unified representation for both perception and motor control. This is important since an animal is hardly ever passively responding to its environment. In- stead, it seems involved in what Arbib has called a perception-action cycle (Arbib, 1979). Perceptions result in actions which in turn cause new percep- tions, and so on. Massive parallelism changes the way the perception-action cycle is viewed. In the traditional view, one would convert the input to a lan- guage which uses variables, and then use these variables to direct motor commands. CM suggests that we think of accomplishing the same actions via a transformation: sensory input is transformed (connected to) to abstract representational units, which in turn are transformed (connected to) to motor units. This will obviously work for reflex actions. The examples are intended to sugest how more flexible command and control structures can also be represented by systems of value units. Object Recognition The examples of Figures 1 and 6 are representative of the problem of gestalt perception: that of seeing parts of an image as a single percept (object). An "obj ect " is indicated by the "simultaneous" appearance of a number of "visual features" in the correct relative spatial positions. In any realistic case, this will involve a variety of features at several different levels of abstraction and complex interaction among them. A comprehensive model of this process would be a prototype theory of visual perception and is well beyond the scope of this paper. What we will do here is consider the pre- requisite task of constructing CM solutions to the problems of detecting non-punctate visual features and of forming sets of the features which could help characterize a percept. We will refer throughout to the prototype prob- lem of detecting Fred' s frisbee, which is known to be round, baby-blue, and 240 FELDMAN AND BALLARD moving fairly fast. The development suppresses many important issues such as hierarchical descriptions, perspective, occlusion, and the integration of separate fixations, not to mention learning. A brief discussion of how these might be tackled follows the technical material. The first problem is to develop a general CM technique for detecting features and properties of images, given that these features are not usually detectable at a single point in some retinotopic map. The basic idea is to find parameters which characterize the feature in question and connect each retinotopic detector to the parameter values consistent with its detectand. Consider the problem of detecting lines in an image from short edge segments. Different lines can be represented by units having different discrete parameter values, e.g. in the line equation p= xcos0+ ysin0, the parameters are p and 0. Thus edge units at (x,y,u) could be connected to ap- propriate line units. Note that this example is analogous to the word recog- nition example (Fig. 1). Edges are analogous to letters and lines to words. As in the words-letter example, "t op- down" connections allow the existence of a line to raise the confidence of a local edge. In our line detection example, lines in the image are high potential (confidence) units in a slope-intercept (O,p) parameter space. High confidence edge units produce high confidence line units by virtue of the network connectivity. This general way of describ- ing this relationship between parts of an image (e.g., edges) and the associated parameters (e.g., p,O for a line) is a connectionist interpretation of the Hough transform (Duda & Hart, 1972). Since each parameter value is determined by a large number of inputs, the method is inherently noise- resistant and was invented for this purpose. A Hough transform network for circles (like Fred' s frisbee) would involve one parameter for size plus two for spatial location, and exactly this method has been used for tumor detection in chest radiographs (Kimme et al., 1975). Notice that the circle parameter space is itself retinotopic in that the centers of circles have specified locations; this will be important in registering multiple features. The Hough transform is a formalism for specifying excitatory links between units. The general requirements are that part of an image represen- tation can be represented by a parameter vector a in an image space A and a feature can be represented by a vector b which is an element of a feature space B. Physical constraints f ( a, b) =0 relate a and b. The space A represents spatially indexed units, and each individual element ah is only consistent with certain elements in the space B, owing to the constraint im- posed by the relation f. Thus for each ak it is impossible to compute the set B k = { b [ ak a n d f ( a ~ , b ) _ < 6 b } where Bk is the set of units in the feature space network B that the a~ unit must connect to, and the constant 6~ is related to the quantization in the space B. Let H(b) be the number of active connections the value unit b CONNECTIONIST MODELS AND THEIR PROPERTIES 241 receives f r om units in A. H(b) is t he number of i mage measur ement s which are consi st ent with t he par amet er val ue b. The pot ent i al of uni t s in B is given by p( b) - - H( b) / EbH( b) . The val ue p(b) can st and f or t he conf i dence t hat segment with f eat ur e val ue b is present in t he image. I f t he measur e- ment represent ed by a is real i zed as gr oups of units, e. g. , a- - ( al , a2) , t hen conj unct i ve connect i ons are r equi r ed t o i mpl ement t he const r ai nt rel at i on. I mpl ement i ng these net wor ks of t en results in a set of very sparsely dis/t'ibuted hi gh- conf i dence f eat ur e space units. In i mpl ement at i ons of t he line det ect i on exampl e, onl y appr oxi mat el y 1% of t he uni t s have maxi mum conf i dence values. Thi s figure is also t ypi cal of ot her modal i t i es. In general , each ak and the rel at i onshi p f will not det er mi ne a single uni t in Bk as in t he line det ect i on exampl e, but t here still will be i sol at ed hi gh- conf i dence units. Figure 1 shows why this is t he case: di f f er ent ah l et t er - f eat ur e units connect to c ommon units in t he letter space B. We have f ound t hat par amet er spaces combi ne with t he growi ng body of knowl edge on specific physical const r ai nt s t o pr ovi de a power f ul and robust model f or t he si mul t aneous comput at i on of i nvari ant obj ect pr op- erties such as refl ect ance, cur vat ur e, and rel at i ve mot i on (Bal l ard, 1981). Of cour se segment at i on must i nvol ve ways of associ at i ng peaks in several di f f er ent f eat ur e spaces and met hods f or doi ng this are discussed present l y, but the cor ner st one of t he t echni ques are hi gh- conf i dence uni t s in t he i ndi vi dual - modal i t y f eat ur e spaces. In ext endi ng t he single f eat ur e case to mul t i pl e feat ures, t he most seri ous pr obl em is t he i mmense size of t he cross pr oduct of t he spatial di mensi ons with t hose of i nt erest i ng feat ures such as col or , vel oci t y, and t ext ure. Thus t o expl ai n how image-like i nput such as col or and opt i cal flow are rel at ed t o abst r act obj ect s such as " a blue, fast -movi ng t hi ng, " it becomes necessary t o use all t he t echni ques of the previ ous sections. Even i f we assume t hat t here is a special uni t f or recogni zi ng images of Fr ed' s frisbee, it cannot be t he case t hat t her e is a separat e one of t hese units for each poi nt in t he visual field. One weak sol ut i on t o this kind of pr obl em was given in Fi gure 17 of t he last sect i on. Ther e coul d concei vabl y be a separat e 3-way conj unct i ve connect i on on t he Fr ed' s fri sbee uni t f or each posi t i on in space. Act i vat i on of one conj unct woul d requi re t he si mul t a- neous act i vat i on of circle, baby- bl ue, and fai rl y-fast in t he same par t of t he visual field. The sol ut i on style with separ at e conj unct i ons f or every poi nt in space becomes i ncreasi ngl y i mpl ausi bl e as we consi der mor e compl ex ob- j ect s with hi erarchi cal and mul t i pl e descri pt i ons. The spat i al l y regi st ered conj unct i ons woul d have to be preserved t hr oughout t he st ruct ure. The pr obl em of goi ng f r om a set of descri pt ors (feat ures) t o t he obj ect which is t he best mat ch t o t he set is known in artificial intelligence as t he in- dexing problem. The f eat ur e set is viewed as an i ndex (as in a dat a base). Ther e have been several pr oposed paral l el hi erarchi cal net wor k sol ut i ons t o the indexing pr obl em ( Fahl man, 1979; Hillis, 1981) and t hese can be mapped 242 FELDMAN AND BALLARD i nt o CM t erms. But t hese designs assume t hat t he net wor k is pr esent ed with sets of descri pt ors whi ch are al ready par t i t i oned; precisely t he vision pr ob- lem we are t ryi ng t o solve. Ther e are t hr ee addi t i onal mechani sms t hat seem to be necessary, t wo of which have al r eady been discussed. Coar se codi ng and t uni ng (as discussed in Sect i on 4) make it much less cost l y t o represent conj unct i ons. In addi t i on, some general concept s (e. g. , bl ue frisbee) mi ght be i ndexed mor e effi ci ent l y t hr ough less precise units. The new idea is an ex- t ensi on of spatial coher ence t hat expl oi t s t he fact t hat t he net wor ks r espond t o activity t hat occurs t oget her in t i me. I f t her e were a way t o f ocus t he ac- tivity of t he net wor k on one area at a t i me, onl y pr oper t i es det ect ed in t hat area woul d compet e t o i ndex obj ect s. The obvi ous way t o focus at t ent i on on one ar ea of t he visual field is with eye movement s, but t her e is evi dence t hat focus can also be done within a fi xat i on. The general idea of i nt ernal spat i al focus is shown in Fi gure 18. In this net wor k, t he general " b a b y - b l u e " unit is conf i gur ed t o have separat e conj unct i ve i nput s f or each poi nt in space, like t he blue- square units of Fi gure 17. The di f f er ence is t hat t he second i nput t o t he con- j unct i on comes f r om a " f o c u s " uni t , and this makes a much mor e general net wor k. The idea of maki ng a uni t (e. g. , baby blue) mor e responsi ve t o in- put s f r om a given spatial posi t i on can be i mpl ement ed in di f f er ent ways. The conj unct i ve connect i on at t he x =7 l obe of t he baby- bl ue uni t is t he most di rect way. But t r eat i ng this conj unct as a strict AND woul d mean t hat all spatial units woul d have t o be act i ve when t here was no focus. An alter- places o t h e r 1 colors a t ---~7 Figure 18. Spotiol focus unit can g a t e only input f r o m attended positions. CONNECTIONIST MODELS AND THEIR PROPERTIES 243 nat i ve woul d be t o have t he " f oc us on 7 " unit boost t he out put of t he " b a b y bl ue at 7 " uni t (and all of its rivals) as shown by t he dashed line; this woul d el i mi nat e t he need f or separ at e spat i al conj unct i ons on t he baby- bl ue uni t , but woul d al t er t he pot ent i al of all t he uni t s at t he posi t i on bei ng at - t ended. The t r ade- of f s become even t ri cki er when goal -di rect ed i nput is t aken i nt o account , but bot h met hods have t he same ef f ect on i ndexi ng. I f t he syst em has its at t ent i on di rect ed onl y t o x = 7, t hen t he onl y f eat ur e units act i vat ed at all will be t hose whose local represent at i ves are domi nant (in t hei r WTA) at x = 7. In such a case, t her e woul d be a t i me when t he onl y concept units act i ve in t he ent i re net wor k woul d be t hose f or x =7. Thi s does not " s o l v e " t he pr obl em of i dent i fyi ng obj ect s in a visual scene, but it does suggest t hat sequent i al l y focusi ng at t ent i on on separ at e places can hel p si gni fi cant l y. Ther e is consi der abl e r eason t o suppose ( Posner , 1978; Tri es- man, 1980) t hat peopl e do this even in t asks wi t hout eye movement . Ther e are ot her ways of l ooki ng at t he net wor k of Fi gure 18. Suppose t he syst em had r eason t o focus on some par t i cul ar pr oper t y (e. g. , baby- blue). I f we make hi -di rect i onal t he links f r om " f oc us on x = 7 " t o " ba by- bl ue" and " baby- bl ue at 7 , " a nice possi bi l i t y arises. The " f oc us on 7 " uni t coul d have a conj unct i ve connect i on f or each separ at e pr oper t y at its posi- t i on. If, f or exampl e, baby- bl ue was chosen f or focus and was t he domi nant col or at x = 7, t hen t he " f oc us on x = 7 " uni t woul d domi nat e its rivals. Thi s suggests anot her way in which t he r ecogni t i on of compl ex obj ect s coul d be hel ped by spat i al focus. Fi gure 19 depi ct s t he fai rl y general si t uat i on. In Fi gure 19, t he units represent i ng baby- bl ue, ci rcul ar, and fai rl y-fast are assumed t o be f or t he ent i re visual field and moder at el y precise. The dot t ed ar r ows t o t he " Fr e d ' s f r i s bee" node suggest t hat t her e mi ght be mor e levels of descr i pt i on in a realistic syst em. The spat i al focus links i nvol vi ng baby- bl ue are t he same as in Fi gure 18, and are repl i cat ed f or t he ot her t wo propert i es. Not i ce t hat t he posi t i on-speci fi c sensing units do not have t hei r pot ent i al s af f ect ed by spatial f ocus units, so t hat t he sensed dat a can r emai n i nt act . The net wor k of Fi gure 19 can be used in several ways. I f at t ent i on has been f ocused on x- - 7 f or any r eason, t he vari ous space- i ndependent units whose represent at i ves are most act i ve at x = 7 will become most act i ve, pr esumabl y l eadi ng t o t he act i vat i on (recogni t i on) of Fr ed' s frisbee. I f a t op- down goal of l ooki ng f or Fr ed' s fri sbee (or even j ust somet hi ng baby- bl ue) is act i ve, t hen t he " f oc us on x - - 7 " will t end t o def eat its WTA rivals, l eadi ng t o t he same result. A t hi r d possibility is a little mor e compl i cat ed, but qui t e power f ul . Suppose t hat a given i mage, even in con- t ext , act i vat es t oo many pr oper t y units so t hat no obj ect s are effect i vel y in- dexed. One st rat egy woul d be t o syst emat i cal l y scan each ar ea of t he visual field, el i mi nat i ng conf oundi ng act i vi t y f r om ot her areas. But it is also possi- ble to be mor e effi ci ent . I f some pr oper t y uni t (say baby- bl ue) were st r ongl y act i vat ed, t he net wor k coul d f ocus at t ent i on on all t he posi t i ons with t hat pr oper t y. In this case it is like put t i ng a baby- bl ue fi l t er in f r ont of t he 244 FELDMAN AND BALLARD scene, and shoul d of t en lead t o bet t er conver gence in t he net wor ks f or shape, speed, etc. One shoul d compar e t he net wor k of Fi gure 17 with Fi gures 18 and 19. In t he f or mer , parallel co-existing concept s are possible i f we assume deli- cat e ar r angement s of conj unct i ve connect i ons. The l at t er net wor ks are mor e r obust but use sequent i al i t y t o el i mi nat e cross-t al k. Ti me and Sequence Connect i oni st model s do not initially appear t o be well-suited t o represent - ing changes with t i me. The net wor k f or comput i ng some f unct i on can be made qui t e fast , but it will be fixed in f unct i onal i t y. Ther e are t wo qui t e dif- ferent aspects of t i me vari abi l i t y of connect i oni st st ruct ures. One is t i me- varyi ng responses, i. e. , l ong-t erm modi f i cat i on of t he net wor ks ( t hr ough changi ng weights) and shor t - t er m changes in t he behavi or of a fi xed net - wor k with t i me. The second aspect is sequence: t he pr obl em of anal yzi ng i nherent l y sequent i al i nput (such as speech) or pr oduci ng i nher ent l y sequen- tial out put (such as mot or commands) with parallel model s. The pr obl em of change will be def er r ed t o (Fel dman, 1981b). The pr obl em of sequence is discussed here. Ther e are a number of bi ol ogi cal l y suggested mechani sms f or chang- ing t he weight (wj) of synapt i c connect i ons, but none of t hem are near l y rapi d enough t o account f or our ability t o hear, read, or speak. The ability t o perceive a t i me-varyi ng signal like speech or t o i nt egrat e t he images f r om successive fi xat i ons must be achi eved (accordi ng t o our dogma) by some dynami c (electrical) activity in t he net wor ks. As usual, we will present com- put at i onal sol ut i ons t o t he pr obl ems of sequence t hat appear t o be consis- t ent with known st ruct ural and per f or mance const rai nt s. These are, agai n, t oo cr ude t o be t aken literally but do suggest t hat connect i oni st model s can describe t he phenomena. Motor Control of the Eye. To see how t he t r ans f or m not i on of dis- t r i but ed units mi ght wor k f or mot or cont r ol , we present a simplistic model of vergence eye movement s. (The same idea may be valid f or fi xat i ons, but cont r ol pr obabl y t akes place at hi gher levels of abst r act i on. ) In this model r et i not opi c (spatial) units are connect ed di rect l y t o muscl e cont r ol units. Each r et i not opi c unit can i f sat ur at ed cause t he appr opr i at e cont r act i on so t hat t he new eye posi t i on is cent ered on t hat uni t . When several r et i not opi c units sat urat e, each enabl es a muscle cont r ol unit i ndependent l y and t he muscle itself cont r act s an average amount . Fi gure 20 shows t he idea f or a one- di mensi onal ret i na. For exampl e, with units at posi t i ons 2, 4, 5, and 6 sat ur at ed, t he net result is t hat t he mus- cle is cent ered at 17/ 4 or 4. 25. (This idea can be ext ended t o t he case where / i / i x, s " FJgure 19. Spatial focus and i ndexi ng. @ Q @
@ reti nal spatial units [C(x) in Fig. 19] current eye coordi nate muscle command units Figure 20. Distributed Control of Eye Fixations 245 246 FELDMAN AND BALLARD t he r et i not opi c units have over l appi ng fields.) Thi s ki nd of or gani zat i on coul d be ext ended t o mor e compl ex movement model s such as t hat of t he or gani zat i on of t he super i or colliculus in t he monkey (Wurt z & Al bano, 1980). Not i ce t hat each r et i not opi c unit is capabl e of enabl i ng di f f er ent mus- cle cont r ol units. The appr opr i at e one is det er mi ned by t he enabl ed x-ori gi n unit which inhibits commands t o t he i nappr opr i at e cont r ol units via modi fi ers. One pr obl em with this simple net wor k arises when di spar at e gr oups of r et i not opi c units are sat ur at ed. The present conf i gur at i on can send t he eye t o an average posi t i on i f t he feat ures are t r ul y identical. The net wor k can be modi f i ed with addi t i onal connect i ons so t hat onl y a single connect ed com- ponent of sat ur at ed units is enabl ed by using addi t i onal obj ect pri mi t i ves. A version of this WTA mot or cont r ol i dea has al r eady been used in a comput er model of t he frog t ect um (Di dday, 1976). Ther e are still many details t o be wor ked out bef or e this coul d be con- sidered a realistic model of vergence cont r ol , but it does i l l ust rat e t he basic idea: local spat i al l y separat e sensors have distinct, active connect i ons which coul d be averaged at t he muscle f or fine mot or cont r ol or be fed t o some in- t ermedi at e net wor k for t he cont r ol of mor e compl ex behavi ors. Converting Space to Time. Consi der t he pr obl em of cont r ol l i ng a simple physical mot i on, such as t hr owi ng a ball. It is not har d t o i magi ne t hat in a skilled mot or per f or mance uni t -groups fire each ot her in a fixed succession, l eadi ng to t he mot or sequence. The comput at i onal pr obl em is t hat t here is a uni que set of ef f ect or units (say at t he spinal level) t hat must receive i nput f r om each gr oup at t he right t i me. Fi gure 21a depi ct s a simple case in which t her e are t wo ef f ect or units (e,, e2) t hat must be act i vat ed al t ernat i vel y. The circles mar ked 1--4 represent units (or gr oups of units) which act i vat e t hei r successor and inhibit t hei r pr edecessor (cf. Del comyn, 1980). The mai n poi nt is t hat a succession of out put s t o a single ef f ect or set can be model ed as a sequence of time-exclusive gr oups represent i ng i nst an- t aneous coor di nat ed signals. Movi ng f r om one t i me step t o t he next coul d be cont r ol l ed by pur e t i mi ng f or ballistic movement s, or by a pr opr i ocept i ve feedback signal. Ther e is, of course, an enor mous amount mor e t han this t o mot or cont r ol , and realistic model s woul d have t o model f or ce cont r ol , ballistic movement s, gravi t y compensat i on, etc. The second par t of Fi gure 21 depi ct s a somewhat f anci f ul not i on of how a vari et y of out put sequences coul d shar e a col l ect i on of l ower level response units. The net wor k shown has a single " Di x i e " uni t whi ch can st art a sequence and which j oi ns in conj unct i ve connect i ons with each not e t o speci fy its successor. At each t i me step, a WTA net wor k deci des what not e gets sounded. One can i magi ne addi ng t he r hyt hm net wor k and t rans- posi t i on net wor ks t o ot her keys and t o ot her modal i t i es of out put . a. Sequence and Suppression star \ o @( 9@ 1 ' ~ ' ~ " D ix ie " Rhythm > b . W h is tlin g D ix ie Figure 21. Mapping Space to Time. 247 248 FELDMAN AND BALLARD Converting Time to Space. The sequencer model f or skilled move- ment s was great l y si mpl i fi ed by t he assumpt i on t hat t he sequence of activi- ties was pre-wi red. How coul d one (still cr udel y, of course) model a si t uat i on like speech per cept i on where t her e is a l argel y unpr edi ct abl e t i me- var yi ng comput at i on t o be carri ed out ? One sol ut i on is t o combi ne t he sequencer model of Fi gure 21 with a simple vision-like scheme. We assume t hat speech is recogni zed by being sequenced i nt o a buf f er of about t he length of a phrase and t hen is rel axed against cont ext in t he way descri bed above f or vi- sion. For simplicity, assume t hat t here are t wo i dent i cal buf f er s, each hav- ing a pervasi ve modi f i er (mj) i nner vat i on so t hat ei t her one can be swi t ched i nt o or out of its connect i ons. We are part i cul arl y concer ned with t he pr o- cess of goi ng f r om a sequence of pot ent i al phonet i c feat ures i nt o an i nt er- pret ed phrase. Fi gure 22 gives an idea of how this mi ght happen. t i me = t o Figure 22. Mappi ng Time to Space. CONNECTI ONI ST MODELS A N D THEIR PROPERTIES 2 4 9 Assume t hat t here is a separat e uni t f or each pot ent i al f eat ur e f or each t i me step up t o t he length of t he buf f er . The net wor k which anal yzes sound is connect ed i dent i cal l y t o each col umn, but conj unct i on allows onl y t he connect i ons t o t he active col umn t o t r ansmi t val ues. Under ideal ci rcum- stances, at each t i me step exact l y one f eat ur e unit woul d be act i ve. A phrase woul d t hen be laid out on t he buf f er like an i mage on t he " mi nd' s e ye , " and t he anal ogous kind of rel axat i on cones (cf. Fi gure l, 6) i nvol vi ng mor - phemes, words, etc. , coul d be br ought t o bear. The mor e realistic case where sounds are l ocal l y ambi guous present s no addi t i onal pr obl ems. We assume t hat , at each t i me step, t he vari ous compet i ng feat ures get varyi ng act i vat i on. Di phone const r ai nt s coul d be capt ur ed by ( + o r - ) links t o t he next col umn as suggested by Fi gure 22. The result is a mul t i pl e possibility rel axat i on pr obl e m- - a ga i n exact l y like t hat in visual per cept i on. The fact t hat each pot ent i al f eat ur e coul d be assigned a r ow of units is essential t o this sol ut i on; we do not know how t o make an anal ogous model f or a se- quence of sounds which cannot be cl earl y cat egori zed and combi ned. Recall t hat t he pur pose of this exampl e is t o i ndi cat e how t i me-varyi ng i nput coul d be t r eat ed in connect i oni st model s. The pr obl em of act ual l y laying out det ai l ed model s for l anguage skills is enor mous and our exampl e may or may not be useful in its cur r ent f or m. Some of t he consi der at i ons t hat arise in di st ri but ed model i ng of l anguage skills are pr esent ed in Ar bi b and Capl an, (1979). CONCLUSI ONS The CM par adi gm advanced in this paper has been appl i ed successfully onl y to rel at i vel y low-level tasks. Ther e is no reason, as yet, to be conf i dent t hat an i nt er medi at e symbol i c r epr esent at i on will not be requi red f or model i ng hi gher cogni t i ve processes. Ther e is, however, t he begi nni ng of a col l ect i on of ef f or t s whi ch can be i nt er pr et ed as at t empt i ng CM appr oaches t o hi gher level tasks. These i ncl ude work which explicitly uses parallelism in pl anni ng (Stefik, 1981) and deduct i on, and wor k which i ncor por at es mor e connec- t i oni st archi t ect ural not i ons of val ue units (Forbus, 1981) and coarse codi ng ( Gar vey, 1981). We have now compl et ed six years of i nt ensi ve ef f or t on t he devel op- ment of connect i oni st model s and t hei r appl i cat i on t o t he descri pt i on of compl ex tasks. Whi l e we have onl y t ouched t he surface, t he results t o dat e are ver y encour agi ng. Somewhat t o our surpri se, we have yet t o encount er a chal l enge t o t he basic f or mul at i on. Our at t empt s to model in detail par- t i cul ar comput at i ons (Bal l ard & Sabbah, 1981; Sabbah, 198 I) have led t o a number of new insights ( f or us, at least) i nt o these specific tasks. At t empt s like this one t o f or mul at e and solve general comput at i onal pr obl ems in 250 FELDMAN AND BALLARD realistic connect i oni st t er ms have pr oven t o be di f f i cul t , but less so t han we woul d have guessed. Ther e a ppe a r t o be a numbe r o f i nt erest i ng t echni cal pr obl ems wi t hi n t he t heor y and a wi de r ange of quest i ons about br ai ns and behavi or whi ch mi ght benef i t f r om an a ppr oa c h al ong t he lines suggest ed in this paper . AP P ENDI X: SUMMARY OF DEFI NI TI ONS AND NOTATI ON A unit is a c omput a t i ona l ent i t y compr i si ng: { q } - - a set o f discrete states, < 10 I r - - a cont i nuous val ue in [ - 10,10], called pot ent i al (accuracy of several digits) v - - a n out put value, i nt egers 0_< v _ 9 i - - a vect or of i nput s i, . . . . . i. and f unct i ons f r om ol d t o new val ues o f t hese p - - f ( i , p , q ) q- - g( i , p, q) v - - h( i , p, q) which we assume t o c omput e cont i nuousl y. The f or m of t he f, g, and h f unct i ons will var y, but will general l y be rest ri ct ed t o condi t i onal s and si m- ple f unct i ons. P-Units For s ome appl i cat i ons, we will use a par t i cul ar l y si mpl e ki nd of uni t whose out put v is pr opor t i onal t o its pot ent i al p ( r ounded) (when p > 0) and whi ch has onl y one st at e. In ot her wor ds p- - p + ~ Ewki~ v - - i f p > 0 t hen r ound (p - 0) else 0 [ 0- <wk< 1] [v =0. . . 9] where ~, 0 are const ant s and wk ar e wei ght s on t he i nput val ues. Conjunctive Connecti ons In t er ms of our f or mal i s m, t hi s coul d be descr i bed in a var i et y o f ways. One of t he si mpl est is t o def i ne t he pot ent i al in t er ms o f t he ma x i mu m, e. g. , p - - p +/ 3Max(i , +i ~- ~o, i 3 + i , - ~ , i s + i 6 - i , - ~ ) CONNECTIONIST MODELS AND THEIR PROPERTIES 251 where fl is a scale const ant as in t he p-uni t and ~o is a const ant chosen (usually > 10) t o suppress noi se and r equi r e t he presence of mul t i pl e act i ve i nput s. The mi nus sign associ at ed with i, cor r esponds t o its bei ng an i nhi bi t or y in- put . The max- of - sum uni t is t he cont i nuous anal og of a logical OR- of - AND (di sj unct i ve nor mal f or m) uni t and we will somet i mes use t he l at t er as an ap- pr oxi mat e versi on of t he f or mer . The OR- of - AND uni t cor r espondi ng t o t he above is: p--p + ct OR (i,&i,, i3&i,, is&i~&(not i,) ) Wi nner - t ake- al l (WTA) net wor ks have t he pr oper t y t hat onl y t he uni t with t he highest pot ent i al ( among a set of cont ender s) will have out put above zero af t er some settling t i me. A coal i t i on will be called st abl e when t he out put of all of its member s is non- decr easi ng. Change For our pur poses, it is useful t o have all t he adapt abi l i t y of net wor ks be conf i ned t o changes in weights. Whi l e t here is known t o be some gr owt h of new connect i ons in adul t s, it does not appear t o be fast or ext ensi ve enough t o pl ay a ma j or r ol e in l earni ng. For t echni cal reasons, we consi der very local gr owt h or decay of connect i ons t o be changes in existing connect i on pat t er ns. Obvi ousl y, model s concer ned wi t h devel opi ng syst ems woul d need a ri cher not i on of change in connect i oni st net wor ks (cf. v o n d e r Mal sbur g & Willshaw, 1977). We pr ovi de each uni t wi t h a me mor y vect or ~t whi ch can be updat ed: #- - c( i , p, q, x, w, tt) where/~ is t he i nt er medi at e- t er m me mor y vect or , w is t he weight vect or , i, p, and q are as al ways, and x is an addi t i onal single i nt eger i nput (0_<x_<9) whi ch capt ures t he not i on of t he i mpor t ance and val ue of t he cur r ent behav- ior. I nst ant aneous est abl i shment of l ong- t er m me mor y i mpr i nt i ng woul d be equi val ent t o havi ng # = w. The assumpt i on is t hat t he consol i dat i on of l ong- t er m changes is a separ at e process. We post ul at e t hat i mpor t ant , f avor abl e or unf avor abl e, behavi or s can give rise t o fast er l earni ng. The r at i onal e f or this is given in ( Fel dman, 1980; 1981a), whi ch also lays out i nf or mal l y our views on how shor t - and l ong- t er m l earni ng coul d occur in connect i oni st net wor ks. A det ai l ed t echni cal discussion of this mat er i al , al ong t he lines of this paper , is pr esent ed in ( Fel dman, 1981b). Obvi ousl y enough, a pl ausi bl e model of l earni ng and memor y is a prerequi si t e f or any seri ous scientific use of connect i oni sm. 252 FELDMAN AND BALLARD R E F E R E N C E S Anderson, J. A., Silverstein, J. W., Ritz, S. A. , & Jones, R. S. Distinctive features, categorical perception, and probabi l i t y learning: Some appl i cat i ons of a neural model. Psychologi- cal Review, September 1977, 84(5), 413-451. Arbi b, M. A. Perceptual structures and distributed mot or control. COINS (Tech. Rep. 79-11). University of Massachusetts, Comput er and I nf or mat i on Science, and Cent er for Sys- tems Neuroscience, June 1979. Arbi b, M. A., & Capl an, D. Neurolinguistics must be comput at i onal . The Brain and Behav- ioral Sciences, 1979, 2, 449-483. Ballard, D. H. Paramet er networks: Towards a theory of low-level vision. Proceedings o f the 7th I JCAL Vancouver, BC, August 1981. Ballard, D. H. , & Kimball, O. A. Rigid body motion f r om depth and optical f l ow (Tech. Rep. 70). New York: University of Rochester, Comput er Science Depart ment , in press, 1981. Ca) Ballard, D. H. , & Kimball, O. A. Shape and light source direction f r om shading (Tech. Rep.). Rochester, NY: University of Rochester, Comput er Science Depart ment , in press, 1981. (b) Ballard, D. H. , & Sahbah, D. On shapes. Proceedings o f the 7th IJCAI, Vancouver, BC, August 1981. Collins, A. M., & Loftus, E. F. A spreading-activation theory of semantic processing. Psycho- logical Review, November 1975, 82, 407-429. Delcomyn, F. Neural basis of rhyt hmi c behavi or in animals. Science, Oct ober 1980, 210, 492-498. Dell, G. S., & Reich, P. A. Toward a unified model of slips of the tongue. In V. A. Fr omki n (Ed.), Errors in Linguistic Performance: Slips o f the Tongue, Ear, Pen, and Hand. New York: Academic Press, 1980. Didday, R. L. A model of vi suomot or mechani sms in t he frog optic rectum. Mathematical Bioscience, 1976, 30, 169-180. Duda, R. O. , & Hart , P. E. Use of t he Hough t r ansf or m t o detect lines and curves in pictures. Communications o f the A CM 15Cl), Januar y 1972, I 1-15. Edelman, G., & Mount cast l e, B. The Mindful Brain. Boston, MA: MI T Press, 1978. Fahl man, S. E. NETL, A System f or Representing and Using Real Knowledge. Bost on, MA: MI T Press, 1979. Fahl man, S. E. The Hashnet i nt erconnect i on scheme. Comput er Science Depart ment , Carnegie-Mellon University, June 1980. Fel dman, J. A. ,4 distributed information processing model o f visual memory (Tech. Rep. 52). Rochester, NY: University of Rochester, Comput er Science Depart ment , 1980. Fel dman, J. A. A connect i oni st model of visual memory. In G. E. Hi nt on & J. A. Ander son (Eds.), Parallel Models o f Associative Memory. Hillsdale, N J: Lawrence Er l baum Associates, 1981. Ca) Feldman, J. A. Memory and change in connection networks (Tech. Rep. 96). Rochester, NY: University of Rochester, Comput er Science Depart ment , Oct ober 1981. (b) Feldman, J. A. Four f rames suffice (Tech. Rep. 99). Rochester, NY: University of Rochester, Comput er Science Depart ment , in press, 1982. Feldman, J. A. , & Ballard, D. H. Computing with connections (Tech. Rep. 72). Rochester, NY: University of Rochester, Comput er Science Depart ment , 1981; to appear in book by A. Rosenfeld & J. Beck (Eds.), 1982. Forbus, K. D. Qualitative reasoni ng about physical processes. Proceedings o f the 7th I JCAL Vancouver, BC, August 1981, 326-330. Freuder, E. C. Synthesizing const rai nt expressions. Communications o f the ACM, November 1978, 21(11), 958-965. CONNECTI ONI ST MODELS AND THEIR PROPERTIES 2 5 3 Garvey, T. D., Lowrance, J. D., & Fiscbler, M. A. An inference technique for integrating knowledge from disparate sources. Proceedings of the 7th IJCAI, Vancouver, BC, August 1981, 319-325. Grossberg, S. Biological competition: Decision rules, pattern formation, and oscillations. Proc. National Academy of Science USA, April 1980, 77(4), 2238-2342. Hanson, A. R., & Riseman, E. M., (Eds.). Computer Vision Systems. New York: Academic Press, 1978. Hillis, W. D. The connection machine (Computer architecture for the new wave). AI Memo 646, M. I. T. , September 1981. Hinton, G. E. Relaxation and its role in vision. (Ph.D. thesis, University of Edinburgh, December 1977.) Hinton, G. E. Draft of Technical Report. La Jolla, CA: University of California at San Diego, 1980. Hinton, G. E. The role of spatial working memory in shape perception. Proceeding of the Cognitive Science Conference, Berkeley, CA, August 1981. (a) 56--60. Hinton, G. E. The role of spatial working memory in shape perception. Proceedings of the Cognitive Science Conference, Berkeley, CA, August 1981. (a) 56-60. Hinton, G. E., & Anderson, J. A. (Eds.). Parallel Models of Associative Memory. Hillsdale, N J: Lawrence Erlbaum Associates, 1981. Horn, B. K. P. , & Schunck, B. G. Determining Optical Flow. AI Memo 572, AI Lab, MIT, April 1980. Hubel, D. H. , & Wiesel, T. N. Brain mechanisms of vision. Scientific American, September 1979, 150-162. J a ' J a ' , J. , & Simon, J. Parallel algorithms in graph theory: Planarity testing. CS 80-14, Com- puter Science Department, Pennsylvania State University, June 1980. Jusczyk, P. W., & Klein, R. M. (Eds.). The Nature of Thought: Essays in Honor of D. O. Hebb. Hillsdale, N J: Lawrence Erlbaum Associates, 1980. Kandel, E. R. The Cellular Basis of Behavior. San Francisco, CA: Freeman, 1976. Kimme, C., Sklansky, J. , & Ballard, D. Finding circles by an array of accumulators. Commu- nications of the ACM, February 1975. Kinsbourne, M., & Hicks, R. E. Functional cerebral space: A model for overflow, transfer and interference effects in human performance: A tutorial review. In J. Requin (Ed.), Attention and Performance 7. Hillsdale, N J: Lawrence Erlbaum Associates, 1979. Kosslyn, S. M. Images and Mind. Cambridge, MA: Harvard University Press, 1980. Kuffler, S. W., & Nicholls, J. G. From Neuron to Brain: A Cellular Approach to the Func- tion of the Nervous System. Sunderland, MA: Sinauer Associates, Inc., Publishers, 1976. Marr, D. C., & Poggio, T. Cooperative computation of stereo disparity. Science, 1976, 194, 283-287. McClelland, J. L., & Rumelhart, D. E. An interactive activation model of the effect of context in perception: Part 1. Psychological Review, 1981. Minsky, M., & Papert, S. Perceptrons. Cambridge, MA: The MIT Press, 1972. Norman, D. A. A psychologist views human processing: Human errors and other phenomena suggest processing mechanisms. Proceedings of the 7th IJCAL Vancouver, BC, August 1981, 1097-1101. Perkel, D. H. , & Mulloney, B. Calibrating compartmental models of neurons. American Jour- nal of Physiology 1979, 235(1), R93-R98. Posner, M. 1. Chronometric Explorations of Mind. Hillsdale, N J: Lawrence Erlbaum Asso- ciates, 1978. Prager, J. M. Extracting and labeling boundary segments in natural scenes. IEEE Trans. PAMI, January 1980, 2(1), 16-27. Ratcliff, R. A theory of memory retrieval. Psychological Review, March 1978, 85(2), 59-108. 2 5 4 FELDMAN A N D BALLARD Rosenfeld, A., Hummel , R. A., & Zucker, S. W. Scene labelling by relaxation operat i ons. IEEE Trans. SMC 6, 1976. Sabbah, D. Design of a highly parallel visual recognition system. Proceedings o f the 7th IJCA L Vancouver, BC, August, 1981. Scientific American. The Brain. San Francisco, CA. : W. H. Freeman and Company, 1979. Sejnowski, T. J. St rong covariance with nonlinearly interacting neurons. Journal o f Mathe- matical Biology, 1977, 4(4), 303-321. Smith, E. E., Shoben, E. J. , & Rips, L. J. St ruct ure and process in semantic memory: A fea- tural model for semantic decisions. Psychological Review, 1974, 8(3), 214-241. Stefik, M. Pl anni ng with Const rai nt s (MOLGEN: Part I). Artificial Intelligence, 16(2), 1981. Stent, G. S. A physiological mechanism for Hebb' s postulate of learning. Proc. National Academy o f Science USA, April 1973, 70(4), 997-1001. Sunshine, C. A. Formal techniques for protocol specification and verification. I EEE Compu- ter, August 1979. Torioka, T. Pat t ern separability in a r andom neural net with i nhi bi t ory connect i ons. Biologi- cal Cybernetics, 1979, 34, 53-62. Triesman, A. M., & Gelade, G. A feature-integration t heory of at t ent i on. Cognitive Psychol- ogy, 1980, 12, 97-136. UIIman, S. Relaxation and const rai ned opt i mi zat i on by local processes. Computer Graphics and Image Processing, 1979, I0, 115-125. yon der Malsburg, Ch. , & Willshaw, D. J. How to label nerve cells so t hat they can i nt ercon- nect in an ordered fashi on. Proc. National Academy o f Science USA, November 1977, 74(1 I), 5176-5178. Wickelgren, W. A. Chunki ng and consol i dat i on: A theoretical synthesis of semantic net works, configuring in condi t i oni ng, S-R versus cognitive learning, normal forgetting, the amnesic syndrome, and t he hi ppocampal arousal system. Psychologial Review, 1979, 86(1 ), 44-60. Wurtz, R. H. , & Al bano, J. E. Vi sual -mot or funct i on of t he pri mat e superior colliculus. An- nual Review o f Neurscience, 1980, 3, 189-226. Zeki, S. The represent at i on of colours in the cerebral cortex. Nature, April 1980, 284, 412-418.