HammingNN classifier – early ideas

Below is an article I wrote in 1989, describing my ideas for a neural network derived from my knowledge of physiology from medical student days, my studies in behaviour from psychiatry residency and practice, and computer principles from my time in engineering.

A Neural Network Model
Based On
Brain Physiology and Psychology

 

 

Henry Olders, P. Eng., MD, FRCPC

9 March 1989

Executive Summary

While the branch of artificial intelligence research called “neural networks” was initially based on the brain, the main thrust now seems ever farther away from principles founded on nature. If we assume that evolution has given us an optimised design, we would do well to incorporate findings in neurophysiology and neuropsychology into our neural network models. These findings include:

  1. Contrast enhancement and stability can be achieved with lateral inhibition, either recurrent inhibition (feedback) such as found in the cortex and subcortical structures, or feedforward inhibition as in the cerebellum.
  2. Classical conditioning (when two stimuli occur within a brief time interval) is a ubiquitous learning process which can be demonstrated at the level of individual neurons.
  3. Most of the information stored in the brain is in the form of temporal (ie serial) patterns; for example, sequences of muscle movements, or memory traces for speech and music. Associative feedback, together with the temporal summation and delay found in neurochemical synapses, enable storage, recall, and recognition of such temporal patterns in which timing and sequence are as important as the spatial aspects of the patterns.
  4. Operant conditioning (stimulus-response learning) explains much higher level learning in which the organism interacts with its environment. Affective states such as anxiety, pain, and pleasure correspond to neurohormonally induced brain states which are thought to influence whether learning or unlearning takes place.
  5. The principle of reafference (extracting information about the environment through self-induced movements) helps to explain how organisms store and recall information such as object properties like hardness or inertia.

The proposed neural network model incorporates these physiological and psychological principles through the following:

  1. Forcing inputs (which provide classical conditioning through adequate stimulus substitution);
  2. Recurrent inhibition with a limited but adjustable sphere of influence;
  3. Facilitating inputs for the application of the reafference principle;
  4. Synaptic modification rules which incorporate global “affect” states.

These principles will permit the neural network to “learn” new patterns with a very small number of learning cycles, compared to the hundreds of learning trials necessary for other network paradigms such as “back propagation”. When the recurrent inhibition is adjusted so that on average half the total number of neurons is active, the information storage capacity of the network is increased considerably.

Most importantly, by coding inputs to the network as serial patterns rather than purely spatial ones, and incorporating associative feedback as well as synapses which exhibit temporal summation and delays, the neural network becomes capable of processing a vastly greater variety of information types, including auditory patterns such as speech.

 

Introduction

This proposal describes a computer model of the mammalian cortex, and is based on the hypothesis that a machine can be built which, modelled on neuronal networks and using processes derived from physiology and psychology, will be able to learn. An immediate commercial application would be a speech driven word processing machine, but the generality of the principle extends beyond speech recognition to most forms of pattern recognition in all biological perceptual modalities.

The concepts are not revolutionary: the Pitts-McCulloch neuron, wired as a “perceptron”, and then further modified by adding lateral and feedforward inhibition for contrast enhancement; “forcing” inputs to facilitate learning; outputs wired back to inputs so that serial or temporal patterns can be processed as if they were spatial or “wire-labelled” patterns; incorporation into a system in which a “hypothesis” of the expected input is generated or synthesised and then used as a model pattern for the actual input (if actual input matches the expected input, learning takes place); and finally, an approximation to emotional states occasioned by operant conditioning reinforcers – eg a positive environmental response to a learning trial will globally enhance learning.

Today’s computer technology permits large-scale networks to be built at modest cost: thousands of neurons can be simulated by equivalent numbers of microprocessors each with its own memory, interacting through common data and control busses as well as through shared memory. Although special hardware designs are needed, this is no longer a prohibitive constraint.

Furthermore, general-purpose computers have achieved speeds and memory capacities such that the many trials necessary to optimise the multiple parameters involved, are now feasible.

This paper outlines a proposal for a project to explore such pattern recognition techniques, with the ultimate goal of producing designs for speech recognition machines which might be commercially exploited.

Because the principles are derived from physiology, the machine should allow for explorations of the mechanisms behind psychological experimental results, as well as providing an under­standing of how some forms of psychopathology develop.

Basic Postulates

In considering the processing of information within the brain, one approach would be to apply what is known about neurophysiology in the generation of models which also satisfy a number of simple postulates, including spatial distribution of memory, parallel processing, and self-organisation.

Spatial Distribution of Memory

A traditional approach to the storage of information involves accessing it on the basis of location; this paradigm applies to a filing cabinet or to a random-access computer memory. An inescapable consequence is that irretrievable loss of specific items of information occurs if the corresponding locations are destroyed, unless duplicate copies are stored elsewhere.

An alternative approach is distributed storage, in which each storage location or element contributes to the storage of many information items, and any given item is sorted not as a unit in one location, but distributed over many locations. A concrete metaphor would be if one were to copy this paper onto file cards, one word per file card, and then store each file card in a different box. It would require destruction of a large number of boxes before it would become impossible to retrieve at least the essential content of the stored data. If one were to store an additional paper in the same way, using the same set of boxes (ie with two words in each box) it would still be possible to retrieve each paper, although possibly with some imprecision, even without labelling which word in any box belongs to which paper. In general, only one of the two words in any box would “fit”, given a knowledge of its context, ie patterns of grammar, sentence construction, the author’s style, etc.

One could continue to store additional papers in the same boxes, although with decreasing precision because of “interference”. The example illustrates, however, not only distribution of information storage over many locations, but retrieval on the basis of “relationships” between elements of the data which can be inferred from the content of the data.

The first postulate, spatial distribution of memory, is suggested by the experiments of Lashley [1. Lashley KS. In search of the engram. Symposia of the Society for Experimental Biology. 1950;4:30.]

Parallel Processing

The brain appears to consist of processing units which operate on the order of milliseconds to perform processing feats impossible to simulate in hundreds of minutes on computers with processing units operating in nanoseconds. Clearly, the brain accomplishes this through the simultaneous operation of many units, each carrying out simple computations or reacting only to its own local set of inputs [1. Rumelhart DE, Norman DA. Parallel Models of Associative Memory. 1981].

This postulate may be the most difficult to accept; it is based on the apparent impossibility of the mammalian chromosomal complement to specify all of the interconnections between neurons found in the brain. What is more likely is that the DNA codes for specific groups, clusters, or nuclei of neurons to connect to other specific groups, via chemotactic or similar principles. For example, embryological retinal tissue, implanted in various locations on the surface of a brain, will send out axons which will eventually seek out and connect up with the superior colliculi.

Once the connections specified by genetic codes have been made, their specificity is probably brought about by modifications of individual connections (ie strengthening of connections between some neurons, and weakening or disappearance of other connections) made possible by the plasticity of the neuronal network, and in accord with sensory experiences or occurrences of behaviour. The findings of Hubel and Wiesel about changes in the visual cortex in kittens reared in specialised visual environments suggests that such effects can be both profound and long-lasting.

Pattern Recognition by Neural Networks

The model of a neural network which can perform pattern recognition will be described in an approach which begins with basic elements and then uses these as building blocks.

The McCulloch-Pitts Neuron

This “formal” neuron, proposed by W. McCulloch and W. Pitts in 1943 [1. McCulloch WS, Pitts W. A logical calculus of the ideas immanent in nervous activity. Bulletin of Mathematical Biology. 1943;5:115-133.], is a special instance of the linear logic element shown in figure 1.

fig1. Linear logic element

Figure 1. Linear Logic Element

In this model, the output ‘e0’ can assume only two states, either “0” (quiescent) or “1” (firing), depending on whether the sum of each input ‘xi’ multiplied by its synaptic weight ‘wi’ exceeds a threshold ‘q’. Weights correspond to synaptic effectiveness, and may be either positive or negative, representing excitatory or inhibitory actions. The inputs are binary signals from other linear logic element neurons.

It can be shown that, by appropriate selection of weights, a linear logic element can perform the Boolean logical functions AND, OR, and NOT. It is possible to construct a machine which can perform any well-defined input-output behaviour by using only these three logical functions. Thus, any logical relationship can be performed by linear logic elements of the type shown in figure 1.

The Perceptron

The term “Perceptron” was coined by Dr. Frank Rosenblatt [1. Rosenblatt F. Principles of Neurodynamics: Perceptrons and the Theory of Brain Mechanisms. 1962. Washington DC: Spartan. 1962.], and defined by him as a network of input elements, output elements, computing elements, and a memory which affects only the flow of signals between the elements. Although many different neural network analogues have been examined under the rubric “perceptron”, a simple starting point consists of a number of linear logic elements (also known as “threshold neurons”) of the McCulloch-Pitts type, connected as in figure 2.

fig2. A perceptron

Figure 2. A Perceptron

In this perceptron, if the synapse weights and threshold values of the association units are adjusted so that each acts as an AND unit, then the network can compute any Boolean function expressed in the disjunctive normal form [1. Hilbert D, Ackermann W. Principles of mathematical logic. American Mathematical Soc.; 1950.].

In their book “Perceptrons”, Minsky and Papert (1969) provided mathematical proofs of the severe limitations of the simple perceptron as a pattern recognition device. For example, it is unable to calculate the property of “connectedness”, which can be easily performed by serial computation methods.

If a mechanism is provided for modifying the synaptic weights in a perceptron, in accordance with a training paradigm, the machine can be made to learn. Minsky and Papert [1. Minsky ML, Papert S. Perceptrons; an introduction to computational geometry. Cambridge, Mass.: MIT Press; 1969:258.] provide a proof for a “perceptron convergence theorem”, demonstrating that such learning can occur in a finite number of steps.

The learning process can be conceptualised as follows: a set of patterns is presented to the perceptron, one at a time. If the perceptron output is correct, a positive reinforcement signal provided by the machine’s operator is used by the machine to modify its internal state, ie to increase the synaptic weights for all synapses between response units and association units which were simultaneously active, ie firing, when the pattern was presented. In the absence of positive reinforcement, simultaneous firing of any association and response units would cause the synapses between them to have their weights decreased. After a sufficient number of repetitions, the perceptron can be said to have “learned” the patterns, in that the presentation of a pattern causes the correct output to appear with a low error rate.

Extensions of the Perceptron

In recognition of the severe limitations of the simple perceptron described above, a number of modifications and additions have been proposed. For example, Stafford and colleagues [1. Hawkins JK, Munsey CJ, Stafford RA. Research on Biax Type Elements and Associated Circuits (Biax Perceptron). DTIC Document; 1963] utilised learning in both layers of a simple perceptron, but no convergence proof was provided.

These simple perceptrons deal only with spatial patterns; to incorporate temporal processing capabilities, various properties of biological neurons, such as signal delay, temporal summation, and a refractory period, can be added, along with closed loops (feedback), as part of the network configuration.

Block et al [1. Block HD, Knight Jr BW, Rosenblatt F. Analysis of a four-layer series-coupled perceptron. II. Reviews of Modern Physics. 1962;34:135.] suggested a four-layer perceptron which provided some generality in recognition of spatial patterns, ie the ability to recognise similarities between patterns, independent (to a degree) of size, translation, or rotation transformations.

Of these extensions, possibly the most important relate to the capability of processing temporal sequences. The limitations of simple perceptrons explored by Minsky and Papert, such as recognition of the properties “connectedness” or “parity”, can be dealt with by a serial decision-making process.

Lateral Inhibition

For a neural network such as the perceptron, the concept of stability is important. Unless specific mechanisms are provided to meet this goal, then the network’s level of excitation would be very sensitive to the learning rule applied at the level of the synapse. For example, if the average synaptic weights for a neuron were to increase, while the threshold of excitation remained unchanged, then the rate of firing of that neuron would also increase.

A number of mechanisms can be used to achieve stability. At the level of individual neurons, one could apply feedback to maintain a constant average synaptic weight. Or again, feedback applied to a group of neurons (such as a perceptron) could provide a constant level of excitation for the ensemble.

It appears that mechanisms which operate at the level of the individual neuron would severely limit information capacity. It can be shown, however, for a group of neurons, that information capacity is maximised if half the neurons are firing at any given time, if the constraint for stability is such that the number of neurons firing is to remain constant.

A useful mechanism for improving the quality of images from the point of view of pattern recognition via feature detection, is contrast enhancement achieved by lateral inhibition. The cortical neuron recording studies of Hubel and Wiesel suggest that this mechanism is operative in mammalian visual cortex. It could be included in a neural network model by providing an inhibitory feedback from each neuron in an array to its nearest neighbours, possibly with the amount of inhibition related inversely by some rule to the distance between the two neurons. Such inhibition would also provide the overall stability in level of excitation as discussed above.

Two types of inhibition mechanism can be used: either feedback or feedforward inhibition. Feedback or “recurrent” inhibition is widely used in the cortex and in subcortical structures, while well-documented examples of forward inhibition can be found in the cerebellum. Depending on the network parameters chosen, either type can provide a wide range of operating modes. For example, in the network shown in figure 3, one form of recurrent inhibition is used; one can adjust the amount of feedback so that only one neuron in each group can fire, or so that on average half the total population in each group of neurons is active. This latter condition would be met if the firing frequency of each inhibitory interneuron were adjusted to be proportional to the number of active inputs to it.

fig3. Recurrent or feedback inh

Figure 3. Recurrent or Feedback Inhibition

For the first case, a network with N input neurons in each group has N possible output patterns, whereas in the second case, the number of possible output patterns for each group of N neurons is equal to the number of ways N/2 active neurons can be distributed over N places, or

 

It is obvious that the second case provides for a much larger information content.

The second type of lateral inhibition is provided by “feedforward inhibition”, illustrated in figure 4.

fig4. Feedforward inhibition

Figure 4. Feedforward Inhibition

If we assume, in a system with feedforward inhibition, that the amount of inhibition is proportional to the total input activity, then only those neurons which have a selective preference for certain input patterns will ever be able to fire. For a system in which synaptic weights are assumed equal prior to learning, no neurons could ever fire unless there were a supplementary system of “forcing” or “facilitating” inputs which could force neuron firing, and thus enable learning to take place. This may be the situation in the cerebellum, in which basket cells appear to be arranged so as to “force” Purkinje cells to fire.

In line with the terminology introduced by Sommerhoff [1. Sommerhoff G. Logic of the Living Brain. John Wiley & Sons Ltd; 1974:424pp.], I shall call each group of neurons which share an inhibitory interneuron, a “Lambda”, module, with a symbolic representation as in figure 5.

fig5. Lambda module

Figure 5. Lambda Module

Here, the single “output” line represents a group of axons; the horizontal input lines represent the bundle of inputs (axons from other groups or systems of neurons) which provide synaptic connections to each of the neurons of the Lambda module; the two vertical lines at the base of the triangle represent the dendrites of all the neurons in the module. The blackened bar on the left of the triangle serves as a reminder that the symbol stands for a group of neurons related by having a common inhibitory interneuron, either recurrent or feedforward.

 

Classical Conditioning

In the discussion on lateral inhibition, the concept of “forcing” or “facilitating” inputs was introduced. The paradigm of classical conditioning may be taken as a psychological analogue for this (presumed) type of neural network interconnection, shown in figure 6.

fig6. Facilitating inputs

Figure 6. Facilitating Inputs

In this figure, each neuron in each lambda module has two types of inputs:

  1. contextual or pattern inputs (ie the normal type of dendritic inputs), in which learning takes place via modification of synaptic weights, and
  2. facilitating inputs (shown as vertical lines with arrowheads at the base of the lambda symbol in figure 6).

These facilitating inputs, depending on the parameters chosen, can either cause the firing of the neurons involved with absolutely no activity required at the contextual inputs, in which case the facilitating inputs function as “forcing” inputs; or they can act simply as facilitators for the contextual inputs, thereby serving as “biasing”inputs.

A physiological correlate of such inputs which can have an overriding effect on their neuron, might be axons which synapse directly on cell somas, or perhaps non-synaptic junctions.

Such a system can undergo learning by the Pavlovian or classical conditioning paradigm. Using Pavlov’s example of the dog which was trained to salivate at the sound of a bell, the perception of food, ie the unconditioned stimulus, can serve as the facilitating input, adjusted as a forcing input so that the stimulus by itself is sufficient to produce the unconditioned response (output) of salivation. If a conditioned stimulus (the bell) is temporally paired with the food stimulus, the appearance of this conditioned stimulus as a contextual input at the same time as a particular pattern of neurons is active (because of the forcing inputs) and producing the salivation response, will cause the synaptic weights linking that particular contextual input pattern representing the sound of the bell, to the output pattern corresponding to salivation, to be strengthened. If this process is repeated over a number of trials, eventually the synaptic weights linking the contextual input pattern to the output pattern will have been strengthened sufficiently so that the forcing input need no longer be present to produce the output. Classical conditioning can be said to have occurred, by means of a process called “adequate stimulus substitution” by Sommerhoff [1. Ibid.].

Serial Processing

The neural networks considered so far are capable of recognising spatial patterns, such as visual patterns impinging on the retina. However, much of what takes place in the brain is related to temporal patterns, not only such things as processing of speech or music, but all types of behaviour and communication, including all motor activities, in which timing and sequence are intrinsic.

By the addition of associative feedback to lambda systems, such systems can be made to function as temporal pattern recognition machines. Furthermore, once temporal patterns have been learnt by such a system via synaptic modification, the system can be made to function as an associative memory, that is a subset of the original input can serve as a trigger to cause the same output (which could itself be a temporal sequence) to be generated. Such a system is shown in figure 7.

fig7. Associative feedback

Figure 7. Associative Feedback

In this system, physiological realities such as temporal summation and delay become necessary to its functioning. The synaptic excitation received by each lambda neuron depends not only on the sensory input but also on which lambda neurons fired most recently, ie not only on the present spatial pattern but also on the particular spatial pattern of inputs and active neurons occurring just prior to the present pattern. This, in turn, depended on the pattern previous to it. Thus it can be seen that the output pattern is dependent on the temporal sequence of input patterns.

Operant Conditioning

The stimulus-response theory of learning as propounded by Skinner and others has obtained a great deal of experimental support. One can conceive of a number of ways in which a neural network could apply reinforcement of successful responses to enhance learning (Sommerhoff, 1974):

  1. Responses are synaptically facilitated only when there are indications of positive outcomes. Responses are extinguished through other interfering reactions;
  2. Responses are synaptically facilitated simply because they occur, and are extinguished as in (1);
  3. Responses are synaptically facilitated except when there are indications on negative outcomes. They are extinguished as in (1).

I am not aware of any experimental evidence to suggest which one of these mechanisms is operative in mammalian nervous systems. It may be that all are operative, if not in the same parts of the nervous system, then perhaps in different structures.

The Principle of Reafference

The principle of extracting information about the environment through self-induced movements is known as the principle of reafference [1. Holst E, Mittelstaedt H. Das reafferenzprinzip. Naturwissenschaften. 1950;37:464-476.]. For example, an infant learns about the hardness and texture of a wooden block by attempting to bite it. We can extend this concept to hypothesise about how the brain might store information about the properties of objects, such as hardness or inertia. Although we might define “hardness” by making reference to classes of objects which we know to be hard or not hard, an infant has no such referents. It could, instead, operationally store the property of hardness as the sensory reaction it received (from pressure sensors in its gums, muscle spindles in its jaw muscles, etc.) when it bit the block with a given amount of force. In this case, the internal representation of a property needs no prior referents, and furthermore, it can be generalised to other objects as the infant’s experience widens.

In the same way, a visual representation of an object might be stored, not as an “image”, but as the set of eye muscle movements required to, for example, follow the object’s contours.

With reference to the neural network with associative feedback described earlier, one could imagine that after sufficient feedback learning, a particular sequence of eye muscle movements specifying an object’s shape might be recalled after only a small number of such movements are made, giving essentially instantaneous recognition of the object.

It is possible that we continue to learn about our environment by formulating expectations (usually unconsciously) about the sensory inputs that we might expect in response to our motor behaviour. A response which is in line with our expectations confirms our expectation of the environment, and thus is learned as being a property or characteristic of that environment. A response which does not meet our expectations would result, not in storing the new sensory information, but in generating alternative formulations which better explain the data. For example, when walking down the stairs, upon reaching the bottom, one expects that the downward movement of the foot will be checked by the presence of the floor. When this happens our knowledge of the environment is reinforced. However, if it is dark and we miscalculate, our foot does not meet the floor, and we become anxious. Our response now might be to feel out cautiously with our foot for the next step.

How does the reafference principle apply to the neural network model? I suggest that it has to do with the “facilitating inputs” discussed in a previous section. The expectation of the sensory input which may result from a self-initiated activity, whether actually performed or only hypothesised, is information which might appear at these facilitating inputs, and biases the relevant neurons towards a recognition of the expected sensory input. When the actual sensory input does appear, on the contextual inputs this time, if it corresponds to the expected input, it will be readily recognised as such, even if distorted or modified or otherwise different from the input pattern which was initially learned.

For example, an infant learning to speak may, upon his pointing to a cookie, eventually come to recognise the sound pattern for “cookie” when spoken by its mother. The initial process might be as follows:

  1. from previous experiences, the infant has learned that the act of pointing to an object during play with mother may stimulate her to utter a sound which for the baby becomes associated with the object;
  2. when pointing to a new object (ie one that has not been “named”) such as the cookie, there will appear at the facilitating inputs of the auditory processing neural network a pattern which is related to the properties of cookies, whether they be taste, texture, shape, colour, smell, location, etc., or a combination of these; the pattern represents the class “cookie” and is the unconditioned stimulus; it causes a neural network output to appear (the unconditioned response) which is symbolic for “cookie”;
  3. pointing to the cookie stimulates mother to speak the word, whose sound pattern then appears at the contextual inputs. This sound pattern is “learned” by the process of “adequate stimulus substitution”, ie it is the conditioned stimulus, and, by repetition which brings about modification of synaptic weights, it comes to substitute for the unconditioned stimulus which appears at the same time at the facilitating inputs;
  4. once learned, the sound pattern of mother saying “cookie” will be sufficient to bring about the same output (ie the response) which was initially brought about by only the facilitating inputs (as in (2) above). We can assume, however, that this facilitating input no longer behaves as a “forcing” input; rather, it operates as a “biasing” input, and thus makes the “cookie” output more likely to occur. Thus, if by context, (which can be stimulated by other modalities such as vision, smell, etc.) the word “cookie” is likely to be heard, then the auditory processing network will be biased via the facilitating inputs towards recognising this word, and this permits it to be recognised even when spoken in a noisy environment, by speakers other than mother, or with different voice inflections or rates or speaking.
  5. perhaps the most important biasing information comes to the speech recognition network via “long loop” feedback, which may involve speech generation networks. For example, if mother addresses the child in a questioning way, “would you like a …” the number of possibilities to complete the question are relatively circumscribed, and it is possible that multiple patterns appear at the facilitating inputs to accommodate the various possibilities. The number of possibilities will, of course, be reduced if the child is hungry, if mother is holding an object in her hand, etc.

Although multiple patterns being presented at the facilitating inputs could be conceived of as being a serial process, it is more likely that multiple speech recognition networks are involved, in a parallel process.

Comparison to Other Experimental Work

Psychological and Physiological Research

A computer model of a given neural network paradigm may, of course, be tested without reference to biological systems. However, if we assume that evolution has come up with effective and efficient mechanisms for pattern recognition, then it makes sense to attempt to compare results obtained with network models with the vast body of experimental data in both brain and nervous system psychology and physiology, for example, ganglionic circuitry studied by monitoring individual neurons, as in Aplysia (Kandel [1. Kandel ER, Schwartz JH, Jessell TM. Principles of neural science. Appleton & Lange, New York. 1981.]), sensory cortical functioning in animals such as cats, again examined with micro-electrodes (Hubel & Wiesel), or learning and perception experiments carried out in the psychology laboratory using both animal and human subjects.

This process of verifying the computer model by attempting to replicate experimental results obtained with live material is complicated by the fact that pattern recognition is a component of two different paradigms under study: learning and memory. However, by making the assumption that learning is necessary for memory and vice versa, the two terms can be used almost interchangeably. Thus, the extremely vast experimental literature in both these fields can be tapped for experimental paradigms amenable to study using a computer model in place of a living neural network.

One example is the sensitisation and habituation processes occurring in the gill-withdrawal reflex of the large snail Aplysia. If the computer model were to be configured as a single-layer network using feedforward inhibition, “forcing” inputs to simulate the sensitisation of the reflex caused by a noxious stimulus to the head, and choosing a synaptic modification rule which would produce habituation, then the model could be used as a test bed to explore which parameter values would produce similar responses over time as obtained in the live preparation (Kandel, 1981).

A learning task which has previously been used to draw conclusions about possible memory models is the item recognition task (Cavanagh [1. Cavanagh P. Holographic and trace strength models of rehearsal effects in the item recognition task. Mem Cognit. 1976;4:186-199.]).

Other Neural Network Research

A three-layer model which permits synaptic modification in the topmost layer controlled by an external instructor, has been described (Reilly et al [1. Reilly DL, Cooper LN, Elbaum C. A neural model for category learning. Biol Cybern. 1982;45:35-41.]). This model uses a scaling factor which can be varied for each cell. It is hypothesised that such a multiplicative scaling factor might be the function performed in real neurons by inhibitory synapses located on or near the cell body, which may have a divisive or shunting effect.

Barto et al (1982 [1. Barto AG, Anderson CW, Sutton RS. Synthesis of nonlinear control surfaces by a layered associative search network. Biol Cybern. 1982;43:175-185.]) describe a two-layer network which can perform a sensorimotor task for non-linear control problems. Instead of a simple binary “teacher”, the supervised learning uses an “attractant” (a real value which is available during learning) whose increase indicates that the correct motor activity took place. The “attractant” value is used in adjusting synaptic weights for both layers. Their network is based on an associative search network (ASN) (Barto et al, 1981 [1. Barto AG, Sutton RS. Landmark learning: an illustration of associative search. Biol Cybern. 1981;42:1-8.]), consisting of neuron-like elements in which synaptic weights are modified according to a “payoff” value provided as a reinforcement signal by the environment. The ASN acts to maximise the payoff, and thus “searches” for the correct output pattern corresponding to a given input or context vector. They also describe use of a “predictor” element which uses the context vector as input and biases all the other elements in the ASN. This particular model explicitly provides for classical conditioning via the “predictor” as well as operant conditioning using the “payoff” function.

Bobrowski [1. Bobrowski L. Rules of forming receptive fields of formal neurons during unsupervised learning processes. Biological Cybernetics. 1982;43:23-28.] provides a mathematical analysis of learning processes in two-layer networks consisting of formal neurons, and demonstrates that, for the case of unsupervised learning (unsupervised in this instance refers to the absence of a “teacher’s decision” which determines the proper response of a given formal neuron to the input signal), convergence will occur after a finite number of steps in the learning sequence. The network can be made to function as a “high-pass filter”, ie a filter that passes signals which occur most frequently in the learning sequence, or as a “detector of rareness”, ie a filter which passes only signals which rarely occur. These two approaches result in convergence to different types of receptive fields, excitatory in the first case, and inhibitory in the second.

Willwacher [1. Willwacher G. Storage of a temporal pattern sequence in a network. Biological Cybernetics. 1982;43:115-126.] in his paper presents results of a computer simulation of a single-layer network with associative feedback as well as feedback inhibition. He used a network with 100 elements, but each of the 100 inputs (representing a spatial, ie visual, pattern) was connected to only one of the control elements. The simulation is essentially analogue, and considers the elements as having transfer functions which can be expressed as electrical signals. With judicious choice of the amount of feedback inhibition, his model was, after training, correctly able to reproduce a temporal sequence of eight patterns after only two of the patterns were presented as input.

Fuhrmann [1. Fuhrmann G. Modelling the visual cortex with “Modulo system” concept. Biol Cybern. 1981;40:39-48.] described a model for pattern recognition in the visual cortex, consisting of six functional layers of elements which individually are referred to as “neuron modules”. Each neuron module is made up of a group of “neurons” which produce a residue number system. Interneuronal signals consist of trains of impulses in close sequence, where only the number of impulses conveys information.

Summary of Novel Characteristics of the Proposed Network

Characteristics of the proposed model which have not been applied elsewhere (at least not all together):

  1. facilitating inputs (classical conditioning via adequate stimulus substitution)
  2. complementary coding for all inputs to avoid problems of instability
  3. recurrent inhibition with a limited (and adjustable) sphere of influence
  4. the application of the reafference principle to provide facilitating inputs to enhance learning
  5. “wire-labelled” or spatial coding of inputs and output patterns, instead of individual neuron-like elements representing specific properties or features, as in Barto et al (1982) or Feldman [1. Feldman AG, Latash ML. Afferent and efferent components of joint position sense; interpretation of kinaesthetic illusion. Biol Cybern. 1982;42:205-214.].

Neural Networks in Auditory Pattern Recognition

A typical application of the neural network paradigm described above is in recognition of auditory patterns, for example:

  1. Prosody. Prosody is the inflection in speech related to affect (mood). It is affected by a number of conditions, including right hemisphere brain damage, mental illness such as schizophrenia or manic-depressive illness, or treatment with antipsychotic medication. In principle, a neural network auditory pattern recogniser could be taught to differentiate between these conditions for the purpose of diagnosis, or be used to monitor the effectiveness of treatment.
  2. “Voiceprint” applications, to identify individuals through the peculiarities of their voice, eg for security purposes.
  3. Speech recognition. A neural network-based system would be able to recognise continuous speech, from a variety of speakers, and be able to function even in relatively noisy environments.
  4. Detection and location of moving sound sources, for collision avoidance (eg vehicles, or visually impaired individuals).
  5. Recognition of sound or vibration patterns generated by machinery, eg differentiating between different failure modes of bearings.

Applications such as these would make use of the following paradigms described above:

  1. use of associative feedback to provide for processing of temporal patterns, in this case auditory speech patterns;
  2. a synaptic modification rule in which synaptic weights are increased for synapses in which both participating neurons are concurrently active (a Hebbian rule);
  3. use of a “teacher” (ie supervised learning); that is, application of operant conditioning in which a correct response brings about reinforcement which is implemented as an increase in the amount of change that individual synaptic weights undergo;
  4. an implementation of classical conditioning.

Conclusions

Neural networks have broad applicability in pattern recognition and associative memory applications. Utilising principles derived from our understanding of the physiology and psychology of brain functioning, such networks can be made even more useful by extending their range of applications to include recognition and recall of temporal (serial) patterns, such as auditory patterns.

 

One thought on “HammingNN classifier – early ideas

  1. Pingback: Schizophrenia: is it time to put this term out to pasture? – henry.olders.ca

Leave a Reply