TRANSLATION OF VISUAL INFORMATION INTO AUDITORY CODES
In: : Research Bulletin, 28, 19-56.

Bjarne Fjeldsenden, Dept. of Psychology, NTNU, 7491 Trondheim, NORWAY








Introduction

Most of this article was written during the period 1970-1972, and is composed for the most part of extracts from the theoretical part of a larger piece of work whose focus is on transformation of information from one modality to another. The points of view are basically long range and detached from immediate practical considerations; so despite new research, the fundamental ideas of this essay should not require much change.

If one takes the point of view that a blind person should navigate in the physical environment as skillfully as a normal person with unimpaired vision does, we are still very far from reaching this aim. For instance, the Ultrasonic Torch, and, later, the Sonic Glasses of Kay, and the projection of TV pictures on to the skin, either mechanically or electrically, as done by Bach-y-Rita, brings one closer; but it is, I think, overoptimistic and oversimplified to believe that these devices can perform anywhere near as well as the eye can. It seems appropriate to mention they enthusiasm in the Fifties over translation of one language into another via computer. Today one admits underestimation of the complexity of the task. I think the Kay and Bach-y-Rita devices are very important and necessary steps in the right direction, but they are the beginning, like the first generation computers, not the ultimate answer to blind mobility. I think one should be very conscious of not freezing the development at a certain stage by committing oneself too fully to a certain device. First, new and advanced technology opens up new avenues all the time, and second blind people differ enormously as far as aptitude to learn mobility and other skills goes, so a device may suit one person but not another.

At times it can be useful to stand back and try to see the forest.
 
 

Part 1

General Theoretical Considerations

In the discussion, which follows, it will hopefully be quite clear that the human auditory system is far better equipped to handle certain classes of auditory material than others. It should be kept in mind that this paper attempts mainly to answer the question of the extent to which the auditory system in man can replace the visual system. All the experiments are concerned with "visual" information presented in an auditory code, but as the skin also can convey information about the "visual world" in a consistent manner, it will be discussed briefly, and compared with the auditory system. Various perceptual problems will be discussed from an information-processing point of view, and the concepts and notions, more than the strict methods, of information theory will be used.
 
 

MODES OF PERCEPTION IN ANIMALS

One tends to think of perception as either a visual or an auditory process. It seems more appropriate to think of perception as data collecting and processing, which enhances the chance for survival of the organism in question. Not the form, but the relevancy and accuracy of the information is of paramount importance, and the organism's capacity to make an adequate response to the received information.

Examples will be given of animals that get information about the environment in a rather unusual form, illustrating that there are unusual ways by which information may be obtained. The visual world may be the "real" world for man, but not necessarily for certain animals.

Different types of bats gather information from the environment by emitting ultrasonic sound of frequencies that mostly range from 20 kHz to 120 kHz in pulses of 3- to 50-millisecond duration. The duration of a pulse varies from one type of bat to another. Some bats can send up to 200 pulses per second and the sound emitted is reflected and received back. In other words, they utilize the radar principle. The accuracy of the bats' sense in insects, and in particular to moths, has been observed experimentally. They seemed capable of discriminating among different types of moths, and the reaction of the bat when a moth changes direction, is also very fast, indicating a very good or direct connection between the perceptual apparatus and the muscles controlling this particular type of behavior (Pye, 1963; Webster, 1963, 1966). One may say that the data processing and execution were highly efficient. But, it was also found (Webster, 1963, pp. 59-76) that moths could detect sound waves ranging from 3 to 150 kHz, and when hearing the bats' signals, they started to fly in an erratic or random way, making it more difficult for the bats to catch them. It may be mentioned that bats do have eyes, but they make little or no use of vision. Fifty meters seems to be the maximum range for the bats echolocating system (Pye, 1963; Webster, 1963, 1966).

The rattlesnake has a highly sensitive infrared sensing mechanism which can respond to changes in temperature in its environment of one thousandth of a degree, sufficient to sense the presence of another animal (Gerardin, 1968, p. 95).

The torpedo fish sends out field producing currents of 100 Hz. Receptor neurons in the skin of the fish are highly sensitive to the pattern of electrical field strength resulting from the interaction of the emitted impulses with surrounding objects. Field changes of only 1 x 10-6 V/ft2 can be detected. In this way the fish can get vital information about surrounding objects and behave accordingly (Wooldridge, 1963, p. 57).
 
 

COMPARISON OF DIFFERENT SYSTEMS

One difference between the bat and the torpedo fish, on the one hand, and the rattlesnake and most other animals, on the other, is that the former group emits energy and produces a signal to get information from the environment, while the latter receives information without having to spend energy. The merits and demerits of the two modes of collecting information are interesting from various points of view.

An organism which produces its own signal is independent of energy radiation from the environment, and is consequently better suited for specific situations. The bat, for instance, operates mostly in darkness, and the torpedo fish would have a distinct advantage in murky water compared with fish dependent on

visual information. One may also ponder whether it is not easier to decode omitted signals that interact with the environment when the same organism is both sender and receiver. One would think that from the point of view of parsimony the same structure in the brain connected with forming the signals would also be engaged in decoding them. There is evidence from research on humans that the brain areas associated with producing speech also are involved in decoding it (Luria, 1966, p. 99).

It seems compatible with common sense that a system relying on energy radiation from the environment has to be more complex to decode data than a system designed to extract from the environment certain relevant features that it selects itself. This seems to be highly relevant to pattern recognition in building devices which can recognize patterns. Should the computer be linked to photosensitive cells, which give indirect information, or should it be linked to gadgets that can give more direct information such as distance data based on a type of radar principle? A general answer may not be possible. It may depend both on the task and specific conditions. What applies to man made systems may apply to living organisms. But while artificial systems are still in their infancy, the systems of living organisms have been in existence for thousands of years. It appears that both computer people working with pattern recognition, and people working in the area of perception may contribute to our understanding of how information is processed by humans and other animals.
 
 

INFORMATION THEORY AND PSYCHOLOGY

The ideas and formulas of information theory have their basis in communications language and were first formulated in 1948 by Wiener and Shannon. The two most important concepts are "information" and "bit." Attneave (1959, p. 1) defines information as ". . . that which removes or reduces uncertainty." Cherry's (1957, p. 306) definition is "The least number of binary digits (yes, no) required to encode some particular message (or alternatively to specify its selection from an alphabet). "Binary digit or bit is defined by Cherry (1957, p. 303) as " . . a code which employs two distinguishable sign only (binary digits). " "Bit" can have the form of yes/no, on/off, black/white, etc. In its basic form, it can be expressed: H = log2A. A is the number of alternatives, and H is the amount of information in terms of bits. It can also be written:

H = log2 x 1/pi where pi stands for the probability that event pi will occur. One can, for example, consider language from an information theoretical point of view. If each word had a probability of 1/1024 of occurrence, then each word would contain log2 x 1 (1/1024) = 10 bits of information. Language is not like this. The probability of occurrence of different words is' different, and the words in a sentence, where they usually occur, are interdependent. This makes the specific formula above invalid in this particular case, but the formula and the example illustrate basically what a bit is.
 
 

INFORMATION CAPACITY OF HUMANS

Perception can be considered as an information handling process and one aspect of this is information transmission. Jacobson (1950, 1951) made an estimate of the information capacity of the eye and the ear. For the eye, he first calculated the number of acuity squares by assuming that the eye had a maximum acuity of 1/100 of a degree, at the same time taking into consideration that the acuity was less when distance from the fovea was increased. He based his calculations on Wertheim's data (Jacobson, 1951a, p. 293) and arrived at an estimate of 240,000 acuity squares. He further assumed that each square could be in an on/off position, and that they could discriminate between states (e.g., black and white). In other words, each square could transmit one bit of information. He assumed also that the eye could react 18 times/ second. This gave one eye an information capacity of 240,000 x 18 = 4.32 x 106 bits/sec (Jacobson, 1951a). Color is not included in this figure. If it had been, the figure would of course have been higher. If one takes this a step further and assumes that each square, instead of being able to make only two discriminations, could make 1024, this would give 4.32 x 107 bits/sec.

It is of interest to compare this figure with some physiological data. The optic nerve has approximately one million nerve fibers (Graham, 1965, p. 51), but one would strongly suspect that there are individual differences. The number of receptors in the eye is in the order of 75 x 106 to 150 x 106 (Graham, 1965, p. 47). According to Jacobson's (1951) estimate, each nerve fiber has an information carrying capacity of 4-5 bits/ sec.

An estimate for the ear gives 0.3 bits/sec per nerve fiber for the cochlear nerve and " . . is clearly due to the greater independence with which the optic nerve signals are produced, in contrast to the prevalence of cooperative signals in the auditory bundles," (Jacobson, 1951).

Assuming that one eye has 130 x 1.000.000 receptors and each receptor is able to discriminate between two alternatives, the eye could have an information capacity of 130 x 1.000.000 x 18 = 2340 x 10 bits/sec. The difference between the last figure and Jacobson's (1951a) is that Jacobson's data are from a behavioral point of view, while the last figure is calculated from a simplified physiological point of view. One reason for the discrepancy in figures is that the receptors in the eye are highly interdependent. One may get an indication of how the various receptors in the eye are interrelated by the following (from Graham, 1965, p. 51): ". . . it has been estimated that in the middle and far periphery approximately 100 rods converge on 17 diffuse bipolar, which in turn converge on a single ganglion cell (Vitter, 1949). Diffuse polysynaptic bipolar cells, as well as diffuse ganglion cells, make up a converging system which may provide the neural basis for both facilitation and inhibition."

Jacobson's (1950, 1951) procedure for the calculation of the information capacity of the ear was as follows: he first calculated how many frequency discriminations a human could distinguish between the range of 30 to 16,000 Hz. Then he calculated how many intensity levels there were within each of these elementary signals or frequency groups. For example, at 2000 Hz, a person can make 325 intensity discriminations, which will convey 8.3 bits of information. The average between 30 and 16,000 HZ was 7.7 bits. The number of discriminations would be 1450 along the frequency scale. He further assumed that a human could make four discriminations-per-second. The capacity of the ear from these data would be 1450 x 4 x 7.7 = 44600 bits/ sec, assuming that all signals were independently and simultaneously perceived, but this is not so. By taking into consideration masking effects, Jacobson arrived at 8,000 bits/sec for random sound and 10,000 bits/sec for loud sounds. (The reason that loud sounds give a higher figure is that more frequencies can be heard.) He also calculated the total number of monaural sounds to be 330,000.

These calculations were set out in great detail, but rest on certain assumptions and certain data, which are far from adequate. One of the basic assumptions is that all the frequencies are presented simultaneously, but only one intensity discrimination can be detected within each frequency group.

One might argue with respect to this assumption that one does not know if a particular frequency could be detected if all the frequencies were present simultaneously. Even if masking effects were considered, it probably would only mean that two or more frequencies which interfered with each other when presented alone, were eliminated in the final count because they could not be distinguished in that particular combination. One may argue that the ear can only react to a far more limited number of frequency groups than 1450 at the same time, because only a limited number of cells in the organ of Corti can react or be stimulated simultaneously. This could be supported by a more general argument; that the ear is a "sequential sense," while the eye is regarded as a parallel processor.

Jacobson, aware of inaccuracies due to lack of empirical data says, "The results obtained may well be off by a factor of two or so, but may serve as guides to our ideas on the capabilities of the ear," (1951, p. 468). He may be a bit optimistic with respect to this assessment, but his general approach seems sound and worthwhile.
 

TRANSMITTING CAPACITY OF CNS

Many experiments have been conducted to determine the human brain's capacity for transmitting information. With respect to unidimensional stimuli, almost all studies, some herein mentioned, have found that man can transmit between 2 and 3 bits of information; many of the studies showed values of up to 2.3 bits. It should be noted that perfect transmission of 5 alternatives corresponds to 2.32 bits. The stimuli to be recognized could be lines of various length, squares of different sizes, positions of a pointer between two end markers, pure tones, various loudness levels, etc. Other conclusions based on this type of study were that it did not matter much if the number of alternatives were increased or if the stimuli were far apart on the dimension used.

If one increases the number of dimensions on which the stimuli vary, one gets different results. For example, one experiment (Attneave, 1951, pp. 73-75) that had six dimensions and five steps on each dimension, found total information transmitted per stimulus presentation ranging from 3.2 bits for the poorest third of the subjects to 7.9 bits for the best third. Attneave (1959, p. 75) makes the following comment: ". . . the number of dimensions on which stimuli vary appears to be an extremely important psychological variable."

For reasons other than psychological, MacKay (1950, 1952) proposed several years ago that two kinds of information--structural and metrical, or logon content and metron content-might profitably be distinguished in any body of data. The term logon content means essentially dimensionality or degrees of freedom. Metron content is some function of the number of discriminable steps or categories on each dimension or logon. These two concepts may be useful, but do not answer the question why a stimulus attribute becomes a logon. These two concepts seem quite applicable to what is discussed above.

In these experiments, involving both unidimensional and multidimensional stimuli, no fixed time limit has been used as a 'rule. Other investigations have calculated transmission-rate-per-second.

Two experiments on central data processing and transmission were discussed by Attneave (1959, pp. 67-80). Ouastler and Wulff estimated the rate of information transmission by expert pianists. Since music is ordinarily highly redundant, random sequences of notes were prepared. The range of alternative notes varied from 3 (1.6 bits-per-note) to 65 (6 bits-per-note). An estimated transmission of about 22 bits-per-second was the maximum obtained. An optimal range of alternatives seemed to be anywhere between 15 and 37 notes. With as many as 65 alternatives transmission rate declined. The basis for this decline may have been largely on the motor side, since so great a range of equiprobable notes made for large "jumps" on the keyboard between successive keys. The maximum transmission when typewriters were used with 32 alternatives was 15 bits-per-second.

Quastler and Wulff also estimated that the maximum rate of impromptu speaking is about 26 bits-per-second and the mean rate about 18 bits per second. In silent reading transmission may be as high as 44 bits per second.

When one estimates the amount of information in the example above, one uses the word as the basic unit. Words can be regarded as consisting of phonemes, and phonemes can be described from a binary point of view. The first linguists to present a fully binary description of phonemes were Roman Jakobson and his collaborators (Cherry, 1957, p. 95). The attributes chosen by Jakobson, et al., have been called distinctive features:

1. vocalic/non-vocalic;

2. consonantal/non-consonantal;

3. interrupted/continuant;

4. checked/unchecked;

5. strident/mellow;

6. voiced/unvoiced;

7. compact/diffuse;

8. grave/acute;

9. flat/plain;

10. sharp/plain;

11. tense/lax;

12. nasal/oral.

Each phoneme can be represented by cubic cells lying in a hyperspace of twelve dimensions. If one accepts the above description, then each phoneme contains 12 bits of information. Therefore, if a person can understand 3 words-a-second, and there are on the average 3 phonemes in each word, then a person can receive 12 x 3 x 3 = 112 bits/sec. This calculation assumes that different phonemes are independent. This does not correspond to reality, but is a useful approximation from the point of view of comparing various types of data processing. If one uses the letter as a unit, uses the same assumptions as above, and in addition assumes that each word consists of 6 letters, then one gets 4.7 (109226) x 6 x 3 = 84.6 bits/sec. In this case a letter is described as unidimensional. If one assumes also that each letter had 12 dimensions, one would get 84.6 x 12 = 1015.2 bits/sec. The point of discussing phonemes, letters, and words in this way, i.e., by considering them as primitives or basic units of speech or print, is to present an idea of the complexity of the data processing involved in decoding these two manifestations of a language.

Experiments by Sumley and Pollack (Attneave, 1959, pp. 76-77) showed that a subject could respond almost as fast when he or she had to choose between 256 words as between 2 words. The difference was only of the order of 10 percent, but obviously the rate of information transmitted increased, because a choice between two words only gives 1 bit of information, while a choice between 256 gives 8 bits of information.

One thing became fairly clear from these experiments and from thinking about problems in this area, that only a very restricted number of choices or decisions can be made consciously per second. When we read, for example, we may read with a speed of 5 words-per-second, and an estimate of information transmitted in silent reading gives 44 bits/sec. This stands in stark contrast to, for example, Jacobson's results. Can these results be reconciled so that they do not contradict each other?

Before answering, it is necessary to differentiate between the terms information transmitting and information processing capacity. Transmission seems to imply that data passes through an organism without being altered; while processing indicates that something happens to the information, that is, the output is different from the input Transmission seems to be a mechanical process, a process which can be explained quite easily from the basic elements. Consider, for example, a voice transmitted over a telephone line. According to Fourier's theory the voice could be considered a composite of sine waves so that, in this instance, it would seem correct to talk about a mechanical transmission. What about a parrot that can repeat certain words? Is this a mechanical transmission of information? Not entirely. Some complex processes that we know very little about must take place between the auditory cortex and the motor cortex which enervates the vocal muscles of the bird. The word coming from the parrot will probably also be somewhat distorted compared to the original utterance. One may conclude that some information processing has taken place between input and output, even if the parrot's "parroting" of a human word sounds or appears to be mechanical.

As a third example, consider a person reading aloud to another. The reader may utter the words in what to him seems a mechanical process, but it is a highly complex process. Some million receptors in the eye may be involved, and some hundred thousand bits of information may be transmitted on the level discussed, for example, by Jacobson (1951a). But it is not this information that is transmitted to the listener. Information on this level cannot be transmitted from one person to another. What is transmitted to the listener is a highly refined or processed product-words.

When contrasting transmission in an electromechanical system like the telephone with transmission of information from one human to another, as in the example of the reader and the listener, one may say that the telephone system transmits information on the same level of complexity, all the time, while a human decodes the input into a highly refined or processed product, and on the basis of this product transmits a very different and highly complex signal to the receiver.

In information transmission in humans, it should be kept in mind that much depends on how well one process the incoming information. If it is easy to process, it is also easy to transmit to another person. But if a person cannot transform the input into a meaningful unit, his capacity to transmit this information to another is very limited. The limit seems to be around 2.3 bits with respect to raw data, as exemplified by the studies on unidimensional stimuli mentioned earlier.

Attneave (1959, pp. 42-80), in extensive discussions of transmission of information in humans, seems to have overlooked that stimuli vary enormously in complexity and in the ease with which they can be processed. These are very important points when considering how well humans transmit information. If information can easily be processed by a human, it also can easily be transmitted in most cases. A word among hundreds or thousands can easily be understood and transmitted by a human, while most people would have difficulty in identifying one tone if they were presented with more than five.

In answer to the question raised earlier; are Jacobson's (1950, 1951, 1951a) reported results compatible with Attneave's (1959, pp. 67-80), one can say "yes." The explanation is that they are concerned with two entirely different levels. Jacobsons transmitted information appears in one of the first links in the perceptual system, and this information transmission is something we cannot be directly aware of. We do not know which receptors or acuity squares have been stimulated when we have seen a chair, for example. But Attneave's type of information is easy to become aware of, and is, in most cases, a highly refined or processed product, as in the case of words.

What seems interesting is that by contrasting the figures of 4.3 x 1.000.000 bits/sec with figures of the order of 2 to 50 bits/sec, one may get a clear idea of the enormous amount of data processing that takes place on what may be called the perceptual level, and appreciate that only highly processed data reaches the brain. For each word read, 10 million receptors may have been activated, but we react to the "finished product" as one unit, gestalt (or whatever term one likes to use).

The gestalt psychologists most vigorously and clearly highlighted this problem. Their summary was that "the whole is more than the sum of its parts." They formulated laws and principles, about how we organize the world; two very important laws were those of "nearness" and "similarity." These laws gave what might be considered a reasonable explanation of why we perceive the world the way we do. The type of primitives they referred to would be elements like dots and lines; units that already had been through a fair bit of data processing and open to introspection or the awareness of a person. Other parts of the theories do not fit in with experimental data, such as field theory and the concept of isomorphism. Their idea was that there is an electrical field in the brain corresponding to a "real" world outside the observer. This theory was tested by inserting gold pins and also gold strips into the visual cortex of monkeys. These devices were supposed to disturb or "short circuit" the electrical fields. No disturbance or defects in pattern perception could be detected (Lashley, et al., 1951). In spite of this result, the gestalt psychologists' way of looking at perceptual problems may still be said to be valid.
 
 

THE SEARCH FOR BASIC UNITS

It seems true to say most sciences look for building blocks on which theories can be built. In physics the atom is one of the basic units; in chemistry, the molecule. In principle one can predict "chemical behavior" on the basis of physical theories, but in practice most of the equations that would have to be solved are too complex, even with present day computers.

In visual perception the receptor seems to be a natural unit to start. It is not difficult to look at them under a microscope (Luria, 1966, p. 129). Finding out how they are organized or linked together is, however, another problem. It has been possible to observe a limited number of receptors, but this approach has made it fairly clear that one gets only a very inadequate idea of its functional organization. The bipolar and amacrine cells form a highly complex network in the eye, and physiologists and other scientists working in the area are very reluctant to draw any general conclusions.

Jacobson (1950, 1951, 1951a) starts on a somewhat higher level than the receptor, and the advantage of using the "bit" as a unit is that it is independent of a particular physical structure and can be based on how many discriminations a system can make per unit of time. This approach also makes it possible to compare one perceptual system directly with another. It is very important to have a reasonably clear estimate of the basic information capacity of a particular perceptual system to be able -to predict or calculate the maximum performance that can be expected, but it is just as important to know how good this system is in organizing these barely noticeable differences.

Psychology has no basic concept that is accepted or has been fruitful to the same degree as the atom or molecule, but "bit" seems to be a useful term if properly used. One advantage is that it can be used on all levels of analysis--but one would have to be careful in specifying what level one is discussing, and not discuss different levels at the same time.

At this stage it may be worth taking a look at some studies that tell us something about the more primitive elements or building blocks of our percepts. This evidence is of both a physiological and a psychological nature.

Hubel and Wiesel (1966) have made many experimental recordings of the reaction of cortical cells to visual stimuli. They have shown that the eye responds to formal characteristics of stimuli--to such things as edges, curves, and straight lines of different slopes. A large proportion of their experiments have been on cats. One experiment illustrates the specificity of retinal cells in this animal and how they can be considered among the first links in the data processing. A cat was presented with a narrow slit or line of light 1/8inch wide and three inches long. Only when the line had a certain orientation, i.e., when it was placed in a ten o'clock-four o'clock orientation, did the cortical cells in which the electrode was placed respond vigorously. One may say that it reacted to "straightness." In addition, a certain orientation was necessary: a five- to ten-degree deviation in orientation of the line reduced the response markedly.

Hubel and Wiesel (1966) have in other experiments shown that there are "on/off" zones in the retina of cat, and that stimulation of one area inhibits the adjacent area. They also found that a line separating two areas differing in brightness gave the most efficient response if the line fell exactly on the boundary between the "on/off" zone in the retina. Furthermore, they reported that moving stimuli were very effective as stimuli. Some cortical cells responded most vigorously to slow moving stimuli (1°/sec or lower) and other cells to a rapid movement (10°/sec or more).

Data from these types of experiments tie in nicely with results in "stabilized vision experiments." If the same spot on the retina is stimulated for 10 to 15 seconds or longer, parts of the stimulus fall away, such as the top part of a letter or a line, or a curve, that form part of a letter.

If one accepts the gestalt psychologist's notion of "good figure," it may be said that a straight line and a curved line, is "a good figure" or "a good gestalt." These straight lines and curves, then, may be considered as building blocks in the visual system. "A good figure" or "a good gestalt" may in this context mean a stimulus configuration that is easy to process. That is, the eye has a lot of straight line or curve analyzers, so the stimulus can be dealt with easily and effectively by the organism, and a simple signal can be sent to the brain. If the eye could not process the stimulus as one unit, many signals might have to be sent to the brain, and it would be more difficult to form a meaningful percept or building block out of which more complex ones could be built. Whether these primitives are something the organism is born with probably depends on what animal on the phylogenetic scale one is considering. The lower down one goes, the more likely it is that the analyzer can be attributed to "nature" rather than "nurture."

A recent experiment by Blackmore (1970) with a kitten brought up in an environment of only vertical stimuli indicated that the cat only developed vertical line analyzers and not horizontal ones. This experiment should be seen in the context of an experiment by Hubel and Wiesel (1963) reported by Gibson (1969, p. 235), in which two visually deprived kittens, 8 and 16 days old, had responses in the receptive field strongly resembling those of mature cats when exposed to patterned stimuli: "Visual experience is thus not necessary for the organization and development of striate nerve cells and their functional connection." But they found that prolonged rearing without patterned light did cause them to deteriorate. Lack of stimulation may lead not simply to a failure in forming neural connections, but on the other hand to the disruption of connections that were there originally" (Gibson, 1969, p. 235).

In view of these results one would expect that a certain type of stimulation of the human eye would be necessary if the eyes were to develop feature analyzers such as those people seem to use in the recognition of letters. This argument is quite compatible with the point of view that one has an inborn figure-ground mechanism in the visual system. If that mechanism did not exist, we would not be able to form any units at all.

The experiments discussed above are concerned with some of the basic building blocks in the visual system. Evidence also exists to support the notion that organisms have an inborn capacity to perceive global aspects of a stimulus. This seems to be more apparent the lower the position in the phylogenetic scale.

Tinbergen's (1952) experiment with the mating behavior of the stickleback fish seems to illustrate this point. What triggered off mating behavior was a certain form of a certain size and red color on a particular part of the form. Tinbergen does not say whether any learning had taken place, but on the findings put forward it seems fairly safe to assume that this particular behavior might well have occurred when the fish was in the mating season regardless of learning. Fish are very poor in learning almost anything, so even if they had had the opportunity to learn, it would be difficult to see how they could have learned this complex behavior. Consequently it would seem reasonable to assume that this behavior is "instinctive" and that the stickleback reacts to global aspects, i.e., form and color, of the stimulus.

Another example is that of frogs. They react to a dark stimulus that moves across the field of vision by sticking out the tongue--if the stimulus is dark, of a certain size, and moves with a certain speed. This stimulus corresponds to insects which are the frog's main food supply (Wooldridge, 1963, pp. 48-50).

Hess' (1956) experiment with imprinting in ducks produces some evidence that not all animals are "wired in circuits" in every respect from birth. They can learn a global aspect of a stimulus if the learning occurs during a short critical period of their life. Sutherland's (1964) experiment with octopuses indicates that this animal has a rather limited capacity for learning. It can discriminate between vertical and horizontal, but not two thick lines, which lean respectively 45° to the left, and to the right. To discriminate whether an object is in a vertical or horizontal position is probably of survival value to the octopus, and this mechanism is probably innate or easily learnt early in life. It can therefore discriminate between these gross aspects of a stimulus in a reliable way.

An experiment reported in Forgus (1966, p. 167) showed that two vertical parallel lines were seen together 73 percent of the time, while a diagonal line between them was seen together with the left line only 22 percent of the time. One interpretation of this is that the two parallel lines belong to one system while the oblique line belongs to another system. That is, different analyzers process them and send the information to different functional parts of the cortex.

Senden's (1932) investigation of people with congenital cataracts regaining their sight after an operation, indicates that humans also have an inbuilt capacity to react to global aspects of a stimulus. Hebb (1949, p. 21) has pointed out that "...what von Senden does show is the fact that patients always responded to certain objects as wholes and could on occasion detect differences between objects even in spite of nystagmus, and that there is a primitive or innate figure-ground mechanism. " This, though, does not mean that the cataract patients can identify objects. Hebb claims that unity and identity have different determinants. If one of the patients were to recognize, say, a triangle, he would count the corners. Learning was also very slow, and there was no, or very poor, generalization. It is tempting to argue that because perceptual learning did not take place in a critical period of the patient's life, the capacity to generalize, i.e. shape constancy, did not develop or was severely handicapped.

Forgus (1966, pp. 27-28) divides the perceptual process into five stages:

1. Detection of light and change in light energy.

2. The gross discrimination of figural unity.

3. The resolution of a more clearly differentiated figure.

4. The identification of form.

5. The manipulation or modification of form, as in social perception and problem solving.

Stages 2 and 4 seem to correspond to Hebb's "unity" and "identification" respectively. It seems wise to consider the list more as an attempt to conceptualize what is happening in a perceptual process rather than as absolute categories.
 
 

COMPUTERS AND PSYCHOLOGY

A third type of experiment in pattern recognition is related to the use of computers. One may say that one tries to simulate perceptual processes in certain areas by linking sensing devices such as photoelectric cells to the computer. Different approaches are used, but often one classifies the methods as either a template or a feature analysis method. Most pattern recognition programs also allow for a certain amount of learning. Computer programs can be considered to be theories of how humans perceive patterns. The template method basically "sees" which pattern in the device most closely matches, say, a letter. The patterns used have often been either letters or numbers. Another approach has been to pick out characteristics of the pattern, such as straight lines, curves, and crosses, and to give these features varying weight. A certain set of features is then likely to be a given letter or pattern. A combination of these two methods is also used to some extent.

My purpose here is not to give an account of computer-aided pattern recognition, but to point to certain similarities between this approach and physiological and psychological approaches to pattern recognition. The most obvious similarity seems to be in what, with varying names, have been called analyzer, operator, feature, n-tuples, etc. They might be regarded as the building blocks of the perceptual system. These features have to be combined. In the computer this is done by elaborate programs. In living organisms very little is known about the mechanisms, which bring about a percept.

Another factor, which also seems to be of importance is the process(es) by which a rough idea of the whole gestalt is arrived at. For humans, Forgus' account, above, gives a fair idea. For computers, one tries to "normalize" a pattern by bringing it up or down to a standard size, or getting at some of the gross or main features of letters with some variation of the template method.

The various pattern recognition programs are very far from as powerful as the human eye, but they may give us some idea how part of our visual system works. And they may help towards clearer and more explicit theories, which can be tested.
 
 


Part 2

Plasticity of Living Organisms

The higher up one gets in the phylogenetic scale, the more plasticity there seems to be in the organism. Man seems able to learn almost anything.
He learns to write, read, adjust to the world when seen upside down as in Ivo Kohler's (1964) experiments, speak, understand Morse code directly, and so on. Considered from a rather superficial point of view, one gets the impression that everything can be learned; that different elements can be formed into gestalts or rearranged. But there is evidence that certain types of stimulus material are harder to organize-form gestalts of than other types. For example, one can mention the experiences one has had so far with the Optophone (which transforms each printed letter into a sound pattern), the ultrasonic torch, and attempts to get people to decode speech sounds when presented visually, as on a screen. It does not deny that the last mentioned example could be of great help in teaching deaf and mute children to talk. But it would be very hard to understand speech when presented in a visual form without any preprocessing of the data; it would just be a lot of meaningless curves to the child. What extensive training could do is an open question.
 

ARTIFICIAL SPEECH

But the crucial point can be exemplified in the question, why is it easy to understand speech, but difficult to understand the Optophone? Since the Optophone is a device that gives a different sound pattern for each letter, it could be said that the auditory mechanism and the brain have a structure that makes it easy to process (or understand, if you like) speech sounds, but not the type of sounds produced by the Optophone. Max Clowes (1966, p. 345) puts it this way: that "it is the absence of higher-order auditory forms in the Optophone display which militates against its success." To make this very important point clear, a third example may clarify it. Wooldridge (1963, p. 164) says, "with respect to speech, the conclusion that we all employ the same areas of the cortex must appear to us to be rather remarkable, in view of the obviously artificial and acquired nature of the function." One may assume that he would use the same argument with respect to understanding speech, that it is "obviously artificial and acquired." This is very contrary to Chomsky's (1966) concept of language. Basically, Chomsky would say that we are born with the aptitude of understanding speech and producing speech. It is not foreign or artificial to humans, but a natural integral and very important part of the human structure. If not, we would need more than a lifetime to learn one language.

If learning to speak and understanding of speech were not "natural," we would never learn it to the extent we do. Part of our organism is particularly well suited to this type of information processing. It is easier for a person to produce speech sounds than arbitrary sounds or noise even if the vocal chords of the person could do it. In the same way, it is easier to make sense out of speech than out of the sound from the Optophone because we may assume there is an underlying structure more suited to cope with this type of information. There is obviously room for variation indicated by the fact that there are three to four thousand different languages in the world. Speech sounds, though, will belong to a restricted group of sound patterns that may be defined in physical terms.

In the following discussion the above mentioned points will receive more detailed consideration.

The first part will present evidence for the flexibility of man's learning capacity, and the latter part present some data that indicates a limitation--that everything is not learned with equal ease. This way of discussing the problem does not imply a dichotomy, that is, that certain things can be learned and other things cannot. Rather, it implies that certain stimulus material is more "natural" to learn than others. Normally, we learn what is "natural," but at times we classify it as "artificial," as in the above example from Wooldridge, but from some sort of superficial criteria. One seldom encounters "real artificial" stimulus material in a learning situation. We are learning what is behaviorally important and the organism generally has a structure that makes it well suited for this type of learning.
 
 

TRANSFORMED VISUAL INPUT

Stratton, in 1897 (Bartley, 1969, pp. 400-402), was the first person to study how humans adjusted to inverted vision. The results indicated that humans could learn to recode the distorted visual input relatively quickly. Ivo Kohler (1964) has made more extensive studies in this field, and in the following some of his findings will be reported and discussed.

Kohler (1964, p. 31) mentioned that a "very interesting investigation by a Russian had been reported in Universum (1950 No. 2). According to this paper, patients suffering from cataracts regained their sight when their corneas were used as focusing screens for projecting real images. By the time those images reached the retina they were right side up. This, however, did not disturb the patients, who, so the author maintains, soon began to perceive objects as right side up. Apparently, inversion of the retinal image is not a necessary prerequisite for veridical perception."

This is a general statement that sounds rather convincing, but not elaborated upon with respect to data and methods. Kohler set out to investigate this and other types of distorted visual input. The first experiment he reported took place in February 1947 and had a duration of six days. The subject used spectacles through which he saw the world inverted. Kohler (1964, p. 31) reported, "At first, this subject saw everything inverted, could not grasp objects without making errors, was extremely unsure of himself, and had to be escorted at all times. After three days, marked improvement was noted in all respects. On the fourth day, the subject went on a bicycle trip, on the last day he went on a ski excursion. During all this time, however, his perceptions were only sporadically right side up, things appeared right side up only when they were simultaneously touched, when a plumb line was used, or when they happened to be in the subject's vicinity." After having removed the spectacles at the end of the six days, the subject reported having apparent movement experiences and slight spells of dizziness. Occasionally, and only for a few minutes after the removal of the spectacles, did the subject see objects as inverted. In other words, the readjustment was rapid.

Shortly thereafter a second experiment is reported by Kohler (1964, p. 32); duration nine days, same spectacles but a different subject. After four to five days, the subject reported remarkable changes, and that the vertical dimension had become lost. For instance, two adjacent heads, one upright, the other inverted, were both reported as upright. One may say that the principle of economy seemed to operate here. It also seems to illustrate very nicely the fact that the visual system does not work like a camera. One may also interpret this result from the point of view that the visual system identifies things on the basis of a lot of perceived dimensions, and that ordering it upright or inverted is not very important, and not an outstanding feature for the visual system. When we have identified a thing, we know how to react to it. We do not have to decide if it is upright or inverted.

A third experiment (August, 1950) is reported by Kohler (1964, p. 33); duration 10 days (123 hours). The same spectacles and the same subject as in the previous experiment were used, but this study focused on the period of transition during which upright vision first began to emerge. The first experience the subject had of seeing something as reinverted, was when he touched the object with his hand. In other words, by touching the object with his hands, he managed to see the object as upright with the spectacles on. It was a sudden transformation. Gravitational pull and familiarity with objects seemed to be other factors that contributed to veridical perception; that is, helped the subject to see the world upright.

In these three experiments with inverted vision it looks as if four to five days are enough for a subject to make an almost perfect adaptation to the distorted input, and two to three days are enough to recover completely from it again.

Another series of experiments by Kohler (1964, pp. 34-42) involved the use of prismatic spectacles (wedge prisms):

1. January, 1933, duration 10 days.
"The subject wore binocular prismatic spectacles whose angle at the apex was 15°, the bases arranged to the left. All signs of behavioral difficulty disappeared after only one day". "After ten days of continuously wearing the spectacles, all objects had straightened out and were no longer distorted." After removing the spectacles, the subject experienced impressions of curvature distortions and apparent movement. Aftereffects lasted for four days.

2. February, 1933, duration 12 days.
The same subject as above wore a pair of spectacles of which only the bottom halves were prisms, the top halves being ordinary glass. After 12 days integration of the two images occurred sporadically.

3. April, 1933, duration 22 days.
This time the subject wore a monocular prismatic spectacle with a 15° angle. Perceptual aftereffect was transferred to some degree to the other eye, which had been covered during the entire experiment. This should clearly indicate that higher cortical processes are involved.

4. November, 1946, duration 124 days.
Kohler himself was the subject in a binocular study. Worth noting is that aftereffect interfered with normal vision for weeks, and "that those after effects which had taken longest to build up were the ones which persisted longest."

5. April, 1947, duration 50 days.
The upper part of the spectacles used distorted vision 10°, the lower part was normal glass. After 10 days there was a gradual adaptation to the prism with out concomitant disturbances of normal vision. The subject's vision had become differentially adapted to both conditions.
This recorded series of experiments indicates that the visual system somehow manages to adapt to a distorted input with surprising plasticity. The transformation process is something that takes place mainly on the perceptual level and most of the learning takes place without awareness.

A third series (Kohler, 1964, pp. 42-46) of experiments were with colored glasses, of which two will be mentioned:

January, 1947, duration 20 days.
The subject wore a pair of spectacles of which the left half was colored blue and the' right half yellow. In the course of the experiment, it was noted that both colors subjectively faded away. This suggests that the same retinal areas became simultaneously adapted to complementary stimuli. Looking to the right (without spectacles) resulted in increased sensitivity to blue, and looking to the left, the same sort of sensitivity to yellow.

March to April, 1947, duration 8 plus 19 days (with a two week interruption).
The spectacles were ordinary glasses covered with a red diagonal stripe one centimeter wide. This stripe was "overlooked"--that is, it faded away after some time, but a green beam was reported for three to four days after removal of the glasses on the corresponding spot.

From the above, one is struck by the workings of the visual system and man's amazing adaptability, or, put differently, the organism's remarkable ability to learn to make sense out of distorted visual input. How does it happen? There is not a satisfactory answer to this question; that is, no theory that can explain what is taking place with any degree of detail and confidence. But one becomes aware that an enormous amount of data processing takes place, and this mainly of an unconscious nature. It may be worth mentioning some of Kohler's comments (1964, p. 123) on these experiments. "The retinal area becomes the isomorphic equivalent of mnemonic details by 'sorting' all excitations according to the circumstances governing their recurrence, and not by merely summating them unselectively. If it did that, adaptation to alternating complementary stimuli (opposite distortions) would be completely impossible . . . ."

". . . the link between the original optical data and the situational factors is established not directly but in a roundabout way, via afterimages which are sometimes enhanced, sometimes suppressed." This should clearly indicate that man's visual system cannot satisfactorily be explained as "wired in circuits," or a camera; that is, a mechanical type of theory seems utterly inadequate.

Further, Kohler says (1964, p. 127): "It is time we give extra thought to this whole phenomenon of increasingly veridical perception which always occurs when experimental spectacles of any kind have been worn for some time. What is the advantage when a taut string, for example, begins to look straight to us no matter how curved the corresponding retinal image may be? Or when a rigid substance keeps its rigidity no matter how elastic it has been made to appear with the spectacles?"

"We are confronted here with a peculiar relationship between optical and physical facts. We always find that it is the physical dimensions of things which have a tendency to become visually correct." A bit later (pp. 127-128) he says: " . in the process of adaptation, it is always the world with which we are familiar which wins out in the end. It does so in the interest of simplicity and economy."

One paragraph later, on page 128, Kohler goes on to say: "What good is a theory of sensation which is not applicable to complex situations and which necessitates our formulating ad hoc hypotheses every time some incidental condition is found to be present? Yet this is precisely the state that the study of perception has been in."

It might be added that this adaptation helps the organism to adjust to the environment. One may say it has, ultimately, survival value. But at the same time it is surprising to what extent the subjects in these experimental situations could adapt. Experiments with animals indicate that their adaptability is very restricted or cannot adapt at all when they receive distorted visual input. In man it appears as if the input can be anything, and, provided there is opportunity for learning so that the distorted input can be related to reality, man seems able to make sense out of it. This is the type of conclusion one is tempted to draw from Kohler's experiments.

One would, on the basis of these studies, think that man would be able to learn to relate to the physical world in an adequate way when given this information in an auditory code and if appropriate training was implemented. But before formulating any hypotheses in this respect, more relevant material will be examined.
 
 

WHAT MAN CAN LEARN

Another example of the visual system's "ability to learn" is given in Bruner, Goodnow, and Austin (1956, pp. 46-47). The authors tell how a person trained in histology learned to distinguish corpus lutum from the surroundings. The person was trained to see it as a gestalt: "What is happening here is a recording in the stimulus input in terms of those features of the object perceived that make possible the reconstruction of the remainder of the object. Such reconstruction is possible because in fact the defining features of most objects and events are redundant with respect to each other." In other words, the eye can extract a multitude of features from an object or an event and piece them together in a way that is desirable. These features then will, after some time, form a gestalt for us. In other words, we organize part of our surroundings for a particular purpose that is useful to us.

Reading

Learning to read is a third example of how we form gestalts. At first a person has to learn to discriminate among 26 letters if he or she is living in an English-speaking country. After a rather short time this is learned and then the process of uniting the letters together to form words begins. This is a somewhat lengthier process, but can still be learned in a matter of a year or two. With further practice, one is able to take in perhaps a whole sentence with one glance. Higher order hierarchies are thus built up.

It is in a way surprising that these artificial signs that we call letters can be strung together in such a way; perhaps the letters are not that artificial. The gestalt psychologist would talk about "good figure," implying that certain configurations are easier to perceive than others, and letters seem, to some extent, to be of such a kind. Perhaps they are also made in such a way that they are maximally discriminable within the limits of economy, that is, they do not take up too much space and are easy to print or write. But this objective again may be contrary to, or at least not helping, their incorporation in an organic way into a larger gestalt, a larger hierarchy. It seems to be true, to some extent, that the speed of reading approaches an asymptote, say after five years, even though special training methods can attain more. If one, for example, considers how humans can perceive a picture or an object as a gestalt, even if they cover a bigger area of retina than, say, a page of a book, one must admit that that type of perception is more effective. When we consider that we need perhaps two minutes to extract the meaning from a printed page, one is reminded of the Chinese proverb that one picture is better than a thousand words.

In spite of these reservations, it is surprising to what extent one can build up gestalts in reading. The reason for this is that we read much faster than we could possibly do if we paid attention to each letter. Certain types of dyslexia or aphasia make us aware of how vulnerable this system is. Small or perhaps undetectable organic injuries or malfunctioning can upset this learning process, rendering a person unable to build up these gestalts, or to make it a much harder task.

Writing

Writing has much in common with reading, but speed here is limited by the speed with which we can move the hand. Initially, when we learn to write, we have to pay a lot of attention to how we form a letter, but after some years of practice it is enough just to think of the word, or, what is more common, a string of words or a sentence, and the right letters are formed.

Writing is in one way exactly the opposite of reading. While in reading we gather information from many letters and organize them so that the end product is a meaningful word or an idea, the writing process starts from an idea and the end product is a string of letters. But both processes involve an organization of a hierarchical nature. Both processes imply the organization of a gestalt or an idea of a unitary nature within the organism.

Speech

To understand speech and to speak can, in principle, be considered to be very similar to the reading and writing process. While the letter can be considered to be the basic unit in reading and writing, the phoneme can be considered the basic unit in speech. Bruner, et al. (1956, p. 249), indicates that the phoneme is often called the smallest unit of speech that "makes a difference" to a listener or a speaker. The gestalts built up here are of the same type as in reading and writing, which is natural enough. Speech precedes reading and writing. The latter is basically built up on the base of the spoken word. So listening and speaking is a way "natural" to man, while reading and writing are "artificial." With respect to speed, it is surprising to discover that on the input side, the "artificial" input, reading, is two to three times more efficient in taking in words than the ear; while on the output side, it seems to be the other way around.
 
 

PERCEPTION

Still, with respect to all the four functions--reading, writing, listening and speaking--it is clear that an enormous amount of data processing or organizing is taking place, and that almost all of this organizing is taking place on what may be called the perceptual level. Most of this data processing is taking place without the awareness of the human involved in it.

As far as the learning process goes, it seems to be more of a conscious element in the four processes just discussed than in the adaptation that took place in Kohler's (1964) experiments. One may say it is more a cognitive process, which is involved in learning to read, write, speak, and understand the spoken word than in learning to adapt to distorted visual input.

It is not difficult to find examples of animals that cannot learn perceptual tasks. The chicken that cannot learn to pick the grain when the visual field is displaced 7° to the right or to the left with prisms (Hess, 1956) is one clear example of limitation in learning capacity of perceptual tasks in animals. However, with respect to man, things are rather different. It is difficult to find good examples that illustrate limitation in man's learning capacity. One area that does give good examples is work in relation to reading machines for the blind.
 
 

UNDERSTANDING SPEECH

Studdert-Kennedy and Cooper (1966, p. 317) are in agreement with others (Attneave, 1959, p. 80) when they say, "Normal speech may be comfortably followed at a rate of more than 200 words-a-minute. The listener handles some 40 to 50 bits of information a second. This rate is an order of magnitude greater than listeners have achieved with non-speech code so far developed. There seem to be two main reasons for this. First, speech signals are multidimensional and their dimensions are not arbitrary: they are determined and organized by characteristics of the articulatory apparatus that generates the signal. Second, a reason closely related to the first, speech signals form a complex pattern of overlapping, or shingled cues, a flowing auditory display of which elements are provided to the listener in parallel rather than in series" (italics added). This should support the view that the ear of a human is specifically well suited to process speech sounds. One may contrast this with the following: "Nye modified the conventional Optophone to permit variations in the number of dimensions in the display. He found that an increase in the number of dimensions could improve rate as well as accuracy of identification. But performance was still poor--less than two bits/sec with the more successful display" (Studdert-Kennedy and Cooper, 1966, p. 322).

Why is it that speech is easy to understand? "To sum up, we have argued that the only known acoustic signal adequate to the coding of written text for easy and rapid assimilation is speech. The advantages of speech stem from its intricate, multidimensional pattern of overlapping cues, determined and organized by the articulatory apparatus. Some other satisfactory set of dimensions and principle of organizations might be found. However, there is no rationale for the search and we seem little closer to the discovery today than we were fifty years ago. We are therefore inclined to accept our fate and to give our attention to the development of a reading machine that talks" (italics added) (Studdert-Kennedy and Cooper, 1966, p. 323). This seems to emphasize that we know very little about the perceptual mechanisms that underlie our processing of speech. The authors have earlier in the same article pointed out how one sound. is dependent on sounds both before and after for its meaning. In other words, the sound elements are highly interdependent.
 

SPEECH SYNTHESIS

Work has been done in the field of synthetic speech. Studies on acoustic cues for speech perception made it possible in the late Fifties to consider the use of the cues in a set of rules that could be applied to a text (written in phonetic transcription) and would generate synthetic speech from it. Liberman, et al. (1959), and Cooper, et al. (1963) drew up the rules for the synthesis . The conversion of a phoneme sequence into intelligible speech was done in a rigidly prescribed manner, so that a computer could be used to synthesize speech essentially by these rules. Gerstman and Kelly (1961) demonstrated the use of a computer to synthesize by essentially these rules. Studdert-Kennedy and Cooper (1966) seem to say indirectly that these methods may produce some bizarre sound sequences that remind us little of English. English, unfortunately, is not a language that always goes by the rules for pronunciation, because 20 percent of English words have their own unique pronunciation. Accepting this, one would expect 20 percent "bizarre sounds" to be generated! Many other languages (for example, Spanish) would be better in this respect, since letters are pronounced consistently far more often than in English. Another problem in reading machines is to develop a pattern recognizer that can recognize all or most types (fonts) of letters, but this is a different problem.

Riley (1966, p. 420) reported that the two best of six subjects reached a reading speed of 5.5 and 4.0 words-per-minute after 200 training lessons on the Optophone. Each lesson lasted about one hour. This can hardly be called an encouraging result. Another study at Batelle Memorial Institute quoted by Riley gave an average score of 12 words-per minute for 3 subjects. They were ten years younger, on the average, than the subjects in Riley's study, and of normal intelligence. The subjects in Riley's group ranged from 110 to 141 IQ on the Wechsler Adult Intelligence Scale. Age, and probably more important, a slightly different "set up," were the main reasons for the discrepancy in results.

As these results indicate, one is far from having a reading machine that is of practical use for normal reading. What is needed is to have some data processing device between "the raw sound" and the receiver. Research data on the salient psychological dimensions would help, given relevant information, in constructing such a device.
 
 

THE BRAIN AND DATA PROCESSING

To gain further insight into various types of auditory data processing, it seems worthwhile to take a look at some of Luria's (1966) work. He began by discussing localization in the brain of areas that have certain functions as compared to the view that the brain is able to do all sorts of functions. Luria takes no dogmatic stand, in the sense that he points out that this is not an either/or question. He emphasizes, for example, that what is called a "function" is a highly complex process. The following seems representative of his attitude: ". . a sensation is always an active reflex process associated with the selection of the essential (signal) components of stimuli and the inhibition of the nonessential subsidiary components. It always incorporates effector mechanisms leading to the tuning of the peripheral receptor apparatus and responsible for carrying out the selective reactions to determine the signal components of the stimulus. It envisages a continuous process of increased excitability in respect to some components of the stimulus and of decreased excitability in respect to others," (Granit, 1955; Sokilov, 1958). In other words, sensation incorporates the process of analysis and synthesis of signals while they are still in the first stages of arrival. These concepts, so fundamentally opposed to the previous hypothesis of dualism (the passivity of the first physiological and the activity of the subsequent psychological stages of perception), constitute the principal distinguishing feature of the Pavlovian view of the sensory organs as analyzers. According to this view, from the very beginning the sensory cortical divisions participate in the analysis and integration of complex, not elementary, signals. The units of any sensory process (including hearing) are not only acts of reception of individual signals, measurable in terms of thresholds of sensation, but also acts of complex analysis and integration of signals, measurable in units of comparison and discrimination. The sensory divisions of the cortex are apparatuses responsible for this analysis, and indications of lesions of these apparatuses are to be found, not so much in a lowering of the acuity of the sensations, as in a disturbance in the analytic-synthetic function," (Luria, 1966, pp. 97-98).

Luria is not quite explicit about the use of the term "signal," but other parts of his writing seem to indicate that he considers them somewhat more complex than the Just Noticeable Difference (JND). The passage quoted can be considered an explanation, on the physiological level, of how a person learns to see a cell, by combining the various features or signals from a complex stimulus. (Compare Bruner, et al., mentioned earlier.) Both Bruner's (1956) and Luria's (1966) point of view seem to be that a person can combine and form a gestalt or chunk out of whatever array of stimuli he chooses to pick. In other words, they seem to support "nurture" rather than "nature." One implication, from this point of view, would be that one could expect a person to learn to understand "visual" information in an auditory code when trained to do so.

While Luria's remarks may be regarded as evidence for the brain's plasticity, the following points to its specificity: "In the context of the present description one further fact, well known in clinical practice but not yet amenable to psychological and physiological explanation, may be mentioned. The disturbance of phonemic hearing as a result of a lesion of the temporal region is not necessarily associated with a disturbance of melodic (musical) hearing; in fact, the latter is more commonly preserved. Conversely, as several writers believe, (Feuchtwanger, 1930; Ustvedt, 1937; Ombredane, 1945; etc.), a lesion of the right temporal region, and (according to other findings) a lesion of the left temporal pole, does not affect phonemic hearing, but may lead to impaired discrimination between tones and rhythms and manifestation of amnesia. This finding implies considerable selectivity of disturbances of complex cortical functions in the presence of circumscribed brain lesions. However, the relationship between a disturbance of phonemic hearing and a disturbance of musical hearing requires further, careful investigation," (Luria, 1966, pp. 112-113).

In support of the point that music and speech are processed partly by different mechanisms, the following may be mentioned. G. Selby, in a paper entitled, ",Localization of the Parietal Lobe Function and Dysfunction," in an interdisciplinary symposium on cerebral localization at the Royal North Shore Hospital of Sydney, Australia on the 5th April, 1970, told the audience that he had had a 72 year old man as a patient who could understand music or a melody either when it was hummed to him or when he was reading the notes, but that the same patient had a very poor understanding of speech and printed material.

The unique position of speech is very well expressed by Luria (1966, p. 100): "Modern linguistics tell us that the articulated sounds of speech differ radically from sounds not related to speech. Two features characterize the sounds of human speech. In their origin and structure they are always organized in a definite objective language system, and consequently, they are special, generalized sounds. Physiologically they are always complex and are produced with the aid of the phonation-articulation apparatus, without which they can be neither pronounced nor perceived."

With respect to the point that there are common mechanisms in speech and listening, G. A. Miller, in discussing what is universal in languages across cultures, points to the fact that we alternate in speaking and listening, and he says: "Perhaps there is some limit imposed by the agility and alteration, perhaps some critical component of the speech apparatus must be actively involved in the process of understanding speech . . . ." (Miller, 1965, p. 95) .

The above evidence indicates rather clearly that speech sounds are easier to process for a human than other types of sounds, perhaps with the exclusion of musical sounds. What are the underlying features of musical and speech sounds that make them easy to form gestalts? It would be of interest to investigate other types of sounds with respect to how easy or difficult they are to organize.
 
 

Part 3

General Discussion

Humans are highly dependent. on information from the environment, and the eye is the most important and effective transmitter and transformer of various types of information. Two classes of information are especially important. One is information that enables us to move around; the other is symbolic information, most often in the form of ink print or pictorial displays. One has to approach these. two classes of information separately for blind people. Some people have concentrated on building reading machines for the blind, others have put efforts into devices that would aid a blind person's mobility. The two groups do not seem to interact significantly. They listen to each other at meetings, but work in one field does not seem to influence work in the other.
 
 

COMPARISON OF EXPERIMENTAL RESULTS AND THEORETICAL EXPLANATIONS

The present analysis started out with the notion that both problems, mobility and reading, could be considered as information-processing and collection. The auditory modality was chosen for study. One experiment gave support to the notion that two types or classes of sound have much in common, psychologically speaking. A correlation of 0.65 between "15 Shapes" and part one of "Wing's Standardized Tests of Musical Intelligence" seems representative for this experiment (Appendix 1)

"15 Shapes" was a test in which the subject was asked to identify 15 shapes described auditorially. A mechanical contour follower was connected with an oscilloscope so that when the contour follower went up a slope an increasing frequency was heard, and a decreasing tone was heard when it descended. A horizontal plane gave no alteration in frequency. The subjects were trained for about one hour.

Another experiment shattered this notion. The correlation between the same part of the musical test and the score on an obstacle course using the Ultrasonic Torch was 0.18 (Appendix 2).

The basic and main, formal information-carrying aspect of the sound from the ultrasonic torch is pitch, but one cannot be sure that this is so psychologically. Listening to the sound from the Torch and to musical sounds gives, subjectively and psychologically speaking, very different experiences. One factor involved is that the Torch produces the sound in pulses, so the sound is coming and going the whole time. Another is that the sound is very complex and diffuse and does not have the clarity and definition of musical tones. A third factor is that pitch may easily be confused with loudness, because the torch seems to have a maximum loudness at about five feet from an object. So, even if the main information-carrying feature is variation in pitch, it seems very hard for a person to pick out this aspect of the stimulus. To be able to do this may require long training and/or a special set of aptitudes. The low correlations among the three Torch Tests: Obstacle Course, Distance, and Recognition (Appendix 2), indicate that it is not only a matter of processing auditory information; that is, auditory information processing does not seem to play an overwhelming role. If that had been the case, one would have expected these three tests to correlate more highly with each other because the sound from the Torch in all the three test situations may be considered to be similar.

Assuming that each of the three Torch Tests has a reasonable reliability, one will have to conclude that other factors than the auditory aspects of the test situation were of importance, because of the low correlation between the three 'Torch Tests. One can see that the correlation between language and music is low, and this also applies to the correlations between the Torch Tests and these two aptitude tests (Appendix 2). One would think that these three groups of tests--the Torch Tests, the language test, and the music test-had a reasonably high intercorrelation, because they all have sound as a key feature, an important element, in them. But the data indicates that psychologically these three test groups have very little in common. It seems consequently rather meaningless to speak about auditory information processing as if this is a unidimensional concept. The three types of sounds discussed are all transmitted by the human ear, but after that these sounds are apparently dealt with by different functional systems within the human organism. Both "15 Shapes" and music seem to measure fairly pure and simple aspects of the sound stimuli; the musical sounds and the sounds in "the 15 Shapes Test" are similar, psychologically speaking, and each to quite an extent belongs to a closed system that is fairly independent and an end in itself, while both language and the Torch Tests seem to be heavily involved with other structures of the brain. Language is integrated with the cognitive structures. Without that connection man would probably not have been able to develop a symbolic and objective language system. One of the requirements of the test situation with respect to the Torch Tests, the Obstacle Course in particular, was that the sounds from the Torch were to be linked up with motor behavior. There probably is no antecedent in the human nervous system that may facilitate these alien sounds being processed nor in being connected in any specific favorable way with the relevant part of the motor cortex. So the Torch sounds are probably propagated through the brain in a rather diffuse way, trying to link up with the appropriate motor areas. How well a person learns in such a situation is probably more dependent on higher order strategies, of which the subject may be conscious or unaware, than on any specific aptitude. In theory this could be tested (that is, how the signals or electric impulses were propagated) if one had microelectrodes in various parts of the brain, and then observed which area became aroused by the different types of stimuli. An EEC would probably also give information of value. One way of approaching the problem would be to use the conceptual framework outlined by Luria (1966, pp. 70-77). The main point he makes is that different tasks require different functions, and these functions can be considered as consisting of small data processing units. The various units are different with respect to importance for various functional systems, and lesion, malfunctioning, or poor performance of one unit can influence the whole functional system in various ways. This approach could be suitable for computer simulation.

In Kohler's (1964) experiments, the subjects learned relatively quickly to react appropriately to the transformed visual input. The Ultrasonic Torch can be considered as giving a transformed auditory picture of the physical world, but few people learn to decode this information in such a way that it greatly improves their mobility. What applies to the Torch also applies to the Optophone (Clowes, 1966; Riley, 1966). That is, they do not learn to use this information efficiently because it is too difficult to decode with sufficient speed. There is a clear contrast between the performance of the subjects in Kohler's (1964) experiments and the subjects in the experiments where the Ultrasonic Torch was used for mobility purposes.

The subjects in Kohler's (1964) experiments could do things like skiing and bicycling after about four days. These activities are considered rather complex, and require a fair amount of information from the environment. This information had to be processed, coded, or understood forthwith by the subject if these activities were to be performed successfully. No activity of this complexity has been tried, to the present writer's knowledge, using the Ultrasonic Torch for guidance or monitoring of behavior. The most proficient Torch users can walk down the street or on a pavement almost as fast, efficiently, and safely, under normal circumstances, and in a familiar environment, as a sighted person. But these blind people normally have had at least several months' training and experience with the Torch, and only a few seem to reach this stage of skill. One inherent restriction with respect to the Torch is that its maximum range is 30 feet. This probably makes it necessary for a blind person to adopt a different strategy for his mobility behavior than the person who sees the world inverted. One would think that orientation in particular is more difficult with the Ultrasonic Torch, while a person can orient himself or herself almost as easily in spite of seeing the world inverted. The range of sight would be the same as under normal circumstances.

When comparing the Ultrasonic Torch with the Optophone, one gets the impression that the former has been somewhat more successful than the latter. The author has observed only three out of fifteen blind people trained for at least some months in using the Torch who could be characterized as good Torch users, while the rest seemed to be equally well or better off with a cane or a guide dog: Out of another group of 12 visually handicapped people having three lessons with me in Oslo in using the Ultrasonic Torch, one managed after this training to walk along a guideline made up of grass and soft bitumen (the type used on a racing track) as fast and confident as a sighted person walking with more than average speed. He went blindfolded because he had some vision, and could be characterized as musical, intelligent, average in sport, and open-minded. At least 3 students out of 102 could be categorized the same way. Maximum reading speed with the Optophone after extensive training is low (10 to 25 words-per-minute), (Smith, 1966, p. 367). The impression one gets from the literature is that the only person who seems to use the Optophone extensively and for whom it is a practical proposition is miss Jameson, (Dufton, 1966, pp. 317-407).

One may ask why the subjects in Kohler's (1964) experiments performed better than people using an Ultrasonic Torch. Two points can be made:

1: The recording of the transformed visual input is a relatively simple process or task. For
instance, inversion of the retinal image has no basic consequence for veridical perception (pp. 29-30).

2. The visual areas of the cortex would most probably still have the same good connections with the
motor areas relevant for mobility in spite of the transformed visual input.

The first point is probably the most important one. Going back to the auditory input one realizes that two problems have to be overcome. One is to find auditory stimuli that easily form gestalts, and the second is to link these perceptual gestalts to the other structures relevant to a particular function. with respect to the reading function, the problem seems to have been solved in principle, in that reading machines for the blind have so far been most successful when speechlike sounds have been used. The superiority of speech-like sounds over other types of auditory displays has been clearly demonstrated, (Clowes, 1966, pp. 344-350).

Consider for a moment that another class of sound stimuli other than speech sounds such as Morse Code was used in a language system. That is, are the speech sounds unique with respect to suitability for linking up with the cognitive structures of the brain? Would it be harder to integrate other types of sound structure with the cognitive structure related to language? There is evidence to suggest that this could be realized. One can use the objective language system for reference and meaningful units, despite the varieties of signs, as found in patterns of letters and the Morse Code. The most crucial factor seems to be that a gestalt or unit can be easily formed at the word level. But how would "an Ultrasonic Torch with a spoken word output" be for mobility? This question can actually mean two things. One is a Torch that actually acts like a human being, say the air traffic controller that "talks down" a pilot under poor visibility conditions. This probably conveys to the pilot a totality, as words become translated into the spatial positions of the aircraft. Such a Torch is not a practical proposition. Technology and science have not advanced that far. But say one could construct a type of Torch that could give a speech-like output, that was easy to organize in the same way as speech sounds are easy to organize into words. Would it be easy to link these gestalts up with the appropriate motor areas in the brain relevant for mobility? Would it be possible to get a good dancer to perform the same movements when directed by words instead of music? It seems doubtful. Words seem adequate when it comes to general behavior and to gross aspects of mobility behavior, but when it comes to the finer nuances of movement, music seems to be more effective. One may interpret Luria's (1966, p. 99) writing as support for the present argument, even if Luria seems to have a general tendency to argue that everything can be associated with everything else. The consequence of the argument put forward here would then be that one should try to make the sound from an Ultrasonic Torch or a corresponding device more like musical sounds, while the present consensus to make reading machines that give speech-like output seems to be a sound one.
 
 

MOBILITY

The discussion above has been general in the sense that it has concentrated on what type of auditory output is most compatible with the human perceptual and motor systems in devices constructed with blind people in mind when these devices are to he used for reading and mobility. It seems reasonable to assume that orientation or keeping direction is an important part of the mobility function. Experimental results (Appendix 2) give some support to this notion. Other researchers in this field have been aware of the problem, and at least three different devices designed for this particular purpose have been constructed. Jacobson's (1963, Vol. 1, pp. 193-197) device was based on a magnetic compass. Swail's (1963, Vol. 1, pp. 199-204) was developed from radio receivers with ferrite rod antennas because of its extreme directivity, and Kohler's (1966, pp. 215-219) used galvanic stimulation that affected the vestibular apparatus. Kohler's (1966) idea in particular seems interesting. His device requires no learning and would probably have an advantage over the other two devices in real space. If the burden of keeping direction is taken away from a blind person, more concentration can be devoted to avoiding obstacles. So it seems quite feasible to use a directional device in connection with the Ultrasonic Torch.

The eyes in a human are so effective in monitoring our behavior, mobility behavior included, that they do this almost always without our awareness. When we walk from A to B we do it automatically and think of other things when we walk. If a blind or blindfolded person tries to do the same with the help of an ultrasonic Torch, it requires full concentration, and certain things in the way will be uppermost in the mind of the blind traveler. When speaking to blind people one gets the impression that what they fear most is holes in the ground, or that a relatively level surface suddenly goes steeply downwards like a flight of stairs. This would suggest that a device like the Ultrasonic Torch should have special feature detectors built into it. To do this successfully a careful-analysis of the physical environment would have to be carried out beforehand to find out what is behaviorally important for safe and reasonable efficient mobility. Besides holes in the ground, one can think of other things that it would be important to detect if safe travel is the objective: fast moving objects such as cars, motorcyclists, and bicyclists. Sharp edges and low obstacles are also often things that could endanger travel. It is important that the whole physical environment relevant for traveling be considered as one system, and that "the Ultrasonic Torch system" should be constructed with this in mind.

The point of view that the environment or ecology should be systematically studied is expressed in a general way by Brunswik (1966, pp. 510-511): "I have advocated that in psychology research not only individuals be representatively sampled from well-defined "populations" but also stimulus situations from well-defined natural cultural "ecologies" (Brunswik, 1947; 1949); only by such "representative design" of experiments can ecological generalizability of functional regularities of behavior and adaptation be ascertained. Representative sampling of situations from the ecology allows us to take cognizance of the occasional major failures that result from the fallibility of perceptual cues or behavioral mean while at the same time fully recognizing the favorable cases also. Generalization of the achieved degree of success to the ecology as a whole becomes possible with the use of the routine technical criteria for sampling statistics hitherto confined to differential psychology."

How is sampling of the environment to be performed? Brunswik in an early size-constancy experiment gave the following indication, (1956, p. 44): " . . . randomly sampled from the normal environment of a university student, stopped in her daily routine. . . to write down her estimates of the extension that happened to be most prominently attended to by her as "figure" in her visual field of the moment, as well as of other elements of the situation, shifting from one of five attitudes to another."

How useful would this approach be applied to a blind person? The basic notion of sampling of the physical surroundings in which the blind person is traveling seems sound, but to simply stop the blind traveler at random time intervals seems rather pointless. It seems necessary to consider not only the actual travel paths of a blind person, which often are very restricted, but what their future travel paths would be. That is, what should be considered is not only what the blind person is doing, but what he or she would like to do if they could do it safely and with confidence. Conceivably, one could follow a blind person traveling under various conditions deemed representative of the type of traveling relevant for a particular blind person, to see what type of problems he got into, and what would be needed to solve them. For instance, a different type of traveling aid system might be needed for a person living in the country as compared with one living in a city. For instance, a farmer who wanted to detect his animals could have a torch with built-in infrared sensing mechanism, and one who traveled a lot in city areas could have a device that could detect "walk" at traffic lights by being able to discriminate between green and red.

The general approach and consideration when building sensing devices and other aids for the blind should be that man and the environment should be considered as two interacting parts in an overall system.

Technological and scientific developments have made many of the things just discussed potentially possible. For instance, it is essential that minicomputers that seem necessary for partial processing of incoming data be small. An indication of how small they may be built is given in this excerpt from The Australian: "Theoretically, it is possible to store one million bits an-inch by magnetic bubble techniques, but the present constraints are how to make the host material, garnite, large enough and defect-free" (Bennett, 1969). Small and efficiently integrated microcircuits (which consist of, for instance, and and or gates, which can process information in less than one millionth of a second, and that have dropped dramatically in price in the last few years, due to mass production) make minicomputers which can be used in connection with mobility aids for blind people a feasible proposition. The system of detectors and minicomputer(s) should be built in a hierarchical way, so that configurations in the environment, such as a sudden drop, would give a clear signal that could not be ignored because it represented a potential physical danger to the blind and would make a person stop immediately. This signal could then be given top priority. Hopefully a blind person may one day put on a suit that would be a combination of a sensing device and a minicomputer.
 
 

COMPARISON OF SENSE MODALITIES

One very important problem to consider is what sensory system can best replace the visual system. The Ultrasonic Torch can give a resolution
power of about 1° within a range of about 10 feet (Kay, 1963, p. 152).

As mentioned earlier one can also detect if an object is approaching or going away, down to an accuracy of 1/4" under optimal conditions. This would correspond to a resolution power of about seven minutes. The eye is considered as having a resolution power of one minute, but the fact that astronauts reported seeing details of things on earth which would require a far greater resolution power of the eye than expected, and, therefore, indicates that the method and the form of the stimulation are important variables. The Ultrasonic Torch could provide a rather detailed picture of the physical world if the subject could use all this information, but it is quite clear that only a fraction of this information is used because the sound is not compatible with the human auditory system. Starkiewicz and Kuliszewski (1963, pp. 157-166) experimented with an Elektroftalm. "The Elektroftalm is an apparatus to enable the blind to recognize objects in their surroundings through the use of the action of light rays coming from the objects upon photosensitive elements." The stimuli were tactile elements operating on a pressure principle, so that increasing light intensity was converted into proportionally stronger pressure. They used 80 elements and reported a resolution power of 2°, but because of the deterioration of part of the equipment, no clear indication of how easily the subjects formed images was reported. Muratov (1965) used photoelectric devices equipped with sound signalization with some degree of success. Bach-y-Rita (1972) used a TV camera and tactile vibrators in a 20 x 20 display placed on the subject's back. Results indicated that the subjects learned to form images rather quickly. What would be the information processing capacity of the skin when a display such as the one just described is used? Given that the skin can discriminate between five different frequency levels and five different intensity levels, and that all the 400 spots could be separated from each other, the information capacity would be (log2 5 + log2 5) x 400 = 1840 bits. If the same assumption is made as was for the ear, that is, four discriminations-per-second, one gets 1840 bits x 4 = 7360 bits/sec. The calculated capacity is close to the information capacity of the ear, but far from that of the eye. The same system, with 400 vibrators, would have a resolution power of about 1° if it covered an angle of 20°. Does the skin have a greater capacity than indicated by the figure just mentioned? "We have found that experienced subjects can resolve stimulator tips spaced between 5 and 10 mm on the trunk, and closer elsewhere. This indicates that over 10,000 points may be available on the approximately 4000 cm2 area of skin of the trunk. This would permit 100-line television picture projection, providing a relatively high resolution display" (Collins, 1971, p. 2). On the same premises as before, this will give a capacity of 7360 x 10,000/400 = 244,000 bits/ sec. The area of skin of the whole body is approximately 18,000 cm2 (Montague, 1971, p. 3). Using this whole area on the same premises as before will give 244,000 x 18,000/ 4,000 = 1,098 x 106 bits/sec. Now we get near the capacity of the eye (Jacobson, 1951a), and this figure of the skin also stands in a reasonable proportion to its capacity if one considers the number of sensory fibers from the skin entering the spinal cord by the posterior roots is well over half a million (Montague, 1971, p. 3). It may be worth noting that by presentations of patterns, the acuity of the skin area can be greater by a factor of 10 compared with two-point limen studies (Bachy-Rita, 1972, p. 16). Miller (1956, p. 89) reported some results from an experiment by Geldard, who measured the channel capacity by placing vibrators on the chest region. According to this experiment a good observer could identify about four intensities, about five durations and about seven locations. This would give [(log2 4 + log2 5) x 7) bits = 30 bits if each point or location could transmit information independently of another point. Otherwise if they were interdependent, that is only one point could receive at any specific time, the information capacity would be log2 (4 x 5 x 7) bits = 7.13 bits.

As one of the dimensions was duration, and the typical "duration period" is 0.1 to 0.5 seconds in this type of experiment (see Hahn, 1963, Vol. II, p. 178) one would not expect more information than 60 bits at the most to be transmitted per second. Using frequency as a fourth dimension, and granted that a person could identify any one of five frequencies correctly, would give ((log2 4 + log2 5 + log2 5) x 7) bits = 46.51 bits of information under the most favorable assumption. The 7.13 bits seem to correspond more to what the skin can identify, while 1,098 x 106 bits corresponds more closely to what the skin can transmit and is based on JND's. The reason for this type of discrepant figure has been discussed earlier.

When considering how the visual system can best be replaced, it looks as though the information capacity as expressed by Jacobson (1950, 1951, 1951a) is a less crucial question than how easily a person can learn to form images or gestalts of the information he receives. It is of course necessary that the various senses can make enough discriminations and have adequate resolution power, but the usefulness of this data is limited if a person cannot combine them into meaningful units or gestalts within a reasonable period of time. It is rather easy to get a fairly clear idea of how fine a discrimination we can make, and satisfactory data is available in this respect. But when it comes to the question of how we organize these primitive units, rather little is known. The type of knowledge we do have is mostly based on indirect data.

With respect to those aspects of the physical world important for mobility, the cutaneous modality seems to have some advantage over the auditory sense. Why? Both vision and the sense of touch are in constant interaction and have to relay precise information to the brain about the physical world. The sense of touch and vision are interacting and have much the same sort of relation to the physical world. Audition does not give any direct precise information about physical objects. However, one difference between vision and touch is that the tactile system is only concerned with things that are close to us and of a limited size, while the eyes are involved with both near and far objects of various sizes.

Therefore, it may be that the skin is relatively good for forming images, but not very good at judging the distance of various objects on the same basis as the eye, for example, with cues such as "overlapping." A display on the skin that signaled the distance of an object along the intensity dimension, and light as highfrequency signals and dark as low frequency signals, seems to be one approach that may be worth following up. To do this one could use an Ultrasonic Torch device or laser beams for collecting distance information, and a device like that of the Bach-y-Rita (p. 261) for image formation. How feasible such devices will be in the foreseeable future will depend on developments of the types discussed. With respect to reading, the auditory sense seems most suitable seen against the background of developments in this field (Clark, 1963, Vol. 1, pp. 205-479; and Dufton, 1966, pp. 317-407).

The various human senses have far greater capacity to make discrminations than to organize them in a way that one commands them. Even if man is far better at perceptual and cognitive learning than any other animal, there are still very clear limitations to what extent we can organize or form gestalts of the sensory stimuli that impinges upon us. Each sense modality has its own area of strength, in which it is very good at organizing sensory stimuli. Psychologists among others are interested in finding out what the underlying mechanisms are for the organizing process. This knowledge and recent developments of fast microcircuits and computer science may make it possible to build devices that do part of the organizing process, and deliver the partly processed stimuli constellation to the sensory modality best suited to decode this particular product.

One possible approach to get a better idea about the potential of the various senses to organize incoming information would be to look at the size of the primary reception area in the brain of a particular sense. One could hypothesize that the bigger the area, the more data processing could be done. Or in other words, the better the sense would be in learning to form gestalts. One weakness with this particular approach could be that one does not know enough about how various senses or areas in the brain cooperate or interact. Another unknown is how much data processing is taking place in the peripheral system. At present it looks as if the psychological approach, which also seems to be the simplest and most straightforward, is the most fruitful. The cutaneous sense, in particular, seems to need more research attention from the point of view of organizing ability with respect to physical stimuli that impinge upon the skin in various patterns. It is still too early to say anything definite about how the skin will compare with the ear as a replacement for the visual system, but if it turns out to be at least as effective as the ear, it would be an advantage, and recent evidence (Back-y-Rita, 1972) points in that direction. In this way, the ear could be free for other essential functions, but it could also very well be that they could be given complementary functions in regard to mobility.
 
 

DIRECT STIMULATION OF THE BRAIN

What type of experience is likely to arise if the visual cortex is stimulated? Penfield and Roberts' (1959, p. 31) work gives some indication: "Stimulation of Broadman's area 17, which forms the banks of the calcarine fissure, causes the patient to see lights, shadows, colors usually moving or twinkling in the visual field." The people referred to here were sighted subjects. What about blind people? Shipley (1963, p. 249) has discussed this: ". . . even in long-term blindness cortical phosphenes can be generated by direct input signals." Later he says:

"Phosphenes in themselves do not constitute vision, though they may be visually exciting and psychologically reassuring to the blind. It is unwise to delude ourselves that they do. Visual forms cannot be evolved from gleanings of meaningless stimuli. The way to study form perception is by use of forms. The gestalt psychologists have shown long ago that forms can be elaborated neither phenomenologically nor physiologically by the mere adding up of raw elements (flashes)."
Forms are emergents above all else.
In 1953 Krieg first suggested that artificial visual perception in blind persons could be achieved through direct electrical stimulation of central visual structures. He pointed out that a spatial map of the external field of vision is projected onto the visual cortex of the brain in such a way as to permit a crude type of form perception by patterned electrical stimulation of the cortical surface. (Sterling, Bering, Pollack, and Vaughn, 1971, p. 1.)
The most important findings of Brindley and Lewin (1971), and Brindley (1971), using a 52-yearold female with 80 electrodes implanted into her visual cortex, was that she reported seeing light or phosphenes in connection with 39 of them. They were commonly a single very small spot of white light at a constant position in the visual field, but for some electrodes it was two or several such spots, or a small cloud. The authors found the results promising for the prospect of making a future visual prosthesis.

The results seem more promising than most people probably would have predicted beforehand, but still it seems safe to conclude that direct electrical stimulation of the visual cortex to give a blind person sight at best is something that belongs to the distant future.
 

THE ROLE OF THE BRAIN

The mind-body problem is a very old one. One consequence of this dichotomy is that the type of research that follows from this conceptualization tends to correlate experience with a physiological event that can be observed directly or indirectly. In other words, one tries to relate a physical event to a psychological event and considers this as a proper and ultimate goal in itself. Is this a fruitful approach? Sperry (1964, p. 410) for one argues that one should look at the problem differently, and sums up his point of view on this matter:

"Utilization of this motor approach immediately helps us to view the brain objectively for what it is, namely, a mechanism for governing motor activity. Its primary function is essentially the transforming of sensory patterns into patterns of motor coordination. Herein lies a fundamental basis for the interpretation, direct or indirect, of all higher brain functions." He also says that we need to look into the relationship between the sensory-associative functions of the brain in relation to its motor activity, and analogizes saying that the output from a machine is usually more revealing of the internal organization than the input.

Sperry's point of view is brought forward for two reasons: one is its relation to the discussion above about the necessity of linking up the sensory-associative area with the motor area in the brain. It seems to be of some importance to study these links. The second reason is related but more general, and can be tied in with the discussion above. It is important to analyze the environment with respect to what is behaviorally important, but it is also important to know how to enervate the muscle action necessary, e.g., to avoid an obstacle. It may be that a vibrator or another form of stimulation on the ankle probably would be more efficient in signaling a curb and get a blind person to lift his feet than if the same signal was transmitted to the ear. For one thing, the subject would probably be much more quickly aware of what part of the body was in a sort of danger and could take appropriate action accordingly. Stimulation or signaling of this nature would seem to be more effective than, say, auditory stimulation because it seems to be more compatible or in line with a person's "natural" behavior.

The notion that we can learn something about perception by looking at our environment and finding out what is behaviorally important, and from motor activity by looking at what sensory information the brain needs for doing the right types of movements or motor acts, have this in common: they point to perception as having a functional role. This point

ought to be kept in mind when one tries to create devices that can give the blind information about the "visual world", but at the same time one must be aware of the strengths and weaknesses of the various sense modalities.
 
 

FUTURE RESEARCH

This problem has been implicit and discussed in most of the last section, but it may be worthwhile to be explicit about what the author sees as the most fruitful approach in more general terms and briefly take up things that have not been mentioned earlier.

The various disciplines or subjects seem to be rather strongly compartmentalized, and people working in one area seem to know little about what research workers in another area are doing. If one wants to improve blind people's mobility, and capacity to receive information from print, one should first gather together all the people who could make a contribution to a solution of these problems: psychologists, computer scientists, engineers, physiologists, doctors, architects, town planners, and politicians, among others.

The guidelines with respect to how information can be extracted from print seem to have been laid down along sound lines, and most of the problems in this area can be solved in principle. It is therefore more a matter of allocation of money and building of more efficient and cheaper pattern recognizer devices, cheaper computers and better computer programs, (Clark, 1963, Vol. I, pp. 205-468, and Dufton, 1966, pp. 317-387).

With respect to mobility, we are far from any satisfactory solution. Generally, it seems more worthwhile to direct the efforts towards making devices that are compatible with the human sensory system, rather than trying to find out who will be good at using specific devices such as Kay's Ultrasonic Torch or how to train people to use it in the most efficient way. One would think that much of the experience obtained in connection with space research could be useful both with respect to specific results and how they go about solving their problems. Basically, the astronauts have to interact in a precise manner with the environment via their space capsule. Direct use of their senses to solve problems are limited. An astronaut has to rely heavily on instruments and his machine if he is to be successful. A lot of data processing has to be done for him by computers. So one may consider space research as one area concerned with man-machineenvironment interaction, and one can consider the mobility problem of the blind from the same point of view.

Although the Ultrasonic Torch resulted mainly from the efforts of one man, Leslie Kay, it is worthwhile to look into what is being done in space research. Far more money is available for this than for research toward improving the mobility of blind people. The allocation of large amounts of money and highly qualified researchers in space science would lead one to expect findings that are useful in other areas with similar problems to solve, such as blind mobility.

With respect to future research one might try to solve the mobility problem gradually by dividing it up. First, one should equip a blind person with a device that will help him to stay on course and observe whether that improves his mobility significantly. Second, one could gradually introduce devices that detect behaviorally important or dangerous f6atures of environments. When giving information about the potentially dangerous obstacles one would have to find a code or set of codes that could easily be understood by the blind. Much research is needed to alleviate this problem. Research with respect to auditory stimuli might also contribute to development of pattern recognition devices for speech. To find out how the auditory system compares with the cutaneous sense, and how they can interact or complement each other, deserves serious attention. Interaction with the environment involves not only an understanding of the incoming information, but also the capacity to react adequately to it. This is a problem somewhat difficult to approach experimentally, but not impossible, and may deserve some attention. Presenting various classes of sounds constructed according to various models may help us to develop better theories about how we process auditory information. Computer simulation and generation of sound patterns by the computer seems to be one promising technique. Confusion matrices might be a useful tool in this context.

Further, is it of any value or help to hear a class of auditory stimuli without feedback if one has to learn this class of stimuli later? An answer or some clues to this question would be of value both for language teachers and mobility instructors training blind people to use the Ultrasonic Torch. My opinion is that giving a class of stimuli without some form of feedback, which helps to structure it, would be rather useless. Moreover, some evidence seems to exist that direct feedback is more effective than verbal. This statement is based on experimental results (Appendix 1), observation of blind subjects, and the results of Riley's (1966) study, which seemed to indicate that an active person interacting with the environment in a more vigorous way than the average person, would be a better Torch user.

There is no clear or fast way to go about solving the problems in this area, but one would believe that a world with a science and technology that can take detailed pictures of the planet mars should also be capable of giving a blind person vital information which other people receive visually in a code that he or she can easily learn. It seems desirable that people focusing their attention on man, psychologists, for instance, should work more closely together with experts on machines (computers, electronic devices, etc.). Practical results such as aids for the blind, and theoretical results such as greater understanding of our perceptual systems and the brain, will follow more quickly and in more depth if research workers in one area are informed of what their colleagues in other areas are doing. Information theory seems to have concepts that can bridge the conceptual gap between various disciplines, and the computer seems to be one of the most useful tools for trying out or testing theories in this area. One should keep in mind that man is the crucial element; both practically and theoretically oriented research will most probably lead to greater understanding of man, hopefully to his advantage.
 
 

SUMMARY AND CONCLUSIONS

This analysis began with the notion that perception is a matter of collecting and processing information. Various animals interact with the environment through different sense modalities., The bat relies almost exclusively on the auditory sense, the rattlesnake has an infrared sensing mechanism, and the torpedo fish sends out field producing currents, and has a skin that can detect variation in the field strength. The perceptual system of each of these animals seems to be particularly suited to coping with or processing each type of information. The information capacity of a human's visual and auditory system is also compared. In trying to analyze the information, where concepts from information theory are used, one encounters the problem of what the primitive or basic unit is on which the analysis can be built. Results and ideas from psychophysiological studies, behavior studies, stabilized vision studies, and computer simulation are briefly discussed. These studies point to certain units that may be useful as "atoms." To process or organize the "primitives" is an enormous task, and in humans it is quite clear that the eye is excellent at organizing the physical "visual" world around us, and the auditory modality is specially suited to process speech sounds and musical sounds. The sounds from an Optophone, in contrast, are very hard to process or "gestalt."

In an experiment (Appendix 1), a third class of sounds was introduced that is, from an oscilloscope. This sound could be characterized as a "clear" sound increasing and decreasing along the frequency dimension. Ability to process this sound correlated clearly (about 0.65) with musical aptitude. One common and important factor seems to be the ability to analyze whether a frequency is increasing or decreasing. It was expected that this ability would be pronounced and important in another experiment (Appendix 2) also, but turned out not to be the case. Two factors may explain it:

1. The sound from the Ultrasonic Torch, which is pulsating and varied in the experiment along the frequency dimension, was hard to analyze because it was not clear; for instance, many subjects seemed to confuse increase or decrease in frequency with increase or decrease in loudness.

2. Motor behavior had to be coordinated with the incoming information, and processing of the sound from the torch was only one part of the mobility task.

Blind subjects (N = 11) performed better on two out of three aspects of the mobility task, that is, keeping direction and speed of walking, but walked into just as many objects as sighted, blindfolded subjects.

It may be concluded that if one wishes to build devices that transform "visual" information into other sense modalities, one would have to partly process the raw data before they are "delivered," and furthermore, explore what sense modality is best suited to a particular type of information. In this context it should be pointed out that it is important to look at a human in interaction with the environment as one system, that is, the type of things in the environment that need to be detected and effectively processed if a blind person is to cope with the world around him. It may also be useful to consider perception as the type of information needed to initiate a certain type of motor behavior that is behaviorally important.

Modern technology and science make the suggestions put forward in this discussion realistic, but research and progress in this area seems mainly dependent on allocation of resources and coordination of the efforts of people working in various fields such as electronics, computer science, physiology, and psychology. The pioneering work of L. Kay, P. Bach-y-Rita, J. C. Bliss, and G. S. Brindley, to mention a few, shows what can be done with modest resources by a single devoted and highly qualified person.

References  Some new references can be found in   the articles  Blind Mobility and Flying IFR  and  Blindness and Cognition

Tables with main results
 

Bjarne Fjeldsenden 06.04.2000