你又在摸鱼了对《Vision》的笔记(8)

Vision
  • 书名: Vision
  • 作者: David Marr
  • 副标题: A Computational Investigation into the Human Representation and Processing of Visual Information
  • 页数: 432
  • 出版社: The MIT Press
  • 出版年: 2010-7-9
  • 第3页 General Introduction
    For if we are capable of knowing what is where in the world, our brains must somehow be capable of representing this information-in all its profusion of color and form, beauty motion, and detail. The study of vision must therefore include not only the study of how to extract from images the various aspects of the world that are useful to us, but also an inquiry into the nature of the internal rep- resentations by which we capture this information and thus make it available as a basis for decisions about our thoughts and actions. This duality- the representation and the processing of information - lies at the heart of most information-processing tasks and will profoundly shape our investigation of the particular problems posed by vision.
    引自 General Introduction

    2017-10-04 08:16:57 回应
  • 第5页 General Introduction
    For the subject of vision, there is no single equation or view that explains everything. Each problem has to be addressed from several points of view-as a problem in representing information, as a computation capable of deriving that representation, and as a problem in the architecture of a computer capable of carrying out both things quickly and reliably.
    引自 General Introduction
    2017-10-04 08:42:13 回应
  • 第21页
    if one chooses the Arabic numeral representation, it is easy to discover whether a number is a power of 10 but difficult to discover whether it is a power of 2. If one chooses the binary representation, the situation is reversed. Thus, there is a trade-off; any particular representation makes certain information explicit at the expense of information that is pushed into the background and may be quite hard to recover. This issue is important, because haw information is represented can greatly affect how easy it is to do different things with it. This is evident even from our numbers example: It is easy to add, to subtract, and even to multiply if the Arabic or binary representations are used, but it is not at all easy to do these things-especially multiplication-with Roman numerals. This is a key reason why the Roman culture failed to develop mathematics in the way the earlier Arabic cultures had.
    引自第21页
    2017-10-08 00:46:30 2人喜欢 回应
  • 第27页 Importance of Computational Theory

    The reason for this is that the nature of the computations that underlie perception depends more upon the computational problems that have to be solved than upon the particular hardware in which their solutions are implemented. To phrase the matter another way, an algorithm is likely to be understood more readily by understanding the nature of the problem being solved than by examining the mechanism (and the hardware) in which it is embodied.

    In a similar vein, trying to understand perception by studyingn only neurons is like trying to understand bird flight by studying only feathers: It just cannot be done. In order to understand bird flight, we have to understand aerodynamics; only then do the structure of feathers and the different shapes of birds' wings make sense. More to the point, as we shall see, we cannot understand why retinal ganglion cells and lateral geniculate neurons have the receptive fields they do just by studying their anatomy and physiology.

    Failure to recognize this theoretical distinction between what and how also greatly hampered communication between the fields of AI and linguistics. Chomsky's (1965) theory of transformational grammar is a true computational theory in the sense defined earlier. It is concerned solely with specifying what the syntactic decompostion of an English sentence should be, and not all with how that decomposition should be achieved.

    2019-07-25 00:24:29 回应
  • 第29页 The Approach of J.J. Gibson

    In perception, perhaps the nearest anyone came to the level of compuational theory was Gibson (1966). Gibson's important contribution was to take the debate away from the philosophical considerations of sense-data and rthe affective qualities of sensation and to note instead that the important thing about the senses is that they are channels for perception of the real world outside or, in the case of vision, of the visible surfaces. He therefore asked the critically important question: How does one obtain constant perceptions in everyday life on the basis of continualy changing sensations? This is exactly the right question, showing that Gibson correctly regarded the problem of percpetion as that of recovering from sensory information "valid" properties of the external world. His problem was that he had a much oversimplified view of how this should be done. His approach led him to consider higher-order variables -- stimulus energy, ratios, proportions, and so on -- as "invariants" of the movement of an observer and of changes in stimulation intensity.

    "These invariants correspond to permanent properties of the environment. They constitute, therefore, information about the permanent environment." This led him to a view in which the function of the brain was to "detect invariants" despite changes in "sensations" of light, pressure, or loudness of sound. Thus, he says that the "function of the brain, when looped with its perceptual organs, is not to decode signals nor to interpret messages nor to accept images, nor ot organize the sensory input or to process the data, in modern terminology. It is to seek and extract information about the environment from the flowing array of ambient energy, and he thought of the nervous system as in some way "resonating" to these invariants.

    2019-07-25 01:35:44 1人喜欢 回应
  • 第37页 A Representational Framework for Vision

    Representational framework for deriving shape information from images

    2019-07-25 01:48:36 回应
  • 第336页 Synopsis

    1. The central tenet of different levels of explanation is that to understand what vision is and how it works, an understanding at only one level is insufficient. It is not enough to be able to describe the responses of single cells, nor is it enough to be able to predict locally the results of psychophysical experiments. Nor is it enough even to be able to write computer programs that perform approximately in the desired way. One has to do all these things at once and also be very aware of the additional level of explanation that I have called the level of computational theory.

    2. By taking an informationb-processing point of view, we have been able to formulate a rather clear overall frame-work for the process of vision. This framework is based on the idea that the critical issues in vision revolve around the nature of the representations used -- that is, the particular characteristics of the world that are made explicit during vision -- and the nature of the processes that recover these characteristics, create and maintain the representations, and eventually read them. By analyzing the spatial aspects of the problem of vision, we arrived at an overall framework for visual information processing that hinges on three principal representations: (1) the primal sketch, which is concerned with making explicit properties of the 2D image, ranging from the amount and disposition of the intensity changes there to primitive representations of the local image geometry, and including at the more sophisticated end a hierarchical description of any higher-order structure present in the underlying reflectance distributions; (2) the 2½-D sketch, which is a viewer-centered representation of the depth and orientation of the visible surfaces and includes contours of discontinuities in these quatities; (3) the 3-D model representation, whose important features are that its coordinate system is object centered, that it includes volumetric primitives (which make explicit the organization of the space occupied by an object and not just its visible surfaces), and that primitives of various size are included, arranged in a modular, hierarchical organization.

    3. The thrid main point concerns the study of processes for recovering the various aspects of the physical characteristics of a scene from images of it. The critical act in formulating computational theories for such processes is the discovery of valid constraints on the way the world behaves that provide sufficient additional information to allow recovery of the desired characteristic. Furthermore, once a computational theory for a process has been formulated, algorithms for implementing it may be designed, and their performance compared with that of the human visual processor, This allows two kinds of results. First, if performance is essentially identical, we have good evidence that the constraints of the underlying computational theory are valid and may be implicit in the human processor; second, if a process matches human performance, it is probably sufficiently powerful to form part of a general purpose vision machine.

    4. The final point concerns the methodology or style of this type of approach, and it involves two main observations.

    First, the duality between representations and processes. In the study both of representations and processes, general problems are often suggested by everyday experience or by psychophysical or even neurophysiological findigs of a quite general nature. Such general observations can often lead to the formulation of a particular process or representational theory. Once we have sufficient confidence in the correctness of the process or representation at this level, we can inquire about its detailed implementation, which involves the ultimate and very difficult problmes of neurophysiology and neuroanatomy.

    The second observation is that there is no real recipe for this type of research any more than there is a straightforward procedure for discovering things in any other branch of science.

    Relationships between representations and processes.

    2019-07-25 04:01:09 回应
  • 第336页 In Defense of the Approach

    𝑯𝒐𝒓𝒂𝒄𝒆 𝑩𝒂𝒓𝒍𝒐𝒘'𝒔 𝒇𝒊𝒓𝒔𝒕 𝒅𝒐𝒈𝒎𝒂: 𝑨 𝒅𝒆𝒔𝒄𝒓𝒊𝒑𝒕𝒊𝒐𝒏 𝒐𝒇 𝒕𝒉𝒆 𝒂𝒄𝒕𝒊𝒗𝒊𝒕𝒚 𝒐𝒇 𝒂 𝒔𝒊𝒏𝒈𝒍𝒆 𝒏𝒆𝒓𝒗𝒆 𝒄𝒆𝒍𝒍 𝒘𝒉𝒊𝒄𝒉 𝒊𝒔 𝒕𝒓𝒂𝒏𝒔𝒎𝒊𝒕𝒕𝒆𝒅 𝒕𝒐 𝒂𝒏𝒅 𝒊𝒏𝒇𝒍𝒖𝒆𝒏𝒄𝒆𝒔 𝒐𝒕𝒉𝒆𝒓 𝒏𝒆𝒗𝒆𝒓 𝒄𝒆𝒍𝒍𝒔, 𝒂𝒏𝒅 𝒐𝒇 𝒂 𝒏𝒆𝒓𝒗𝒆 𝒄𝒆𝒍𝒍'𝒔 𝒓𝒆𝒔𝒑𝒐𝒏𝒔𝒆 𝒕𝒐 𝒔𝒖𝒄𝒉 𝒊𝒏𝒇𝒍𝒖𝒆𝒏𝒄𝒆𝒔 𝒇𝒓𝒐𝒎 𝒐𝒕𝒉𝒆𝒓 𝒄𝒆𝒍𝒍𝒔, 𝒊𝒔 𝒂 𝒄𝒐𝒎𝒑𝒍𝒆𝒕𝒆 𝒆𝒏𝒐𝒖𝒈𝒉 𝒅𝒆𝒔𝒄𝒓𝒊𝒑𝒕𝒊𝒐𝒏 𝒇𝒐𝒓 𝒇𝒖𝒏𝒄𝒕𝒊𝒐𝒏𝒂𝒍 𝒖𝒏𝒅𝒆𝒓𝒔𝒕𝒂𝒏𝒅𝒊𝒏𝒈 𝒐𝒇 𝒕𝒉𝒆 𝒏𝒆𝒓𝒗𝒐𝒖𝒔 𝒔𝒚𝒔𝒕𝒆𝒎.

    Marr's response: I must disagree with Barlow's formulation, although I do agree with one of the thoughts behind this dogma, namely, that there is nothingn else looking at what the cells are doing -- they are the ultimate correlates of perception. However, the dogma fails to take level one analysis -- the level of the computational theory -- into account. You cannot understand stereopsis simply by thinking about neurons. You have to understand uniqueness, continuity, and the fundamental theorem of steropsis. In addition, and critically important for a researcher, the levels approach enforces a rigid intellectual discipline on one's endeavors. As long as you think in terms of mechanisms or neurons, you are liable to think too imprecisely, in similes.

    𝑨𝒓𝒆 𝒕𝒉𝒆 𝒅𝒊𝒇𝒇𝒆𝒓𝒆𝒏𝒕 𝒍𝒆𝒗𝒆𝒍𝒔 𝒐𝒇 𝒆𝒙𝒑𝒍𝒂𝒏𝒂𝒕𝒊𝒐𝒏 𝒓𝒆𝒂𝒍𝒍𝒚 𝒊𝒏𝒅𝒆𝒑𝒆𝒏𝒅𝒆𝒏𝒕?

    Marr's response: Not really, thougth the computational theory of a process is rather independent of the algorithm or implementation levels, since it is determined solely by the information-processing task to be solved. The algorithm depends heavily on the computational theory, but it also depends on the characteristics of the hardware in which it is to be implemented.

    𝑾𝒉𝒆𝒏 𝒂𝒏𝒅 𝒉𝒐𝒘 𝒗𝒊𝒔𝒊𝒐𝒏 "𝒈𝒐𝒆𝒔 𝒔𝒚𝒎𝒃𝒐𝒍𝒊𝒄"

    Most would agree that an intensity array I(x,y) or even its convolution ∇2G*I is not a very symbolic object. It is a continuous 2-D array with few points of manifest interest. Yet by the time we talk about people or cars or fields or trees, we are clearly being very symbolic, and I think again that most would find suggestions of symbols in Hubel and Wiesel's (1962) recordings. Our view is that vision goes symbolic almost immediately, right at the level of zero-crossings, and the beauty of this is that the transition from the analogue arraylike representation to the discrete, oriented, sloped zero-crossing segments is probably accomplished without loss of information (Marr, Poggio, & Ullman, 1979; Nishihara, 1981).

    And the use of symbols does not stop there either. Almost the whole of early vision appears to be highly symbolic in character. Terminations, discontinuities, place tokens, virtual lines, groups, boundaries -- all these things are very abstract constructions, and few of their neurophysiological correlates have been found, but experiments like Stevens' (1978) tell us that such things must be there.

    2019-07-25 11:39:25 回应