出版社: The MIT Press
副标题: A Computational Investigation into the Human Representation and Processing of Visual Information
出版年: 201079
页数: 432
定价: USD 45.00
装帧: Paperback
ISBN: 9780262514620
内容简介 · · · · · ·
David Marr's posthumously published Vision (1982) influenced a generation of brain and cognitive scientists, inspiring many to enter the field. In Vision, Marr describes a general framework for understanding visual perception and touches on broader questions about how the brain and its functions can be studied and understood. Researchers from a range of brain and cognitive scie...
David Marr's posthumously published Vision (1982) influenced a generation of brain and cognitive scientists, inspiring many to enter the field. In Vision, Marr describes a general framework for understanding visual perception and touches on broader questions about how the brain and its functions can be studied and understood. Researchers from a range of brain and cognitive sciences have long valued Marr's creativity, intellectual power, and ability to integrate insights and data from neuroscience, psychology, and computation. This MIT Press edition makes Marr's influential work available to a new generation of students and scientists. In Marr's framework, the process of vision constructs a set of representations, starting from a description of the input image and culminating with a description of threedimensional objects in the surrounding environment. A central theme, and one that has had farreaching influence in both neuroscience and cognitive science, is the notion of different levels of analysisin Marr's framework, the computational level, the algorithmic level, and the hardware implementation level. Now, thirty years later, the main problems that occupied Marr remain fundamental open problems in the study of perception. Vision provides inspiration for the continuing efforts to integrate knowledge from cognition and computation to understand vision and the brain.
目录 · · · · · ·
Foreword by Shimon Ullman xvii
Preface xxiii
PART I
INTRODUCTION AND
PHILOSOPHICAL PRELIMINARIES
· · · · · · (更多)
Foreword by Shimon Ullman xvii
Preface xxiii
PART I
INTRODUCTION AND
PHILOSOPHICAL PRELIMINARIES
GENERAL INTRODUCTION 3
Chapter 1
The Philosophy and the Approach 8
Background 8
Understanding Complex InformationProcessing Systems 19
Representation and description 20
Process 22
The three levels 24
Importance of computational theory 27
The approach of J. J. Gibson 29
A Representational Framework for Vision 31
The purpose of vision 32
Advanced vision 34
To the desirable via the possible 36
PART II
VISION
Chapter 2
Representing the Image 41
Physical Background of Early Vision 41
Representing the image 44
Underlying physical assumptions 44
Existence of surfaces 44
Hierarchical organization 44
Similarity 47
Spatial continuity 49
Continuity of discontinuities 49
Continuity of fl ow 50
General nature of the representation 51
ZeroCrossings and the Raw Primal Sketch 54
ZeroCrossings 54
Biological implications 61
The psychophysics of early vision 61
The physiological realization of the ▽2G filters 64
The physiological detection of zero crossings 64
The fi rst complete symbolic representation of the image 67
The raw primal sketch 68
Philosophical aside 75
Spatial Arrangement of an Image 79
Light Sources and Transparency 86
Other light source effects 88
Transparency 89
Conclusions 90
Grouping Processes and the Full Primal Sketch 91
Main points in the argument 96
The computational approach and the psychophysics of texture discrimination 96
Chapter 3
From Images to Surfaces 99
Modular Organization of the Human Visual Processor 99
Processes, Constraints, and the Available Representations of an Image 103
Stereopsis 111
Measuring stereo disparity 111
Computational theory 111
Algorithms for stereo matching 116
A cooperative algorithm 116
Cooperative algorithms and the stereo matching problem 122
Biological evidence 125
A second algorithm 127
Uniqueness, cooperativity, and the pulling effect 140
Panum’s fusional area 144
Impressions of depth from larger disparities 144
Have we solved the right problem? 148
Vergence movements and the 2½D sketch 149
Neural implementation of stereo fusion 152
Computing distance and surface orientation from disparity 155
Computational theory 155
Distance from the viewer to the surface 155
Surface orientation from disparity change 156
Algorithm and implementation 159
Directional Selectivity 159
Introduction to visual motion 159
Computational theory 165
An algorithm 167
Neural implementation 169
Using directional selectivity to separate independently moving surfaces 175
Computational theory 175
Algorithm and implementation 177
Looming 182
Apparent Motion 182
Why apparent motion? 183
The two halves of the problem 184
The correspondence problem 188
Empirical fi ndings 188
What is the input representation? 188
Two dimensionality of the correspondence process 193
Ullman’s theory of the correspondence process 196
A critique of Ullman’s theory 199
A new look at the correspondence problem 202
One problem or two? 202
Separate systems for structure and object constancy 204
Structure from Motion 205
The problem 205
A previous approach 207
The rigidity constraint 209
The rigidity assumption 210
A note about the perspective projection 211
Optical flow 212
The input representation 212
Mathematical results 213
Shape Contours 215
Some examples 216
Occluding contours 218
Constraining assumptions 219
Implications of the assumptions 222
Surface orientation discontinuities 225
Surface contours 226
The puzzle and diffi culty of surface contours 228
Determining the shape of the contour generator 229
The effects of more than one contour 230
Surface Texture 233
The isolation of texture elements 234
Surface parameters 234
Possible measurements 234
Estimating scaled distance directly 238
Summary 239
Shading and Photometric Stereo 239
Gradient space 240
Surface illumination, surface refl ectance, and image intensity 243
The refl ectance map 245
Recovery of shape from shading 248
Photometric stereo 249
Brightness, Lightness, and Color 250
The Helson–Judd approach 252
Retinex theory of lightness and color 253
Algorithms 255
Extension to color vision 256
Comments on the retinex theory 257
Some physical reasons for the importance of simultaneous contrast 259
Hypothesis of the superfi cial origin of nonlinear changes in intensity 26
Implications for measurements on a trichromatic image 262
Summary of the approach 264
Summary 264
Chapter 4
The Immediate Representation of Visible Surfaces 268
Introduction 268
Image Segmentation 270
Reformulating the Problem 272
The Information to be Represented 275
General Form of the 2½D Sketch 277
Possible Forms for the Representation 279
Possible Coordinate Systems 283
Interpolation, Continuation, and Discontinuities 285
Computational Aspects of the Interpolation Problem 288
Discontinuities 289
Interpolation methods 290
Other Internal Computations 291
Chapter 5
Representing Shapes for Recognition 295
Introduction 295
Issues Raised by the Representation of Shape 296
Criteria for judging the effectiveness of a shape representation 296
Accessibility 297
Scope and uniqueness 297
Stability and sensitivity 298
Choices in the design of a shape representation 298
Coordinate systems 298
Primitives 300
Organization 302
The 3D Model Representation 302
Natural coordinate systems 303
Axisbased descriptions 304
Modular organization of the 3D model representation 305
Coordinate system of the 3D model 307
Natural Extensions 309
Deriving and Using the 3D Model Representation 313
Deriving a 3D model description 313
Relating viewercentered to objectcentered coordinates 317
Indexing and the catalogue of 3D models 318
Interaction between derivation and recognition 321
Finding the correspondence between image and catalogued model 322
Constraint analysis 322
Psychological Considerations 325
Chapter 6
Synopsis 329
PART III
EPILOGUE
Chapter 7
In Defense of the Approach 335
Introduction 335
A Conversation 336
Afterword by Tomaso Poggio 362
Glossary 368
Bibliography 375
Index 393
· · · · · · (收起)
喜欢读"Vision"的人也喜欢 · · · · · ·
> 更多短评 5 条
Vision的话题 · · · · · · ( 全部 条 )
Vision的书评 · · · · · · ( 全部 0 条 )
读书笔记 · · · · · ·
我来写笔记
闭关中请稍后 (没有签名档)
if one chooses the Arabic numeral representation, it is easy to discover whether a number is a power of 10 but difficult to discover whether it is a power of 2. If one chooses the binary representation, the situation is reversed. Thus, there is a tradeoff; any particular representation makes certain information explicit at the expense of information that is pushed into the background and may b...20171008 00:45
if one chooses the Arabic numeral representation, it is easy to discover whether a number is a power of 10 but difficult to discover whether it is a power of 2. If one chooses the binary representation, the situation is reversed. Thus, there is a tradeoff; any particular representation makes certain information explicit at the expense of information that is pushed into the background and may be quite hard to recover.
This issue is important, because haw information is represented can greatly affect how easy it is to do different things with it. This is evident even from our numbers example: It is easy to add, to subtract, and even to multiply if the Arabic or binary representations are used, but it is not at all easy to do these thingsespecially multiplicationwith Roman numerals. This is a key reason why the Roman culture failed to develop mathematics in the way the earlier Arabic cultures had.
回应 20171008 00:45 
闭关中请稍后 (没有签名档)
For the subject of vision, there is no single equation or view that explains everything. Each problem has to be addressed from several points of viewas a problem in representing information, as a computation capable of deriving that representation, and as a problem in the architecture of a computer capable of carrying out both things quickly and reliably.20171004 08:35
For the subject of vision, there is no single equation or view that explains everything. Each problem has to be addressed from several points of viewas a problem in representing information, as a computation capable of deriving that representation, and as a problem in the architecture of a computer capable of carrying out both things quickly and reliably.
回应 20171004 08:35 
闭关中请稍后 (没有签名档)
For if we are capable of knowing what is where in the world, our brains must somehow be capable of representing this informationin all its profusion of color and form, beauty motion, and detail. The study of vision must therefore include not only the study of how to extract from images the various aspects of the world that are useful to us, but also an inquiry into the nature of the internal r...20171004 08:13
For if we are capable of knowing what is where in the world, our brains must somehow be capable of representing this informationin all its profusion of color and form, beauty motion, and detail. The study of vision must therefore include not only the study of how to extract from images the various aspects of the world that are useful to us, but also an inquiry into the nature of the internal rep resentations by which we capture this information and thus make it available as a basis for decisions about our thoughts and actions. This duality the representation and the processing of information  lies at the heart of most informationprocessing tasks and will profoundly shape our investigation of the particular problems posed by vision.
回应 20171004 08:13

闭关中请稍后 (没有签名档)
For if we are capable of knowing what is where in the world, our brains must somehow be capable of representing this informationin all its profusion of color and form, beauty motion, and detail. The study of vision must therefore include not only the study of how to extract from images the various aspects of the world that are useful to us, but also an inquiry into the nature of the internal r...20171004 08:13
For if we are capable of knowing what is where in the world, our brains must somehow be capable of representing this informationin all its profusion of color and form, beauty motion, and detail. The study of vision must therefore include not only the study of how to extract from images the various aspects of the world that are useful to us, but also an inquiry into the nature of the internal rep resentations by which we capture this information and thus make it available as a basis for decisions about our thoughts and actions. This duality the representation and the processing of information  lies at the heart of most informationprocessing tasks and will profoundly shape our investigation of the particular problems posed by vision.
回应 20171004 08:13 
闭关中请稍后 (没有签名档)
For the subject of vision, there is no single equation or view that explains everything. Each problem has to be addressed from several points of viewas a problem in representing information, as a computation capable of deriving that representation, and as a problem in the architecture of a computer capable of carrying out both things quickly and reliably.20171004 08:35
For the subject of vision, there is no single equation or view that explains everything. Each problem has to be addressed from several points of viewas a problem in representing information, as a computation capable of deriving that representation, and as a problem in the architecture of a computer capable of carrying out both things quickly and reliably.
回应 20171004 08:35 
闭关中请稍后 (没有签名档)
if one chooses the Arabic numeral representation, it is easy to discover whether a number is a power of 10 but difficult to discover whether it is a power of 2. If one chooses the binary representation, the situation is reversed. Thus, there is a tradeoff; any particular representation makes certain information explicit at the expense of information that is pushed into the background and may b...20171008 00:45
if one chooses the Arabic numeral representation, it is easy to discover whether a number is a power of 10 but difficult to discover whether it is a power of 2. If one chooses the binary representation, the situation is reversed. Thus, there is a tradeoff; any particular representation makes certain information explicit at the expense of information that is pushed into the background and may be quite hard to recover.
This issue is important, because haw information is represented can greatly affect how easy it is to do different things with it. This is evident even from our numbers example: It is easy to add, to subtract, and even to multiply if the Arabic or binary representations are used, but it is not at all easy to do these thingsespecially multiplicationwith Roman numerals. This is a key reason why the Roman culture failed to develop mathematics in the way the earlier Arabic cultures had.
回应 20171008 00:45

闭关中请稍后 (没有签名档)
if one chooses the Arabic numeral representation, it is easy to discover whether a number is a power of 10 but difficult to discover whether it is a power of 2. If one chooses the binary representation, the situation is reversed. Thus, there is a tradeoff; any particular representation makes certain information explicit at the expense of information that is pushed into the background and may b...20171008 00:45
if one chooses the Arabic numeral representation, it is easy to discover whether a number is a power of 10 but difficult to discover whether it is a power of 2. If one chooses the binary representation, the situation is reversed. Thus, there is a tradeoff; any particular representation makes certain information explicit at the expense of information that is pushed into the background and may be quite hard to recover.
This issue is important, because haw information is represented can greatly affect how easy it is to do different things with it. This is evident even from our numbers example: It is easy to add, to subtract, and even to multiply if the Arabic or binary representations are used, but it is not at all easy to do these thingsespecially multiplicationwith Roman numerals. This is a key reason why the Roman culture failed to develop mathematics in the way the earlier Arabic cultures had.
回应 20171008 00:45 
闭关中请稍后 (没有签名档)
For the subject of vision, there is no single equation or view that explains everything. Each problem has to be addressed from several points of viewas a problem in representing information, as a computation capable of deriving that representation, and as a problem in the architecture of a computer capable of carrying out both things quickly and reliably.20171004 08:35
For the subject of vision, there is no single equation or view that explains everything. Each problem has to be addressed from several points of viewas a problem in representing information, as a computation capable of deriving that representation, and as a problem in the architecture of a computer capable of carrying out both things quickly and reliably.
回应 20171004 08:35 
闭关中请稍后 (没有签名档)
For if we are capable of knowing what is where in the world, our brains must somehow be capable of representing this informationin all its profusion of color and form, beauty motion, and detail. The study of vision must therefore include not only the study of how to extract from images the various aspects of the world that are useful to us, but also an inquiry into the nature of the internal r...20171004 08:13
For if we are capable of knowing what is where in the world, our brains must somehow be capable of representing this informationin all its profusion of color and form, beauty motion, and detail. The study of vision must therefore include not only the study of how to extract from images the various aspects of the world that are useful to us, but also an inquiry into the nature of the internal rep resentations by which we capture this information and thus make it available as a basis for decisions about our thoughts and actions. This duality the representation and the processing of information  lies at the heart of most informationprocessing tasks and will profoundly shape our investigation of the particular problems posed by vision.
回应 20171004 08:13
在哪儿买这本书 · · · · · ·
这本书的其他版本 · · · · · · ( 全部2 )
 W. H. Freeman版 1983913 / 14人读过
以下豆列推荐 · · · · · · ( 全部 )
 哲学宅男理想书架 (Cynicmonkey)
 [个人] Readinglist Season 1011 (伞保护协会)
 机器视觉、计算机视觉、视觉测量 (Jack L)
 Data ([已注销])
 【认知】读书 ([已注销])
谁读这本书?
二手市场
订阅关于Vision的评论:
feed: rss 2.0
0 有用 jdhks 20190601
在这个数据可以轻易获得的时代，Marr的three levels of analysis值得拿来反复咀嚼，特别是在AI和neuroscience/cognitive psychology领域。
0 有用 乔不圆 20180217
！
0 有用 poringking 20180723
I'm extremely sad after reading this book. Not only because a genius like Marr died at such an early age, but also because no one had ever told me to read this book before going into computer vision, ... I'm extremely sad after reading this book. Not only because a genius like Marr died at such an early age, but also because no one had ever told me to read this book before going into computer vision, not even my own advisor. In page 18 Marr wrote: "Gone are the ad hoc programs of computer vision" but yet that's all I see in CVPR/ICCV these days. (展开)
0 有用 JOKER 20170210
landmark
0 有用 牧真 Eugene 20190429
Landmark of vision science. Great Marr!
0 有用 牧真 Eugene 20190429
Landmark of vision science. Great Marr!
0 有用 jdhks 20190601
在这个数据可以轻易获得的时代，Marr的three levels of analysis值得拿来反复咀嚼，特别是在AI和neuroscience/cognitive psychology领域。
0 有用 乔不圆 20180217
！
0 有用 JOKER 20170210
landmark
0 有用 poringking 20180723
I'm extremely sad after reading this book. Not only because a genius like Marr died at such an early age, but also because no one had ever told me to read this book before going into computer vision, ... I'm extremely sad after reading this book. Not only because a genius like Marr died at such an early age, but also because no one had ever told me to read this book before going into computer vision, not even my own advisor. In page 18 Marr wrote: "Gone are the ad hoc programs of computer vision" but yet that's all I see in CVPR/ICCV these days. (展开)