Abstract: Approaches to how the visual brain might process information for the stabilisation and recognition of objects have produced a number of different theories as to how this can be achieved. No theory seems to be able to account for all the deficiencies in processing caused by brain lesions to the cerebral hemispheres. Here, an attempt is made to identify some of the shortcomings and strengths of these theories and thereby establish a consensus that can assimilate previous models within the context of a newer, more comprehensive, framework.
Introduction – Early Models of Object Processing.
It has been proposed that exemplars, or particularities, as retinotopic viewer-centred descriptions emanating from lower occipital areas, are encoded in the right cerebral hemisphere (R.H.) in the sense that perceptual variability can be overcome at this level (Warrington and Taylor 1978). From this standpoint, the R.H. is therefore perceptually able to encode form without assigning a semantic value, which is taken to be a function of the Left Cerebral Hemisphere (L.H.). Accordingly, R.H. processing has been taken as the locus of object-centred descriptions to the extent that 2D output from lower visual areas can be processed by the R.H., in order to attain stability or invariance. However, in the L.H., this level of analysis receives object-centred descriptions from the R.H., useful for semantic processing. In this sense, the L.H. connects object-centred descriptions (as embodied in a typical view) to visually-mediated semantic knowledge. Thus, it seems, the L.H. exploits object-centred descriptions constructed by the R.H. in order to extract the major, global co-ordinates necessary for the attribution of a broader categorical classification.
This model constituted a preliminary approximation as to how information might flow through the visual cortex derived from lesion studies at the time. Based on Marr’s (1982) seminal work on object processing, Warrington and associates originally tried to account for the way object categorisation and labelling occurs in the visual brain by way of a linear model. Due to obvious discrepancies in her own neurophysiological research that found differences in processing regimes between the left and right hemispheres it was, subsequently concluded that this was no longer tenable. This was because, even though the R.H. was thought to constitute an earlier stage in visual processing for unusual views/detail, when this hemisphere sustained damage, the L.H. still appeared to achieve a level of functioning based upon the ability to process the conventional/global features of objects. As the L.H. was thought to receive input via the R.H., a linear model predicted that the later analysis carried out by the L.H. should be severely disrupted but this did not appear to be the case. As Davidoff and Warrington (1999: p. 280) pointed out, one must presume that a strictly serial model must preclude an earlier stage being impaired if there is normal performance at a later stage. Warrington was therefore obliged to amend this account with a more complex model that took into consideration the apparent discrepancies. Accordingly, a parallel processing account was thought to be more applicable and thus came to supersede the original serial model (Warrington and James, 1988; Rudge and Warrington, 1991). Here, it was proposed, as well as proceeding first to the right and then the left inferotemporal hemispheres respectively, the inferotemporal cortex of both hemispheres also receives bilateral input directly from both lower areas of the visual cortex (V1 to V4). Given the putative existence of associative (L.H. damage) and apperceptive agnosia (R.H. damage) as syndromes describing the above decrements, Warrington’s reinterpretation of the processing sequence seemed plausible.
Further Problems with Earlier Models.
Although the earlier serial model was changed to take account of the inconsistencies stemming from Marr’s description of object processing, this still left certain disparities that were not fully explained by Warrington’s later parallel processing model. In explicit terms, if the default mechanism in the R.H. is object-centred (viewer independent) (Warrington and Taylor, 1978) e.g. mediated by Marr’s axis scenario (or Biederman’s RBC: 1987), then most, if not all, views of an object should be accessible including usual and unusual views by way of R.H. systems, otherwise this would not constitute a true object-centred description. However, this seems not to be the case as conventional views are incapable of being accessed in the R.H. and, furthermore, it is precisely the conventional view (or views) that are generally held to encode for invariance in the first instance (see below). In order to circumvent these difficulties Warrington (Warrington and James, 1988; Rudge and Warrington, 1991; Davidoff and Warrington, 1999) proposed that R.H. processing should be regarded as an “optional resource” rather than an obligatory post-sensory, pre-semantic stage.
Interestingly, although Warrington and James (1986) supported Marr’s distinction between object-centred and viewer-centred co-ordinates, unlike Humphreys and Riddock (1984), (who also endorsed Marr’s approach) Warrington and James suggested a possible alternative to Marr’s axes model based upon distinctive features. This was because Warrington and James (1986) (see also McCarthy and Warrington, 1990 p. 49) found some unusual views were difficult to perceive in R.H. lesioned patients where there was no foreshortening of the major axis, or the angle of view had no systematic effects on object recognition. In a study of rotated silhouettes it was established that, for normals, each object has its own “fingerprint” or recognition threshold function. Appositely, it was also found R.H. patients, although they had elevated thresholds for each object, these followed the same pattern as the control group – so there were no discontinuities between the performance of the controls and the critical lesion group as would be predicted on the “axis transformation” explanation. McCarthy and Warrington further proposed that this can be explained through “features” as relatively unique clusters of visual contours based, for example, on curvature and angle, and the relative positions of clusters independent of their absolute orientation in space. Therefore, the threshold of any object would depend on the number of critical features which become available when, for example, a shadow is rotated through a specific angle. Thus each object has not so much a usual view but rather a minimal view. In essence, this amounts to a quantitative rather than a qualitative inefficiency on the part of the R.H. lesioned group, which can be seen as a degradation of visual vocabulary such that more visual features need to be in view in order for an object to be recognised. This has the advantage of being able to account for the evidence that object perception deficits can be triggered by a number of visual stimuli in which distinctive features are obscured. From this perspective, unusual views can be defined as those views that present certain ambiguities due to the location of decisive features along a contour and their relationship to one another as an object undergoes transformation. In this regard, the L.H. remains intact but R.H. lesioned subjects seem to be responding to the more obvious salient or major distinctive features that take precedence over lesser features because of their greater prominence. Accordingly, damage to the R.H. will lead to a loss of the ability to access the lesser features and a concomitant degradation of the features vocabulary so that more features will be required to disambiguate an object in the event of paradoxical, complex or depleted views.
Farah (2000) suggested that a neural network model can more parsimoniously explain the disparity, and therefore two tokens (stable perceptual representation that is matched with a stored memory description) are unnecessary. Here, the purported difficulty R.H. damaged subjects have with unusual views is suggested as being suspect (if existing at all) to the extent that patients with this deficit show no impairment in recognising objects in everyday life (Farah, 1990: p.55). Moreover, Farah believes R.H. deficits should only be considered an impairment for stimuli difficult to see in that it may play a special role in the perception of objects that have been degraded through reduction in brightness, contrast, size, exposure, duration and blurring. The R. H. contribution to object recognition when processing unconventional views may therefore be a function of such factors.
On one level Farah’s insight into apperceptive agnosia (or what she refers to as categorical deficit disorder) is in sympathy with Warrington’s later reading of R.H. deficits as a quantitative rather than qualitative difference. Farah (1990) also agrees that the R.H.may represent an optional resource rather than an obligatory stage in that it is concerned with a kind of visual problem solving. So notwithstanding Farah’s reservations as to the precise neural mechanism underlying visual agnosia there is a measure of agreement that R.H. deficits are simply a matter of degree in that the R.H. functions as a back-up system to the primary analysis carried out by the L.H.. Interestingly, Humphreys and Quinlan (1987) propose the R.H. may process local salient features but that the visual system will use those cues, whether global or local, according to whichever solution best fits a particular problem, e.g. in ambiguous situations local feature descriptions may be elicited. Correspondingly, Bauer (1993) believes the notion of a two-stage model of apperception and association needs to be updated in favour of a more complex system that involves parallel processing streams occurring simultaneously at cortical and subcortical levels.
Viewpoint Dependent Theories of Object Processing Relating to Apperceptive and Associative Agnosia
This revised account may be more in sympathy with viewer-centred (viewpoint dependent) models of how object constancy can be achieved (e.g. Tarr and Blthoff, 1995; Tarr et al., 1998; Tarr and Pinker, 1989; etc.) rather than the object-centred approach of Marr and Biederman, as the minimal view can be construed as equivalent to the threshold by which different but significant views of an object undergoing transformation can be demarcated. In other words, where clusters of features for one view converge on this threshold a different set will appear for each different view. Therefore, there will be separate co-ordinates established according to a limited number of sets of clusters stable enough to encode an object through its full range of transformations. Features may well be derived initially from such cues as curvature maxima that tag the features themselves (Hoffman and Richards, 1984; Hoffman and Singh, 1997; Singh et al., 1999; Richards et al., 1987; Norman et al., 2001) which for statistical reasons, are more likely, but not exclusively, to be found on the longer axis of an object. This may be part of a hierarchical system with the more obvious global features being tagged first that would serve for the general purpose of attaining constancy and expedient recognition (Hayward, 1998). However, where an object, for any reason, continues to remain ambiguous, R.H. functions will be recruited in order to facilitate more detailed analysis. Interestingly, Lawson also (Lawson, Humphreys and Watson, 1994) discounts Marr’s axis theory (and Biederman’s RBC) of object recognition in favour of viewpoint dependent approaches.
It would be interesting to determine whether R.H. lesioned individuals would be capable of coding for the full array of usual views for any given object – Warrington’s studies (Warrington and James, 1986: McCarthy and Warrington, 1990) seem to suggest this is the case as the conventional view chosen for each object was determined approximately rather than absolutely. This is because an object is thought to have a range of usual views (defined by sets of features as outlined above) based upon global cues rather than one explicit view. A viewer-dependent approach would predict that the R.H. lesioned subjects should be capable of disambiguating a number of such “conventional” views, although I am not aware of any research that has specifically addressed this issue. Warrington and James (1986) investigation with silhouettes tends to confirm this prediction as their R.H. lesioned group showed no discontinuities compared to the performance of normals in a task that involved different conventional views for individual objects.
The Imprecise Role of Perceptual Factors in Object Processing.
Interestingly, Farah (2000) has suggested associative agnosia could also involve perceptual impairments that can lead to problems of recognition. In this respect, some researchers may have played down the importance of the perceptual factors that seem to be manifest in this syndrome (Bauer  makes the same point). For example, associative visual agnosic patients appear to be abnormally sensitive to the visual quality of the stimulus (Levine and Calvanio, 1989; Riddoch and Humphreys, 1987; Rubens and Benson, 1971). Furthermore, although precise copying of pictures and preserved matching is often referred to in showing how perception is intact with the problem residing in the process of associating a normal percept with visual memory knowledge, this ability is nearly always compromised by an extremely slow, slavish copying procedure that involves characteristic errors and omissions (Farah, 2000: pp. 95-97). This suggests that perceptual factors may be important in the L. H. system. Indeed, Farah (2000) concludes that associative agnosics may be unable to recognise objects because they fail to represent shape normally. Correspondingly, Farah (1990) emphasises the point that Humphrey’s and Riddoch’s (1984) right-hemisphere damaged patients showed good performance on one of the unconventional views, therefore, they were not actually impaired at this task. This further supports the notion the L. H. may independently attain a near normal level of perceptual invariance that can subsequently be exploited for semantic properties by this hemisphere. Marsolek (1995: p. 376) explicitly raises this possibility suggesting the visual form system may be operational in both hemispheres but the L.H. is more effective for abstract visual-form while the R. H. works more efficiently with specific instances of form through the attribution of details. In this respect, it is proposed that one system classifies different instances of an object as belonging to the same abstract category involving the L.H., while the other, in the R.H., preserves visual details in order to ascertain specific exemplars of particular object classes. Although Warrington and James (1988: 29) suggest the likelihood of a dedicated L. H. pre-semantic categorical system, at the same time they regard this as still an open question. Ellis and Young (1988: 38) go further in proposing that the unimpaired performance of subjects with R. H. lesions (L.H. undamaged) on conventional views suggests that both viewer-centred representations and what they refer to as “object recognition units” are relatively intact. Object recognition units are defined as stored structural descriptions of known objects. Although this implies an object-centred description existing between viewer-centred descriptions and object recognition units this is not explicitly stated. Indeed, Ellis and Young seem to be proposing that viewer-centred descriptions can be mapped directly onto a corpus of object recognition units without the benefit of stability gained first from object-centred structural processing for each individual object!
Towards a possible alternative explanation.
Given that Warrington and Farah’s accounts seem to downgrade the importance of the R.H., for the realisation of a stable structural description, the question arises as to how theories of object recognition can be assimilated according to this revised scenario. As indicated, Warrington’s earlier linear model hypothesised that information flowed from the lower visual cortices initially to the R. H. system before being transferred to the L. H. via callosal routes. An alternative to this processing arrangement would be to assume that Warrington’s model had the flow of information the wrong way round. This is partially supported by the fact that, although Warrington and Taylor (1978) indicated that feedback systems and multiple interactions were possible between hemispheres, this did not seem to constitute a major component of their model. In fact, McCarthy and Warrington (1990: Fig. 2.16, p. 43) continued to postulate that information flowed from R. H. to L.H. but occasioned a question mark to highlight some hidden factor that occurred between sensory and visual semantic processing. However, if information can be redefined as flowing first to the L. H. and from there to the R.H. rather than vice versa, this would provide a more logical and consistent procedure in relation to neurophysiology and theoretical assumptions. Here, information would pass from the 21/2D sketch first to the L.H. for quick and rapid assimilation of incoming visual data thanks to previously assigned (through learning) “object-centred” processes. This would constitute a fast, unimpeded route to the semantic system previously mediated by familiarisation by way of interaction with any given object. Alternatively, in the event of ambiguous, paradoxical or degraded views, information would flow to the R.H. for views that might over-burden L.H. systems. Disambiguation of these parameters would then rely on a reciprocal engagement of the two hemispheres in the event of a failure at a first pass in the L.H.. Damage to the L.H. would mean the R.H. still receives direct input from the occipital cortex but this would lead to some decrements in recognition as the system coding for visual details and parts in the R.H. would now be unable to communicate with globally-defined conventional/usual view vectors of the L.H. in order to attain object invariance in the event of ambiguous or degraded stimuli etc. This would explain the apparent deficiencies in the parts vocabulary in apperceptive agnosia due to the fact the L.H. is now dependent on more global features for attaining perceptual stability and recognition.
As Farah (1990: 55) emphasises, how the R.H. encodes information to allow for the discernment of details, degraded objects, etc., remains to be resolved. However, Kosslyn (1980; 1983) raises the possibility of a separate system for the encoding and attribution of details that can be mapped onto a separately stored, global, outline template. It may, therefore, transpire that the R.H. actually provides a spectrum of local feature descriptions stabilised for transformations that can be matched to global descriptions if and when the occasion demands. Correspondingly, Humphreys and Quinlan (1987) propose that there might be two independent pathways for achieving object constancy, i. e. local distinctive features and global structure (based on principal axis). Here, the notion of “continuous local transforms” (p. 97) is invoked that might independently encode local form elements. Interestingly, Humphreys and Riddoch (1987: see Fig. 10.3) propose a viewpoint-dependent object description for both global shape and local geometric feature processing that is not wholly incompatible with the viewer-dependent model of object recognition presented here (Fig. 4).
This account predicts that the R. H., in coding for details and parts of objects, would search for certain aspects when the L. H. is unable to initially recognise an object. This would, therefore, assume that the L.H. delegates the task to the R.H. for further processing after which information would be relayed back to the L. H. for confirmation. Notwithstanding degraded situations etc., the R. H. would only serve as a “back up system” in the event of idiosyncratic views (as opposed to usual and unusual views) of objects that have rarely, if ever, been experienced before. It seems more judicious to assume that the L. H. would encode for more obvious features, therefore realising invariance for the global aspects of an object, after which this could subsequently be relayed to the visual semantic system, also in the L.H. for immediate recognition. From there the output, in the form of an “icon”, could be used to attribute a verbal label (Davidoff and Warrington, 1999).
From an evolutionary standpoint this processing arrangement seems compelling as it would serve the interests of an organism to deal with incoming information rapidly. Importantly, as global features constitute the ecological most prominent aspect of an object from a range of distances (Hayward, 1998:428), thus providing constancy cues both for distance and rotation in depth, such parameters would not only take precedence over local details but also afford the most efficacious route for immediate recognition.
The modified version of object processing proposed here is based upon a viewer-dependent paradigm. Consequently, any co-ordinates deriving from viewer-centred information originating from lower visual areas of the brain that proved to be of utility are likely to be exploited in later processing systems (Tarr and Blthoff, 1998). These contingencies are likely to be those clusters of global features for specific views of an object that remain stable in the face of various transformations. These constitute a fast, efficient route to object recognition that is exploited by the L.H.. The R. H., however, seems to specialise more with the encoding of local features and details that would, as a back up to the L.H., afford disambiguation of degraded objects or objects that appear paradoxical, e. g. extreme foreshortening or deviations from the canonical view as in accidental views as opposed to mere unusual views.
Bauer, R. M. (1993) (3rd Edit.) Agnosia.. In, K. M. Heilman and E. Valenstein (eds.) Clinical Neuropsychology, 215-278, Oxford University Press: New York.
Biederman, I. (1987) Recogntion-by-components: A theory of human image understanding. Psychological Review, 94, 115-145.
Davidoff, J. and Warrington, E. K. (1999) The bare bones of object recognition: implications from a case of object recognition impairment. Neuropsychologia, 37, 279-292.
Ellis, A. W. and Young, A. W. (1988) Human Cognitive Neuropsychology. Lawrence Erlbaum: Hove.
Farah, M. J. (1990) Visual Agnosia. Bradford, MIT Press: Cambridge, Mass.
Farah, M. J. (2000) The Cognitive Neuroscience of Vision. Blackwell: Malden, Mass.
Hayward, W. G. (1998) Effects of Outline Shape on Object Recognition. Journal of Experimental Psychology: Human Perception and Performance, 24 (2), 427-440.
Hoffman, D. D. Richards, W. A. (1984) Parts of recognition. Cognition, 18, 65-96.
Hoffman, D. D. Singh, M. (1997) Salience of visual parts. Cognition, 63, 29-78.
Humphreys, G. W. and Riddoch, J. M. (1984) Routes to object constancy: implicatons from neurological impairments of object constancy. Quarterly Journal of Experimental Psychology, 26A, 385-415.
Humphreys, G. W. and Riddoch, J. M. (1987) The Fractionation of Visual Agnosia In, G. W. Humphreys and M. J. Riddoch (eds.), Visual Object Processing: A Cognitive Neuropsychological Approach. pp. 281-306 Lawrence Erlbaum: Hove.
Humphreys, G. W. and Quinlan, P. T. (1987) Normal and Pathological Processes in Visual Object Constancy. In, G. W. Humphreys and M. J. Riddoch (eds.) Visual Object Processing: A Cognitive Neuropsychological Approach. pp. 43-105 Lawrence Erlbaum: Hove.
Kosslyn, S. M. (1980) Image and Mind. Harvard University Press: Cambridge, Mass.
Kosslyn, S. M. (1983) Ghosts in the mind’s machine: creating and using images in the brain. Norton: New York.
Lawson, R., Humphreys, G.W. and Watson, D. G. (1994) Object recognition under sequential viewing conditions: evidence for viewpoint-specific recognition procedures. Perception, 23, 595-614.
Levine, D. and Calvanio, R. (1989) Prosopagnosia: A defect in visual configural processsing. Brain and Cognition., 10, 149-170.
Marr, D. (1982) Vision. W. H. Freeman: San Francisco.
Marsolek, C. J. (1995) Abstract Visual-Form Representations in the Left Cerebral Hemisphere. Journal of Experimental Psychology: Human Perception and Performanc, 21 (2), 375-386.
McCarthy, R. A. and Warrington E. K. (1990) Cognitive Neuropsychology – A Clinical Introduction. Academic Press: San Diego.
Norman, J. F., Phillips, F. and Ross, H. E. (2001) Information concentrated along the boundary contours of naturally shaped solid objects. Perception, 30, 1285-1294.
Richards, W. A. Koenderink, J. J. and Hoffman, D. D. (1987) Inferring three-dimensional shapes from two-dimensional silhouettes. Journal of the Optical Society of America A, 4, 1168-1175
Riddoch, J. M. and Humphreys, G. W. (1987) A case of integrative visual agnosia. Brain, 110, 1431-1462.
Rubens, A. B. and Benson, D. F. (1971) Associative visual agnosia. Archives of Neurology, 24, 305-316.
Rudge, P. and Warrington, E. K. (1991) Selective impairment of memory and visual perception in splenial tumours. Brain, 114, 349-360.
Singh, M. Seyranian, G.D. Hoffman, D.D. (1999) Parsing silhouettes: The short-cut rule” Perception and Psychophysics, 61, 636-660.
Tarr, M. J. and Blthoff, H. H. (1995) Is human object recognition better described by geon structural descriptions or by multiple views? Comment on Biederman and Gerhardstein (1993) Journal of Experimental Psychology: Human Percpeion and Performance , 21, 1494-1505.
Tarr, M. J. and Blthoff, H. H. (1998) Image-based object recognition in man, monkey and machine. In, M. J. Tarr and H. H. Blthoff, (eds.), Object Recognition in Man, Monkey, and Machine. pp. 1-20 Bradford, MIT Press: Cambridge:Mass.
Tarr, M.J. and Pinker, S. (1989) Mental rotation and orientation dependence in shape recognition. Cognitive Psychology, 21, 233-282.
Tarr, M. J., Williams, P., Hayward, W. and Gauthier, I. (1998) Three-dimensional object recognition is viewpoint dependent. Nature Neuroscience, 1, 259-331.
Warrington, E. K. and James, M. (1986) Visual object recognition in patients with right-hemisphere lesions: axes or features? Perception, 15, 355-366.
Warrington, E. K. and James, M. (1988) Visual apperceptive agnosia: a clinico-anatomical study of three cases. Cortex, 24, 13-32.
Warrington, E. K. and Taylor, A. M. (1978) Two Categorical Stages of Object Perception. Perception, 17, 695-705.
Warrington, E. K. and Davidoff , J. (1999) The bare bones of object recognition: implications from a case of object recognition impairment. Neuropsychologia, 37, 279-292.