Keywords
visual perception, object identification, multiple objects, global features
visual perception, object identification, multiple objects, global features
Visual information is processed in the brain in both parallel and serial. This is true not only for anatomical reason (e.g. parallel neuronal pathways from the retina; Merigan & Mounsell, 1993) but also for perceptual phenomena. An example of parallel perceptual processing is target detection in a visual search task. Features such as color, orientation, motion, spatial frequency and stereodepth are detected in parallel (Enns & Rensink, 1991; Gilchrist et al., 1997; Treisman & Gelade, 1980; Wolfe, 1994), whereas ambiguous figures and stimuli in a binocular rivalry are perceived in serial (Alais & Blake, 2004; Leopold & Logothesis, 1999). The present study concerns the identification of visual objects in multiple object stimuli and whether objects are identified in parallel or serial mode.
In scene perception, we have empirical findings that demonstrate the parallel processing of a scene category, i.e. gist perception together with perception of objects in a scene (Brandman & Peelen, 2017; Gagne & MacEvoy, 2014; Hollingworth & Henderson, 1999; Joubert et al., 2007; Joubert et al., 2008). Rousselet et al. (2002, 2004a) demonstrate parallel processing of several scenes presented simultaneously. Rousselet & colleagues (2004b) argue, based on neurophysiological findings that neurons of infero-temporal cortex with large receptive field could encode the identity of several objects in parallel and theoretically simultaneous identification of several objects is possible. On the other hand, scene identification influences identification of objects in the scene and the identification of objects influences identification of a scene (Davenport & Potter, 2004; Davenport, 2007; Joubert et al., 2007; Joubert et al., 2008; Mack & Palmeri, 2010). Such influences suppose successive perceptual processes.
Much less research has been conducted on interaction between objects presented simultaneously without scene context. Gronau et al. (2008) (see also Auckland et al., 2007; Green & Hummel, 2006) demonstrated the facilitation effect of semantically and spatially related objects on their identification in a study in which two semantically related or unrelated objects were presented in congruent or incongruent spatial relation.
If objects are identified successively, we can expect to observe a direct dependence of reaction time on the number of objects. Successive identification permits interaction between objects that could result in the nonlinear dependence of reaction time on the number of objects when reaction time depends on factors such as the similarity of objects and belongingness to the same or different categories. Our recent study demonstrated that object identification time depends much more on the number of categories than on the number of objects upon multiple object stimulus (Soliunas et al., 2018). One, two or three objects (pictures of 10 categories of man-made objects) were presented simultaneously for 100 ms and then followed by a name of a category. Subjects were asked to answer whether objects of this category were present in a stimulus. Performance accuracy and reaction time did not depend on the number of objects if the objects belonged to the same category.
The present study is further verification of the hypothesis that objects of the same category are identified simultaneously. For better control of the global features of objects, new categories were selected and new objects were produced. A shape is a possible feature of an object that could enable parallel identification. To verify this possibility, we manipulated the global and local features of objects by distorting them. We predicted that the parallel identification of objects would be observed for intact objects, but not for globally distorted objects.
All subjects were invited to participate in the study by personal discussion. Between February and May 2015, a total of 58 volunteer students from Vilnius University agreed to took part in the experiment (44 females and 14 males; 20–22 years of age). Each subject had normal or corrected-to-normal vision and verbally confirmed that had no prior experience with psychophysical testing of a similar nature. The subjects were not informed about the specific goals of this particular experiment. All subjects signed an informed consent form approved by the Lithuanian Bioethics Committee (consent form No. ASI12, approval No. 158200-13-578-173; issued by the Vilnius Region Ethics Committee of Biomedical Research, Vilnius, Lithuania). All subjects took part in one experimental session.
A total of 62 objects of 10 categories (shoe, cap, clock, ashtray, cup, table, telephone, vase, mirror, and kettle/teapot) were selected from internet search engines in such a way that the shape of an object was not the exclusive feature of particular category. When selecting objects and creating stimuli of multiple objects, seven outline shapes were taken into account: “8”-shaped, circle, ellipse, square, elongated, triangle and “L”-shaped. Objects of each category had at least three outline shapes and each outline shape had at least three categories of objects.
All objects were transformed into grayscale pictures and resized in such a way that fitted into a 100×100-pixel area. There were eight types of stimuli that varied in the number of objects (one, two or three), number of categories (one, two or three), and number of outline shapes (one or two) (Figure 1): i) “1-1” stimuli (one object); ii) “1-2” stimuli (one category, two objects of different shape); iii) “1-3” stimuli (one category, three objects of two shapes); iv) “2-2s” stimuli (two categories, two objects of the same shape); v) “2-2d” stimuli (two categories, two objects of different shapes); vi) “2-3s” stimuli (two categories, two objects of the same category and the same shape and third object of different category and different shape); vii) “2-3d” stimuli (two categories, two objects of the same category but different shapes and third object of different category but the same shape as one of the two objects of the first category); and viii) “3-3” stimuli (three categories, three objects of two shapes). In total, 10 stimuli of each type were created, giving 80 stimuli altogether (Supplementary File 1).
Columns represent different types of stimulus; rows represent experimental conditions. Each individual column represents the same objects under three experimental conditions.
The objects were placed into a 200×200-pixel area around a fixation point that was located at the center of this area. Stimuli were presented at the center of screen on the white background and subjects did not see the limits of the 200×200-pixel stimulus area. Distance between the subject’s eyes and the screen was 60 cm, and consequently the angular size of the 200×200-pixel stimulus was 8°×8°. The orientation of a particular object was not constant across stimuli and could rage between −45° and +45° with respect to natural (vertical or horizontal) orientation.
Stimuli were presented under three experimental conditions: i) original; ii) locally distorted; and iii) globally distorted. The “original” condition corresponds to the presentation of stimuli described above. Locally distorted stimuli were created by partially masking the original stimuli with white stripes: 9-pixel-wide stripes with 9-pixel gaps (Figure 1). This procedure partially or completely eliminates some local features of objects but basically preserve the outline shape. The smaller the features, the higher the probability of elimination. Globally distorted objects were created by applying Whirl and pinch and Ripple functions in the image editor GIMP 2.8.10 (Kimball et al., 2013). The same values of these functions distort objects of different shapes to different degrees; we therefore had to apply different values of these functions to more-or-less subjectively equalize the assessed degree of distortion in different objects. Whirl and pinch values ranged from −80 to +80 for elongated and rounded shapes and from −200 to +200 for more angular shapes, and pinch amount ranged from −1 to +1. Ripple values ranged from 40 to 70. The applied global distortion procedure affects outline shape and to a lesser degree the local elements of object. The smaller the elements, the less distortion there is.
To reduce memorization of stimuli during experiment, the orientation and location of particular object in particular stimulus varied across conditions.
The experiment was performed at the Department of Neurobiology and Biophysics, Vilnius University. Experimental sessions were conducted during daytime (the precise time of the day was not controlled) in a room with natural daylight illumination.
Stimulus presentation and data registration were controlled by E-Prime v.2.0 (Psychology Software Tools, Inc., 2012) experiment generator running on Windows OS. Stimuli were presented on the screen of 19-inch CRT monitor running at 85 Hz frame-rate and 1024×768 resolution. The subject’s head was not fixed but they were instructed to hold the same distance (about 60 cm) from the display during experiment.
Before the experimental session, subjects performed practice session that consisted of 16 trials (two trials of each stimulus type). Only original stimuli were presented during practice.
The trial procedure of experimental session is shown in Figure 2. A fixation point was presented at the center of screen for 306 ms and the subjects were asked to keep their eyes focused on the fixation point during the test stimulus presentation. Appearance of the fixation point was followed by a 106-ms blank interval and then a test stimulus was displayed for 106 ms (i.e. for 9 frames of the CRT monitor) under “original” conditions and for 200 ms (i.e. 17 frames) under “locally distorted” and “globally distorted” conditions. The longer stimulus exposition duration under the two conditions with distorted stimuli were chosen to equalize response accuracy under all conditions. The test stimulus was followed by a 35-ms blank interval and then a masking pattern was displayed for 306 ms (we used backward masking procedure to control the time available for object identification). The masking pattern was an 8°×8° square of chaotic pattern. After 35 ms blank interval, a probe-word was presented. The probe-word was a name of a category written in lowercase Arial font, 2° height. Subjects had to decide whether an object defined by a probe-word was present or not on a given trial by pressing the “1” or “2” key on the right side of a keyboard. One half of subjects received the instruction to press the “1” for Yes and “2” for No, whereas the other half received inverse instruction. Subjects had four seconds to make their decision. The response time (the duration from onset of probe-word to the keypress event) and accuracy were recorded for each trial. The response initialized the next trial with a 106-ms delay.
The order of experimental conditions was randomized across participants. There were 60-s rest intervals between conditions. Each condition consisted of 160 trials, i.e. 80 stimuli were presented twice in random order. Altogether, 480 stimuli were presented in the experimental session, with eight types of stimuli presented randomly under each condition. The whole experimental session lasted about 30 min.
The chi-square goodness of fit test confirmed normal distribution of experimental data. The reaction time and response accuracy data were analyzed in two-way ANOVA for stimulus type (“1-1”, “1-2”, “1-3”, “2-2”, “2-3”, and “3-3”) and experimental conditions (original, locally distorted, and globally distorted). Initially, there were eight stimulus types, but as there were no statistical differences between performance for the “2-2s” and “2-2d” stimuli, we merged these results into one group “2-2”. For the same reason, we merged results of “2-3s” and “2-3d” into one group “2-3”. Newman–Keuls post hoc test was applied to assess the significance of differences between means. All statistical analysis was performed using Statistica v.7 software (StatSoft Inc., 2004).
The results of the experiment are presented in Figure 3 and Dataset 1. Two-way (stimulus type and experimental conditions) ANOVA indicated significant main effects of: stimulus type (F(5,27822) = 281.7, P < 0.0001 for reaction time (RT) data and F(5,27822) = 203.4, P < 0.0001 for response accuracy); experimental conditions (F(2,27822) = 27.7, P < 0.0001 for RT data and F(2,27822) = 49.5, P < 0.0001 for response accuracy). The interaction of the two factors was not significant (F(10,27822) = 1.7, P = 0.079 for RT and F(10,27822) = 1.2, P = 0.272 for accuracy data), which means that stimulus distortion did not change the pattern of performance that was observed for original stimuli. The significant main effect of the experimental conditions indicates that the RT was shorter (730 ms) and the accuracy was higher (86,4%) for original stimuli than for locally or globally distorted stimuli (760 ms and 80.9% for locally distorted stimuli and 762 ms and 82.1% for globally distorted stimuli), but this finding is not notable because the duration of stimulus exposition was different under different conditions and the absolute values of performance under different conditions are irrelevant. What we are interested in is the dependency of the identification of objects on the number of objects and on the number of categories.
Mean values are presented with 95% confidence intervals. Stimulus type: the first digit represents the number of categories in the stimulus, the second digit represents the number of objects (e.g. “1-3” represents three objects of one category).
Figure 3 reveals the influence of the number of objects and the influence of the number of categories on object identification. For RT data, we can see four statistically different levels of performance (we should stress again that we compare values between “stimulus types” but not between conditions): the shortest RT is for “1-1” stimuli, with longer RTs for “1-2” and “1-3” stimuli, even longer RTs for “2-2” and “2-3” stimuli, and the longest RT for “3-3” stimuli. There is no significant difference between “1-2” and “1-3” cases (P = 0.391, 0.876, and 0.329 for “Original”, “Locally distorted”, and “Globally distorted” conditions, respectively, according to analysis using the Newman–Keuls post hoc test) and between “2-2” and “2-3” cases (P = 0.363, 0.442 and 0.472 for “Original”, “Locally distorted”, and “Globally distorted” conditions, respectively). The same four levels of performance were found under all conditions.
For accuracy data, we can see a similar pattern of performance: the highest accuracy is in “1-1” case, with lower accuracy in “1-2” and “1-3” stimulus types, even lower accuracy in the “2-2” and “2-3” stimulus type, and the lowest accuracy in the “3-3” stimulus type. Here we can see two deviations from this rule: accuracy was higher in “2-2” case than in “2-3” case under “Original” (P < 0.01) and under “Globally distorted” (P < 0.01) conditions.
Summarization of the performance in relation to the number of objects and on the number of categories without differentiating experimental conditions is presented in Figure 4.
Mean values are presented with 95% confidence intervals.
The dependence of performance effectiveness on the number of categories is more clearly expressed than the dependence of performance effectiveness on the number of objects. For one-category stimuli, there was no difference in RT and accuracy whether two or three objects were presented. For two-category stimuli, there was no difference in RT whether two or three objects were presented, but accuracy was higher in the case of two objects. We can state that “the more categories, the poorer the performance”, but not that “the more objects, the poorer the performance”, because it depends on whether objects belong to the same or to different categories.
The present study is continuation of our previous investigation (Soliunas et al., 2018) described in the Introduction, which findings suggested that the objects of the same category could be identified in parallel mode. The experiment described here further tested this hypothesis.
The first important result is the replication of the principal findings of the previous experiment, despite the fact that a different set of stimuli were used (all stimuli were newly created) and a different group of subjects took part in the experiment. These findings indicate that the identification of objects in multiple object stimuli basically depends on the number of categories present, but not on the number of objects. It further supports the suggestion that objects of the same category are identified simultaneously.
The second aim of the study was to search for the features of stimuli that could enable the parallel identification of objects of the same category, i.e. searching the “category” features that are identified in parallel. One set of stimuli had more distorted local features and the other set of stimuli had a more distorted global features. We predicted that the parallel identification of objects could be based on global features, therefore the distortion of outline shape should result in a dependency of response time on the number of objects independent of whether objects belong to the same or to different categories. The results of the experiment did not support our hypothesis. Both types of distortion had an effect only on the absolute level of performance accuracy, and to reach the same accuracy level as with intact stimuli, the exposition time for distorted stimuli was doubled. Distortion of global or local features did not change the pattern of task performance (i.e. the dependency of performance effectiveness on the number of objects and on the number of categories). At this point we can only suggest that both local and global features are used to identify the category of objects in a multiple-object environment.
It is too early to conclude that objects of the same category are identified in parallel in natural settings based on the findings of this study. Further investigations are required to test this suggestion. Here we can only speculate about the possible processes of identification of multiple objects. Our findings suggest the following scenario. As the identification of one object was faster and more accurate than identification of two objects of the same category, it is possible that the visual system first identifies one category. Additional time is required to identify other objects of this category but this could be done in parallel mode because this second stage could be regarded as “detection stage” instead of the first “identification stage”. Many studies and theories state that the detection of an object’s presence is a faster process than identification of the object’s category (Biederman, 1987; de la Rosa et al., 2011; Kobylka et al., 2017; Marr, 1982; Nakayama et al., 1995; but see Green, 1992; Grill-Spector & Kanwisher, 2005, which found no difference in the performance time between identification and detection tasks). In our case, the visual system should detect whether an object such as “shoe” (object of the first identified category) is present or not, and all shoes are detected simultaneously if they are present. Later follows the next “identification stage” when the next category is identified. It remains unclear what kind of object features are processed during the detection and identification of objects. A somewhat similar two-stage processing model of several simultaneously presented pictures was suggested by Potter & Fox (2009). They presented up to four photographs simultaneously in a rapid serial visual presentation procedure and subjects had to either detect a verbally denoted target or memorize pictures and perform a recognition test after each sequence of pictures. The two stages for visual processing suggested by the authors were: fast global processing of all pictures in a stimulus that is sufficient for target detection, and slower serial processing that is required for object recognition.
In summary, the presented experimental data support the hypothesis that visual objects of the same category are identified in parallel in multiple object stimuli. As the distortion of global or local features do not influence the performance pattern, we can suggest that both global and local features are processed during identification of object category. The number of simultaneously presented objects was restricted to three items in our experiment, therefore the further research is needed with higher number of objects.
Dataset 1. Response time and accuracy data. "1-1", "1-2", "2-2s", "2-2d", "1-3", "2-3s", "2-3d", "3-3" are different types of stimuli where the first digit between quotation marks indicates the number of categories and the second digit indicates the number of objects. Original, original conditions; locally d., locally distorted conditions; globally d., globally distorted conditions; RT, reaction time; %, percentage accuracy. DOI: 10.5256/f1000research.14468.d204177 (Šoliūnas, 2018).
Free alternatives for E-prime software: PsychoPy (Peirce, 2007); DXMD (Forster & Forster, 2003).
Views | Downloads | |
---|---|---|
F1000Research | - | - |
PubMed Central
Data from PMC are received and updated monthly.
|
- | - |
Is the work clearly and accurately presented and does it cite the current literature?
Partly
Is the study design appropriate and is the work technically sound?
Partly
Are sufficient details of methods and analysis provided to allow replication by others?
Yes
If applicable, is the statistical analysis and its interpretation appropriate?
No
Are all the source data underlying the results available to ensure full reproducibility?
Yes
Are the conclusions drawn adequately supported by the results?
Partly
Competing Interests: No competing interests were disclosed.
Reviewer Expertise: visual perception, visual psychophysics
Is the work clearly and accurately presented and does it cite the current literature?
Partly
Is the study design appropriate and is the work technically sound?
Partly
Are sufficient details of methods and analysis provided to allow replication by others?
Yes
If applicable, is the statistical analysis and its interpretation appropriate?
No
Are all the source data underlying the results available to ensure full reproducibility?
Yes
Are the conclusions drawn adequately supported by the results?
Partly
References
1. Potvin P, Schutz R: Statistical power for the two-factor repeated measures ANOVA. Behavior Research Methods, Instruments, & Computers. 2000; 32 (2): 347-356 Publisher Full TextCompeting Interests: No competing interests were disclosed.
Is the work clearly and accurately presented and does it cite the current literature?
Yes
Is the study design appropriate and is the work technically sound?
Yes
Are sufficient details of methods and analysis provided to allow replication by others?
Partly
If applicable, is the statistical analysis and its interpretation appropriate?
No
Are all the source data underlying the results available to ensure full reproducibility?
Yes
Are the conclusions drawn adequately supported by the results?
Yes
References
1. Munneke J, Brentari V, Peelen MV: The influence of scene context on object recognition is independent of attentional focus.Front Psychol. 2013; 4: 552 PubMed Abstract | Publisher Full TextCompeting Interests: No competing interests were disclosed.
Reviewer Expertise: Visual perception; Cognition; Psychophysics
Is the work clearly and accurately presented and does it cite the current literature?
Partly
Is the study design appropriate and is the work technically sound?
Partly
Are sufficient details of methods and analysis provided to allow replication by others?
Yes
If applicable, is the statistical analysis and its interpretation appropriate?
No
Are all the source data underlying the results available to ensure full reproducibility?
Yes
Are the conclusions drawn adequately supported by the results?
Partly
References
1. Wagenmakers E: A practical solution to the pervasive problems ofp values. Psychonomic Bulletin & Review. 2007; 14 (5): 779-804 Publisher Full TextCompeting Interests: No competing interests were disclosed.
Reviewer Expertise: Form perception, motion perception, attention, visually-guided action, synesthesia
Alongside their report, reviewers assign a status to the article:
Invited Reviewers | ||||
---|---|---|---|---|
1 | 2 | 3 | 4 | |
Version 1 24 May 18 |
read | read | read | read |
Click here to access the data.
Spreadsheet data files may not format correctly if your computer is using different default delimiters (symbols used to separate values into separate cells) - a spreadsheet created in one region is sometimes misinterpreted by computers in other regions. You can change the regional settings on your computer so that the spreadsheet can be interpreted correctly.
Provide sufficient details of any financial or non-financial competing interests to enable users to assess whether your comments might lead a reasonable person to question your impartiality. Consider the following examples, but note that this is not an exhaustive list:
Sign up for content alerts and receive a weekly or monthly email with all newly published articles
Already registered? Sign in
The email address should be the one you originally registered with F1000.
You registered with F1000 via Google, so we cannot reset your password.
To sign in, please click here.
If you still need help with your Google account password, please click here.
You registered with F1000 via Facebook, so we cannot reset your password.
To sign in, please click here.
If you still need help with your Facebook account password, please click here.
If your email address is registered with us, we will email you instructions to reset your password.
If you think you should have received this email but it has not arrived, please check your spam filters and/or contact for further assistance.
Comments on this article Comments (0)