ALL Metrics
-
Views
-
Downloads
Get PDF
Get XML
Cite
Export
Track
Research Article

Possible significance of spatial heterogeneities of local visual features for face perception

[version 1; peer review: 1 approved, 1 approved with reservations]
PUBLISHED 12 Jan 2015
Author details Author details
OPEN PEER REVIEW
REVIEWER STATUS

Abstract

Second-order visual filters are the mechanisms which preattentively combine the rectified outputs of first-order filters (the linear striate neurons). This allows them to select the image areas which are characterized by spatial heterogeneity of the local visual features. The aim of our research is to determine whether information from these areas may be sufficient to detect unfamiliar faces and to distinguish their gender. In our experiments we used digital photos of real living things or artificial objects and faces. All these images were adjusted to an average luminance, contrast and size (7 angle degree) and were processed to extract the areas which differ the most in contrast, orientation, and spatial frequency in each of the six spatial frequencies (0.5, 1, 2, 4, 8, and 16 cpd). The other image parts were adjusted to the background. The obtained pictures were presented in a random sequence. The observer had to say what he/she saw after each presentation. When a face was presented the observer’s answer could be assigned to one of the categories: ‘it is not clear’, ‘head’, ‘human face’, ‘male / female’. We found that the information contained in the image areas with a spatial heterogeneity of the local features is sufficient not only for detecting a face, but also for distinguishing its gender. The best results were obtained at a carrier frequency of 2 cpd. The results were a little bit worse at 0.5 and 1 cpd. However, the information extracted from the high-frequency half of the spectrum was significantly less useful. The obtained results allow us to suggest that the information transmitted by the second-order visual filters may be used for pattern recognition.

Keywords

visual filters, spatial heterogeneity, pattern recognition

Introduction

The issue of visual image formation has a long history. Until recently, there hasn’t been proposed a theory that could explain everything on that matter. In visual neuroscience 3 points of view on image formation have prevailed. According to the first one, an image is a holistic description in which the most typical object of its class is taken as a standard. This is called ‘a template theory’. The second point of view also came from a holistic image description, but takes an average description of an object of its class as standard. This is called ‘a prototype theory’. The third theory assumes that every image can be described using the summation of its features. This is called ‘a feature theory’. Until now it’s remained unknown how could the invariance of holistic descriptions be provided and what could be presented as separation features.

No matter what point of view is closer to the truth, it is now stated for sure that the initial visual processing is a parallel local description of an input, which results in breaking a scene into a quantity of fragments, which are known as primitives. These are the gradients of luminance of various localization, orientation and spacial frequency. The operation begins in retina and ends in the visual cortex.

But this is only the start of visual processing. Image formation inevitably includes grouping of primitives, attributed to a single object. At first, the theory of the integration of features, according to which the mechanism of bounding is selective attention1, was popular. Lately though, the number of tasks had been described to solve which spacial grouping implements preattentively. It is, for example, perceiving of second order movement2 and texture separation3. These operations can be done by the so-called ‘second-order mechanisms’ that preattentively (following a certain rule) bind outputs of first-order mechanisms (the linear striate neurons)35. The following studies proved an existence of such mechanisms and determined its properties69. Considering the second order movement to be a laboratory phenomenon, the ability to quickly divide textures is very important in everyday life.

Is the role of the second order mechanisms limited by the task of texture separation? Considering that these mechanisms can distinguish special modulations of local features, the attempts were made to establish the considerable role of this information in perceiving complex scenes and objects. Analysis of the natural pictures showed that notably the first and the second order features spatially overlap10,11. As a result the second order features were determined to delete the ambiguities from interpretation of the change of luminance (the first order features)12,13.

Meanwhile, it’s quite rational to assume that these modulations could contain important information concerning an object’s forms and their details. Considering this, the goal of our study – to determine whether information concerning spacial heterogeneity – could be useful in identifying all the images and faces among the ‘not faces’ in particular.

Initiating the task ahead, we cannot ignore the fact that the early visual processing is operated by the system of parallel paths that are set to different spacial frequencies1417. It is known that, when it comes to tasks of identifying faces, these frequencies are not the same. There’s also the probability that the results of the processing by particular spacial channel are united in certain combinations.

Preparing the test images, we followed the assumptions about the organization of the second order mechanisms, which are displayed in the ‘filter-rectify-filter’ model18. According to it, the outputs from adjacent linear filters (the first-order filters) with the same frequency and orientation set are united by the certain algorithm in the second-order filters. In other words, in case of the second-order mechanisms, those filters that differ only in localization in field of view unite. Such filters with different resolutions pass those regions of image that differ in heterogeneity to contrast, orientation and spacial frequency. We followed the assumption that these regions, due to their heterogeneity, contain the important information and could be viewed as the ‘regions of interest’.

Methods

Apparatus

The stimuli were displayed on a 17” LG Flatron 775FT monitor hosted by a PC (amd64-compatible) with an NVIDIA GeForce 7300 SE graphical subsystem running Debian GNU/Linux 7.2 (wheezy). The screen resolution was 1152 × 864 pixels with a refresh rate of 75 Hz. The monitor luminance was calibrated by a digital photometer (manufactured by ‘TKA’, St. Peterburg, Russia) using 256 gray levels.

Stimuli

The digital photographs of real objects and faces were used as initial images. All images were previously adjusted in size (7 angle deg.). The average luminance of the stimulus equaled the luminance of the background and was 19 kd/m2. Initial images were processed in such a way that the areas which were different from the surroundings in contrast, orientation and spatial frequency in 6 frequency ranges corresponding to the frequency tuning of human visual pathways were extracted19. The object size was such that its maximum length along any axis corresponded to 0.5 period of the SOF (the window diameter) which was tuned to the lowest carrier frequency (3 cpi).

The sequence of the computations for the preparation of the test images reproduced the operation sequence in the basic model ‘filter-rectify-filter’:

  1. The initial image linear filtration (by FOFs).

    The FOF’ core is a two-dimensional Gabor function20,21. FOF bandpass is 2 octaves. 6 peak spatial frequency with an increment of 1 octave (from 4 to 128 cpi) and 6 orientations with an increment of 30 deg. (from 0 to 150 deg.).

  2. Rectification.

    The rectification was realized by square-rooting of the sum of squares of the FOFs‘ outputs forming the quadrature pair.

  3. The linear filtering of the 36 obtained images (6 spatial frequencies × 6 orientations) by the SOFs.

    The SOF’ core is a two-dimensional Gabor function with 1 period which is 8 times longer than 1 period of the combined FOFs9,13. The orientation tunings of the FOF and SOF were the same2224.

  4. Orientation integration.

    6 values corresponding to 6 FOF’ orientations were obtained for each pixel. Then the maximum of these values was attached to each pixel.

  5. Finding the local peaks at the SOFs‘ outputs.

    The local maximums were found in each of the six matrices of the SOFs‘ outputs (6 spatial frequencies of the carrier).

  6. Windows allocation.

    Each maximum in each ‘frequency slice’ became the window center through which the information from the FOFs was allowed to pass. The window’s diameter was 0.5 of the period of the SOFs forming this frequency slice.

  7. Filling of the windows.

    Each window was filled with the image, obtained by FOFs at corresponding frequency. The pixels‘ luminance was decreased by Gaussian from the window center to the periphery. The image was filled with the background outside the window. In the case of overlapping of the windows the pixels got the major luminance value.

Procedure

The subjects were seated at the distance of 1.15 m from the monitor that was randomly showing the previously made stimulus. Looking at the queue image, the observer needed to tell what he saw. The time of showing wasn’t limited. The images based on photo of man’s and women’s face (unfamiliar) were shown in the queue of the ‘not-faces’ images. The subject’s responses to the above-mentioned images could’ve been categorized in one of three existing categories (‘head’, ‘human’s face’, ‘man or women’), or said that it hadn’t been noted at all in case of a wrong or missing answer. Questions that could lead the observer to the right answer were not asked.

Subjects

A total number of 70 students (9 men and 61 women) aged between 17 and 21 took part in this experiment. All the participants had normal or corrected to normal vision and no history of neurological or psychiatric disorders had been reported. The participants did not take any medicines just before or during the study tests. All the participants of the research were informed about the purpose and the procedures of the experiment; they all signed a consent form that outlined the risks and benefits of participating in the study and indicated that they believed in the safety of the investigation. The study was realized in accordance with the ethical standards consistent with The Code of Ethics of the World Medical Association (Declaration of Helsinki) and approved by the local ethics committee.

Results

The information was allowed to pass only through one ‘window’, centered relative to the face (Figure 1A,C) when the initial (real) images were processed by the SOFs tuned to the lowest frequency of the carrier (0.5 cpd) (we denote these filters as F1). The result of the processing is shown in Figure 1B,D. Looking at the presented images the observers determined the gender in 87.9% and gave a more general response “a face” only in 11.4%.

b624ae09-6710-4fdc-a045-75f2141ea33b_figure1.gif

Figure 1. The face processing at the carrier frequency 0.5 cpd.

A, C – the initial images. The circles are the windows through which the filtered image is allowed to pass. B, D – the test (processed) images. There are only ‘the regions of interest’ at the frequency of filtering.

If the initial images were processed by the SOFs tuned to a higher frequency of the carrier (1 cpd) (F2) the information from only a part of a face could be transmitted through one window (Figure 2A,C). The information was transmitted through 2 windows because there were 2 local maximums at the F2 outputs. The result of the processing may be seen in Figure 2B,D. We should mention that the observers’ results were a little worse than the previous ones. Now the gender was defined in 75.7% and the response ‘a face’ was given in 20%.

b624ae09-6710-4fdc-a045-75f2141ea33b_figure2.gif

Figure 2. The face processing at the carrier frequency 1 cpd.

The SOFs spatially integrating the higher frequency signals (2 cpd) (F3) passed information through the windows which size was about 0.25 of the face (Figure 3A,C). As a result, the test images were formed, shown in Figure 3B,D. In this case the performance was again improved. The observers determined the gender in 94.3%.

b624ae09-6710-4fdc-a045-75f2141ea33b_figure3.gif

Figure 3. The face processing at the carrier frequency 2 cpd.

A further reducing of the SOFs‘ size while increasing the carrier frequency led only to deterioration of the performance (Figure 4, Figure 5, Figure 6).

b624ae09-6710-4fdc-a045-75f2141ea33b_figure4.gif

Figure 4. The face processing at the carrier frequency 4 cpd.

b624ae09-6710-4fdc-a045-75f2141ea33b_figure5.gif

Figure 5. The face processing at the carrier frequency 8 cpd.

b624ae09-6710-4fdc-a045-75f2141ea33b_figure6.gif

Figure 6. The face processing at the carrier frequency 16 cpd.

All obtained results are summarized in the Table 1.

Table 1. Face categorization using the image areas with a spatial heterogeneity of the local visual features.

Carrier frequency
(cpd)
Category
(percentage of answers)
headfacemale/female
0.5 (F1)0.7111.4387.86
1.0 (F2)2.1420.0075.71
2.0 (F3)0.005.7194.29
4.0 (F4)0.7135.0064.29
8.0 (F5)0.0022.8676.43
16.0 (F6)0.7127.1451.43

Integration of the information which was extracted by three SOFs from the low-frequency half of the spectrum (F1+F2+F3) (Figure 7A) did not improve the performance comparing with using only F3 (92.1% versus 94.3% respectively).

But this operation (F1+F2+F3) makes it to identify the person if the initial image is a familiar face25 (Figure 7B).

b624ae09-6710-4fdc-a045-75f2141ea33b_figure7.gif

Figure 7. The results of windows‘ combination at the frequencies 0.5, 1, and 2 cpd.

A – the unfamiliar face from our experiment, B – the familiar face.

head facemale/femaleshownno response
f0.51161231400
f13281061403
f2081321400
f4149901400
f80321071401
f161387214029
Dataset 1.Frequencies of different types of the responses.
Each cell contains the frequency of a certain type of response from the participants to the type of stimuli as shown.

Discussion

The issue of ‘face perceiving’ can be divided conditionally into two parts: the allocation of the useful information (the reduction of redundancy) and the building a sizer of the selected information (the recognition). Our research concerns first part of this issue.

Among all the known algorithms of finding the ‘regions of interest’ only the small part could be viewed as neural2632. These algorithms can be divided into the modular and the net. In case of the first ones the weight is given, in case of the second ones it is formed during the training of the net. The approach used by us in our work is based on the modular architecture of the earlier levels of processing, finishing with the automatic allocation of the useful information from an incoming image.

In our research the first order filters formed six copies of an incoming image with different definition, and the second order filters were used as windows, which were at the maximum level of difference from surroundings by contrast, orientation or spatial-frequency.

The received results show that the identification of a face is more effective on the carrier frequency 2 cpd. This conforms with the other authors’ data that showed that the identification of faces is faster and more precise if the frequencies of the middle range are used16,17,3337. So what is new in our information?

We’ve shown that not the whole face is informative, but only its regions with spatial heterogeneity. Meaning, in task of detection of a human face and the definition of the gender information of a whole face is significantly redundant. It does not contradict with the data that processing of a human face is holistic38,39. It’s just that the integrated information concerning its most informative areas could be enough for the holistic description of it.

If 100% would be the sum of all second-order filters activated by our images we can presume that at frequency of 2pcd the volume of selected information would be 1%. Reminder, this amount of information is enough to determine gender confidently.

To identify a familiar face, determination of regions of interest on one of the frequencies is not enough. The summation of the low-frequency half of spectrum is necessary at least (Figure 8). High-frequency information is not crucial for the gender determination and identification, but useful in perceiving details and delicate differentiation.

b624ae09-6710-4fdc-a045-75f2141ea33b_figure8.gif

Figure 8. The result of the familiar faces‘ processing at the carrier frequency 2 cpd.

At the left – Einstein, at the right – princess Diana.

Thus been said, we chose areas of image that differ most from the surroundings in contrast, orientation and spacial frequency to be the most informative. The second order filters that we used form maps of convexity for every carrier frequency. As a result we have the “embedded maps”. At the lowest of the used frequencies one of the filters selects face as a whole. The following maps select areas of a face that are smaller and smaller. If the object approaches or retreats, so that its size changes in a certain range, the embedded maps stay the same. The difference would be only if the object approaches the same regions of interest would be allocate by second-order filters, which are tuned at lower carrier frequency, and if it retreats – at the higher one.

The smaller the window, which transmits the information, the higher its definition is. As a result, the same portions of information are transmitted through every window. If size of an object or it’s turns is changing, general capacity and nature of information transmitted by second-order filters stays the same.

Conclusions

Note that the information allocated with the given algorithm is useful for perceiving faces, the following hypothetical model of second-order filters image formating can be proposed. The face describing is simultaneous in a number of definition levels. At the relatively low level a face is described as a whole. With the higher definition transmits information concerning large objects of a face. Every higher level describes even smaller details. Wherein the given information allocates and transmits with parallel frequency channels. As a result, a hierarchical description of a face formed parallel, according to automatic algorithm. Wherein, the system of decision making can not use all available information. Elaborations will be made until specific visual task will be resolved.

The obtained results allow to assume that second-order filters are suitable candidates to the role of mechanism of convexity map formating, and information they allocate can be used to form a face image.

Data availability

F1000Research: Dataset 1. Frequencies of different types of the responses, 10.5256/f1000research.5975.d4149940

Comments on this article Comments (0)

Version 1
VERSION 1 PUBLISHED 12 Jan 2015
Comment
Author details Author details
Competing interests
Grant information
Copyright
Download
 
Export To
metrics
Views Downloads
F1000Research - -
PubMed Central
Data from PMC are received and updated monthly.
- -
Citations
CITE
how to cite this article
Babenko VV, Alekseeva DS and Yavna DV. Possible significance of spatial heterogeneities of local visual features for face perception [version 1; peer review: 1 approved, 1 approved with reservations]. F1000Research 2015, 4:10 (https://doi.org/10.12688/f1000research.5975.1)
NOTE: If applicable, it is important to ensure the information in square brackets after the title is included in all citations of this article.
track
receive updates on this article
Track an article to receive email alerts on any updates to this article.

Open Peer Review

Current Reviewer Status: ?
Key to Reviewer Statuses VIEW
ApprovedThe paper is scientifically sound in its current form and only minor, if any, improvements are suggested
Approved with reservations A number of small changes, sometimes more significant revisions are required to address specific details and improve the papers academic merit.
Not approvedFundamental flaws in the paper seriously undermine the findings and conclusions
Version 1
VERSION 1
PUBLISHED 12 Jan 2015
Views
16
Cite
Reviewer Report 09 Apr 2015
Talis Bachmann, Laboratory of Cognitive Neuroscience, Institute of Public Law, University of Tartu, Tartu, Estonia 
Approved with Reservations
VIEWS 16
This paper includes several interesting ideas (e.g., how different response categories by subjects allow to know what is salient in the test image, using spatial heterogeneity areas as the basic image processing strategy). However, in its present form it could ... Continue reading
CITE
CITE
HOW TO CITE THIS REPORT
Bachmann T. Reviewer Report For: Possible significance of spatial heterogeneities of local visual features for face perception [version 1; peer review: 1 approved, 1 approved with reservations]. F1000Research 2015, 4:10 (https://doi.org/10.5256/f1000research.6392.r7885)
NOTE: it is important to ensure the information in square brackets after the title is included in all citations of this article.
Views
14
Cite
Reviewer Report 06 Mar 2015
Alexander Latanov, Faculty of Biology, Moscow State University, Moscow, Russian Federation 
Approved
VIEWS 14
The manuscript represents very interesting study on second-order visual filters that determine perception of face feature. These filters are associated with second-order neuronal populations that combine the outputs of striate neurons coding the primary visual features. The authors assume that ... Continue reading
CITE
CITE
HOW TO CITE THIS REPORT
Latanov A. Reviewer Report For: Possible significance of spatial heterogeneities of local visual features for face perception [version 1; peer review: 1 approved, 1 approved with reservations]. F1000Research 2015, 4:10 (https://doi.org/10.5256/f1000research.6392.r7766)
NOTE: it is important to ensure the information in square brackets after the title is included in all citations of this article.

Comments on this article Comments (0)

Version 1
VERSION 1 PUBLISHED 12 Jan 2015
Comment
Alongside their report, reviewers assign a status to the article:
Approved - the paper is scientifically sound in its current form and only minor, if any, improvements are suggested
Approved with reservations - A number of small changes, sometimes more significant revisions are required to address specific details and improve the papers academic merit.
Not approved - fundamental flaws in the paper seriously undermine the findings and conclusions
Sign In
If you've forgotten your password, please enter your email address below and we'll send you instructions on how to reset your password.

The email address should be the one you originally registered with F1000.

Email address not valid, please try again

You registered with F1000 via Google, so we cannot reset your password.

To sign in, please click here.

If you still need help with your Google account password, please click here.

You registered with F1000 via Facebook, so we cannot reset your password.

To sign in, please click here.

If you still need help with your Facebook account password, please click here.

Code not correct, please try again
Email us for further assistance.
Server error, please try again.