Cortical long-range interactions embed statistical knowledge of natural sensory input: a voltage-sensitive dye imaging study

How is contextual processing as demonstrated with simplified stimuli, cortically enacted in response to ecologically relevant complex and dynamic stimuli? Using voltage-sensitive dye imaging, we captured mesoscopic population dynamics across several square millimeters of cat primary visual cortex. By presenting natural movies locally through either one or two adjacent apertures, we show that simultaneous presentation leads to mutual facilitation of activity. These synergistic effects were most effective when both movie patches originated from the same natural movie, thus forming a coherent stimulus in which the inherent spatio-temporal structure of natural movies were preserved in accord with Gestalt principles of perceptual organization. These results suggest that natural sensory input triggers cooperative mechanisms that are imprinted into the cortical functional architecture as early as in primary visual cortex.

The early visual cortex comprises an extended and densely interwoven network, acting on millisecond time scales 1 . Radially, activity is rapidly distributed by local feedback loops 2,3 . Tangentially, long, horizontal fibers enable neurons to sense regions beyond their receptive field borders 4-7 . Investigating the dynamics of this circuitry with simple parametric stimuli reveals well-defined selective response properties. In carnivores and primates, where these neurons are further organized in overlaid maps [8][9][10][11] , it is possible to satisfactorily predict changes in the layout of these maps in accord with the distribution of the stimulus energy across different visual features 12 , but see 13 .
However, these stimulus-response relationships can flexibly be modified by the specific connectivity patterns between distant neurons 4,14,15 . For example, a surrounding stimulus which is in itself not sufficient to drive cortical neurons above firing thresholds may exert strong contextual influences on local processing [16][17][18][19] . These integrative phenomena are conceived as the functional backbone of various Gestalt criteria of perceptual organization 20 and naturally occurring visual tasks such as contour forming, figure-ground separation, object segmentation, or perceptual completion. Hence, local-to-local interactions are intrinsically tied to the integrative functionality of cortical operation. They can be conceptualized as biases originating from the cortical architecture that foster optimal coordination of large numbers of neurons in accord with the statistics of incoming signals 21-23 .
Today, a large body of evidence indicates that the functional properties of neurons are specifically adapted to process signals that are of ecological relevance [24][25][26] . Neuronal stimulus-response properties exhibit higher sensitivity 27 , selectivity 28 and reliability 29,30 in response to visual features when these are presented within their natural sensory context. However, direct functional evidence showing that cortical connectivity mediating local-to-local coupling embeds empirical statistical knowledge of natural inputs is at best scarce and it is not clear whether these interactions studied with simple stimulus configurations extrapolate to complex dynamic conditions that mimic natural input.
We recorded cortical activity using voltage-sensitive dye imaging 31 in response to locally presented natural movies recorded by cats in a natural habitat 26 . We characterized contextual effects by manipulating the spatiotemporal statistical regularities between two movie-patches; we tested the hypothesis that cortical circuits responsible of contextual effects are functionally adapted to natural input statistics. Our results show that, under dynamic natural stimulation conditions, facilitatory interactions across distances beyond the classical receptive field characterize contextual effects, demonstrating that cortical circuits embed functional knowledge about the spatiotemporal relationships inherent in natural scenes.

Stimulus conditions
Stimulus acquisition and presentation hardware was the same as in a previous study 26 . Natural movies (see Figure 1A, for two example frames) were recorded at a sampling rate of 25 Hz by freely moving cats exploring a natural habitat. The recorded natural movies were presented locally, through a single or a pair of Gaussian apertures (referred to as patches in the text) for a duration of 2 s including the 200 ms prestimulus interval. We included in the analysis presented here only the initial 750 ms. Local patches were created by modulating the contrast of the movies as a function of space according to a two dimensional Gaussian function with full width at half maximum (FWHM) (~3-4˚). The FWHM depended on the distance of the center point to the area centralis in individual experiments (~3-6˚). The average luminance value of pixels overlapping with the two dimensional Gaussian mask were first subtracted and then multiplied pixel-wise with the Gaussian mask. This effectively reduced the luminance contrast from center to periphery. Following this step the average values were set back to the background luminance level, which was kept constant across the whole experiment.
Local conditions were indexed by two parameters: position on the screen (A or B) and movie index (1 or 2) specifying which full-field movies were to be masked. We used different movies in different experimental sessions to increase the generalizability of our (A) Movie frames extracted from two different movies (marked orange/green) are shown. Prior to optical recordings retinotopy across the imaged cortical region was evaluated by hand mapping of receptive field locations (colored rectangles) at various penetration sites. This ensured that the distance between the midpoint of apertures was bigger than the range of classical receptive fields. (B) For each experiment, two base natural movies (marked orange/green) were used to create different stimulation conditions. Movies were presented through either one or two Gaussian masks (3˚ or 4˚ FWHM, see Materials and Methods) centered at positions A (top position) or B (bottom position), illustrated by white circles. According to the position label (A or B) and the index of the natural movies Movie 1 or Movie 2, the following conditions were displayed: single (A1, B1, A2, B2), in which only one local patch was shown; coherent (A1B1, A2B2), in which two local patches belonged to the same full-field movie; and incoherent (A1B2 and A2B1), with local patches derived from different movies (see rightmost column).

Changes from Version 1
This article has been updated to include a new paragraph in the discussion about the potential benefits of using natural stimuli to investigate contextual effects. Additionally, we added a paragraph highlighting the relationship between VSD imaging signals and spiking activity.

A B
device was used that allowed targeted penetrations at different locations without opening the sealed recording chamber.
The raw data was processed in two steps 26 . First, in order to remove differences in illumination across different pixels, divisive normalization was performed on all the recorded raw samples of a given pixel by its mean during prestimulus period. Second, heartbeat and respiration-related artifacts were removed by subtracting the average blank signal recorded in the absence of stimulation. These differences were later normalized by the blank signal in order to gain independence from the global activity level fluctuations occurring during the course of an experiment. As our recordings were synchronized with the heart-beat cycle of the animal, this blank subtraction step effectively removes these artifacts. Moreover, this method is preferred over the cocktail blank correction because our conditions were not composed of orthogonal stimuli. These steps were applied for each trial separately and the outcome was averaged across trials. The number of trials ranged from 25 to 35 for different experiments.

Model fitting and statistical evaluation
We computed spatial profiles by averaging the evoked data across the temporal dimension. These were fitted with a two dimensional Gaussian function of the form: Where A, σ x , σ y , μ x , μ y, and r represent the peak value, the horizontal (medio-lateral) and vertical (antero-posterior) spreads, the position of the Gaussian function (medio-lateral or antero-posterior), and the rotational parameter, respectively. We used lsqcurvefit function provided by Matlab (2007b, The MathWorks, Natick, MA) using a large-scale trust-region-reflective algorithm. Prior to optimization runs, initial parameters of the Gaussian function were roughly estimated using different heuristics for each experiment and condition separately. As the fitting function we used superposition of two Gaussian functions G 1 (x,y) + G 2 (x,y) centered roughly on separate activation spots. Almost always, the activity spots were clearly distant and well-isolated from each other. All parameters were estimated simultaneously. Using two Gaussians in all stimulation conditions, including single conditions, ensured that the activation at the distant locations did not influence the fit, therefore possible cross-contributions were negligible. Furthermore the center position of each single Gaussian fit was constrained around the location of the peak responses at locations A and B. This helped avoiding complications where Gaussians could overfit noise during single stimulation conditions and this was necessary to estimate the value of the indirect activation. We also constrained the value of different parameters to avoid fitting to irrelevant cortical activity; this was most useful in the single condition case, where there was only one single activity spot.
For the statistical evaluation of confidence intervals, we used the bootstrapping method with 1000 repetitions and an alpha value of 0.05. The confidence intervals are provided within square brackets following average values. We tested the significance of median activities using the Wilcoxon sign rank test; the p-values are provided within brackets.
results. By displaying either one or simultaneously two local movies, the conditions A1, B1, A1B1, and A2, B2, A2B2 were created ( Figure 1B and Supplemental Movie S1). For conditions with a pair of patches, the distance between the centers of the two Gaussian apertures was equal to 2.5 FWHM. Local movies were corrected for mean luminance so that the average of the pixels within the Gaussian apertures was always equal to the brightness of the background in which they were embedded. However, we did not equalize the contrast within each aperture, as it is not possible to do so without introducing strong artifacts, particularly in cases where the local portion of a movie frame contains regions with homogenous brightness values belonging to object surfaces.
Prior to optical recordings, the topographic mapping between the cortical surface and the visual field were scrutinized by means of several electrode penetrations, and local stimuli were positioned so that the upper movie-patch matched the receptive field position of the simultaneously recorded multiunit activity. This ensured that the distance between the centers of the Gaussian masks extended beyond the borders of classical receptive fields.

Experimental setup
Animals were initially anesthetized with ketamine (15 mg kg -1 intramuscularly (i.m.)) and xylazine (1 mg kg -1 i.m.), supplemented with atropine (0.05 mg kg -1 i.m.). After tracheotomy, animals were artificially respirated, continuously anaesthetized with 0.8-1.5% isoflurane in a 1:1 mixture of O 2 /N 2 O, and fed intravenously. Heart rate, intratracheal pressure, expired CO 2 , body temperature, and electroencephalograms (EEG) were monitored during the entire experiment. The skull was opened above the primary visual cortex and the dura was resected. Paralysis was induced and maintained by alcuronium dichloride (Alloferin ® ). Eyes were covered with zero-power contact lenses for protection. External lenses were used to focus the eyes on the screen. To control for eye drift, the position of the area centralis and receptive field positions were repeatedly measured. A stainless steel chamber was mounted and the cortex was stained for 2-3 hours with voltage-sensitive dye (RH-1691), and unbound dye was subsequently washed out with artificial cerebrospinal fluid. All surgical and experimental procedures were approved by the German Animal Care and Use Committee (AZ 9.93.2.10.32.07.032) in accordance with the Deutsche Tierschutzgesetz and NIH guidelines.

Data acquisition and preprocessing
Optical imaging was accomplished using an Imager 3001 (Optical Imaging Inc, Mountainside, NY) and a tandem lens macroscope 32 , 85 mm/1.2 toward camera and 50 mm/1.2 toward subject, attached to a CCD camera (DalStar, Dalsa, Colorado Springs). The camera was focused ~400 μm below the cortical surface. For detection of changes in fluorescence, the cortex was illuminated with light of 630 ± 10 nm wavelength and emitted light was high-pass filtered with a cutoff of 665 nm using a dichroic filter system. Cortical images were acquired at a frame rate of 220 Hz and covered regions of approximately 10 × 5 mm 2 of primary visual cortex. The relevant retinotopic region of area 18 was captured (lower contralateral quadrant of the visual field), and parts of area 17 were also occasionally captured. For electrophysiological recordings, a custom-built

Effect of contextual stimulation on activity amplitudes
Cortical responses to two different movies presented in single and coherent conditions for a duration of 750 ms are shown as space-time plots (Figure 2, top and bottom rows for Movie 1 and Movie 2, see also Video S2). Conditions are indicated on the left-most column. Upon localized stimulation by natural movie-patches, activity emerged from baseline level with variable delays among conditions. The cortical dynamics induced by the individual stimuli show different temporal profiles and suggest that instantaneous activity levels were determined by the specific properties of each natural movie. Each single movie evoked well-separated spots of activity on the cortical surface indicating that the thalamic input was spatially well resolved at this stimulus configuration. Furthermore, we observed large differences in the activity levels between single and coherent conditions (note the difference in color scale). In this experiment, while the peak activity (maximum activity level across all pixels) during single conditions (bottom two rows in each panel) was 6.67×10 -4 ΔF/F, coherent conditions (top row in each panel) led to a value of 7.86×10 -4 ΔF/F, corresponding to an increase of 18%. As the direct input to the recorded cortical region was identical during single and coherent conditions, we attribute these differences in activity to the impact of long-range interactions on cortical dynamics under natural stimulation conditions.
To quantify long-range interactions we computed spatial profiles of activity levels under stimulation conditions with or without

Results
We performed voltage-sensitive dye imaging (VSDI) in the primary visual cortex of anesthetized cats (n = 4). Multi-unit recordings complemented these measurements and provided information on receptive field properties and localization and spatial extent (see Materials and methods). In order to investigate long-range cortical interactions under ecologically relevant dynamic stimulation conditions, natural movies were presented locally by applying either one or two Gaussian masks (3-4° FWHM) to the original full-field movies ( Figure 1A). Presentation of two movies (Movie 1 and Movie 2) viewed through apertures at two different positions (position A and B) creates a total of 8 different local stimulation conditions ( Figure 1B, please see also Video S1): Single conditions (A1, B1, A2 and B2) consisted of isolated local movie-patches that provided no contextual information ( Figure 1B, first and second column). In contrast, coherent (A1B1 and A2B2) and incoherent (A1B2 and A2B1) conditions provided contextual information in the form of another distant movie-patch. Here, two movie-patches stemming from either the same (coherent) or different (incoherent) original natural movies were presented simultaneously at two locations that were larger than the typical classical receptive field sizes ( Figure 1A, colored boxes representing receptive fields). Whereas coherent conditions leave the spatiotemporal characteristics of natural movies intact, incoherent stimulation eliminates naturally occurring correlations between apertures and induces an evident dissonance (please see Video S1). movie that drives the cortex more strongly, can appear more highly activated than in coherent conditions. Therefore, cross-wise comparisons (as in Figure 5B) between both incoherent conditions are needed to calculate the net interaction effects between coherent and incoherent movies.
The precise shape of spatial activity profiles shown in Figure 3 varied considerably across different experiments. This is to be expected, as the location and extent of activity spots depend strongly on the recording conditions specific for each experimental session. It was therefore not straightforward to compare these spatial activity profiles across different experiments. We used a parametric approach in order to circumvent this problem and modeled spatial activity profiles recorded during different experiments using two-dimensional Gaussian functions with 6 free parameters. These parameters consisted of peak value (A), its horizontal and vertical position (μ x and μ y ), horizontal and vertical spread (σ x and σ y ) as well as a rotational parameter (r) (see Materials and methods). The modeled spatial activity profiles are presented together with the empirical data in Figure 3 (second and fourth rows, same colorbar). The correlation coefficient between these fits and the empirical data was on average 0.88 [0.82, 0.91] (average, [95% bootstrap confidence intervals], same convention in the following). For the whole data set, the distribution of correlation coefficients ( Figure 4D) was negatively skewed and equal to 0.82 [0.78, 0.86] on average. Hence, compared to many thousands of pixels typically recorded in optical imaging, our parametric approach provided a major reduction in dimensionality without compromising the precise characterization of response patterns.
We computed 4 types of characteristic spatial activity profiles from 3 different stimulation conditions ( Figure 4A, first row, threedimensional depiction; second row, top view representation). From activity during single conditions we derived the characteristic activity profiles for direct (cyan border) and indirect (yellow border) stimulation types. Whereas the direct activity represents the baseline responses to a single movie-patch in the absence of any contextual stimuli, the indirect activity captures the influence of an isolated distant movie-patch. Similarly we computed the characteristic activity patterns in response to movie-patches presented either in coherent (dark gray border) or incoherent conditions (magenta border). In Figure 4A, we visualize these four characteristic activity profiles after normalizing separately peak and spread parameters by their corresponding values obtained during direct stimulation. This was done for each experiment separately and the median fitted values were computed subsequently (this was necessary in order to eliminate outliers that originated from the normalization procedure).
We observed major changes in the characteristic activity profiles that were reflected in peak and spread parameters (compare different columns in Figure 4A). Concerning the peak activity, the indirect effect ( Figure 4A, first column, yellow borders) of a single movie-patch presented at a distant location was on average slightly excitatory; however, this was statistically not significant (sign-test = 0.8).
However, it should be noted that we observed net-excitatory effects as frequently as suppressive effects with similar amplitudes at the distant non-stimulated locations. The occasional occurrence of context ( Figure 3, first and third rows). These maps represent the average activity of each single pixel across stimulation duration and demonstrate clearly the spatially restricted non-overlapping foci of activity. Note that different movies produced different amplitudes of activity. Thus, occasionally, incoherent conditions, incorporating a Figure 3. Observed and modeled spatial profiles of activity patterns. Spatial activity profiles (first and third rows) in response to two different natural movies (upper and lower panels) presented at various stimulation conditions with (third and fourth columns) or without (first and second columns) contextual information. Icons schematically represent the stimulation conditions. These are computed by time-averaging the data presented in Figure 2. Activity during nocontext conditions was used to define a pair of interest regions per movie. This was done by selecting the pixels with highest activity located within the top 5 th percentile (see contour lines). Within these regions of interest we derived direct (cyan), indirect (yellow) and coherent (black) and incoherent (magenta) activity levels. The observed activity profiles were parametrically fitted using a composite function involving two 2-D Gaussians (second and fourth rows). The goodness of fit values characterized by the correlation coefficient is shown for the whole data set in Figure 4D. Color scale represents ΔF/F (x10 -4 ). This is depicted in (Figure 4B, blue dots) for each individual comparison (4 dots per experiments). In nearly all cases, the peak activity during coherent stimulation was higher than direct activity measured during single conditions. In 2 cases, a positive activation was observed only during coherent conditions. While direct drive evoked an average peak activity of 0. suppression of net activity below baseline levels in the far periphery have been shown with VSDI when presenting local stimuli without contextual surround (see Figure 1 in 33 ). During the two conditions where context was present ( Figure 4A, third and fourth columns, magenta and dark gray borders), the peaks were higher than during stimulation without context (compare to second column, cyan border). Importantly, among those conditions where context was present, coherent context resulted in higher peak activity values (compare third and fourth columns).
We quantified the total facilitation effect by comparing the activity induced by direct and coherent stimulation ( Figure 4A, see blue lines).   Figure 4C, right panel), we found a similar result. Here again only the σ y parameter was significantly different and an increase of 8.8% [0.5, 19.9%] was observed. Therefore, as in the case of peak activity modulations, coherent context had a stronger impact on the spread parameter compared to the case where the contextual information was absent or incoherent. We conclude that rather than a sharpening of the spatial profile, more cortical space is allocated when contextual information is present. Furthermore, the direction of this increase is biased towards the location of the contextual information.
Effect of contextual stimulation on time-course of activity levels In order to have a better grasp on the temporal unfolding of long-range interactions, we next characterized the time-course of the facilitatory effects. To this end we used the evoked activity values and limited the analysis to pixels that were most strongly driven by the movie-patches. Based on the activity profiles during two single conditions (Figure 3, first and second rows), we defined two non-overlapping regions of interest for each movie condition by choosing those pixels that lay within the highest 5 th percentile of activity (see contour lines). These most responsive pixels were typically located centrally with respect to the activity spot. For each of the afore-mentioned activation types (indirect, direct, incoherent and coherent) we computed the mean time-course of activity across all movies and experiments within these most strongly driven pixels ( Figure 5A, same colors as in Figure 4A and Figure 3). Please note that here experiments were conducted using different natural movies leading to the loss of a specific temporal profile. Samples of the time course of the facilitatory effect with mean activity significantly different than zero are depicted with filled circles (t-test).
As noted before, the indirect influence of the distant single movie-patch was slightly excitatory ( Figure 5A, yellow line). However, contrary to the previous parametric analysis, which was not temporally resolved, we detected here a significant effect of the indirect input at ~100 ms (p = 0.04, see filled circle). This confirms that the indirect influence of a movie-patch presented in isolation to its neighboring regions is of excitatory nature and occurs quickly.
During direct stimulation in the absence of context ( Figure 5A, cyan line), activity increased with stimulus onset and quickly reached a plateau at 100 ms, exactly where the indirect drive reached the significance level. At this point, the activity was 3.7-times stronger than the indirect drive. All samples following stimulus onset were statistically different from zero (p < 0.002). As expected, with the presence of a coherent context, the facilitatory interactions caused stronger activity levels throughout stimulus presentation. These were effective as early as 100 ms following stimulus onset ( Figure 5A,  left panel, cyan). We computed the pair-wise differences between coherent and single conditions and evaluated whether these deviated significantly from zero level ( Figure 5A, right panel, black line). These facilitatory effects quickly followed after stimulus onset and reached significance around 300 ms (p < 0.049). The presence of an was significantly different than zero (p = 0.0011, pairwise t-test). We therefore conclude that contextual stimuli presented at distant locations have a substantial modulatory effect on local activity. We further compared the peak values between direct and incoherent stimulation conditions (not shown as a scatter plot). The presence of an incoherent context resulted in an increase of 24.5%; this increase was, however, not significantly different from zero (sign-test, p = 0.8; t-test, p = 0.08).
To what extent can the total facilitatory effect be accounted for by the indirect additive effect of the distant movie-patch? We compared the activity during coherent stimulation conditions to the predicted activity by the sum of direct and indirect responses ( Figure 4A, green lines). We found a superadditive effect of contextual stimulation in nearly all comparisons ( Figure 4B, green dots). The superadditive facilitatory effect quantified as the difference between coherent conditions and the sum of single and indirect activations corresponded to 41.9% [20.9%, 76.2%] (signtest, p = 0.004; t-test, p = 0.009), hence only about 3.3% of the contextual effect was accounted for by linear interactions. This shows that long-range interactions result to a large extent from non-linear interactions between cortical sites.
To what extent are the non-linear contextual influences adapted to the statistical regularities of natural movies? The total facilitatory effect quantified above incorporates both the specific and unspecific influences originating from contextual stimulation. While the modulation of peak activity by an incoherent context can be attributed to the unspecific effect of the distant stimuli, any incremental effect of a coherent context can be attributed to the specific adaptation of these interactions to the statistics of natural movies. To evaluate the specificity of these interactions we compared the peak values between coherent and incoherent conditions ( Figure 4A, red line; Figure 4B, red dots). We observed an increase of 20.7% [4.18%, 38.12%] in the peak activity level, and this was found to be marginally significant (sign-test, p = 0.21; t-test, p = 0.053). However, compared to the 45.2% observed for total facilitation, this analysis shows that about 54.2% of the facilitation results from specific interactions. Therefore, the non-linear facilitatory effects were fully effective only when the two movie-patches complied with the statistical regularities specific to natural movies.

Effect of contextual stimulation on spatial extent of activity
It is possible that long-range facilitation by the contextual sources of information are accompanied by a modification of the total spatial extent of cortical activity. For example, the presence of contextual information could result in a more tuned spatial activity profile leading to a decrease in spread parameters with the presence of context. Alternatively, contextual information could potentially cause a larger number of cortical neurons to be allocated. In order to test these different hypotheses, we evaluated the influence of context on the spatial extent of activated cortical space and compared the average spread parameters (σ x and σ y ) between single and coherent conditions. The cortical activation extended larger surfaces during conditions of stimulation where context was present ( Figure 4C). We found that the presence of a coherent context increased 10.9% [1.4, 29.2%] the joint spread parameter ( Figure 4C, left panel). Considering each dimension separately we found that contextual information increased the spread parameter only along the direction incoherent context had a smaller impact on activity levels (left panel, magenta line) and consequently the time-course of activity was similar to conditions where no context was present. The difference between coherent and incoherent conditions (right panel, gray line) computed over the stimulus presentation was positive throughout the stimulus presentation and reached the significance levels at two time frames (p < 0.042, see filled circles). The time-resolved analysis presented here complements the parameteric approach. We conclude that the interactions between different cortical locations occur quickly following stimulus onset and they persist across the stimulus presentation.
During single conditions, the activity within the region of interest is mainly determined by direct sub-cortical input and, therefore, the bottom-up characteristics of the input stream are presumably the sole determinants of the precise time-course. Additionally, as natural movies contain non-zero correlations across long distances, it is expected that activity profiles at locations A and B exhibit certain amount of similarity that would lead into correlations in the activity time-courses. We quantified these similarities at locations A and B during two single conditions by measuring the correlation coefficient ( Figure 5B, schematic representation). The correlations, r single , between the time-courses of activity recorded in both locations were never negative. Temporal resolution of the time-courses in this analysis was 200 Hz in order to capture its detailed structure. We observed an average correlation of r single = 0.57 [0.47, 0.67], suggesting that low-level characteristics of movies were to a large extent common to both locations. How do the lateral interactions, which are effective during simultaneous presentation of two movie-patches, influence the precise time-course of activity? To answer this question, we computed r coherent by quantifying the correlation between activities evoked by two simultaneously presented movie patches ( Figure 5B, see arrows). All correlation values were higher than corresponding r single values ( Figure 5, red dots). r coherent was equal to 0.84 [0.74, 0.89], resulting in an increase of 46.9%. This result suggests that longrange interactions increase the similarity of the activity time-course. In accord with this conclusion, we observed that an incoherent movie-patch presented simultaneously had a detrimental effect on the correlation values. Consequently, r incoherent was 30% smaller than r coherent and equal to 0.58, ([0.38, 0.74], Figure 5B, black dots). This result suggests that long-range interactions, in addition to their facilitatory effects, lead to an increase in the similarity of the time-course.  cortical distances. Whether these super-additive interactions result from disinhibition 49 or from additional excitatory drive through "cross-orientation" mechanisms remains to be explored.
When the same movie was presented through both apertures they were perceptually grouped without effort, and appeared to belong to a single scenery. On the other hand, when two differing movies were used, the content within both apertures appeared to be immediately incompatible (please see Video S1). There are a number of factors that determine coherence between patches taken from the same movie. First, stimulus motion was similar between the two distant apertures. This was due to the body-and head-motions of the recording cat, which induced large and equal motion fields across the visual scene captured by the camera. It has been earlier noted that such temporal phase relationships across distant regions are perceptually salient and enable object segmentation even in the absence of any spatial information 58 . Second, natural images tend to possess large spatial correlations because of the dominance of low spatial frequencies in their spectrum 21 . Moreover autocorrelations of orientations may cover large portions of visual field reaching up to 8 degrees 59 . Therefore, our stimulus paradigm can be conceived as a dynamic illustration of Gestalt criteria of good continuation.
There are different idealized mechanisms, each based on different anatomical substrates, which could mediate the observed facilitatory long-range interactions. Overlapping feed-forward thalamocortical input could be an explanation for increased cortical drive during stimulation with adjacent movie-patches. However, there are a number of counter-arguments against this explanation. First, cortical locations driven most strongly by the individual movies were separated by distances larger than the anatomical spread of direct thalamo-cortical projections 5,15,60,61 . This was in accord with relatively smaller spatial extents of mapped receptive fields. Second, the activity at the distant location during stimulation with one single movie-patch was only minimal and reached significant levels about 50 ms later in comparison to directly stimulated locations. Third, and most decisively, the total drive to the recorded cortical area was constant across the two coherent and incoherent conditions. Only the order of the presentation being different, it is not possible to account for facilitatory interactions in a purely feed-forward scheme.
Rather, the dense network of horizontal connections linking distant neurons across several millimeters is a likely candidate for the observed long-range effects. It has been shown that unmyelinated intralaminar connections contribute to subthreshold responses evoked from distant stimuli placed outside of the classical receptive fields both with intracellular 5 and combined extracellular recordings and VSDI in cat 7 . Furthermore, the selective intracortical connectivity pattern of these tangential connections linking neurons with similar feature selectivity is well-suited to mediate the specific enhancement of activity levels dependent on the stimulus coherence. However, we cannot exclude that feedback signals originating from higher visual areas with larger receptive field sizes than in primary visual cortex could add to these interactions 62,63 . Back-propagating waves of activity have been shown to be initiated in further downstream cortical areas as early as ~100 ms after stimulus onset 64,65 .

Discussion
We used VSDI to investigate long-range cortical interactions during processing of natural images in the primary visual cortex at the mesoscopic population level. By using "keyhole-like" presentations of the original natural movies through either one or two distant Gaussian masks, we quantified the effect of surrounding stimulation on local activity. We provide evidence that contextual integrative mechanisms are indeed operative under natural stimulus conditions. We show that under these conditions the horizontal cortical network 34-37 forms the basis for synergistic interactions across several millimeters of cortex. Contextual stimulation led to a net facilitatory effect compared to the case when the movies were shown in isolation. An important attribute of these interactions was their sensitivity to the intrinsic spatiotemporal regularities of natural movies 38,39 . Moreover these contextual interactions led to an increased similarity of the population dynamics across long-range cortical distances.
Contextual processing has been investigated extensively both experimentally at the single neuronal level 40 and in recent modeling approaches 41 . A large variety of facilitatory and/or inhibitory contextual effects have been observed, however the final outcome crucially depends on the precise configuration of the parametrized stimulus used to stimulate center and surround regions. While the surround effect was found to be mainly inhibitory 35,42,43 and spatially asymmetrically organized 44 , the precise nature of the effect depends on the contrast of contextual stimuli relative to the contrast threshold of the recorded neuron 45-47 .
An important cornerstone of long-range facilitation is its dependence on the precise spatial configuration of the surrounding context 48 . It has been shown that facilitatory effects increase proportionally with the congruency of the contextual stimuli with respect to the center stimulus 18,49 . Using static stimuli, such coherence is generally controlled parametrically by changing the orientation difference between center and surround patches 17,18,50,51 . Since we here used natural movies recorded by cats that freely explored a natural habitat, our stimuli were complex and contained simultaneous multiple features. The head and body movements of the cats added to this complexity as the recorded visual stimuli contained motion cues that were correlated across large visual distances. In order to control the coherency of the stimuli between the apertures, we adopted a non-parametric method by exploiting the unique spatiotemporal characteristics of each original movie.
With surround gratings-dependent on their relative orientation, distance, and contrast-both facilitatory 17,18,42,45,48,49,52,53 and suppressive neuronal effects 17,18,42,45,50,52,54-57 on a center stimulus have been described in the literature. In contrast, using natural stimuli in our study, we exclusively observed facilitation. Interestingly in this regard, when using randomly placed oriented lines as contextual stimuli, Kapadia and colleagues (1995) found that "inhibition could be eliminated by changing the orientation of a few of these elements". This led them to suggest that with the "appropriate configuration of contours surrounding the RF, the cell is lifted from a rather profound level of inhibition, and its excitatory inputs are unmasked". In this sense, the specific regularities in our naturalistic movies could be viewed as "appropriate configuration" and were shown here to trigger facilitatory interactions across long-range Thus, these connections act fast 66,67 and are likely to mediate surround modulations spanning considerable distances in visual space, while lateral intra-laminar connectivity may account for modulations within shorter distances 68 .
The relationship between the VSD imaging signal and spiking activity can indeed be complex. For instance, a close relationship between spike rate and the derivative of the VSD response rather than its magnitude has been proposed 69 . This might especially apply to the rising phase of the membrane potential after stimulus onset 6,70 . Combined VSD and Calcium-sensitive dye imaging suggests that the relationship between spiking activity and the amplitude of the VSD response depends also on stimulus intensity 71 . However, Chen et al. 72 found that these relationships are well-captured by a power function with an exponent of ~4 (similarly as for the relationship between average membrane potential and spike rates in single V1 neurons 73,74 ), indicating that the threshold for observing significant spiking activity is about 30-40% of the maximal VSD response. Amplitudes evoked by our natural movies were on average well-within this range as compared to full-field moving gratings that we used as a standard control for maximal stimulation. Moreover, only pixels fitted at the center of the movie representations with highest amplitude entered our analysis, furthermore ensuring that subthreshold activity was excluded. Thus, we are confident that we describe the effect of long-range connectivity on postsynaptic spiking activity.
We observed that, compared to incoherent stimulation, stimulation with coherent pairs of natural stimulus patches led to stronger facilitatory effects. Since the total input analyzed across coherent and incoherent stimulation was identical across the recorded cortical area, these results cannot be solely explained by the local properties of movie-patches. Rather, this facilitation necessarily reflects the outcome of an integrative phenomenon sensitive to the content of both local movie-patches when presented simultaneously. Therefore, we suggest that the functional architecture of early visual cortical circuits may have empirically internalized the typical contextual relationships 21-23 found in dynamic natural visual scenes.
Author contributions SO, DJ, PK: Conceived and designed the experiments, analyzed the data, wrote the manuscript, contributed materials and analysis tools. SO, PK: Stimulus recording and preparation. DJ: performed the experiments.

Competing interests
No competing interests were disclosed.
The funders had no role in study design, data collection and analysis, decision to publish, or preparation of the manuscript. What is the point of using "ecological" stimuli if the response is then reduced to 6 parameters? How is that different from using drifting gratings? The authors should either provide the comparison with simpler stimuli or better justify their choices. The main point here is that the larger summation of two stimuli from the same movie versus two stimuli from different movies may be simply due to objects that span the two masks generating a similarity in the orientation and spatial frequency in 1.

Open Peer Review
two stimuli from the same movie versus two stimuli from different movies may be simply due to objects that span the two masks generating a similarity in the orientation and spatial frequency in the two stimuli, which, in turn, would lead to very well-known orientation-specific effects of contextual stimuli.
How is presenting another stimulus "contextual"? the stimuli where either one or two stimuli of identical size. It would be easier to read if the authors used only one terminology, such as single, coherent and incoherent.
How was a 2D Gaussian fit to the response of one of the stimulus while presenting both? The response is merged, doesn't this affect the fitting procedure?
What was learned from the multi-unit recordings? Figure 2: It is pretty but mostly useless without quantification, the authors should show values of dF/F or contour plots of the significant response (3 or 5 SDs above baseline noise) so that the movies can be properly compared.
It is remarkable that the responses to the same mask in movie 1 and 2 are so different, are they really statistically different? Is the VSD signal capable of discriminating the differences between the two movies? Looking at the movies it seems that they have similar spectrotemporal composition.
Where were the values presented on top of page 5 taken from? What area? What time? What is "superadditive"? Does it mean supralinear? Very difficult to understand, mainly because the term is used in combinations such as "superadditive facilitation" I have read this submission. I believe that I have an appropriate level of expertise to confirm that it is of an acceptable scientific standard.
No competing interests were disclosed. These are important questions (please also see our response to question 3 from Dr Rosa below). Clearly, there exist a remarkable number of studies (references in main text) about contextual effects using local grating patches of certain orientations and spatial frequencies.
In our work we follow instead the hypothesis that the low-level sensory regions (if not the whole nervous system) are wired to optimally process inputs that are most likely to be received from the sensory organs. Our recent report suggests that the cortical state (i.e. both the dynamic range and mean activity) is qualitatively different when stimulated with gratings as compared to ecological stimuli. Thus, response behavior to natural input can 26 2. 3.

4.
gratings as compared to ecological stimuli. Thus, response behavior to natural input can deviate significantly from predictions based on simple parameterized stimuli, probably due to the extensive spatial information in natural images These observations may place the burden of justification on the use of simple stimuli, and not of natural stimuli as used in the present report. Many intra-areal long-range facilitation phenomena in the early visual system have been shown to occur in response to local presentations of simple stimuli. The evidence showing that these are also functional under more natural stimulation conditions that are more complex and also more ecological is not clear. For instance, both the extent and the impact of long-range connections may critically depend on the correlation structure across neuronal populations ( ), which Lindén ., 2011 et al in turn emerges from the specific spatio-temporal correlations of visual input. Therefore, in this manuscript, we aimed to demonstrate that long-range interactions are also functional and effective under more naturalistic stimulation conditions, which were so far not investigated. We provide this evidence in this manuscript and we have added a paragraph to the discussion section (paragraph 4) of our manuscript.
Finally, the choice of a low number of parameters to characterize the response is data driven. We demonstrate that our analysis captures the larger part of variance of the response as obtained by voltage sensitive dye imaging (VSDI) and is therefore a valid procedure. A comparison of the dimensionality of stimulus and response space has to take into account the non-linearity of processing in the visual system. Our use of "contextual" is inspired by usage of the terms classical receptive field and non-classical receptive field/context in the visual sciences: Here one visual stimulus elicits a reliable response at a certain location. The other stimulus does not elicit detectable activity at that location, but as our data demonstrate modulates the response to the former stimulus. The term "context" is intended to denote that dependence. That is, we consider this neighboring information as a contextual input as it doesn't directly influence the processing at the distant center.
In all cases, including single stimulation conditions, there were two 2D Gaussians that were combined to fit the observed responses. The parameters were estimated simultaneously. Using two Gaussians in all stimulation conditions ensured that the activation at the distant locations did not influence the fit. Most of the time, the activity spots were clearly distant and well-isolated from each other, therefore possible cross-contributions were negligible. Furthermore the center position of each single Gaussian fit was constrained around the location of the peak responses at locations A and B. This helped avoiding complications where Gaussians could overfit noise. We now included this information in the manuscript (Methods, page 8, last paragraph).
Electrical recordings were performed to allow for rapid retinotopic hand mapping of the imaged area before VSDI. Electrical recordings parallel to VSDI were done to verify occurrence of evoked spiking activity at those cortical locations where high levels of the dye signal was observed, i.e. around the center representation of one of the local movie patches. Note that a single recording site was used. Thus, analysis of reciprocal interactions at two locations simultaneously, as resolved by VSDI, was not feasible. Analysis of the electrical recordings related to particular stimulus features will be subject of a following report. The colorbar depicts x10 dF/F values. This is now added to the legends of Figures 2 to 4. Pixels with an activity smaller than 1.5x10 dF/F are not shown. This corresponds to the 75 percentile of the activity distribution, as noted on the figure legend. Responses of each single movie are depicted with their own colormap so that the precise detailed structure of the evoked responses can be clearly seen for each single condition. Using a common color scale would basically eliminate the possibility to clearly see responses to movies that evoked weaker responses. Furthermore, the colorbar helps comparison of amplitudes across different movies and positions.
VSDI can clearly distinguish these differences. In Onat , we have shown that VSDI et al. can distinguish diverse stimulus related components in response to drifting gratings including actual position, i.e. the retinotopic trajectory, of the grating stripes. In the current manuscript, in Figure 2, it is also possible to see that the same movie during double or single conditions leads to similar temporal activity profiles (as analyzed in Figure 5).
These are peak values computed across the entire stimulation duration and all recorded pixels. Figure 4A (top row) presents the spatial activity profiles observed on the cortical surface that are reconstructed using the Gaussian parameters (mu, sigma, amp) averaged across ROIs and cats. Please notice that averaging the raw recorded data across cats and ROIs (A and B) is not possible. This is due to the large differences in the shape of the activity profiles across different ROIs and cats, therefore we reconstructed these activity profiles using the fitted parameters. Each individual data point that contributes to these spatial profiles are depicted in Figure 4B. We replaced the word "modelized" with "modeled". You might view this analysis technique as a low dimensional parametric description of the experimental data.

Minor Points
We used the term dissonance to describe the perceptual quality of movie patches originating from different movies when shown simultaneously.
We used the term super-additive, to indicate a response that is more than the sum of the two individual components when these individual components are presented simultaneously. The meaning is essentially the same as supralinear, although in the literature super-additive is used more often.

Reference numbers refer to the reference list in the manuscript
No competing interests were disclosed. Competing Interests: 19  3. Interestingly, using LFP as a population measure of postsynaptic activity, a recent study showed that superimposed gratings (i.e. plaids) lead to responses that were not predicted by activity in response to their single components ( ). Thus, a simple Bartolo ., 2011 et al parametric manipulation (addition) of most simple stimuli like gratings can lead to drastic changes in neuronal population response behavior limiting explanatory power. If this is the case, the use of gratings may be a weak predictor with respect to complex stimulation, in particular with respect to natural scenes. Please also see above for our reply to Dr Contreras who had a similar concern. Reference numbers refer to the reference list in the manuscript No competing interests were disclosed. Competing Interests: