Individual typological differences in a neurally distributed semantic processing system: Revisiting the Science article by Mitchell et al . on computational neurolinguistics [version 1; peer review: 1 approved with reservations]

Background: Revisiting the 2008 Science article by Mitchell et al . on computational neurolinguistics, individual typological differences were found as striking characteristics in the patterns of informative voxels crucial for the distributed semantic processing system. Methods: The results of different feature selection methods (ANOVA and Stability) were compared based on the open datasets of each subject for evaluating how these features were decisive in predicting human brain activity associated with language meaning. Results: In general, the two selection results were similar and the voxel-wise ranks were correlated but they became extremely dispersive for a subgroup of subjects exhibiting mediocre precision when examined without regularization. Quite interestingly, looking at the anatomical location of these voxels, it appears that the modality-specific areas were likely to be monitored by the Stability score (indexing “identity”), and that the ANOVA (emphasizing “difference”) tended to detect supramodal semantic areas. Conclusions: This minor finding indicates that in some cases, seemingly poor data may deeply and systematically conceal information that is significant and worthwhile. It may have potential for shedding new light on in the controversy pertaining to cognitive semantics, which is divided into modality-biased (embodied) and amodal symbol theories. The manuscript is a reanalysis of the study of Mitchell et al. (2008). The author claims that using different feature selection methods can help to identify areas that are functional distinctive in individual level. The finding of the presented study provides substantial implication to studies that focus on how neural activation encoding the semantic content. I only have few minor comments.


Approval Status
Any reports and responses or comments on the article can be found at the end of the article.

Introduction
It is widely acknowledged that despite some challenges in multivoxel pattern analysis (MVPA), the issue of individual variability, raised as a penalty to classification accuracy in cross-subject modelling, is challenging to overcome, particularly when targeting concepts and meanings conveyed by language 1 . Admittedly, the precision rates in MVPA could be mostly uniform for experiments successfully performed at the individual first level, as in the case of the classically recognized Science article authored by T. Mitchell and his group (Predicting Human Brain Activity Associated with the Meanings of Nouns) 2 . In their study, distributed brain activation patterns were observed in nine subjects conceptualizing 60 concrete nouns in an fMRI scanner, and these neural patterns were regressed on a cooccurrence probability in a text corpus between each of these fMRI nouns and semantic features (25 basic verbs). The highly significant prediction accuracy that they obtained at the subject level was associated with a pattern separability built on a set of informative voxels as features, which are distributed across numerous brain areas and differ across subjects. The Science study and other computational neurolinguistic reports 3-7 could generate classifiers with high precision (owing to the L2-regularization technique that obliterates individual variability) and draw plausible semantic maps in the individual brain.
However, functionality at work in the individual brain remains to be specified further beyond unanimously acceptable modelling results. In this study, an alternative view is put forward to uncover systematicity and provide typology in individual variability through a reanalysis of the open data used in the Mitchell et al.'s Science study. Several previous studies have reported that despite the almost invariably accurate performance of the subjects in conceptualization tasks, a fundamental difference is seen in the magnitude of how the information on the feature voxels is dispersive based on their anatomical locations 1,5 . Unquestionably, the experimental paradigm developed by Mitchell et al. hinged on modality specific factors, which promoted distributiveness of semantic processing systems. Their stimulus set, which used captioned drawings with considerable visual effects and concrete nouns with implications of some motion/perception, rendered the experiment sensitive to embodied cognition 8 . The attribute generation task, which consisted of thinking about the properties of the object, could allow free association and perceptual simulation 9 . Yet, even without accounting for the modality specific factors, there remains some debate pertaining to the supramodal semantic centers in general as to whether the left temporal pole and anterior temporal gyrus (hub and spoke model) 10-11 or the left middle temporal gyrus and the left angular gyrus (high level convergence model) 12-13 are nodal for the loci of a genuinely semantic process. The primary goal of the present study was to determine whether such topology of semantic processing can be elucidated through a typology of fMRI subjects and to examine how that typology is determined by the subjects' hidden neural responses characteristic to the selection of the most informative voxels.

Methods
The datasets were nine .mat files corresponding to the nine subjects (P1-P9), downloaded from the website of Carnegie Mellon University. A '.mat' file was created (by using MATLAB R2015a) from each subject's data, called 'runByVoxByNouns-P<number>.mat,' as a three-dimensional array corresponding to x: runs (six repeated presentations) by y: voxels by z: words (60 fMRI nouns). This procedure facilitated the computation of informative voxel locations identified by different featureselection methods, which were the F values of ANOVA to measure and evaluate the mean value across the items repeatedly presented in a learning set and the Stability scores to distinguish voxels that exhibited consistently similar activation patterns to the items for machine learning 14 . Mitchell et al. adopted the latter method to select the top 500 voxels, which involves computing the Pearson's correlation coefficient of the activation vectors for the stimulus nouns over 15 (= 6 P 2 ) pairs of fMRI presentation runs repeated 6 times. They computed the average pairwise correlation for each voxel over all pairs of rows in the matrix composed of six presentation runs by 58 nouns (reserving two nouns for testing). In this study, cross-validation was not performed except for computing the modeling accuracy for each subject using the ordinary least square method (OLS) without L2-regularization. The top 500 voxels were selected from the overall runs by the ANOVA and the Stability scores, and the number of the voxels not shared by the two selection results (subtraction operation for the two sets, i.e., type 1: ANOVA set -Stability set and type 2: Stability set -ANOVA set) was counted as "divergence" for each subject, as shown in Table 1. The voxel-wise ranks in the two selection results were compared with each other with Spearman's rank correlation The top 500 voxels were selected by the ANOVA and the Stability and the modelling accuracy based on the ordinary least square method (OLS) without adjustment of L2-regularization. The subject-wise 'rho' and the corresponding p values represent Spearman's rank correlation coefficient between the voxel-wise ranks in the two feature selection results. 'Divergence' implies the number of the selected voxels extracted by the subtraction operation of ANOVA set -Stability set or Stability set -ANOVA set.
coefficient and the subject-wise rho and the corresponding p values were computed. The ANOVA and Stability featureranking data were all mapped to anatomical regions according to the automated anatomical labeling (AAL) atlas 15 .

Results
The raw modelling accuracy, based on the ordinary least square methods (OLS) without L2-regularization, decayed with the magnitude of "divergence" between the ANOVA and the Stability score and with the decrease in rank similarity between the voxels selected by each method (Table 1). Admittedly, good modelling accuracy is associated with high F values and Stability scores for the selected voxels, as is seen in P1 eliciting the best precision by a wide margin and the next best group of P2-P4. However, there was no mean difference in feature scores among the subjects of the middle group, P5, P6, and P7, although P5 and P7 exhibited significant Spearman's rank correlation coefficients, divergence less than 100, and OLS precision higher than 70%, while none of these conditions was true for P6. When visualizing the score distributions of the top 500 voxels with a notched box plot, it is apparent that the poor-performance group (P6, P8, and P9) was characterized by narrowness of the boxes and the low upper whiskers ( Figure 1) and dispersiveness of the voxel-wise ranks in the two selection results (Figure 2).

Discussion
Our results support the notion that particular types of individuals differ markedly in their way of recruiting voxels with respect to different feature selection methods, i.e., Stability scores and F values of ANOVA. The Stability scores examine the extent to which each voxel reacted to the same stimulus across runs in a constant manner; therefore, it is the "identity" of an object that is emphasized by this index as invariable through repetition. Conversely, the F values of ANOVA pertain to the magnitude of between-group variance across the responses to the 60 nouns with respect to the within-group variance across the 6 presentations of each individual noun; therefore, the "difference" is likely to be captured by that index although it should be inextricably linked with the "identity" side. In consequence, regardless of the feature selection method, mostly the same voxels with a similar top 500 ranking order could be selected from the brains of P1-P5 and P7, but the remaining subjects (P6, P8, and P9) showed important divergence from the list of the 500 important voxels selected by the two methods. The mean index values for the top voxels were significantly larger in the former subjects than the latter ones for each method; the difference in raw classification accuracy without the regularization effect was conspicuous between these subject groups. However, an in-depth analysis revealed some questions to be delineated, since P5 and P7 may be treated equally as members of an interesting subgroup in that, despite the highly significant rank correlation for the selected voxels, the mean values were relatively low and not significantly different from those of P6 in the poor-performance group.
When tapping into the anatomical regions from which feature voxels were selected, the most intriguing property was that high precision in modelling was guaranteed rather by extra-linguistic regions. It is noteworthy that in P1 (recording 82% as classification accuracy by the OLS with no regularization), the majority of the top 487 informative voxels shared by the Stability score and the ANOVA were found in the visual areas of the temporal and occipital lobes and several in the frontal and parietal lobes. Differently from all the other subjects, the left inferior frontal gyrus, pars triangularis ("Frontal_Inf_Tri_L"), and the left precuneus ("Precuneus_L"), frequently considered as involved in executive functions of language activity, were not recorded in the overlapping selected voxel areas of P1. When focusing on the poor-performance subject group (P6, P8, and P9), which exhibited a large divergence (larger than 1 standard deviation from the mean) between the voxel selections by the two methods, it appeared that the modality-specific areas were likely to be monitored by the Stability score (indexing "identity"), and that the ANOVA (emphasizing "difference") tended to detect supramodal semantic areas. The voxels type 1 (selected by the ANOVA but not by the Stability score as the top 500) and type 2 (selected by the Stability score but not by the ANOVA under the same criteria) voxels were mostly extracted from different anatomical regions. The frequency distribution tables (Figure 3) represent the number of type 1 and type 2 voxels selected from P6, P8, and P9 and attributed to each anatomical area in the AAL brain atlas. The ANOVA highlighted as locations of type 1 voxels, some areas for amodal or supramodal semantic processing, especially the left middle temporal gyrus ("Temporal_Mid_L") which was the most populous by far in this category. In contrast, the Stability score tended to introduce bias to the vision-related areas, notably the left middle occipital gyrus ("Occipital_Mid_L"), which may reflect stimulus modality or perceptual symbols in embodied cognition.
This divergence allows us to shed a new light on a traditionally controversial subject in neural semantics; where is the border that separates the brain regions selective to purely conceptual functions and sensory-driven, modality-dominant, so extrinsic to meaning processing? The compatibility of the Stability score with the perceptual modalities may suggest with the embodiment view (to which Mitchell et al. also referred for the neural signature of the verb "eat") that the "identification" of a concept is materially founded upon a sensory-perceptual system and real-life experience with its instances of referent (or stimuli) to shape its cognitively grounded symbols. However, the voxel information brought by the ANOVA enables us to propose an alternative view for the discrimination power in language, having an affinity for the amodal (not to say disembodied) symbol theory. Descended from the school of Saussure, this theory (often relying on lexical co-occurrence information from language corpora as in the case of the Science study) postulates that the value of a symbol (or a linguistic sign) is not derived from its intrinsic sense but from language itself as a computability system of "difference." It is quite intriguing that the reanalysis of the Science data assessed, through the variability of subjects performing a language task, a salient discrepancy between the brain regions informative of manifold essence in semantic processing. Indeed, we are not yet in a position to argue these philosophically opposite views only through a succinct review such as this report. However, at least we may conclude here that such a fundamental issue was, quite interestingly, readdressed by reanalyzing the data from a subject group that elicited inconsistency to the feature selection methods Figure 3. Frequency tables of the informative voxels extracted from the poor-performance subjects. The voxels were classified into type 1 (above, ANOVA subtracted from the Stability score) and type 2 (below, Stability score subtracted from the ANOVA) and attributed to each anatomical area in the AAL brain atlas (the voxels labelled 'Not_Found' were removed from the lists). The type 1 voxels tend to belong to the supramodal semantic regions (such as "Temporal_Mid_L"), whereas the type 2 ones are characterized by the dominance of the visual area (such as "Occipital_Mid_L").
and relatively low precision rates to fMRI machine learning classifiers.

Data availability
The dataset of Mitchell

Competing interests
No competing interests were disclosed.

Grant information
The author(s) declared that no grants were involved in supporting this work.

Are the conclusions drawn adequately supported by the results? Yes
Competing Interests: No competing interests were disclosed.
I confirm that I have read this submission and believe that I have an appropriate level of expertise to confirm that it is of an acceptable scientific standard, however I have significant reservations, as outlined above.
The benefits of publishing with F1000Research: Your article is published within days, with no editorial bias • You can publish traditional articles, null/negative results, case reports, data notes and more • The peer review process is transparent and collaborative • Your article is indexed in PubMed after passing peer review • Dedicated customer support at every stage • For pre-submission enquiries, contact research@f1000.com