AR2, a novel automatic muscle artifact reduction software method for ictal EEG interpretation: Validation and comparison of performance with commercially available software

Objective: To develop a novel software method (AR2) for reducing muscle contamination of ictal scalp electroencephalogram (EEG), and validate this method on the basis of its performance in comparison to a commercially available software method (AR1) to accurately depict seizure-onset location. Methods: A blinded investigation used 23 EEG recordings of seizures from 8 patients. Each recording was uninterpretable with digital filtering because of muscle artifact and processed using AR1 and AR2 and reviewed by 26 EEG specialists. EEG readers assessed seizure-onset time, lateralization, and region, and specified confidence for each determination. The two methods were validated on the basis of the number of readers able to render assignments, confidence, the intra-class correlation (ICC), and agreement with other clinical findings. Results: Among the 23 seizures, two-thirds of the readers were able to delineate seizure-onset time in 10 of 23 using AR1, and 15 of 23 using AR2 (p<0.01). Fewer readers could lateralize seizure-onset (p<0.05). The confidence measures of the assignments were low (probable-unlikely), but increased using AR2 (p<0.05). The ICC for identifying the time of seizure-onset was 0.15 (95% confidence interval (CI), 0.11-0.18) using AR1 and 0.26 (95% CI 0.21-0.30) using AR2. The EEG interpretations were often consistent with behavioral, neurophysiological, and neuro-radiological findings, with left sided assignments correct in 95.9% (CI 85.7-98.9%, n=4) of cases using AR2, and 91.9% (77.0-97.5%) (n=4) of cases using AR1. Conclusions: EEG artifact reduction methods for localizing seizure-onset does not result in high rates of interpretability, reader confidence, and inter-reader agreement. However, the assignments by groups of readers are often congruent with other clinical data. Utilization of the AR2 software method may improve the validity of ictal EEG artifact reduction.


Abstract
To develop a novel software method (AR2) for reducing muscle Objective: contamination of ictal scalp electroencephalogram (EEG), and validate this method on the basis of its performance in comparison to a commercially available software method (AR1) to accurately depict seizure-onset location.
A blinded investigation used 23 EEG recordings of seizures from 8 Methods: patients. Each recording was uninterpretable with digital filtering because of muscle artifact and processed using AR1 and AR2 and reviewed by 26 EEG specialists. EEG readers assessed seizure-onset time, lateralization, and region, and specified confidence for each determination. The two methods were validated on the basis of the number of readers able to render assignments, confidence, the intra-class correlation (ICC), and agreement with other clinical findings.
Among the 23 seizures, two-thirds of the readers were able to Results: delineate seizure-onset time in 10 of 23 using AR1, and 15 of 23 using AR2 (p<0.01). Fewer readers could lateralize seizure-onset (p<0.05). The The scalp electroencephalogram (EEG) is a critical diagnostic tool in the evaluation of seizures, but artifact from muscle contraction often limits its use because of the obscuring of the cerebrally generated potentials. This problem is present in 11% of ictal EEGs overall and up to 70% of frontal lobe seizures 1-3 . The inability, or lack of precision, to discern the seizure-onset zone from scalp EEG often necessitates additional testing, including (positron emission tomography) PET, magnetoencephalography, ictal Single-photon emission computed tomography (SPECT), and intracranial EEG 4 . Each of these tests adds undesired time and cost to the evaluation.
Digital filters are the common approach to maximizing the likelihood of identifying a seizure-onset zone from EEG with muscle artifact. This filtering reduces muscle artifact by attenuating all frequencies beyond a selected value 5 , but it may impair the integrity of the EEG recording since brain-generated potentials may be in the same frequency band 6,7 . Recently, new technologies to reduce muscle artifact based on independent component analysis (ICA) 8-10 have become available. ICA derives spatial features that can remove artifacts that have static scalp topographies and time courses of activity that are distinct from that of EEG sources. ICA artifact correction is necessarily imperfect and will remove some neurogenic components of the EEG as well. However, the degree of EEG distortion may be negligible and ICA has proven effective at removing EMG and ocular artifacts from EEG data recorded from normal individuals in laboratory settings [11][12][13][14][15][16][17][18][19][20] . Prior studies have demonstrated that ICA-based methods improve the interpretation of artifactladen ictal EEG recordings; in these studies researchers manually performed the ICA analysis prior to performing the EEG interpretation 15,16 . Automatic artifact reduction using ICA [17][18][19] has become commercially available and is included in the latest versions of popular EEG viewer software 20 . Ictal scalp EEG recordings present extraordinary challenges to ICA artifact reduction algorithms because the number of EMG artifact sources increases.
Despite the utilization of these software products by neurologists around the globe, the clinical benefit has not been established. It is also unknown if the new approaches introduce confounding artifacts that may lead to erroneous interpretations.
The goal of this study was to assess the validity of a commercially available EEG artifact reduction tool (AR1) that uses different montages and within electrode analysis to identify artefactual independent components 20 , and compare its validity to a novel automatic artifact reduction tool (AR2), which was developed at the University of California Los Angeles on the basis of interreader agreement, confidence, and congruence with other clinical findings.

Implementation
The custom software algorithm involved importing EEG scalp recordings as European Data Format (EDF) files in Matlab 8.4 (Mathworks, Natick, MA). Prior to performing ICA to remove muscle artifact, the algorithm first identified epochs of the scalp EEG record contaminated by muscle artifact and determined the electrodes that were suspected of having high recording impedance during that epoch. The purpose of these calculations was to exclude these electrodes from the ICA calculations.
The imported EEG was band pass filtered (16-70 Hz) using a 500th order finite impulse response filter, i.e. FIR1 in referential montage. We then calculated the normalized instantaneous amplitude of the band-pass filtered signal using a Hilbert transform. This signal was smoothed using moving averaging, and the algorithm identified the longest epoch in which the time series remained greater than one standard deviation. We next calculated the normalized mutual information (MI) 21 adjacency matrix across all scalp electrode contacts during the (16-70 Hz) band-pass filtered artifact epoch of greatest duration and assigned each scalp EEG electrode a single MI value derived from the maximum pairwise MI values in the adjacency matrix. We then determined if this maximum mutual information value exceeded a threshold value defined by visual inspection of the scalp EEG used in the experimental dataset, and if that electrode should be included in subsequent artifact reduction processing. If the recording lacked an artifact epoch, or all channels were excluded, artifact reduction was applied to the referential recordings from all recording electrodes.
The high pass filtered (>16 Hz) scalp EEG was then separated into consecutive 120-second trials (24,000 data points) and each trial was processed using CUDAICA 22,23 . A 120 second trial length was chosen to optimize processing time. The purpose of the ICA was to separate the (>16 Hz) seizure activity, from the (>16 Hz) muscle artifact. The 16 Hz cut-off for the filter was chosen to isolate the vast majority of the muscle artifact. Independent components that explained an amount of variance above a particular threshold were excluded from the signal. The threshold was selected on the basis of the values of the raw and normalized mixing matrix (i.e. inverse weight matrix) calculated in each of the ICA iterations. We assumed that the last myogenic component and first neurogenic component can be differentiated on the basis of the inverse weight matrix, which provides the spatial distribution of each component, and identifying the independent component that account for the most variance with a focal spatial topography 17 defined on the basis of exceeding a normalized threshold of two standard deviations in at least one electrode of the inverse weight matrix. This threshold was chosen on the basis of visual inspection of the EEG in the experimental dataset and resulting independent components.

Amendments from Version 1
We have substantially revised the manuscript in order to address the concerns of the three reviewers. In an effort to more transparently convey effect size, we have revised our statistical approach by performing the student's paired t-test and providing the reader with the t-values. We also correct for multiple comparisons using the Holm-Bonferroni method.
In the methods section we provide greater detail regarding the AR2 methodology and also indicate that the experimental dataset was used to derive parameters which could have overestimated the efficacy of the approach. In the introduction and discussion, we offer improved explanations of the approach and results derived from an expanded body of literature.

REVISED
The pruned EEG calculated for each 120 second trial of EEG (i.e iteration of CUDAICA) was concatenated, and subsequently the entire raw ictal EEG was low pass filtered (<16 Hz) using a 500th order symmetric digital FIR filter, and the resulting low pass filtered EEG was reconstituted by addition of the waveforms with the high pass (>16 Hz) filtered EEG, following the exclusion of the independent components suspected to represent muscle artifact. The reconstituted and modified ictal EEG was exported from Matlab format to EDF for subsequent visual analysis.

Operation
All computations were carried out using compiled

Performance measures of AR1 and AR2
The AR1 and AR2 processed data were reviewed in Persyst v12 without video by 26 neurologists with a specialization in EEG, 20 of whom were board certified. The readers were blinded to which records received AR1 or AR2, and each reader reviewed the 46 seizures in random. Following review of each ictal record, the reader completed a multiple choice questionnaire (Supplementary File 1), which assessed ability to visualize seizure-onset (Y,N) lateralize seizure-onset (L,R,N), locate the region of ictal onset (anterior temporal, anterior frontal, mid-temporal, temporal-parietal-occipital, occipital, none), and self-identify confidence of interpretation on a 5 point scale [(5) entirely confident (4) somewhat sure (3) probable (2) not confident (1) unlikely i.e. slight probability] for each measure. When time of onset, laterality, or the seizure onset region was not assigned the confidence was taken as (0). Readers were not provided with a definition of seizure-onset.

EEG analysis
During the interpretation of the ictal EEG processed by AR1 or AR2, no restrictions were placed on the use of Persyst v12 built in EEG filters (low-pass, high-pass, band-pass), or changes to montage. A comment in each recording was used to demarcate the time prior to the clinical seizure but not the EEG onset. The assessment was not time limited.

Statistical analysis
Differences in EEG interpretation utilizing AR1 and AR2 were assessed using the paired student's t-test and the McNemar test on paired nominal data. The Bonferroni-Holm method was used to correct for multiple comparisons. Agreement across readers (Y,N,L,R), using either AR1 or AR2, was calculated using the inter-class correlation coefficient (ICC). For these outcomes, missing values were imputed to be in between non-missing values, and were analyzed using cumulative logit mixed effects models, which capture this ordering in the values and accounts for the clustering of readings into patients, and seizures within patients. Agreement across readers for onset region was calculated using a Fleiss kappa and treating the missing values as a category of response. Errors are given as standard error of the mean (s.e.m), unless otherwise specified.

Implementation of the AR2 method
We applied the AR2 method developed at UCLA to the 23 seizures in the dataset. The method was automatic and unsupervised and separated the high-pass filtered (> 16 Hz) scalp EEG recordings into putative neurogenic and myogenic components ( Figure 1). After pruning the putative myogenic components, the putative neurogenic components were reconstituted with the low-pass filtered (< 16 Hz) scalp EEG ( Figure 2). The AR2 and AR1 processed scalp EEG recordings were subsequently inspected by the 26 specialists ( Figure 3).   Ictal scalp EEG recording from seizure 18 prior to artifact reduction processing (top), after processing with artifact reduction methodology 1 (AR1, middle), and after processing with artifact reduction methology 2 (AR2, bottom). Only processing with AR2 reveals a right hemispheric onset followed by clear spread to right frontal regions.

Comparison of seizure-onset lateralization assignments with other clinical findings
We identified the patients with at least two consistent clinical findings that lateralized the suspected seizure-onset zone (SOZ). Compared to AR1, more readers were able to render seizure-onset laterality assignments using AR2, and these assignments were more often congruent with other clinical data (Table 2). These clinical findings included seizure semiology, onset of seizures without EEG obscuration, structural MRI, PET, or SPECT findings. If any of the clinical findings were contradictory with respects to the laterality of the suspected SOZ, the SOZ was designated unknown. Overall, 4 patients (#1,4,5,6) had clinical findings that supported a left-hemispheric SOZ, and 1 patient (#7) had clinical findings that supported a right-hemispheric SOZ (Table S1). Among the 5 patients with clinical seizure onset lateralization based on independent data, if the reader lateralized the seizure-onset to the left using AR2 they were correct in 95.9% (95% CI 85.7-98.9%) of cases, but using AR1 they were correct in 91.9% (95% CI 77.0-97.5%) of cases (Table 3, p<0.0607).

Discussion
In this study, we present a new artifact reduction software, AR2, and its application compared with a commercially available tool, AR1. 26 neurologists used the two methods to interpret 23 ictal EEG recordings that were uninterpretable due to muscle artifact when reviewed with conventional filtering. The major findings from this study include: 1) the utilization of artifact reduction software results in non-uniform interpretation of ictal EEG, with many readers not able to render assignments; 2) when readers did render seizure-onset laterality assignments it often agreed with other clinical findings; 3) although the study size was small, the AR2 software method increased the number of readers that rendered assignments, and reader confidence suggesting it aids in diagnosis.
Both AR1 and AR2 are digital signal processing software tools 8,15,20 that may confound accurate ictal EEG interpretation by altering the appearance of the EEG. Digital filtering also can mislead 5 . One concern about AR1 and AR2 relates to the uncertainty that myogenic activity was fully removed, and neurogenic components were unaffected during waveform alteration. Specifically, the readers were not confident in their interpretations, and the determination of seizure lateralization sometimes differed between the AR1 or AR2 methods. As such, the artifact reduction methods may introduce false positive findings. This demonstrates the limits of EEG artifact reduction approaches and puts the advantages into perspective.
The reliability of localization by ictal scalp EEG in the absence of artifact is between 65-75% for lateralization 24 . Neurologists disagree more on the interpretation of ictal EEG processed with artifact reduction software, however the seizure-onset laterality assignments rendered by a quorum are often correct. Further refinement of this technology may successfully improve the efficiency of video-EEG monitoring and the utilization of epilepsy surgery; however, correlation with epilepsy resective surgery outcomes will be required for further validation.
With regard to AR2, the novel software method developed for this study, the slight improvement seen in ictal EEG interpretability after applying the method suggests that the algorithm can (1) sometimes produce signals that are, exclusively or mainly, EEG or EMG, and (2) identify which signals are of brain origin and which are contaminant. The effectiveness of AR2 could possibly be improved by utilizing autocorrelations to identify the myogenic independent components 17 One explanation for AR2's ability to isolate myogenic from neurogenic activity may be related to the respective dipole generators of each. ICA produces independent components that may resemble single equivalent dipoles 14 . Presumably, networks of myocytes exhibit shorter distance connectivity then networks of neurons that produce beta and gamma oscillations, and thus the two generators can be distinguished on the basis of the focality 17 of the independent components topography.

Data and software availability
All software code for the new AR2 software developed by S.A.W. is openly and permanently available at https://github.com/shennanw/ AR2.   . Differences in ictal onset region assignments using AR1 or AR2. Stacked bar plot of the ictal onset region assignments using either AR1 (lighter colors) or AR2 (darker colors) for all 23 seizures. Overall, across seizures, more readers were able to render an assignment using AR2 as compared to AR1 (p<0.05). Inter-reader agreement using for assigning the ictal onset region was marginal using either AR1 or AR2.

Competing interests
No competing interests were declared. In this manuscript, Weiss and colleagues present a novel algorithm for removing electromyographic (EMG) artifacts from ictal EEG recordings, called AR2. Moreover, they evaluate the performance of the algorithm on data from 8 patients and compare it to a similar commercial algorithm, AR1 (i.e., Persyst v12's artifact correction software), using readings by 26 neurologists. The data chosen were so corrupted by EMG artifacts that they were not interpretable using conventional frequency-based filtering. Both AR1 and AR2 rely on independent components analysis (ICA) to remove EMG artifacts via spatial filters that are learned from the data. There is strong evidence that ICA is effective at removing EMG (and other EEG-artifacts) from data acquired in controlled, research settings[ref1] . However, there may be too many EMG sources in highly polluted ictal recordings for ICA to work.

Grant information
In general, the authors found that both algorithms (1) made around 50% of the seizures interpretable with typically low levels of rater confidence and (2) produced very low-levels of inter-rater agreement. Nonetheless, when compelling seizure-onset lateralization was available from other sources of data (e.g., PET, SPECT), the algorithms led to EEG interpretations that were in concordance in about 80% of seizures (Table 2). Moreover, AR2 tended to slightly outperform AR1. Specifically, neurologists could interpret more seizures and tended to have more confidence in their interpretations following AR2 artifact correction. However, there was no statistically significant difference in inter-rater agreement between algorithms. The authors conclude from this that their AR2 algorithm "may improve the validity of ictal EEG artifact reduction." In general, I think the authors' work is laudable and that it is a valuable contribution to the literature. AR2 is well motivated given the evidence that ICA is successful at removing EMG (and other EEG artifacts) from data acquired in controlled research settings and the approach they have taken to validate their algorithm is generally sound. Moreover it is impressive that all of the seizures were read by a large number of neurologists, (26; although it is not clear how many were board certified in epilepsy or clinical neurophysiology) and that they have made all of their code and data public.
-2 neurophysiology) and that they have made all of their code and data public.
However, there are some significant issues with this work that qualify their findings and should be addressed in revisions or future work: -As the authors note, the data for this study was obtained from a small number of patients (8, only 5 of whom had lateralized seizure foci based on independent data). Thus, it is not clear how robust some of their findings are (e.g., the small differences between AR1 and AR2 performance).
-Although AR2 is a fully automatic algorithm, there are some arbitrary parameters of the algorithm (e.g., the mutual information threshold used to include an electrode in the artifact correction procedure) that must have been set based on exploratory analyses. If the data used to set these parameters are the same data used to validate the algorithm, then the authors are surely over-estimating, to some extent, the automatic performance of the algorithm. The authors need to specify what data were used to fix the parameters of AR2.
-It is important to note that the authors chose extremely contaminated data to evaluate AR1 and AR2 and that these algorithms might be more useful when applied to less contaminated data.
-If I understand the text correctly, AR2 excludes non-artifact contaminated electrodes from its analysis. You should include these electrodes in the ICA decompositions because they will help capture the neurogenic signal you are trying to preserve. -Since ICA necessarily removes some neurogenic signal along with EEG artifacts, it can help to quantify this by applying your algorithm to non-artifact polluted data . Adding such an analysis to these findings would help us to understand how and how much AR2 might be distorting EEG seizure activity. Electrodes closest to muscles are likely most affected.
-For many statistical hypothesis tests the authors provide only p-values. It would be much more informative if the authors provided test statistics (e.g., t-scores, degrees of freedom), named the type of test (e.g., cumulative logit mixed effect model) and confidence intervals. In particular, confidence intervals will be much better than p-values at communicating how important and robust these effects are . -Figures 4-5 report p<0.05 for the results of a large number of statistical tests (23 per subfigure) with no correction for multiple comparisons. You should perform some type of correction (e.g., Bonferroni-Holm or Benjamini & Hochberg's false discovery rate control algorithm). -To interpret these results, it would greatly help to have inter-reader reliability and reader confidence values for non-artifact contaminated data. Can you get these from the existing literature? -I think the primary finding of this work is that neither AR1 nor AR2 provide robust artifact correction when applied to such heavily contaminated data and need to be improved. You should discuss what improvements (if any) you think could be made. For example, using higher-density EEG recordings could greatly help. With more electrodes, ICA's performance should improve (given sufficient training data).
In addition to those major points, here are some additional suggestions and points of consideration/clarification: The abstract should specify the consistency of AR1-derived lateralization with behavioural, neurophysiological, and neuro-radiological findings. Currently, only the consistency with AR2-derived lateralization is reported. -[pg 3]: Saying "ICA removes artifacts based on source-related features instead of frequencies." is too vague to be informative. You might consider providing more details, such as "ICA derives spatial filters that can remove artifacts that have static scalp topographies and time courses of activity that are distinct from that of EEG sources. ICA artifact correction is necessarily imperfect and will remove some neurogenic components of the EEG as well . However, the degree of EEG distortion may be negligible and ICA has proven effective at removing EMG and ocular artifacts from EEG data recorded from 2 3 4 and ICA has proven effective at removing EMG and ocular artifacts from EEG data recorded from neuronormal individuals in laboratory settings ." -The introduction should note why ICA might not be able to correct for EMG-ictal artifact, even though it has proven useful for less artifact-polluted research data. Specifically, it may fail because the number of EEG artifact sources may be much greater in ictal data. -You say that EEG readings were provided by "26 neurologists with a specialization in EEG." Please specify how many were board certified in epilepsy or clinical neurophysiology. -It appears that AR2 is applied to epochs that are not contaminated with EMG (pg 3, bottom left). Why try to correct artifacts that aren't there? -Instead of saying "independent components of greatest order," I think it is more conventional to say "independent components that account for the most variance." -Please provide the specifications of the analog filter used to acquire the data. It would help to explicitly report the number of data points per electrode fed to ICA. The reliability of ICA is a function of this .
-It might help to clearly state that the AR1 and AR2 processed data were both read using the same graphical user interface (i.e., Persyst's). It took me a little while to figure this out and it's great that you did this.
-It would help to add titles to subfigures (if it is permitted by F1000's formatting guidelines). -In Figure 1 there is no point to showing both the non-normalized ICA and normalized mixing matrix since the mixing matrix column scale is arbitrary. Just show the normalized mixing matrix. It would also help to view the mixing matrix weights as scalp topographies to see both the quality of the putative neurogenic and EMG ICs.
-[pg 4] You say "Compared to AR1, more readers were able to render seizure-onset laterality assignments using AR2, and these assignments were more often congruent with other clinical data (Table 2)." However in Table 2, 82% of the seizures that were lateralizable with AR1 (i.e., 145/177) agree with clinical findings in contrast to 81% of seizures using AR2 (i.e., 171/210). I think percentage of agreement is more important than the number of seizures in agreement.
-[pg 11] You say "Among the 8 patients, if the reader lateralized the seizure-onset to the left using AR2 they were correct in 95.9%….". Do you mean "Among the 5 patients" with clinical seizure onset lateralization based on independent data? -I think your statement "With regard to AR2, the novel software method developed for this study, the slight improvement seen in ictal EEG interpretability after applying the method suggests that the algorithm can (1) reliably produce signals that are, exclusively or mainly, EEG or EMG, and (2) identify which signals are of brain origin and which are contaminant." is overly strong. I think "sometimes" is more accurate than "reliably" given the low reader confidence and inter-reader agreement.
-I don't understand your statement "One explanation for AR2's ability to isolate myogenic from neurogenic independent components may be that scalp EEG electrodes record weighted and summated far-field signals from all brain and muscle sources, as well as near-field electrode noise generated at the electrode/skin interface." ICA can separate myogenic from neurogenic activity because they have distinct scalp topographies and largely independent time courses of activity.
-It is fantastic that you have made both AR2's code and your data publicly available. However, there is not enough documentation on your GitHub repo for me to be able to easily understand how to use it (what is scalp_input_matrix.mat?). A little bit more documentation would greatly help. We are grateful for your insightful and thoughtful comments and suggestions. Appended below are answers to your inquiries, and changes we have made to the manuscript.

References
-As the authors note, the data for this study was obtained from a small number of patients (8, only 5 of whom had lateralized seizure foci based on independent data). Thus, it is not clear how robust some of their findings are (e.g., the small differences between AR1 and AR2 performance).
--The authors agree that this study is underpowered. Our findings are exploratory at best.
-Although AR2 is a fully automatic algorithm, there are some arbitrary parameters of the algorithm (e.g., the mutual information threshold used to include an electrode in the artifact correction procedure) that must have been set based on exploratory analyses. If the data used to set these parameters are the same data used to validate the algorithm, then the authors are surely over-estimating, to some extent, the automatic performance of the algorithm. The authors need to specify what data were used to fix the parameters of AR2.
--You are correct that we used the experimental dataset to define the threshold values and thus we are likely over-estimating the performance of the algorithm. We clarify on (pg.3) and (pg.4) that the thresholds were defined using visual inspection of the experimental dataset in the revised manuscript.
-It is important to note that the authors chose extremely contaminated data to evaluate AR1 and AR2 and that these algorithms might be more useful when applied to less contaminated data.
--On (pg.2) we now specify "Ictal scalp EEG recordings present extraordinary challenges to ICA --On (pg.2) we now specify "Ictal scalp EEG recordings present extraordinary challenges to ICA artifact reduction algorithms because the number of EMG artifact sources increases." -If I understand the text correctly, AR2 excludes non-artifact contaminated electrodes from its analysis. You should include these electrodes in the ICA decompositions because they will help capture the neurogenic signal you are trying to preserve.
--We apologize for the lack of clarity. We only excluded electrodes that had suspected increases in impedance. We specify on (pg.3) "Prior to performing ICA to remove muscle artifact, the algorithm first identified epochs of the scalp EEG record contaminated by muscle artifact and determined the electrodes that were suspected of having high recording impedance during that epoch. The purpose of these calculations was to exclude these electrodes from the ICA calculations." -Since ICA necessarily removes some neurogenic signal along with EEG artifacts, it can help to quantify this by applying your algorithm to non-artifact polluted data . Adding such an analysis to these findings would help us to understand how and how much AR2 might be distorting EEG seizure activity. Electrodes closest to muscles are likely most affected.
--We agree that this analysis would be helpful and should be a focus of future study. Unfortunately, the EEG reviewers who participated in this study are not available to review non-ictal scalp EEG recordings.
-For many statistical hypothesis tests the authors provide only p-values. It would be much more informative if the authors provided test statistics (e.g., t-scores, degrees of freedom), named the type of test (e.g., cumulative logit mixed effect model) and confidence intervals. In particular, confidence intervals will be much better than p-values at communicating how important and robust these effects are .
--As you suggested we now provide t-scores, degrees of freedom, and have named the type of the test in the results. We provide confidence intervals for the cumulative logit mixed effects models results, and the correlation with other clinical data. S.E.M values are provided for the other comparisons in the figures included in the manuscript. The authors are in agreement the confidence intervals are essential to convey effect size - Figures 4-5 report p<0.05 for the results of a large number of statistical tests (23 per subfigure) with no correction for multiple comparisons. You should perform some type of correction (e.g., Bonferroni-Holm or Benjamini & Hochberg's false discovery rate control algorithm).
--We have used your Matlab code to perform the Bonferroni-Holm correction on the p values obtained for the individual seizures. The results have been revised accordingly (see methods, statistical analysis).
-To interpret these results, it would greatly help to have inter-reader reliability and reader confidence values for non-artifact contaminated data. Can you get these from the existing literature?
--We agree and the following sentence has been added to the discussion (pg. 13): The reliability of localization by ictal scalp EEG in the absence of artifact is between 65-75% for lateralization . -I think the primary finding of this work is that neither AR1 nor AR2 provide robust artifact correction when applied to such heavily contaminated data and need to be improved. You should discuss what improvements (if any) you think could be made. For example, using higher-density EEG recordings could greatly help. With more electrodes, ICA's performance should improve (given sufficient training data).
--Thank you for this helpful suggestion, we have added the following sentence to the discussion (pg. 13): The effectiveness of AR2 could possibly be improved by utilizing autocorrelations to identify the myogenic independent components. We hope that this method can be optimized for 10/20 standard scalp EEG.
In addition to those major points, here are some additional suggestions and points of consideration/clarification: The abstract should specify the consistency of AR1-derived lateralization with behavioural, neurophysiological, and neuro-radiological findings. Currently, only the consistency with AR2-derived lateralization is reported.
--We have provided the results for AR1 in the abstract as you suggested.
-[pg 3]: Saying "ICA removes artifacts based on source-related features instead of frequencies." is too vague to be informative. You might consider providing more details, such as "ICA derives spatial filters that can remove artifacts that have static scalp topographies and time courses of activity that are distinct from that of EEG sources. ICA artifact correction is necessarily imperfect and will remove some neurogenic components of the EEG as well . However, the degree of EEG distortion may be negligible and ICA has proven effective at removing EMG and ocular artifacts from EEG data recorded from neuronormal individuals in laboratory settings ." --Thank you for your suggestion we have made these verbatim changes to the introduction (pg. 2) -The introduction should note why ICA might not be able to correct for EMG-ictal artifact, even though it has proven useful for less artifact-polluted research data. Specifically, it may fail because the number of EEG artifact sources may be much greater in ictal data.
--We have addressed this issue as mentioned in a prior comment to you. --done as suggested -You say that EEG readings were provided by "26 neurologists with a specialization in EEG." Please specify how many were board certified in epilepsy or clinical neurophysiology. -It appears that AR2 is applied to epochs that are not contaminated with EMG (pg 3, bottom left). Why try to correct artifacts that aren't there?
--As specified in the methods we performed the ICA on 120 second trials irrespective of the beginning and end of the ictal EMG artifact. We used this approach in order to allow the algorithm to function in an automated and unsupervised manner.
-Instead of saying "independent components of greatest order," I think it is more conventional to say "independent components that account for the most variance." --We have made this modification as you suggested (pg. 3) -Please provide the specifications of the analog filter used to acquire the data. It would help to explicitly report the number of data points per electrode fed to ICA. The reliability of ICA is a function of this .
--We now specify 24,000 data points in the methods (pg.4) -It might help to clearly state that the AR1 and AR2 processed data were both read using the same graphical user interface (i.e., Persyst's). It took me a little while to figure this out and it's great that you did this.
--We have modified the methods as follows (pg. 5): The AR1 and AR2 processed data were reviewed in Persyst v12 without video by 26 neurologists with a specialization in EEG, 20 of whom were board certified.
-It would help to add titles to subfigures (if it is permitted by F1000's formatting guidelines).
--As far as I know this is not possible.
-In Figure 1 there is no point to showing both the non-normalized ICA and normalized mixing matrix since the mixing matrix column scale is arbitrary. Just show the normalized mixing matrix. It would also help to view the mixing matrix weights as scalp topographies to see both the quality of the putative neurogenic and EMG ICs.
--We have changed the figure as you suggested and modified the legend.
-[pg 4] You say "Compared to AR1, more readers were able to render seizure-onset laterality assignments using AR2, and these assignments were more often congruent with other clinical data (Table 2)." However in Table 2, 82% of the seizures that were lateralizable with AR1 (i.e., 145/177) 6 ( Table 2)." However in Table 2, 82% of the seizures that were lateralizable with AR1 (i.e., 145/177) agree with clinical findings in contrast to 81% of seizures using AR2 (i.e., 171/210). I think percentage of agreement is more important than the number of seizures in agreement.
--Thank you for this insightful point. The numbers do not refer to the number of seizures in agreement but rather to the number of observations i.e. assignments made that agreed with the laterality defined by other clinical data. Thus, more readers were able to render observations that agreed with other clinical data using AR2 as compared to AR1. However, as you point out the percentage of readers that made rendered a laterality decision that did not agree with the other clinical data using AR2 was comparable to AR1.
-[pg 11] You say "Among the 8 patients, if the reader lateralized the seizure-onset to the left using AR2 they were correct in 95.9%….". Do you mean "Among the 5 patients" with clinical seizure onset lateralization based on independent data? --You are correct and we apologize for the lack of clarity. We have modified the results as follows (pg. 12): Among the 5 patients with clinical seizure onset lateralization based on independent data, … -I think your statement "With regard to AR2, the novel software method developed for this study, the slight improvement seen in ictal EEG interpretability after applying the method suggests that the algorithm can (1) reliably produce signals that are, exclusively or mainly, EEG or EMG, and (2) identify which signals are of brain origin and which are contaminant." is overly strong. I think "sometimes" is more accurate than "reliably" given the low reader confidence and inter-reader agreement.
--We agree and have modified the sentence as you suggested (pg.13).
-I don't understand your statement "One explanation for AR2's ability to isolate myogenic from neurogenic independent components may be that scalp EEG electrodes record weighted and summated far-field signals from all brain and muscle sources, as well as near-field electrode noise generated at the electrode/skin interface." ICA can separate myogenic from neurogenic activity because they have distinct scalp topographies and largely independent time courses of activity.
--Thank you for pointing out that this sentence lacks clarity. We have modified this paragraph as follows (pg.13): One explanation for AR2's ability to isolate myogenic from neurogenic activity may be related to the respective dipole generators of each. ICA produces independent components that may resemble single equivalent dipoles . Presumably, networks of myocytes exhibit shorter distance connectivity then networks of neurons that produce beta and gamma oscillations, and thus the two generators can be distinguished on the basis of the focality of the independent components topography.
-It is fantastic that you have made both AR2's code and your data publicly available. However, there is not enough documentation on your GitHub repo for me to be able to easily understand how to use it (what is scalp_input_matrix.mat?). A little bit more documentation would greatly help. 14 11. (page 3, , third paragraph) What does reconstitute mean in this context? Implementation 12. (page 4, ) Since the performance of AR1 and AR2 is being assessed by 4 different Statistical analysis performance measures obtained from 26 EEG specialists, it would be more accurate to use a 2-way repeated measures ANOVA (or its non-parametric equivalent, in the case of the samples not being normally distributed), followed by multiple comparison testing if necessary.

Results:
13. The first three figures have very poor quality. In particular, it is nearly impossible to follow the overall (quite detailed) description of Figure 2 (and it is panel A on the top, left hand-side, and not A ). Also, the three panels in Figure 3 should be overlaid to facilitate the direct comparison between the two algorithms.
14. Although AR2 outperforms AR1 for most of the performance measures, the results are still poor, making me wonder if either of these methods is suitable for EEG muscle artifact correction. Discussion 15. (page 12, first paragraph) What do the Authors mean with "One concern about AR1 and AR2 relates ? to the lack of understanding of the waveform alteration" 16. (page 12, fourth paragraph) "(…) (1) reliably produce signals that are, exclusively or mainly, EEG or . Please clarify and elaborate on this claim.

Abstract:
2. (Results) The authors should include the consistency value also for AR1.
--As you suggested we have added the consistency values for AR1 to the abstract.

Introduction:
3. (page 3, first paragraph) " ". I Each of these tests adds undesired time and cost to the evaluation would say that the necessity of using additional imaging techniques depends on how precise one wants seizure-onset zone delineation to be, as scalp EEG has a poor spatial resolution and localization power. Please elaborate and/or re-phrase the sentence accordingly.
--We have modified the introduction as follows (pg. 3): The inability, or lack of precision, to discern the seizure-onset zone from scalp EEG often necessitates additional testing, … 4. (page 3, second paragraph) "ICA removes artifacts based on source-related features instead of . What do Authors mean with "source-related features"? Actually, there are several frequencies" studies that use frequency-based criteria for the selection and subsequent removal of artifact-related sources…Please explain.
--Thank you for this instructive feedback. We have modified the introduction as follows (pg.3): ICA derives spatial features that can remove artifacts that have static scalp topographies and time courses of activity that are distinct from that of EEG sources. --Thank you for suggesting the inclusion of this important methods article. We now cite this article in the introduction and discussion. 6. (page 3, fourth paragraph) Authors refer to AR1 as a commercially available software, and in fact, detailed information about it is provided in reference [17]. However, the Authors should provide a brief description of the method because: 1) it is the only method which they compare their novel one with; and 2) so that future readers do not need to go through [17] in order to understand the overall rationale of AR1.
--Although the complete methods for AR1 are not included in reference 17 we have modified the introduction as follows (pg.3): The goal of this study was to assess the validity of a commercially available EEG artifact reduction tool (AR1) that uses different montages and within electrode analysis to identify artefactual independent components , and compare its validity to a novel automatic artifact reduction tool (AR2)… 1 20

Methods:
7. (page 3, , first paragraph) Implementation "(…) a power spectral density algorithm to find . The Authors provide no extended intervals of elevated high frequency power across electrodes" information about how this algorithm works, nor references; thus, it is presently not possible to reproduce this part of the study. 8. (page 3, , first paragraph) The Authors need to justify their choices in general; Implementation particularly, why only compute the adjacency matrix between the epoch of greatest duration across all electrodes? Why compute the adjacency matrix in the first place, and not any other discriminative feature for the presence of muscle artifacts? Why only assign the maximum pairwise MI value in the adjacency matrix to a given electrode and ignore all the rest? How was the MI threshold determined? --We agree with your comments #7 and #8. We now specify in the methods that the reason we performed this analysis was (pg.3): "Prior to performing ICA to remove muscle artifact, the algorithm first identified epochs of the scalp EEG record contaminated by muscle artifact and determined the electrodes that were suspected of having high recording impedance during that epoch. The purpose of these calculations was to exclude these electrodes from the ICA calculations." The method used to determine the artifact epoch had actually been modified prior to submission of version 1 of the manuscript. We now better describe this algorithm as "We then calculated the normalized instantaneous amplitude of the band-pass filtered signal using a Hilbert transform. This signal was smoothed using moving averaging, and the algorithm identified the longest epoch in which the time series remained greater than one standard deviation." 9. (page 3, , second paragraph) Again, the Authors need to provide more details Implementation overall. Why segment EEG into consecutive epochs of 120 s? How exactly was the variance threshold derived? Also, I did not understand why should be there any order associated with myogenic and neurogenic components ("We assumed that the last myogenic component and first ).

neurogenic component (…)"
--We agree with your comment and apologize for the lack of clarity. We now specify that (pg.4): A 120 second trial length was chosen to optimize processing time.
In addition, the method have been modified as follows (pg. 4): "We assumed that the last myogenic component and first neurogenic component can be differentiated on the basis of the inverse weight matrix, which provides the spatial distribution of each component, and identifying the independent component that account for the most variance with a focal spatial topography defined on the basis of exceeding a normalized threshold of two standard deviations in at least one electrode of the inverse weight matrix. This threshold was chosen on the basis of visual inspection of the EEG in the experimental dataset and resulting independent components." 10. (page 3, , second paragraph) I understand that one of the expected features of Implementation ICs reflecting muscle artifacts is having a focal spatial topography; however, bad channels are also reflected in ICs exhibiting this feature. Thus, I have severe concerns about false positives when using this criterion, as other myogenic-unrelated ICs are probably being selected as well, which may hinder a true assessment of the impact of muscle artifact correction. 17 may hinder a true assessment of the impact of muscle artifact correction.
--We agree with your concerns however in the algorithm we already excluded bad channels using the algorithm described with reference to comments #7 and #8. 11. (page 3, , third paragraph) What does reconstitute mean in this context? Implementation --We now specify in implementation (pg.4) that: the resulting low pass filtered EEG was reconstituted by addition of the waveforms with the high pass (>16 Hz) filtered EEG 12. (page 4, ) Since the performance of AR1 and AR2 is being assessed by 4 Statistical analysis different performance measures obtained from 26 EEG specialists, it would be more accurate to use a 2-way repeated measures ANOVA (or its non-parametric equivalent, in the case of the samples not being normally distributed), followed by multiple comparison testing if necessary.
--We appreciate this helpful feedback. Dr. David Groppe the other reviewer of the manuscript suggested that we use paired t-tests and provide the t-value in order to convey effect size to the reader. We have followed his recommendations. Including both 2-way repeated measures ANOVA and paired t-tests would confuse the reader.

Results:
13. The first three figures have very poor quality. In particular, it is nearly impossible to follow the overall (quite detailed) description of Figure 2 (and it is panel A on the top, left hand-side, and not A ). Also, the three panels in Figure 3 should be overlaid to facilitate the direct comparison between the two algorithms.
-a) We have made grammatical changes to figure 2, and corrected the figure A1 vs. A2 labeling. We apologize for this oversight. b) We attempted to overlay the panels of figure 3 but the result was confusing and not visually appealing. Therefore, we cannot provide this suggested change.
14. Although AR2 outperforms AR1 for most of the performance measures, the results are still poor, making me wonder if either of these methods is suitable for EEG muscle artifact correction.
--We agree and point out in the discussion that the readers were not confident in their interpretations using either AR1 or AR2 in the discussion (pg. 13).
Discussion 15. (page 12, first paragraph) What do the Authors mean with "One concern about AR1 and AR2 ? relates to the lack of understanding of the waveform alteration" --This sentence has been modified to provide more clarity (pg.13): "One concern about AR1 and AR2 relates to the uncertainty that myogenic activity was fully removed, and neurogenic components were unaffected during waveform alteration." 1 2 16. (page 12, fourth paragraph) "(…) (1) reliably produce signals that are, exclusively or mainly, . Please clarify and elaborate on this claim.

EEG or MEG (…)"
--We agree with your comment that this sentence is unclear. This paragraph has been modified in the revision and now reads as follows: "One explanation for AR2's ability to isolate myogenic from neurogenic activity may be related to the respective dipole generators of each. ICA produces independent components that may resemble single equivalent dipoles . Presumably, networks of myocytes exhibit shorter distance connectivity then networks of neurons that produce beta and gamma oscillations, and thus the two generators can be distinguished on the basis of the focality of the independent components topography."