Combining complexity measures of EEG data: multiplying measures reveal previously hidden information

Many studies have noted significant differences among human electroencephalograph (EEG) results when participants or patients are exposed to different stimuli, undertaking different tasks, or being affected by conditions such as epilepsy or Alzheimer's disease. Such studies often use only one or two measures of complexity and do not regularly justify their choice of measure beyond the fact that it has been used in previous studies. If more measures were added to such studies, however, more complete information might be found about these reported differences. Such information might be useful in confirming the existence or extent of such differences, or in understanding their physiological bases. In this study we analysed publically-available EEG data using a range of complexity measures to determine how well the measures correlated with one another. The complexity measures did not all significantly correlate, suggesting that different measures were measuring unique features of the EEG signals and thus revealing information which other measures were unable to detect. Therefore, the results from this analysis suggests that combinations of complexity measures reveal unique information which is in addition to the information captured by other measures of complexity in EEG data. For this reason, researchers using individual complexity measures for EEG data should consider using combinations of measures to more completely account for any differences they observe and to ensure the robustness of any relationships identified.

This article is included in the Neuroinformatics channel. Electroencephalography (EEG) is a common, relatively non-invasive research and diagnostic tool. Its one-dimensional signals from localised peripheral regions on the head make it attractive for its simplistic fidelity and has allowed high clinical and basic research throughput. When it comes to interpreting EEG data, investigators have a wide range of analytical tools at their disposal (Dauwels et al., 2010;Delorme & Makeig, 2004) and in recent years have explored a number of novel relationships between measures of complexity (Cao & Slobounov, 2011;Dauwels et al., 2011;Jing et al., 2014;Sitt et al., 2014;Susmáková & Krakovská, 2008;Weiss et al., 2011). Studies which have included complexity measures, however, do not regularly include more than one or two such measures. For example, Dauwels et al. (2011) include the Lempel-Ziv (LZ) complexity measure (Lempel & Ziv, 1976) -an algorithmicbased measure -and regularity measures, but ignore potential chaotic and fractal measures. This is not to suggest that the LZ complexity measure or that regularity measures are meaningless, nor that chaotic and fractal measures are more or less important than other measures of complexity, but that all may be measuring different features. Thus, for a more complete and robust picture of any relationships found for one complexity measure in EEG data, it might be useful for investigators to include other measures in their analyses.
This study therefore aims to determine whether different measures of complexity of EEG signals correlate, and (if so) to what degree.
To do this, a small battery of complexity measures were computed for publicly-available normative data and subsequently analysed for correlations. If some measures were found not to significantly correlate or correlate fully, this would suggest that these measures are detecting unique information which might otherwise have remained hidden to investigators who were computing only a single complexity measure from their data.

Methods
One thousand, one hundred EEG recordings of 1-second duration from 13 healthy control subjects undergoing an object recognition psychophysics task were obtained from a publicly-available database created by Begleiter (1996) of the Neurodynamics Laboratory, State University of New York Health Center, Brooklyn, United States. The control subjects were selected so as to avoid disease-specific influences. While our sample size was limited by the database, prior studies which used this database reached significance (thus, independent power calculations were not performed). Detailed demographic, subject, recording, and task information can be found in the original study by Zhang et al. (1995).
The following complexity measures were calculated in MATLAB for each recording: LZ algorithmic complexity (Lempel & Ziv, 1976), fractal dimension estimation (FD) (Higuchi, 1988), permutation entropy (PE) (Bandt & Pompe, 2002), Wiener entropy (WE) (Wiener, 1954), and spectral structure variability (SSV) (Singh, 2011). These measures were chosen on the basis of their broad representation of different conceptions of 'complexity', including informational theoretic, chaotic/fractal, and computational informatic approaches; details of how these measures are calculated and what they measure are well-described by their respective original proposers (Bandt & Pompe, 2002;Higuchi, 1988;Lempel & Ziv, 1976;Singh, 2011;Wiener, 1954) and so will not be repeated here (see Data Availability for code details). Many more measures exist than these, however as the principle aim of this paper was to determine if differences exist at all, any differences detected in this small cross-section of measures would sufficiently illustrate this. Results from the complexity measures were analysed by linear regression and significance (considered as p<0.05) for relationships between pairs of measures was calculated using Pearson product-moment correlation coefficients. For relationships which appeared to have non-linear components when viewing its scatterplot, binomial regression was attempted. Graphs and statistics were generated using MATLAB R2012a (7.14.0.739) and Microsoft Excel 2007.

Results
Of the ten pairs of measures, eight pairs exhibited highly significant (p<0.0001) correlations while two pairs -(i) PE and FD, (ii) WE and LZ -did not significantly correlate (Table 1 and Table 2). High degrees of spread were noted among all correlations.
These relationships were visualised using scatter plots (Figure 1 and Figure 2) to help determine if any of these relationships may be non-linear. Two such relationships -(i) LZ and FD, (ii) SSV and FD -appeared to follow a binomial trend (Figure 3), and binomial regression improved these relationships greatly.    The following data are the results from MATLAB functions which calculated complexity measures for each EEG recording. The following data are the results from MATLAB functions which calculated complexity measures for each EEG recording. ID = identification code as per Begleiter (1996); LZ = Lempel-Ziv algorithmic complexity; FD = fractal dimension estimate (Higuchi method); PE = permutation entropy; WE = Wiener entropy (also known as spectral flatness) (Burns & Rajan, 2015).

Discussion and conclusions
Some -but not all -measures of complexity of EEG signals correlate, and to varying degrees of significance, e.g. we found no significant relationship between PE and FD but did find a significant relation between PE and LZ. To the best of our knowledge, this study represents the first report of such complexity measure differences in EEG signals. Of the many complexity measures available to researchers investigating EEG data, overreliance or overconfidence in any single measure therefore seems misplaced. As research groups who have attempted to classify or predict sleep stages or conscious states from EEG data have implicitly noted (Susmáková & Krakovská, 2008;Sitt et al., 2014;Weiss et al., 2011), no individual measure can reliably predict all possibly relevant physiology. Instead, combinations of measures are needed. In the same way, no individual measurement of complexity can reliably predict all possibly relevant complexity.
In part, the results from this short study reflect on a more generalised ambiguity of the concept of 'complexity'. Who is to say, after all, that more is revealed about 'complexity' by FD than LZ? It seems that it cannot be said that either elucidate more or less about 'complexity', since both ultimately treat it in a different way on even a conceptual basis. This further reiterates the primary finding of the present study: by multiplying measures we can reveal information which was previously hidden or unknown to us. However, there are two caveats to this: (1) not all information may be physiologically or otherwise relevant all of the time (or ever); and (2) different datasets may, due to their differences in nature, show different levels of covariance between complexity measures.
It would be interesting for future studies to analyse previouslynoted complexity differences -e.g., between patients with and without Alzheimer's disease (Dauwels et al., 2010) -to determine if these differences were measuring the same difference. Our results suggests they may not be. And if this is the case, more might be gleaned from the available data if more measures were applied in combination. It could even be possible that there exists entirely separate complexity dimensions, along which patients progress at different rates. Such information could therefore contain even more physiological, clinical, or other significance than previously thought.

Data availability
A copy of MATLAB functions used in this study has been uploaded to GitHub and can be accessed here: https://github.com/tfburns/ MATLAB-functions-for-complexity-measures-of-one-dimensional-signals.
Results from these functions for the EEG data used can be found in the F1000Research repository (see below).
Author contributions TB conceived, wrote, and performed all analyses pertaining to the manuscript. RR assisted in the development of the conceptual and methodological components of the analyses. All authors have read and agreed to the final content of the manuscript.