Heatmaps and consensus clustering for ego network exploration

Philippe Boileau; Lisa Kakinami; Tracie Barnett; Mélanie Henderson; Lea Popovic

doi:10.12688/f1000research.108964.1

Home Browse Heatmaps and consensus clustering for ego network exploration

ALL Metrics

-

Views

-

Downloads

Get PDF

Get XML

Export

▬

✚

Method Article

Heatmaps and consensus clustering for ego network exploration

[version 1; peer review: 2 approved with reservations]

Philippe Boileau^1,2, Lisa Kakinami^2,3, Tracie Barnett^4,5, Mélanie Henderson^4,6, Lea Popovic²

Philippe Boileau^1,2, Lisa Kakinami^2,3, [...] Tracie Barnett^4,5, Mélanie Henderson^4,6, Lea Popovic²

PUBLISHED 11 Jul 2022

Author details Author details

¹ Graduate Group in Biostatistics, School of Public Health, University of California, Berkeley, California, 94704, USA
² Department of Mathematics and Statistics, Concordia University, Montreal, Quebec, H3G 1M8, Canada
³ PERFORM Centre, Concordia University, Montreal, Quebec, H4B 1R6, Canada
⁴ Research Centre of CHU Sainte Justine, Université de Montréal, Montreal, Quebec, H3T 1C5, Canada
⁵ Department of Family Medicine, McGill University, Montreal, Quebec, H3S 1Z1, Canada
⁶ Department of Pediatrics, Université de Montréal, Montreal, Quebec, H3T 1C5, Canada

Philippe Boileau
Roles: Conceptualization, Formal Analysis, Methodology, Writing – Original Draft Preparation

Lisa Kakinami
Roles: Conceptualization, Methodology, Supervision, Writing – Review & Editing

Tracie Barnett
Roles: Data Curation, Funding Acquisition, Project Administration, Writing – Review & Editing

Mélanie Henderson
Roles: Data Curation, Funding Acquisition, Project Administration, Writing – Review & Editing

Lea Popovic
Roles: Conceptualization, Methodology, Supervision, Writing – Review & Editing

OPEN PEER REVIEW

REVIEWER STATUS

Abstract

Background: Researchers need visualization methods (using statistical and interactive techniques) to efficiently perform quality assessments and glean insights from their data. Data on networks can particularly benefit from more advanced techniques since typical visualization methods, such as node-link diagrams, can be difficult to interpret. We use heatmaps and consensus clustering on network data and show they can be combined to easily and efficiently explore nonparametric relationships among the variables and networks that comprise an ego network data set.
Methods: We used ego network data from the Québec Adipose and Lifestyle Investigation in Youth (QUALITY) cohort used to evaluate this method. The data consists of 35 networks centered on individuals (egos), each containing a maximum of 10 nodes (alters). These networks are described through 41 variables: 11 describing the ego (e.g. fat mass percentage), 18 describing the alters (e.g. frequency of physical activity) and 12 describing the network structure (e.g. degree).
Results: Four stable clusters were detected. Cluster one consisted of variables relating to the interconnectivity of the ego networks and the locations of interaction, cluster two consisted of the ego’s age, cluster three contained lifestyle variables and obesity outcomes and cluster four was comprised of variables measuring alter importance and diet.
Conclusions: This exploratory method using heatmaps and consensus clustering on network data identified several important associations among variables describing the alters’ lifestyle habits and the egos’ obesity outcomes. Their relevance has been identified by studies on the effect of social networks on childhood obesity.

Keywords

exploratory data analysis, visualization, heatmap, consensus clustering, networks, obesity

Corresponding author: Lea Popovic

Competing interests: No competing interests were disclosed.

Grant information: The QUALITY cohort is funded by the Canadian Institutes of Health Research (#OHF-69442, #NMD-94067, #MOP-97853, #MOP-119512), the Heart and Stroke Foundation of Canada (#PG-040291) and the Fonds de la Recherche du Québec - Santé. Mélanie Henderson holds a Diabetes Junior Investigator Award from the Canadian Society of Endocrinology and Metabolism - AstraZeneca and a Fonds de Recherche du Québec - Santé Junior 2 salary awards and Lisa Kakinami holds a Junior 1 salary award from the latter institution. PB was supported by a scholarship from the Institut des sciences mathématiques.
The funders had no role in study design, data collection and analysis, decision to publish, or preparation of the manuscript.

Copyright: © 2022 Boileau P et al. This is an open access article distributed under the terms of the Creative Commons Attribution License, which permits unrestricted use, distribution, and reproduction in any medium, provided the original work is properly cited.

How to cite: Boileau P, Kakinami L, Barnett T et al. Heatmaps and consensus clustering for ego network exploration [version 1; peer review: 2 approved with reservations]. F1000Research 2022, 11:771 (https://doi.org/10.12688/f1000research.108964.1) First published: 11 Jul 2022, 11:771 (https://doi.org/10.12688/f1000research.108964.1) Latest published: 11 Jul 2022, 11:771 (https://doi.org/10.12688/f1000research.108964.1)

Introduction

Visualization methods relying on a combination of statistical and interactive techniques are becoming necessary for researchers to efficiently perform quality assessments and glean insights from their data.¹ Networks are data structures that are particularly benefiting from such advancements. A network consists of a set of nodes, also called vertices, that are connected to one-another by edges. Networks allow scientists to model and investigate the complexities of structure of the systems they are researching, offering more insight than analyzing their individual components.² They have been used to model and study various phenomena, from Internet traffic to food webs.

Network data are commonly visualized by node-link diagrams (Figure 1), which consist of plotting the nodes and edges. Variables pertaining to the nodes and edges are also often included in the node-link diagram; the node’s colour may represent a categorical variable, or the thickness of an edge may capture an aspect of the relationship between connected nodes. While this makes it easy to discern some topological qualities of the network,³ such as the number of edges a node shares with the other nodes (the node’s degree) or the interconnectedness of the nodes (the network’s density), this visualization’s utility is limited by the size of the network and the number of variables used to describe its various parts.

Figure 1. Two node-link diagram representations of a randomly generated network.

The visualization to the left helps observers gain a sense of the structure of the network by emphasizing the most central node. On the other hand, the depiction on the right permits a quick assessment of the network’s connectedness.

As the number of vertices, edges and variables characterizing each increase, the visualization becomes cluttered and difficult to interpret. These issues are compounded when attempting to perform exploratory data analysis (EDA) on network data with multiple components, i.e. a network in which there are groups of nodes that do not share any edges, or many disjoint ego networks, such as personal social networks. Ego networks consist of a central node (the ego) that is implicitly connected to all other nodes in the network (the alters), which can share edges among themselves. Due to the constraint on the size of the node-link diagrams and the amount of information that can be presented, comparisons between many ego networks at once is difficult.

An alternative representation technique that may be more appropriate for such network data is the heatmap. A heatmap is a graphical representation of a two-dimensional matrix⁴ and is a versatile tool for illustrating multivariate data, as demonstrated with the mtcars dataset (https://stat.ethz.ch/R-manual/R-devel/library/datasets/html/mtcars.html) in Figure 2. Since heatmaps can display large quantities of information in a single figure, it is an ideal tool for presenting multiple ego networks and their variables. Heatmaps rely on two key elements to convey information clearly and efficiently: hierarchical clustering and an appropriate color palette.⁴ These two elements maximize the pattern recognition potential within the visualization. Additionally, when presented in the appropriate medium (i.e. in a dynamic report), heatmaps can be turned into interactive visualizations using software like heatmaply.⁵ The interactivity of the heatmap can improve its legibility, and therefore its use as an EDA tool.

Figure 2. A heatmap of select cars from 1974 and their various characteristics, from the mtcars R data set.

Notice the cluster of compact cars in the top left corner, clearly grouped together by their relatively small engines and weight (top left, dark green-blue rectangles), and by their superior efficiency (top right, light green and yellow rectangles). The most performant cars are grouped together near the center of the y-axis, clustered together due to their larger engines and reduced efficiency.

Although heatmaps offer an efficient way to depict large quantities of data,⁴ it can be difficult to identify the meaningful patterns in the visualization. Thus, heatmaps’ pattern recognition potential can be augmented by using methods that identify stable patterns under data perturbations. Consensus clustering⁶ is one such nonparametric method for assessing the stability and stability of patterns identified in a heatmap. More information on consensus clustering is provided in the methods section.

The objective of this analysis was to determine whether the combination of heatmaps and consensus clustering is an effective tool for representing ego network data. Heatmaps are particularly convenient for depicting high-dimensional data, though, to the best of our knowledge, their application in this setting has yet to be studied. Data from a pilot study investigating the influence of adolescent’s social network on their obesity outcomes were used to evaluate this methodology’s performance.

Methods

Exploratory method

This methodology relies on the application of agglomerative hierarchical clustering (referred throughout the paper as hierarchical clustering), heatmaps and consensus clustering to ego network data. The following paragraphs provide details on their individual implementations, and how they are used in tandem to produce a novel EDA method for this data structure.

Hierarchical clustering is an unsupervised learning method that aims to find clusters of similar items in a data set, where items can be either variables or observations. Numerous methods exist to quantify the similarity among items, like Euclidean distance, correlation or Manhattan distance. At the start of the procedure, each item forms its cluster. These items are then paired with others based on their similarities, forming new clusters.⁷ Between cluster similarity can be measured by various methods, though the most popular are the single, average and complete linkage methods. For information on these linkage methods, see James et al.⁷ With each iteration of the algorithm, the number of elements in each cluster increases, the number of individual clusters decreases and, generally, the average similarity among the items in each cluster decreases. The process ends when only a single cluster remains. The result of this procedure is often depicted by a tree, known as a dendrogram (Figures 2 and 3), illustrated on the axes of the heatmaps. The dendrogram can be used to infer the cluster membership of each item by “cutting” the tree at various levels of similarity, which corresponds to the height of the tree. Cutting the tree at higher levels of similarity will produce granular clusters (i.e. many small groups of similar items), whereas lower levels of similarity will produce larger clusters composed of more dissimilar items. Thus, the dendrogram can be cut at the lowest height that produces an a priori specified number of clusters. Selecting the appropriate number of clusters in the data is a challenging task, though, as we will see later, consensus clustering can help in making a reasonable choice. As is customary with methods relying on some notion of distance, the data subjected to hierarchical clustering are often normalized.

Figure 3. Heatmap of the QUALITY ego network data.

An aggregation step must take place prior to the application of hierarchical clustering due to the structure of ego network data. Ego networks can be described by variables pertaining to the ego, the alters, the edges between alters and to the network itself, such as its structural characteristics. Unfortunately, there is no obvious way to apply hierarchical clustering to multilevel data, where the higher level comprises the ego and network variables, and the lower level is composed of the alter and edge variables. To transform the data into a format compatible with hierarchical clustering, the alter and edge variables within each network are summarized by some measure of center. The choice of statistic, such as the mean, median or mode, will depend on the characteristics of the variables being summarized. Once this step is accomplished, the transformed ego network data is ready to undergo hierarchical clustering; each observation consists of an ego network described by variables relating to the ego, the network and measures of centers of the alter and edge variables.

The dendrograms produced by subjecting the transformed ego network data to hierarchical clustering are then used to create the heatmap of the data. The heatmap is comprised of networks across rows and of variables across columns. The cells in the heatmap, represent normalized variable values and, should be colored using a divergent color palette to promote pattern recognition.⁴

Consensus clustering is then used to identify the number of clusters among the transformed network data’s variables and to quantify the stability of the clusters nonparametrically.⁶ Consensus clustering repeatedly applies multiple user-defined clustering methods to random subsets of the data and evaluates the frequency with which variables are clustered together over repeated samples. Given a set of n items subjected to some clustering method, the consensus matrix, X, is an n×n matrix where Xi,j corresponds to the fraction of iterations of the algorithm for which items i and j were members of the same cluster. A value near one signifies that items i and j are often clustered together, whereas a value near zero implies that they are rarely grouped together. The resulting consensus matrix is then summarized using the cluster consensus and item consensus statistics,⁶ which measure within-cluster stability and item-wise stability, respectively.⁶ Consensus clustering thus identifies stable clusters in the data, further increasing the interpretability of the heatmaps by pinpointing the more meaningful patterns in the visualization. Additionally, Monti et al. use the empirical cumulative density function (ECDF) of the entries in the consensus matrix to identify the appropriate number of stable clusters in the data.⁶ For example, if hierarchical clustering is used to group the items, consensus clustering can be repeatedly applied such that each iteration of hierarchical clustering specifies a different number of clusters. The ECDFs produced by applying consensus clustering for a range of the number of clusters can then be compared. When the items in the data are perfectly clustered under repetition, i.e. the consensus matrix entries consist of zeros and ones, the ECDF of the consensus clustering is a step function with a single step between zero and one.⁶ As the clustering becomes less exact, the step transforms into a smooth, monotonically non-decreasing line.⁶ Sharp changes in the ECDF curves of consecutive consensus clustering procedures can help distinguish the number of stable clusters in the data.⁶

Consensus clustering is therefore applied to the variables of the transformed ego network data in order to identify the meaningful associations depicted in the heatmap, and to quantify the strengths of these relationships. Hierarchical clustering is used to cluster the random subsets of the data in each iteration. Depending on the goals of the EDA, consensus clustering may also be applied to the ego networks to identify clusters of similar networks.

Software implementation

An open-source R implementation⁸ (R Project for Statistical Computing, RRID:SCR_001905) of this methodology was developed and is available in the neatmaps package (v2.1.1).⁹ This software, along with relevant documentation and examples, is available on CRAN and on GitHub at github.com/PhilBoileau/neatmaps.¹⁰ This package makes heavy use of two packages: heatmaply (v.1.3.0)⁵ was used to produce the heatmaps, and ConcencusClusterPlus (v1.36.0)¹¹ was used to implement consensus clustering.

Case study

In the realm of public health, social networks have been used to model the spread of non-infectious diseases like obesity.¹²^–¹⁴ However, as the obesity epidemic currently affects over a third of the world’s population,¹⁵ the development of novel methods to better understand and potentially mitigate its spread are needed. While targeting multiple behaviours like diet and physical activity to reduce body weight is recognized as an effective strategy,¹⁶ their benefits may be further augmented through social network-based programs. Though evidence suggests that these interventions may positively impact obesity outcomes, further investigation is required.¹⁴ Thus, we applied our novel EDA approach to an ego network data set collected by a pilot study exploring the relationship between adolescent social networks and obesity outcomes.

Data source

Data were obtained from a pilot study on the influence of adolescent’s social networks on health outcomes (n = 46) from the Quebec Adipose and Lifestyle Investigation in Youth (QUALITY) cohort (n = 630). QUALITY is an ongoing longitudinal study which investigates the natural history of obesity using a sample of at-risk Caucasian youths in Quebec.¹⁷ At-risk youth were defined as children with at least one overweight parent (i.e. body mass index (BMI) over 30 kg/m2 or waist circumference over 102 cm for men and 88 cm for women, based on self-reported height, weight and waist circumference)¹⁷ at the start of the study. The pilot aimed to evaluate the data collection processes and to identify patterns that could lead to new research questions for the full-scale study. A complete case analysis of the pilot data was performed (n = 35 ego networks). QUALITY obtained ethics approval (#MP-21-2005-79, 2040) from the Centre Hospitalier Universitaire Sainte-Justine. Parents signed consent forms and children provided assent. This secondary data analysis was approved by the Concordia Research Ethics Board (#300116369).

Measures

Each of the pilot study’s participants (egos) was asked to list up to ten people (alters) with whom they felt comfortable discussing important personal matters in the past year. The egos then reported their alters’ demographic characteristics (age, sex), location of alter’s residence with respect to the ego’s, perceived body type, health behaviours (frequency of physical activity, frequency of eating healthy (e.g. avoiding junk food), frequency of undertaking a diet for weight loss, frequency of Internet use), and support (frequency of encouraging ego to be physically active and frequency of performing at least 30 minutes of physical activity with the alter). Egos also reported on relationship characteristics (duration, closeness, importance, frequency of contact, types of interaction [i.e. face to face, phone, email, SMS, social media, video call and other]), location of interactions (home, work, school, hobby, media and other). The mean of these alter variables was computed within each ego network. Each participant was also asked to answer the questions pertaining to their own frequency of physical activity, of healthy eating, of dieting for weight loss, and of Internet use using questionnaires published in the literature¹⁸^,¹⁹ as described in our previous work.²⁰

Other ego data included height and weight (measured via stadiometer and electronic weight scale, respectively),¹⁷ fat mass percentage (measured using dual-energy absorptiometry) and body mass index z-scores in accordance with the WHO growth curves accounting for age and sex.²¹

Network measures

Each participant (ego) has an associated network consisting of alters (nodes) based on ego-reported friendship or family ties (edges). Based on these ties, certain topological characteristics of each ego network were computed. These metrics were the ego degree, mean alter degree, density, constraint,²² hierarchy,²³ effective size and efficiency.²⁴ Additionally, the networks’ homophily indices²⁵ were calculated for each of the following variables: age, gender, perceived body type and frequency of physical activity, of eating healthy, dieting for weight loss and of Internet use.

Results

The EDA method was applied to the QUALITY ego network data using Euclidean distance as a similarity metric and the average linkage method for between-cluster similarity of the network clusters and the variable clusters. The data were normalized by rescaling each variable to have a range between 0 and 1. For variable vector v of length n with values v_i for i = 1, …, n, each value v_i was standardized as follows:

v_{i}^{new} = \frac{v_{i} - min (v)}{max (v) - min (v)}

Thus, hierarchically clustering the variables produces clusters that contain positively linearly-associated variables.²⁶ The consensus clustering step of the method was performed with predefined numbers of clusters ranging from 2 to 10. For each cluster count, 1000 repetitions of the clustering algorithm were performed on a random subset of 80% of the ego networks, as recommended in the documentation of ConcencusClusterPlus (v1.36.0).¹¹

The heatmap of the QUALITY ego network data is shown in Figure 3, efficiently visualizing the 35 ego networks and their 41 variables. Figure 4 illustrates the consensus matrices for the consensus cluster iterations of three, four and five clusters. The ECDFs produced by the consensus clustering are illustrated in Figure 5. The clusters, their contents (along with each items’ item consensus statistic) and the clusters consensus statistic are also presented (Table 1).

Figure 4. The consensus matrices of consensus clustering with three, four and five clusters.

This evidence suggests that four relatively stable clusters are identified in the data.

Figure 5. The ECDFs of the consensus matrices.

There is a clear change in the distributions of the consensus matrices for four and five clusters, suggesting that there are four clusters among the ego network data variables.

Table 1. Cluster contents, four clusters.

Cluster	Cluster consensus	Variables and item consensus
1	0.708	Degree (0.773), mean alter degree (0.773), mean multiplexity (0.784), proportion of alters living outside ego’s neighbourhood (0.518), proportion of interactions of ego with alters at school (0.785), proportion of interactions of ego with alters during hobby time (0.784), sex homophily (0.702), age homophily (0.785), frequency of internet use homophily (0.472)
2	Not applicable	Age (Not applicable)
3	0.873	Ego’s frequency of physical activity (0.897), ego’s frequency of internet use (0.845), ego’s frequency of healthy eating (0.916), ego’s WHO BMI z-score (0.902), ego’s fat mass % (0.902), proportion of alters that are relatives (0.916), proportion of male alters (0.683), proportion of alters living in ego’s home (0.915), proportion of alters living in ego’s neighbourhood (0.864), mean alter age (0.916), mean perceived alter body shape (0.881), mean frequency that ego is physically active with alters (0.877), mean frequency that alters encourage ego to be physically active (0.9), mean frequency that alters are physically active (0.901), mean frequency of alter internet use (0.89), mean frequency that alters eat healthily (0.9), perceived body shape homophily (0.914), frequency of physical activity homophily (0.906), frequency of healthy eating homophily (0.916), frequency of dieting for weight loss homophily (0.695), effective size (0.845), efficiency (0.845), constraint (0.912), hierarchy (0.845), number of components (0.845)
4	0.988	Ego’s frequency of dieting for weight loss (0.963), Mean importance of alters to ego (0.993), Mean closeness of alters to ego (0.993), proportion of interactions of ego with alters at home (0.993), mean frequency of alters dieting for weight loss (0.993), density (0.993)

The ECDFs and the consensus matrix suggest that the data contains four stable clusters. Cluster one consists of variables relating to the interconnectivity of the ego networks and the locations of interaction, cluster two consists of the ego’s age, cluster three contains lifestyle variables and obesity outcomes and cluster four is comprised of variables measuring alter importance and diet.

Of the four clusters identified during consensus clustering, cluster four is the most stable with a cluster consensus of m(5) = 0.988 (Table 1). This cluster corresponds to the six rightmost columns in the heatmap (Figure 3). This grouping suggests a positive linear relationship between the dieting habits of the egos and those of their alters, and the mean strength of the relationships in the egos’ networks. However, upon closer inspection of the heatmap, this result may be due to the lack of variability in the distributions of the variables composing the cluster.

Cluster three is also a stable cluster given its cluster consensus value of m(3) = 0.873 (Table 1). This cluster is positioned to the left of the solid blue streak in the center of the heatmap and provides evidence of a positive association between certain lifestyle behaviours and the obesity measures for both the egos and their alters. These results indicate that the frequency of physical activity of the egos and the alters, the frequency with which the alters encourage the ego to be physically active, the homophily of frequency of physical activity, the ego’s and alter’s frequency of healthy eating and the homophily of frequency of healthy eating in the network are potentially related to the egos’ adiposity measures such as BMI z-score, fat mass percentage and perceived body type.

Cluster two consists solely of the ego’s age since it was not found to be strongly associated with any of the other variables. This result is unsurprising given that all egos are approximately the same age. Such a homogeneous variable does not provide any meaningful information, and consensus clustering successfully recognized this.

Although the results of cluster two, three and four are stable, the first cluster identified using cluster consensus is not. Cluster one has a moderate cluster consensus statistic (m(1) = 0.708), implying that its variables are less strongly associated. This cluster is positioned in between the solid blue and red columns on the right side of the heatmap (Figure 3).

Lastly, streaks of a single solid color (blue or red) in Figure 3 indicate that variables comprising these columns exhibit little variability. These variables are, in blue, the ego’s frequency of Internet use for entertainment, number of components, hierarchy, effective size, and, in red, density, proportion of ego interaction with alters at home, and the ego’s dieting frequency.

Discussion

The heatmap allows analysts to quickly identify potential associations among networks and variables, study the distributions of the variables and assess the quality of the data.

The results of the consensus clustering augment the interpretation of the heatmap by pinpointing the most meaningful clusters in the data and quantifying the associations among the clusters’ variables. The third and fourth clusters identified by this method capture previously studied relationships among the obesity outcomes and lifestyle behaviours of the egos and alters, and the ego network structure. De la Haye et al. previously observed an association between dietary intake and friendship ties among males in school-based social networks.²⁷ Similarly, the fourth cluster identified via consensus clustering identified an association between friendship ties and dieting habits. The third cluster is consistent with the literature on childhood obesity outcomes and lifestyle behaviours within social networks.²²^–²⁵ Additionally, this method recognized that ego age could not be meaningfully grouped with any other variable, which would not be apparent if the analysis relied solely on hierarchical clustering. The heatmap also permitted the efficient identification of variables which exhibit little variation, indicating to the study investigators that questions associated with these variables should be modified to capture more discerning information on the measures of interest.

Although these results are encouraging, there are some limitations associated with using heatmaps and consensus clustering to explore ego network data. First, this EDA process requires that the data have no missing values, a rare occurrence in empirical research. Although judicious pre-processing and data imputation can remedy the situation, further work must be done to assess the sensitivity of this technique to missingness. Secondly, though consensus clustering can evaluate the quality of the hierarchical clustering, its results are data dependent. Its results should only be taken seriously when the data are a representative sample of the population of interest. Additionally, the data was scaled such that the variables were hierarchically clustered based on the strengths of their linear associations. Other data normalization methods could be used so that the hierarchical clustering targets the non-linear relationships among variables.

Conclusions

We demonstrate that, when applied to ego network data, the combination of heatmaps and consensus clustering successfully identified a number of important relationships that are consistent with literature on social networks and childhood obesity. These results may motivate further research in this field. This was accomplished without the need for substantial expertise using network analysis software; only a few functions from the neatmaps R package (v.2.1.1) were required to perform this analysis. Replication in other ego network data sets is warranted in order to further validate this methodology.

Data availability

Underlying data

These data were collected by the QUALITY research team, in a small subset of the cohort (35 participants out of the original 630 participants). Data are not provided in an online repository due to ethical considerations regarding confidentiality/privacy concerns for study participants. Data can be made available upon reasonable request sent to the manuscript authors or the QUALITY research team (www.etudequalitystudy.ca).

Software availability

Software available from: https://github.com/PhilBoileau/neatmaps, https://CRAN.R-project.org/package=neatmaps

Source code available from: https://github.com/PhilBoileau/neatmaps

Archived source code at time of publication: https://doi.org/10.5281/zenodo.6450386¹⁰

License: MIT

Author’s contributions

PB, LP and LK conceived the method. TB and MH furnished the data. PB implemented the method, processed the data and performed analyses. LP, LK and TB supervised the work. PB wrote the manuscript with input from all authors. All authors provided critical feedback on the method, analyses and manuscript. All authors read the final version of the manuscript and approved it.

Acknowledgements

Dr. Marie Lambert (July 1952 – February 2012), a pediatric geneticist and researcher, initiated the QUALITY cohort. Her leadership and devotion to QUALITY will always be remembered and appreciated. Finally, we are grateful to all the families participating in the QUALITY cohort.

Portions of this research were presented at the Quebec Society for lipid, nutrition and metabolism scientific meeting (Magog-Orford, Quebec, February 7–9, 2018), the 5th annual PERFORM Centre conference (Montreal, Quebec, May 17, 2018) and the Canadian Statistics Student Conference (Montreal, Quebec, June 2, 2018). Thank you to all attendees who provided valuable feedback.

References

1. Perer A, Shneiderman B: Integrating Statistics and Visualization for Exploratory Power: From Long-Term Case Studies to Design Guidelines. IEEE Comput. Graph. Appl. 2009; 29(3): 39–51. PubMed Abstract | Publisher Full Text
2. Newman M: Networks: An Introduction. Oxford University Press;2010.
3. Gehlenborg N, Wong B: Networks. Nat. Methods. 2012; 9(2): 115–115. Publisher Full Text
4. Gehlenborg N, Wong B: Heat maps. Nat. Methods. 2012; 9(3): 213–213. Publisher Full Text
5. Galili T, O’Callaghan A, Sidi J, et al.: heatmaply: an R package for creating interactive cluster heatmaps for online publishing. Bioinformatics. 2018; 34(9): 1600–1602. PubMed Abstract | Publisher Full Text
6. Monti S, Tamayo P, Mesirov J, et al.: Consensus Clustering: A Resampling-Based Method for Class Discovery and Visualization of Gene Expression Microarray Data. Mach. Learn. 2003; 52(1): 91–118. Publisher Full Text
7. James G, Witten D, Hastie T, et al.: An Introduction to Statistical Learning: With Applications in R.2013.
8. R: The R Project for Statistical Computing:Accessed February 25, 2022.Reference Source
9. Boileau P: Neatmaps: Heatmaps for Multiple Network Data. R Package v2.1.1.2019.
10. Boileau P: PhilBoileau/Neatmaps: F1000 Submission. Zenodo. 2022. Publisher Full Text
11. Wilkerson MD, Hayes DN: ConsensusClusterPlus: a class discovery tool with confidence assessments and item tracking. Bioinforma Oxf Engl. 2010; 26(12): 1572–1573. PubMed Abstract | Publisher Full Text
12. Demongeot J, Hansen O, Taramasco C: Discrete dynamics of contagious social diseases: Example of obesity. Virulence. 2016; 7(2): 129–140. PubMed Abstract | Publisher Full Text
13. Christakis NA, Fowler JH: The Spread of Obesity in a Large Social Network over 32 Years. N. Engl. J. Med. 2007; 357(4): 370–379. Publisher Full Text
14. Hill AL, Rand DG, Nowak MA, et al.: Infectious Disease Modeling of Social Contagion in Networks. PLoS Comput. Biol. 2010; 6(11): e1000968. PubMed Abstract | Publisher Full Text
15. Hruby A, Hu FB: The Epidemiology of Obesity: A Big Picture. PharmacoEconomics. 2015; 33(7): 673–689. PubMed Abstract | Publisher Full Text
16. August GP, Caprio S, Fennoy I, et al.: Prevention and Treatment of Pediatric Obesity: An Endocrine Society Clinical Practice Guideline Based on Expert Opinion. J. Clin. Endocrinol. Metab. 2008; 93(12): 4576–4599. PubMed Abstract | Publisher Full Text
17. Lambert M, Van Hulst A, O’Loughlin J, et al.: Cohort profile: the Quebec adipose and lifestyle investigation in youth cohort. Int. J. Epidemiol. 2012; 41(6): 1533–1544. PubMed Abstract | Publisher Full Text
18. Matzat U, Snijders CCP:The online measurement of ego centered online social networks.Welker M, Wenzel O, editors. Online-Forschung 2007: Grundlagen Und Fallstudien. Neue Schriften zur Online-Forschung:Herbert von Halem Verlag;2007; 274–294.
19. Brunet J, Sabiston CM, O’Loughlin J, et al.: Perceived parental social support and moderate-to-vigorous physical activity in children at risk of obesity. Res. Q. Exerc. Sport. 2014; 85(2): 198–207. PubMed Abstract | Publisher Full Text
20. Ybarra M, Barnett TA, Yu J, et al.: Personal Social Networks and Adiposity in Adolescents: A Feasibility Study. Child Obes. Print. 2021; 17(8): 542–550. Publisher Full Text
21. de Onis M , Onyango AW, Borghi E, et al.: Development of a WHO growth reference for school-aged children and adolescents. Bull. World Health Organ. 2007; 85(9): 660–667. Publisher Full Text
22. Burt R: Structural Holes and Good Ideas. Am. J. Sociol. 2004; 110: 349–399. Publisher Full Text
23. Moody J, White DR: Structural Cohesion and Embeddedness: A Hierarchical Concept of Social Groups. Am. Sociol. Rev. 2003; 68(1): 103–127. Publisher Full Text
24. Latora V, Marchiori M: Efficient Behavior of Small-World Networks. Phys. Rev. Lett. 2001; 87(19): 198701. Publisher Full Text
25. McPherson M, Smith-Lovin L, Cook JM: Birds of a Feather: Homophily in Social Networks. Annu. Rev. Sociol. 2001; 27(1): 415–444. Publisher Full Text
26. Hastie T, Tibshirani R, Friedman J: The Elements of Statistical Learning Data Mining, Inference, and Prediction. 2nd ed.New York, NY:Springer.
27. de la Haye K , Robins G, Mohr P, et al.: Obesity-related behaviors in adolescent friendship networks. Soc Netw. 2010; 32(3): 161–167. Publisher Full Text

Comments on this article Comments (0)

Version 1

VERSION 1 PUBLISHED 11 Jul 2022

Author details Author details

¹ Graduate Group in Biostatistics, School of Public Health, University of California, Berkeley, California, 94704, USA
² Department of Mathematics and Statistics, Concordia University, Montreal, Quebec, H3G 1M8, Canada
³ PERFORM Centre, Concordia University, Montreal, Quebec, H4B 1R6, Canada
⁴ Research Centre of CHU Sainte Justine, Université de Montréal, Montreal, Quebec, H3T 1C5, Canada
⁵ Department of Family Medicine, McGill University, Montreal, Quebec, H3S 1Z1, Canada
⁶ Department of Pediatrics, Université de Montréal, Montreal, Quebec, H3T 1C5, Canada

Philippe Boileau
Roles: Conceptualization, Formal Analysis, Methodology, Writing – Original Draft Preparation

Lisa Kakinami
Roles: Conceptualization, Methodology, Supervision, Writing – Review & Editing

Tracie Barnett
Roles: Data Curation, Funding Acquisition, Project Administration, Writing – Review & Editing

Mélanie Henderson
Roles: Data Curation, Funding Acquisition, Project Administration, Writing – Review & Editing

Lea Popovic
Roles: Conceptualization, Methodology, Supervision, Writing – Review & Editing

Competing interests

No competing interests were disclosed.

Grant information

The QUALITY cohort is funded by the Canadian Institutes of Health Research (#OHF-69442, #NMD-94067, #MOP-97853, #MOP-119512), the Heart and Stroke Foundation of Canada (#PG-040291) and the Fonds de la Recherche du Québec - Santé. Mélanie Henderson holds a Diabetes Junior Investigator Award from the Canadian Society of Endocrinology and Metabolism - AstraZeneca and a Fonds de Recherche du Québec - Santé Junior 2 salary awards and Lisa Kakinami holds a Junior 1 salary award from the latter institution. PB was supported by a scholarship from the Institut des sciences mathématiques.
The funders had no role in study design, data collection and analysis, decision to publish, or preparation of the manuscript.

Article Versions (1)

version 1

Published: 11 Jul 2022, 11:771

https://doi.org/10.12688/f1000research.108964.1

Copyright

© 2022 Boileau P et al. This is an open access article distributed under the terms of the Creative Commons Attribution License, which permits unrestricted use, distribution, and reproduction in any medium, provided the original work is properly cited.

Download

Export To

metrics

	Views	Downloads
F1000Research	-	-
PubMed Central Data from PMC are received and updated monthly.	-	-

Citations

0

SEE MORE DETAILS

CITE

how to cite this article

Boileau P, Kakinami L, Barnett T et al. Heatmaps and consensus clustering for ego network exploration [version 1; peer review: 2 approved with reservations]. F1000Research 2022, 11:771 (https://doi.org/10.12688/f1000research.108964.1)

NOTE: If applicable, it is important to ensure the information in square brackets after the title is included in all citations of this article.

track

receive updates on this article

Track an article to receive email alerts on any updates to this article.

Open Peer Review

Version 1

VERSION 1

PUBLISHED 11 Jul 2022

Views

4

Reviewer Report 15 Feb 2024

Jaya Sreevalsan-Nair, International Institute of Information Technology, Bangalore, India

Approved with Reservations

https://doi.org/10.5256/f1000research.120409.r240771

Summary:
This study determines the completeness of visual inferences from an ego network visualization using a combination of heatmaps and consensus clustering.

Comments:
1. While I understand the need for evaluating the combination of visualization ... Continue reading

Summary:
This study determines the completeness of visual inferences from an ego network visualization using a combination of heatmaps and consensus clustering.

Comments:
1. While I understand the need for evaluating the combination of visualization and analysis for the proposed study, it would be incorrect to say “Heatmaps are particularly convenient for depicting high-dimensional data ... their application in this setting has yet to be studied.” The reason why I say it is incorrect is that while the original data source is an ego network, the slice of data used in the heatmap is still a high-dimensional dataset. Composite visualizations combining views have also been formalized [Javed and Elmqvist 2012]. Hence, while this particular dataset may have some unique characteristics to test the combination of visualizations, the choice of visualizations itself is not unique or novel.

Javed, W. and Elmqvist, N., 2012, February. Exploring the design space of composite visualization. In 2012 IEEE Pacific Visualization Symposium (pp. 1-8). IEEE.

2. The paper is a bit difficult to read because of three points –

(a) The consensus clustering is used for symmetric matrices (square matrices) in Monti et al. (and in general), however here, it is used for a rectangular matrix. Usually, for rectangular matrices in bioinformatics applications, biclustering is used.

(b) The ego network of the QUALITY cohort is not explained initially, but when it is explained in the case study, we realize that the data is just high-dimensional data of the ego networks. This work has nothing to do with the ego network as such. The title “ego network exploration” is misleading and perhaps would be better with “comparison of ego networks”. Figure 1 is thus misleading.

(c) What is the transformation of the ego network? Is it the reordering of rows and columns? This needs to be explicitly mentioned.

3. I am assuming that Fig.3 is showing a subset of the 630x46 heatmap by choosing 35x41. Can this be clarified? Otherwise, it sounds like Fig.4 and later depicts a 35x41 dataset. How was the subset chosen?

4. Following up on #2(a), it is still not clear how clusters are computed. Typically Euclidean distance is computed between objects/data items using their feature vectors and this distance metric is used for clustering. Maybe it will help if the actual procedure is written out for the case study. Are the matrices in Fig.4 630x630 or 630x46? It appears to be the former, but the reader should not have to speculate.
After the speculation, the “Data Availability” section says it is of 35 participants. If that’s the case then, 35x46 is not a large dataset, which was the premise of this work. Please discuss these aspects.

5. Are the ordering of the rows and columns the same for the 3 different outcomes in Fig 4?

6. What is the resampling strategy used for consensus clustering, following Monti et al.?

Overall, the writing of the paper can be improved by straightforward descriptions – e.g. (a) There are two heatmaps used here – a rectangular one of item x variable, and a square matrix of item x item (for consensus clustering). (b) The clustering of ego networks essentially means clustering of the participants of the cohort based on their ego networks. Just stating that simplifies the understanding of this paper as clustering is typically done for generating levels of detail in visualization.

[I haven’t tested the code. I hope the issues pointed out by the previous reviewer are resolved.]

Minor suggestions:
1. Please mention “dendrogram” in the caption of Fig.2. Please also say what the color bar stands for, in the caption (i.e. min-max normalized value). From the caption, it is not clear which variable suggests “smallness” and performance (e.g. hp could also imply performance).
2. In “Data Source”, please avoid calling #subjects (or #ego-networks) and #variables, both as variable n.

Is the rationale for developing the new method (or application) clearly explained?

Partly
Is the description of the method technically sound?

Partly
Are sufficient details provided to allow replication of the method development and its use by others?

Yes
If any results are presented, are all the source data underlying the results available to ensure full reproducibility?

No source data required
Are the conclusions about the method and its performance adequately supported by the findings presented in the article?

Partly

Competing Interests: No competing interests were disclosed.

Reviewer Expertise: Data Visualization, Visual Analytics, Population Count Data Analysis, Geospatial Data Analysis, Analysis of Visualizations, Machine Learning/Deep Learning.

I confirm that I have read this submission and believe that I have an appropriate level of expertise to confirm that it is of an acceptable scientific standard, however I have significant reservations, as outlined above.

CITE

Report a concern

Respond or Comment

Views

17

Reviewer Report 29 Sep 2022

Anjalika Nande, Institute for Computational Medicine, Johns Hopkins University, Baltimore, MD, USA

Approved with Reservations

https://doi.org/10.5256/f1000research.120409.r150275

In this study, the authors explore whether heatmaps (produced using hierarchical clustering of variables) and consensus clustering can be used in tandem to better visualize and understand ego network data. They test this method on data from the QUALITY cohort ... Continue reading

In this study, the authors explore whether heatmaps (produced using hierarchical clustering of variables) and consensus clustering can be used in tandem to better visualize and understand ego network data. They test this method on data from the QUALITY cohort that aims to study the influence of an adolescent’s social network on their obesity outcomes. Their method led to 4 stable clusters and managed to successfully identify important associations between variables that have been previously identified in the literature. Their methodology is available as an R package. Overall, the exploratory data analysis method studied here seems to be a useful way to understand ego centered network data and I only have a few comments that may help to strengthen the manuscript.

Major comments:

I think this manuscript will benefit overall if more details are provided regarding the methods used to calculate the different alter and ego variables and the network measures. This will help the reader better interpret the results, and make the study more reproducible.

More information is needed about the alter and ego variables and the transformations that were used to get them in a format compatible with hierarchical clustering. Specifically, a brief description of each of the variables, and the methods used to obtain their summary statistic (in this case a measure of center) would be helpful. If possible, it might also be useful to include the general range of the variables prior to normalization. I think having a table in the Supplement with this information would suffice.
It would help if the normalized distributions of these variables were provided somewhere (maybe in the Supplement) to enable a better interpretation of the heatmap and clusters, and to provide more context to the statement in the second paragraph on page 8 : ‘However, upon closer inspection of the heatmap, this result may be due to the lack of variability in the distributions of the variables composing the cluster’. Related to this, at the beginning of the Results section I’d suggest adding a few words like: ‘For variable vector v of length n (where n is the number of ego networks)…’ to make it clearer that each variable is normalized between 0 and 1 across all the ego networks.
In the ‘Network measures’ paragraph on page 7, the authors mention the topological characteristics of the network that were computed. Along with the appropriate citations for these metrics, it would be helpful to also have a brief description of how they were calculated. This doesn’t have to be in the Main text; having it in a Supplementary Methods section also works.

Minor comments:

When I first tried to install the R code from Github following the instructions in the README file, install.packages(‘neatmaps’) gave a warning that the dependency ‘ConsensusClusterPlus’ was not available and so I couldn’t load the package after. Since ConsensusClusterPlus is a separate package that also needs to be installed for neatmaps to work, I would suggest adding this to the README.
It would help if the figure captions are more descriptive overall.
1. Figure 3: Include what each axis corresponds to and briefly describe what the colors mean.
2. Figure 4: Briefly explain what the colors mean and provide an explanation that supports the line ‘This evidence suggests that four relatively stable clusters are identified in the data’. Currently just looking at the figure and the caption isn’t enough to understand this statement.
3. Figure 5: Describe what the subplot on the right-hand side is showing, currently the caption only mentions the one on the left.
Line 9 page 6 ‘Depending on the goals of the EDA, consensus clustering may also be applied to the ego networks to identify clusters of similar networks’: I found this line confusing, can the authors clarify what they mean by identifying clusters of similar networks?

Is the rationale for developing the new method (or application) clearly explained?

Yes
Is the description of the method technically sound?

Yes
Are sufficient details provided to allow replication of the method development and its use by others?

Partly
If any results are presented, are all the source data underlying the results available to ensure full reproducibility?

Partly
Are the conclusions about the method and its performance adequately supported by the findings presented in the article?

Yes

Competing Interests: No competing interests were disclosed.

Reviewer Expertise: infectious disease dynamics, drug resistance, networks, evolutionary dynamics

I confirm that I have read this submission and believe that I have an appropriate level of expertise to confirm that it is of an acceptable scientific standard, however I have significant reservations, as outlined above.

CITE

Report a concern

Respond or Comment

Comments on this article Comments (0)

Version 1

VERSION 1 PUBLISHED 11 Jul 2022

Open Peer Review

Reviewer Status

Reviewer Reports

	Invited Reviewers
	1	2
Version 1 11 Jul 22	read	read

Anjalika Nande, Johns Hopkins University, Baltimore, USA
Jaya Sreevalsan-Nair, International Institute of Information Technology, Bangalore, India

Comments on this article

All Comments(0)

Add a comment

Sign up for content alerts

Browse by related subjects

Back to all reports

Reviewer Report

4 Views

15 Feb 2024 | for Version 1

Jaya Sreevalsan-Nair, International Institute of Information Technology, Bangalore, India

4 Views Cite this report Responses(0)

Approved With Reservations

Summary:
This study determines the completeness of visual inferences from an ego network visualization using a combination of heatmaps and consensus clustering.

Comments:
1. While I understand the need for evaluating the combination of visualization and analysis for the proposed study, it would be incorrect to say “Heatmaps are particularly convenient for depicting high-dimensional data ... their application in this setting has yet to be studied.” The reason why I say it is incorrect is that while the original data source is an ego network, the slice of data used in the heatmap is still a high-dimensional dataset. Composite visualizations combining views have also been formalized [Javed and Elmqvist 2012]. Hence, while this particular dataset may have some unique characteristics to test the combination of visualizations, the choice of visualizations itself is not unique or novel.

Javed, W. and Elmqvist, N., 2012, February. Exploring the design space of composite visualization. In 2012 IEEE Pacific Visualization Symposium (pp. 1-8). IEEE.

2. The paper is a bit difficult to read because of three points –

(a) The consensus clustering is used for symmetric matrices (square matrices) in Monti et al. (and in general), however here, it is used for a rectangular matrix. Usually, for rectangular matrices in bioinformatics applications, biclustering is used.

(b) The ego network of the QUALITY cohort is not explained initially, but when it is explained in the case study, we realize that the data is just high-dimensional data of the ego networks. This work has nothing to do with the ego network as such. The title “ego network exploration” is misleading and perhaps would be better with “comparison of ego networks”. Figure 1 is thus misleading.

(c) What is the transformation of the ego network? Is it the reordering of rows and columns? This needs to be explicitly mentioned.

3. I am assuming that Fig.3 is showing a subset of the 630x46 heatmap by choosing 35x41. Can this be clarified? Otherwise, it sounds like Fig.4 and later depicts a 35x41 dataset. How was the subset chosen?

4. Following up on #2(a), it is still not clear how clusters are computed. Typically Euclidean distance is computed between objects/data items using their feature vectors and this distance metric is used for clustering. Maybe it will help if the actual procedure is written out for the case study. Are the matrices in Fig.4 630x630 or 630x46? It appears to be the former, but the reader should not have to speculate.
After the speculation, the “Data Availability” section says it is of 35 participants. If that’s the case then, 35x46 is not a large dataset, which was the premise of this work. Please discuss these aspects.

5. Are the ordering of the rows and columns the same for the 3 different outcomes in Fig 4?

6. What is the resampling strategy used for consensus clustering, following Monti et al.?

Overall, the writing of the paper can be improved by straightforward descriptions – e.g. (a) There are two heatmaps used here – a rectangular one of item x variable, and a square matrix of item x item (for consensus clustering). (b) The clustering of ego networks essentially means clustering of the participants of the cohort based on their ego networks. Just stating that simplifies the understanding of this paper as clustering is typically done for generating levels of detail in visualization.

[I haven’t tested the code. I hope the issues pointed out by the previous reviewer are resolved.]

Minor suggestions:
1. Please mention “dendrogram” in the caption of Fig.2. Please also say what the color bar stands for, in the caption (i.e. min-max normalized value). From the caption, it is not clear which variable suggests “smallness” and performance (e.g. hp could also imply performance).
2. In “Data Source”, please avoid calling #subjects (or #ego-networks) and #variables, both as variable n.

Is the rationale for developing the new method (or application) clearly explained?

Partly
Is the description of the method technically sound?

Partly
Are sufficient details provided to allow replication of the method development and its use by others?

Yes
If any results are presented, are all the source data underlying the results available to ensure full reproducibility?

No source data required
Are the conclusions about the method and its performance adequately supported by the findings presented in the article?

Partly

Competing Interests

No competing interests were disclosed.

Reviewer Expertise

Data Visualization, Visual Analytics, Population Count Data Analysis, Geospatial Data Analysis, Analysis of Visualizations, Machine Learning/Deep Learning.

I confirm that I have read this submission and believe that I have an appropriate level of expertise to confirm that it is of an acceptable scientific standard, however I have significant reservations, as outlined above.

Respond to this report

Responses (0)

Back to all reports

Reviewer Report

17 Views

29 Sep 2022 | for Version 1

Anjalika Nande, Institute for Computational Medicine, Johns Hopkins University, Baltimore, MD, USA

17 Views Cite this report Responses(0)

Approved With Reservations

In this study, the authors explore whether heatmaps (produced using hierarchical clustering of variables) and consensus clustering can be used in tandem to better visualize and understand ego network data. They test this method on data from the QUALITY cohort that aims to study the influence of an adolescent’s social network on their obesity outcomes. Their method led to 4 stable clusters and managed to successfully identify important associations between variables that have been previously identified in the literature. Their methodology is available as an R package. Overall, the exploratory data analysis method studied here seems to be a useful way to understand ego centered network data and I only have a few comments that may help to strengthen the manuscript.

Major comments:

I think this manuscript will benefit overall if more details are provided regarding the methods used to calculate the different alter and ego variables and the network measures. This will help the reader better interpret the results, and make the study more reproducible.

More information is needed about the alter and ego variables and the transformations that were used to get them in a format compatible with hierarchical clustering. Specifically, a brief description of each of the variables, and the methods used to obtain their summary statistic (in this case a measure of center) would be helpful. If possible, it might also be useful to include the general range of the variables prior to normalization. I think having a table in the Supplement with this information would suffice.
It would help if the normalized distributions of these variables were provided somewhere (maybe in the Supplement) to enable a better interpretation of the heatmap and clusters, and to provide more context to the statement in the second paragraph on page 8 : ‘However, upon closer inspection of the heatmap, this result may be due to the lack of variability in the distributions of the variables composing the cluster’. Related to this, at the beginning of the Results section I’d suggest adding a few words like: ‘For variable vector v of length n (where n is the number of ego networks)…’ to make it clearer that each variable is normalized between 0 and 1 across all the ego networks.
In the ‘Network measures’ paragraph on page 7, the authors mention the topological characteristics of the network that were computed. Along with the appropriate citations for these metrics, it would be helpful to also have a brief description of how they were calculated. This doesn’t have to be in the Main text; having it in a Supplementary Methods section also works.

Minor comments:

When I first tried to install the R code from Github following the instructions in the README file, install.packages(‘neatmaps’) gave a warning that the dependency ‘ConsensusClusterPlus’ was not available and so I couldn’t load the package after. Since ConsensusClusterPlus is a separate package that also needs to be installed for neatmaps to work, I would suggest adding this to the README.
It would help if the figure captions are more descriptive overall.
1. Figure 3: Include what each axis corresponds to and briefly describe what the colors mean.
2. Figure 4: Briefly explain what the colors mean and provide an explanation that supports the line ‘This evidence suggests that four relatively stable clusters are identified in the data’. Currently just looking at the figure and the caption isn’t enough to understand this statement.
3. Figure 5: Describe what the subplot on the right-hand side is showing, currently the caption only mentions the one on the left.
Line 9 page 6 ‘Depending on the goals of the EDA, consensus clustering may also be applied to the ego networks to identify clusters of similar networks’: I found this line confusing, can the authors clarify what they mean by identifying clusters of similar networks?

Is the rationale for developing the new method (or application) clearly explained?

Yes
Is the description of the method technically sound?

Yes
Are sufficient details provided to allow replication of the method development and its use by others?

Partly
If any results are presented, are all the source data underlying the results available to ensure full reproducibility?

Partly
Are the conclusions about the method and its performance adequately supported by the findings presented in the article?

Yes

Competing Interests

No competing interests were disclosed.

Reviewer Expertise

infectious disease dynamics, drug resistance, networks, evolutionary dynamics

I confirm that I have read this submission and believe that I have an appropriate level of expertise to confirm that it is of an acceptable scientific standard, however I have significant reservations, as outlined above.

Respond to this report

Responses (0)

[1] 1. Perer A, Shneiderman B: Integrating Statistics and Visualization for Exploratory Power: From Long-Term Case Studies to Design Guidelines. IEEE Comput. Graph. Appl. 2009; 29(3): 39–51. PubMed Abstract | Publisher Full Text

[2] 2. Newman M: Networks: An Introduction. Oxford University Press;2010.

[3] 3. Gehlenborg N, Wong B: Networks. Nat. Methods. 2012; 9(2): 115–115. Publisher Full Text

[4] 4. Gehlenborg N, Wong B: Heat maps. Nat. Methods. 2012; 9(3): 213–213. Publisher Full Text

[5] 5. Galili T, O’Callaghan A, Sidi J, et al.: heatmaply: an R package for creating interactive cluster heatmaps for online publishing. Bioinformatics. 2018; 34(9): 1600–1602. PubMed Abstract | Publisher Full Text

[6] 6. Monti S, Tamayo P, Mesirov J, et al.: Consensus Clustering: A Resampling-Based Method for Class Discovery and Visualization of Gene Expression Microarray Data. Mach. Learn. 2003; 52(1): 91–118. Publisher Full Text

[7] 7. James G, Witten D, Hastie T, et al.: An Introduction to Statistical Learning: With Applications in R.2013.

[8] 8. R: The R Project for Statistical Computing:Accessed February 25, 2022.Reference Source

[9] 9. Boileau P: Neatmaps: Heatmaps for Multiple Network Data. R Package v2.1.1.2019.

[10] 10. Boileau P: PhilBoileau/Neatmaps: F1000 Submission. Zenodo. 2022. Publisher Full Text

[11] 11. Wilkerson MD, Hayes DN: ConsensusClusterPlus: a class discovery tool with confidence assessments and item tracking. Bioinforma Oxf Engl. 2010; 26(12): 1572–1573. PubMed Abstract | Publisher Full Text

[12] 12. Demongeot J, Hansen O, Taramasco C: Discrete dynamics of contagious social diseases: Example of obesity. Virulence. 2016; 7(2): 129–140. PubMed Abstract | Publisher Full Text

[13] 13. Christakis NA, Fowler JH: The Spread of Obesity in a Large Social Network over 32 Years. N. Engl. J. Med. 2007; 357(4): 370–379. Publisher Full Text

[14] 14. Hill AL, Rand DG, Nowak MA, et al.: Infectious Disease Modeling of Social Contagion in Networks. PLoS Comput. Biol. 2010; 6(11): e1000968. PubMed Abstract | Publisher Full Text

[15] 15. Hruby A, Hu FB: The Epidemiology of Obesity: A Big Picture. PharmacoEconomics. 2015; 33(7): 673–689. PubMed Abstract | Publisher Full Text

[16] 16. August GP, Caprio S, Fennoy I, et al.: Prevention and Treatment of Pediatric Obesity: An Endocrine Society Clinical Practice Guideline Based on Expert Opinion. J. Clin. Endocrinol. Metab. 2008; 93(12): 4576–4599. PubMed Abstract | Publisher Full Text

[17] 17. Lambert M, Van Hulst A, O’Loughlin J, et al.: Cohort profile: the Quebec adipose and lifestyle investigation in youth cohort. Int. J. Epidemiol. 2012; 41(6): 1533–1544. PubMed Abstract | Publisher Full Text

[18] 18. Matzat U, Snijders CCP:The online measurement of ego centered online social networks.Welker M, Wenzel O, editors. Online-Forschung 2007: Grundlagen Und Fallstudien. Neue Schriften zur Online-Forschung:Herbert von Halem Verlag;2007; 274–294.

[19] 19. Brunet J, Sabiston CM, O’Loughlin J, et al.: Perceived parental social support and moderate-to-vigorous physical activity in children at risk of obesity. Res. Q. Exerc. Sport. 2014; 85(2): 198–207. PubMed Abstract | Publisher Full Text

[20] 20. Ybarra M, Barnett TA, Yu J, et al.: Personal Social Networks and Adiposity in Adolescents: A Feasibility Study. Child Obes. Print. 2021; 17(8): 542–550. Publisher Full Text

[21] 21. de Onis M , Onyango AW, Borghi E, et al.: Development of a WHO growth reference for school-aged children and adolescents. Bull. World Health Organ. 2007; 85(9): 660–667. Publisher Full Text

[22] 22. Burt R: Structural Holes and Good Ideas. Am. J. Sociol. 2004; 110: 349–399. Publisher Full Text

[23] 23. Moody J, White DR: Structural Cohesion and Embeddedness: A Hierarchical Concept of Social Groups. Am. Sociol. Rev. 2003; 68(1): 103–127. Publisher Full Text

[24] 24. Latora V, Marchiori M: Efficient Behavior of Small-World Networks. Phys. Rev. Lett. 2001; 87(19): 198701. Publisher Full Text

[25] 25. McPherson M, Smith-Lovin L, Cook JM: Birds of a Feather: Homophily in Social Networks. Annu. Rev. Sociol. 2001; 27(1): 415–444. Publisher Full Text

[26] 26. Hastie T, Tibshirani R, Friedman J: The Elements of Statistical Learning Data Mining, Inference, and Prediction. 2nd ed.New York, NY:Springer.

[27] 27. de la Haye K , Robins G, Mohr P, et al.: Obesity-related behaviors in adolescent friendship networks. Soc Netw. 2010; 32(3): 161–167. Publisher Full Text

Heatmaps and consensus clustering for ego network exploration

Abstract

Keywords

Introduction

Figure 1. Two node-link diagram representations of a randomly generated network.

Figure 2. A heatmap of select cars from 1974 and their various characteristics, from the mtcars R data set.

Methods

Exploratory method

Figure 3. Heatmap of the QUALITY ego network data.

Software implementation

Case study

Data source

Measures

Network measures

Results

Figure 4. The consensus matrices of consensus clustering with three, four and five clusters.

Figure 5. The ECDFs of the consensus matrices.

Table 1. Cluster contents, four clusters.

Discussion

Conclusions

Data availability

Underlying data

Software availability

Author’s contributions

Acknowledgements

References

Comments on this article Comments (0)

Open Peer Review

Comments on this article Comments (0)

Open Peer Review

Reviewer Status

Reviewer Reports

Comments on this article

Browse by related subjects

Competing Interests Policy

Stay Updated