ALL Metrics
-
Views
-
Downloads
Get PDF
Get XML
Cite
Export
Track
Research Article

A curated collection of transcriptome datasets to study the transcriptional response in blood and nasal samples following viral respiratory inoculation and vaccination

[version 1; peer review: awaiting peer review]
PUBLISHED 13 May 2025
Author details Author details
OPEN PEER REVIEW
REVIEWER STATUS AWAITING PEER REVIEW

Abstract

Background

Our understanding of the human immune system’s response to viral respiratory tract infections (VRTIs) and vaccines, including the molecular mechanisms and correlates of protection, remains incomplete. Extensive transcriptomic data from inoculation and vaccination studies have been deposited in publicly available databases. However, these studies are often separate and difficult to locate.

Methods

To bridge this research gap, we have systematically searched and reviewed publicly available datasets from NCBI Gene Expression Omnibus (GEO), Archive of Functional Genomics Data (ArrayExpress), Immunology Database and Analysis Portal (ImmPort), and Google using targeted queries. Inoculation study queries included terms related to humans, blood, PBMCs, nasal, respiratory challenges, and respiratory viral inoculations; while vaccination study queries focused on humans, blood, PBMCs, transcriptomes, and respiratory viral vaccines or vaccinations.

Results

This collection includes 18 datasets from inoculation of 5 respiratory viruses: H1N1, H3N2, RSV, HRV, and SARS-CoV-2 (14 from blood and 4 from nasal swabs) with 429 participants (ages 18 to 55 years) and 37 datasets from vaccination of influenza and COVID-19 with 2,084 participants (ages 0.5 to over 89 years). The duration and number of post-immunization time points range from 14 days before to 28 days after inoculation (1 to 20 time points) and from 28 days before to 360 days after vaccination (1 to 13 time points).

Conclusion

We provide a curated compendium of public gene expression data repositories for researchers to reanalyze transcriptomes from human whole blood, peripheral blood mononuclear cells (PBMCs), and nasal swab samples. This will facilitate studies of transcriptional responses to respiratory viral inoculation or vaccination.

Keywords

Transcriptomics, Bioinformatics, Vaccination, Inoculation, Respiratory viral infection, Influenza viruses, COVID, Respiratory syncytial viruses (RSV), Human rhinoviruses (HRV) Whole Blood, PBMC.

Introduction

Viral respiratory tract infections (VRTIs) are widespread globally, causing significant illnesses and hospitalizations each year.1 The importance of studying VRTIs has grown in light of the global COVID-19 pandemic.2 VRTIs involve a range of viruses, including respiratory syncytial viruses (RSV), human rhinoviruses (HRV), and influenza and coronaviruses.3 Most VRTIs lack both effective antiviral therapies and approved vaccines.4

In-depth investigation of human immune responses to inoculation and vaccination is critically needed. To facilitate further research, we have compiled a curated dataset collection, which includes transcriptomic data from the blood and nasal swabs of volunteers who underwent inoculation and vaccination, specifically for influenza, COVID-19, RSV, and HRV.

Our curated dataset collection is divided into two categories: inoculation studies and vaccination studies. The inoculation collection includes 18 datasets with 429 participants, while the vaccination collection comprises 37 datasets with 2,084 participants. All datasets were sourced from GEO, ImmPort Shared Data, and the ArrayExpress Collection at EMBL-EBI.57 These datasets have been organized into a publicly accessible format for download and further analysis.

Methods

The datasets were gathered using specific search queries across GEO, ArrayExpress, ImmPort, and Google search. The queries for inoculation studies included terms related to humans, blood, PBMCs, nasal, respiratory challenges, and viral inoculations. The vaccination study queries followed similar criteria, focusing on humans, blood, PBMCs, transcriptomes, and vaccines or vaccinations. The search queries used for GEO, ArrayExpress, and Google search to find inoculation studies are as follows:

GEO and ArrayExpress Inoculation Search Query

(“humans”[MeSH Terms] OR “Homo sapiens”[Organism] OR Homo sapiens [All Fields]) AND ((“blood”[Subheading] OR “blood”[MeSH Terms] OR blood [All Fields]) OR PBMC [All Fields] OR PBMCs [All Fields]) AND respiratory [All Fields] AND (challenge [All Fields] OR “experimentally infected”[All Fields] OR inoculated [All Fields])

Google Inoculation Search Query

Homo sapiens AND (blood OR PBMC OR PBMCs) AND transcriptome AND respiratory AND (inoculation OR challenge OR “experimentally infected” OR inoculated)

The search queries used for GEO, ArrayExpress, and Google to find vaccination studies are as follows:

GEO and ArrayExpress Vaccine Search Query

(“humans”[MeSH Terms] OR “Homo sapiens”[Organism] OR Homo sapiens [All Fields]) AND ((“blood”[Subheading] OR “blood”[MeSH Terms] OR blood [All Fields]) OR PBMC [All Fields] OR PBMCs [All Fields]) AND respiratory [All Fields] AND ((“vaccination”[MeSH Terms] OR inoculation [All Fields]) OR (“vaccines”[MeSH Terms] OR vaccine [All Fields]) OR (“vaccines”[MeSH Terms] OR vaccines [All Fields]) OR (“vaccination”[MeSH Terms] OR vaccination [All Fields]))

Google Vaccine Search Query

Homo sapiens AND (blood OR PBMC OR PBMCs) AND transcriptome AND respiratory AND (vaccine OR vaccines OR vaccination)

For ImmPort, vaccine studies were located using the following filters:

  • Species: Homo sapiens

  • Research Focus: Vaccine Response

  • Condition or Disease: COVID-19 AND Influenza

Results

The majority of datasets obtained were derived from human blood, PBMCs, and nasal swabs, generated using Illumina or Affymetrix platforms or through RNA sequencing. Every dataset identified through this search was carefully curated manually. This involved thoroughly reviewing the dataset descriptions, examining the study designs, and reading through the related original articles on PubMed. Ultimately, we only included studies that involved human whole blood, PBMCs, and nasal swabs linked to VRTI inoculation or vaccination for our dataset collection. According to these criteria, we retained 18 inoculation datasets and 37 vaccine-related datasets.

The data selection process for inoculation is summarized in Figure 1a. Using GEO and ArrayExpress inoculation query, 108 studies were found with Homo sapiens as the primary organism. Google search returned 92 inoculation studies. The data selection process for vaccination is summarized in Figure 1b. GEO and ArrayExpress Vaccine query identified 82 studies where Homo sapiens is the top organism. Google search produced 89 vaccination studies. ImmPort search yielded 99 vaccination studies.

dd5044fb-69fb-4a66-83f6-d0b70bd12cab_figure1.gif

Figure 1. Flowchart illustrating construction of inoculation (a) and vaccination (b) dataset compendium.

Figure 2 illustrates the distribution of participants over time in 18 inoculation studies, involving 429 participants aged 18 to 55 years, across five respiratory viruses: H1N1, H3N2, RSV, HRV, and SARS-CoV-2. There is a notable peak on day 0. The duration and number of post-immunization time points range from 14 days before to 28 days after inoculation, spanning 1 to 20 time points. Notably, 56 participants from four studies had both blood and nasal samples, and they were counted separately. Subsequent time points show smaller but consistent sample collections, with varying contributions from different pathogens. H1N1, H3N2, and HRV are the most frequently sampled pathogens, while SARS-CoV-2 and RSV contribute fewer samples to the dataset.

dd5044fb-69fb-4a66-83f6-d0b70bd12cab_figure2.gif

Figure 2. A collection of datasets covering transcriptional responses to inoculation over time across varying viruses.

Histogram of the time points pre- and post-inoculation available in our compendium. Each virus is indicated by a different color. The height of the bars represents the number of participants with available gene expression data.

Figure 3 summarizes the distribution of 2,084 participants, aged 0.5 to over 89 years, over time in 37 vaccination studies covering COVID-19 and influenza vaccines and their types. The peak occurred on day 0, with participant recruitment enriched during the first week. The duration and number of post-immunization time points range from 28 days before to 360 days after vaccination, spanning 1 to 13 time points.

dd5044fb-69fb-4a66-83f6-d0b70bd12cab_figure3.gif

Figure 3. A collection of datasets covering transcriptional responses to vaccination over time across COVID-19 and influenza.

Histogram of the time points pre- and post-vaccination is available in our compendium. Each vaccine and type are indicated by a different color. The height of the bars represents the number of participants with available gene expression data.

The complete list of inoculation datasets included in our collection can be found in Table 1.

Table 1. Complete list of inoculation datasets.

Pathogen Sample typeSample sizeProfiling platformTime pointAccession number Citation
HRVPeripheral blood50Microarray (Affymetrix)-0.88, 0, 0.21, 0.5, 0.88, 1.21, 1.5, 1.88, 2.21, 2.5, 2.88, 3.21, 3.5, 3.88, 4.21, 4.5, 4.92, 5.21, 5.92, 6.92GSE730728
RSV20
H3N238
H1N143
HRVPeripheral blood20Microarray (Affymetrix)0GSE171569
RSV20
H3N217
H3N2Peripheral blood17Microarray (Affymetrix)0, 0.21, 0.50, 0.88, 1.21, 1.50, 1.88, 2.21, 2.50, 2.88, 3.21, 3.50, 3.88, 4.21, 4.50GSE3055010
H3N2Whole blood11Microarray (Illumina)0,0.50,1,2GSE6175411
SARS-CoV-2Blood and nose swabs36RNA-seq 0, 0.25, 14, 28(blood) 0,1,3,5,7,10,14(nose swabs)E-MTAB-1299312
H3N2Blood and nose swabs20RNA-seq 0, 1, 2, 3, 7, 10, 14, and 28(blood) 0,1, 2,3,7,14(nose swabs)EGAD50000000956
HRVNose swabs17Microarray (Affymetrix)-14,0.33, 2.00GSE1134813
H1N1Whole blood21Microarray (Illumina)0, 1, 2, 3, 4GSE9073214
RSVNose swabs58RNA-seq 0,3GSE15523715
H1N1PBMC24Microarray (Affymetrix)0, 0.21, 0.50, 0.90, 1.21, 1.50, 1.90, 2.21, 2.50, 2.90, 3.21, 3.50, 3.90, 4.21, 4.50GSE5242816
H3N217

We have collected transcriptomes from 18 cohorts across 10 inoculation studies, totaling 429 participants. This collection covers inoculations of five respiratory viruses: H1N1, H3N2, RSV, HRV, and SARS-CoV-2. The time span ranges from -14 to 28 days before and post for inoculation. The sample source includes blood (n=14) and nasal swabs (n=4). The transcriptomic data types include microarrays from Illumina (n=2) and Affy (n=11); bulk RNA-seq (n=5) ( Figure 4).

dd5044fb-69fb-4a66-83f6-d0b70bd12cab_figure4.gif

Figure 4. Distribution of inoculation datasets across different profiling platforms.

The complete list of vaccination datasets included in our collection can be found in Table 2.

Table 2. Complete list of vaccine datasets.

PathogenVaccine type Sample type Sample size Profiling platformTime point Accession number Citation
InfluenzaTIVPBMC172Microarray (Illumina)0,2,7,28GSE10799017
InfluenzaLAIVWhole blood20Microarray (Illumina)0,1,7,30GSE5200518
InfluenzaTIVWhole blood17Microarray (Illumina)0,1,7,30
InfluenzaLAIVPBMC28Microarray (Affymetrix)0,3,7GSE2961919
InfluenzaTIVPBMC28Microarray (Affymetrix)0,3,7
InfluenzaLAIV and inactivatedBlood44RNA-seq 0,3,7,29,85,92,113GSE21777020
InfluenzainactivatedPBMC50RNA-seq 0,1,3,7,21,22,24,28GSE10201221
InfluenzaLAIVNasal epithelium40RNA-seq 0,3GSE23049422
InfluenzaLAIVNasal cells55RNA-seq 0,5,12GSE11758023
InfluenzaTIVNasal cells62RNA-seq 0,5,12
Influenzavetor viral virusWhole blood11Microarray (Illumina)0,0.5,1,2GSE6175411
InfluenzaInactivatedPBMC14RNA-seq -7,0,1,2,3,4,5,6,7,8,9,10,21GSE4576424
InfluenzaInactivatedWhole blood18Microarray (Illumina)-7,0,1,3,7,10,14,21,28GSE30101/GSE4876225
InfluenzaInactivatedPBMC212Microarray (Affymetrix)0,3,7,14GSE7481726
InfluenzaInactivatedWhole blood247Microarray (Illumina)0,1,3,14GSE48024
InfluenzaInactivatedWhole blood91Microarray (Illumina)0GSE41080
InfluenzaInactivatedPBMC5RNA-seq 0,1,2,3,4,5,6,7,8,9,10GSE45735
InfluenzaInactivatedPBMC60Microarray (Illumina)0,2,4,7,28GSE59743/GSE95584
InfluenzaInactivatedPBMC27Microarray (Affymetrix)0,3,7GSE29617/GSE29614
InfluenzaInactivatedPBMC64Microarray (Illumina)0,2,4,7,28GSE59654
InfluenzaInactivatedWhole blood51Microarray (Illumina)0,2,7,28GSE101709
InfluenzaInactivatedPBMC42Microarray (Illumina)0,4,7,28GSE59635
InfluenzaInactivatedWhole blood44Microarray (Illumina)0,2,7,28GSE101710
InfluenzaInactivatedPBMC63Microarray (Affymetrix)-7,0,1,7,70GSE47353
InfluenzaLAIVPBMC28Microarray (Affymetrix)0,3,7GSE29615
influenzaInactivatedWhole blood123Microarray (Affymetrix)−7,0,1,3,7,10,14,21,28SDY311/SDY312/SDY314/SDY315/SDY11227
InfluenzaTIVWhole blood34Microarray (Illumina) 0,7,14SDY272/SDY648/SDY739/SDY819/SDY62228
influenzaTIVWhole blood65RNA-seq 0,2,7,28SDY139329
InfluenzaTIV Whole blood6RNA-seq 1,3,60SDY300*
COVIDmRNAPBMC16scRNA0,7,21,28GSE24791730
COVIDmRNABlood23RNA-seq 0,1,2,3,4,5,6,7,8,9GSE19000131
COVIDmRNAPBMC214RNA-seq 0,22,90,180,360GSE22068232
COVIDmRNABlood6scRNA0,1,2,7,21,22,28,42GSE17196433
COVIDmRNABlood56RNA-seq 0,1,7,21,22,28GSE169159
COVIDmRNAPBMC8scRNA28,35,60,110,201GSE19567334
COVIDmRNAPBMC4scRNA0,7,30,90GSE21022935
COVIDvetor viral virusWhole blood36RNA-seq 0,50,57GSE22884236

* This data is available at ImmPort (https://immport.org/shared/home) under study accession SDY300: Healthy Human DC and monocyte subsets transcriptional regulations in response to Fluzone 2010-2011 and pneumococcal vaccinations.

We have gathered transcriptomes from 37 cohorts across 22 vaccine studies, totaling 2084 participants. This collection focuses on studying Influenza and COVID-19 vaccines. The time span ranges from -28 to 360 days. The data source includes blood (n=34) and nasal swabs (n=3). The transcriptomic data types include microarrays from Illumina (n=13) and Affy (n=7); bulk RNA-seq (n=13) and scRNA-seq (n=4) ( Figure 5). Vaccine types: TIV (n=22) and LAIV (n=6) were used for influenza vaccination, mRNA vaccine (n=7) was used for COVID-19 vaccination, and vector adenovirus (n=2) was used for both influenza and COVID-19, one of each.

dd5044fb-69fb-4a66-83f6-d0b70bd12cab_figure5.gif

Figure 5. Distribution of vaccine datasets across different profiling platforms.

Ethics and consent

Ethical approval and consent were not required.

Comments on this article Comments (0)

Version 1
VERSION 1 PUBLISHED 13 May 2025
Comment
Author details Author details
Competing interests
Grant information
Copyright
Download
 
Export To
metrics
Views Downloads
F1000Research - -
PubMed Central
Data from PMC are received and updated monthly.
- -
Citations
CITE
how to cite this article
Lan Z, Zhang T, Zhang Y et al. A curated collection of transcriptome datasets to study the transcriptional response in blood and nasal samples following viral respiratory inoculation and vaccination [version 1; peer review: awaiting peer review]. F1000Research 2025, 14:493 (https://doi.org/10.12688/f1000research.162267.1)
NOTE: If applicable, it is important to ensure the information in square brackets after the title is included in all citations of this article.
track
receive updates on this article
Track an article to receive email alerts on any updates to this article.

Open Peer Review

Current Reviewer Status:
AWAITING PEER REVIEW
AWAITING PEER REVIEW
?
Key to Reviewer Statuses VIEW
ApprovedThe paper is scientifically sound in its current form and only minor, if any, improvements are suggested
Approved with reservations A number of small changes, sometimes more significant revisions are required to address specific details and improve the papers academic merit.
Not approvedFundamental flaws in the paper seriously undermine the findings and conclusions

Comments on this article Comments (0)

Version 1
VERSION 1 PUBLISHED 13 May 2025
Comment
Alongside their report, reviewers assign a status to the article:
Approved - the paper is scientifically sound in its current form and only minor, if any, improvements are suggested
Approved with reservations - A number of small changes, sometimes more significant revisions are required to address specific details and improve the papers academic merit.
Not approved - fundamental flaws in the paper seriously undermine the findings and conclusions
Sign In
If you've forgotten your password, please enter your email address below and we'll send you instructions on how to reset your password.

The email address should be the one you originally registered with F1000.

Email address not valid, please try again

You registered with F1000 via Google, so we cannot reset your password.

To sign in, please click here.

If you still need help with your Google account password, please click here.

You registered with F1000 via Facebook, so we cannot reset your password.

To sign in, please click here.

If you still need help with your Facebook account password, please click here.

Code not correct, please try again
Email us for further assistance.
Server error, please try again.