Expected immune recognition of COVID-19 virus by memory from earlier infections with common coronaviruses in a large part of the world population

SARS-CoV-2 is the coronavirus agent of the COVID-19 pandemic causing high mortalities. In contrast, the widely spread human coronaviruses OC43, HKU1, 229E, and NL63 tend to cause only mild symptoms. The present study shows, by in silico analysis, that these common human viruses are expected to induce immune memory against SARS-CoV-2 by sharing protein fragments (antigen epitopes) for presentation to the immune system by MHC class I. A list of such epitopes is provided. The number of these epitopes and the prevalence of the common coronaviruses suggest that a large part of the world population has some degree of specific immunity against SARS-CoV-2 already, even without having been infected by that virus. For inducing protection, booster vaccinations enhancing existing immunity are less demanding than primary vaccinations against new antigens. Therefore, for the discussion on vaccination strategies against COVID-19, the available immune memory against related viruses should be part of the consideration.


SARS-CoV-2 and other human coronaviruses
From the end of 2019, the world experienced the coronavirus disease 2019  pandemic caused by the emerging severe acute respiratory syndrome coronavirus 2 (SARS-CoV-2; aka 2019 novel coronavirus or 2019-nCoV). SARS-CoV-2 shares ~80% nucleotide identity with SARS-CoV-1 (aka SARS-CoV), the causative agent of the SARS epidemy from 2002, and is even more similar to some coronaviruses in bats (Andersen et al., 2020;Ceraolo & Giorgi, 2020;Wu et al., 2020;Zhou et al., 2020). Coronaviruses are membrane-enveloped positivestrand RNA viruses with, for an RNA virus, a large genome of ~30 kb. That genome encodes several structural components of the virion including the nucleocapsid protein N and the membrane proteins S (spike), M, and E, plus also a number of nonstructural proteins involved in RNA replication and otherpartly unknown-functions (Weiss & Navas-Martin, 2005). The coronaviruses infecting humans belong to the serological/ phylogenetic clades group I (alphacoronaviruses) and group II (betacoronaviruses); group I includes HCoV-229E (human coronavirus 229E) and HCoV-NL63, while group II includes SARS-CoV-1, SARS-CoV-2, Middle East respiratory syndrome coronavirus (MERS-CoV), HCoV-OC43, and HCoV-HKU1. The viruses SARS-CoV-1 and MERS-CoV, on average, cause the most severe symptoms, and their outbreaks were successfully monitored and halted. At the other end of the spectrum, the viruses HCoV-229E, HCoV-NL-63, HCoV-OC43, and HCoV-HKU1 tend to cause only mild symptoms and are very common.
Prevalence and associated disease of the common human coronaviruses 229E, NL63, OC43, and HKU1 The Centers for Disease Control and Prevention (CDC; https:// www.cdc.gov/coronavirus/general-information.html) states: "Common human coronaviruses, including types 229E, NL63, OC43, and HKU1, usually cause mild to moderate upper-respiratory tract illnesses, like the common cold. Most people get infected with one or more of these viruses at some point in their lives." The same agency lists the common symptoms caused by these viruses as runny nose, sore throat, headache, fever, cough, and general feeling of being unwell, but also explains that they occasionally cause lower-respiratory tract illnesses, such as pneumonia or bronchitis. The viruses 229E and OC43 have been known since the 1960s (reviewed in Kahn & McIntosh, 2005), but NL63 (van der Hoek et al., 2004) and HKU1 (Woo et al., 2005) were only (conclusively) identified following the rise in interest in coronaviruses in the wake of the SARS epidemy. These common coronaviruses are believed to be the second most common cause of the common cold (Mäkelä et al., 1998). In the U.S.A., a 3-year RT-PCR surveillance of respiratory samples of patients revealed that the four viruses 229E, NL63, OC43, and HKU1 were present at levels varying by season and region, with all individual viruses peaking at >3% prevalence in each investigated region (Midwest, Northeast, South, West); co-infection with other coronaviruses was found in only ~2% of infected cases, but co-infection with another respiratory virus was found in a substantial ~30% of infected cases (Killerby et al., 2018). This pattern was reminiscent of findings in the United Kingdom (Gaunt et al., 2010) and Japan (Matoba et al., 2015). Serological investigations in countries as diverse as the U.S.A. (Bradburne & Somerset, 1972;Dijkman et al., 2012), China (Zhou et al., 2013), and Qatar (Al Kahlout et al., 2019), found that most healthy blood donors had antibodies against coronaviruses, supporting that these viruses are widespread indeed.
Since immune memory protection can be induced by related pathogens, as exemplified by the eradication of human smallpox virus (Variola) by immunization with a related "cowpox" virus (Vaccinia) (Plotkin & Plotkin, 2018), it is interesting to consider whether common human coronavirus infections may have induced some level of protection against SARS-CoV-2.
The possibility of matching linear epitopes between SARS-CoV-2 and the common human coronaviruses that may stimulate the immune system through MHC class I presentation The major arms of immune memory concern antibody secretion by B cells, killing of infected cells by CD8 + T cells, and helper/regulatory immune activities (e.g. cytokine secretion) by CD4 + T cells. For a murine coronavirus infection in mouse, both antibody responses and cell-mediated cytotoxicity were needed to efficiently control the virus (reviewed by Weiss & Navas-Martin, 2005). In SARS-CoV-1-infected patients, B cell as well as T cell responses were observed (Li et al., 2008), and, in animal models of SARS, B cell responses (Bisht et al., 2005) as well as CD4 + and CD8 + T cell responses (Channappanavar et al., 2014;Liu et al., 2017;Zhao et al., 2010;Zhao et al., 2016) were shown to have protective value. Notably, in most individuals who had recovered from SARS, SARS-CoV-1specific memory CD8 + T cells persisted for up to 6 years after SARS-CoV-1 infection whereas memory B cells and antivirus antibodies generally became undetectable (Tang et al., 2011).
Based on theoretical considerations alone, it is difficult to predict effective B cell memory across different virus species (Qiu et al., 2020), which makes it a poor topic for our present study which is based on sequence comparisons. We just note that a recent study concluded that sera from people that likely had been infected with the common human coronaviruses 229E, NL63, OC43, and/or HKU1, possessed no or negligible crossreactivity with SARS-CoV-2 virus S protein (Amanat et al., 2020) and thus probably possess no neutralizing antibodies.
There may be some recognition of SARS-CoV-2 epitopes by CD4 + T cell memory derived from previous infections with common human coronaviruses. However, as discussed in the

Amendments from Version 1
Based on requests of the reviewers, we now have added more background information and references to the article. In Table 1, we added software predictions for HLA-C binding. In the Notifications section, in line with the comments by the reviewers, we additionally included some new information that appeared after the initial submission of our article.
Any further responses from the reviewers can be found at the end of the article REVISED Results and Discussion section, the very limited lengths of identical sequence stretches between the viruses make theoretical predictions of such epitopes difficult, and therefore the current study only concentrates on potential CD8 + T cell memory.
For inducing CD8 + T cell memory, the core requirement is merely that an identical short peptide is presented by major histocompatibility complex (MHC) class I (MHC-I) molecules. MHC-I molecules present peptide fragments from intracellular proteins, thus also from viral proteins, at the cell surface for screening by CD8 + cytotoxic T cells (Neefjes et al., 2011). CD8 + T cells recognize the combination of MHC-I molecule with peptide by T cell receptors (TCR) that are unique per T cell clone, and if stimulated these clones can proliferate, kill the presenting (virus-infected) cell, and produce memory cells. MHC-I molecules are polymorphic in that they are represented by many diverse allelic forms that differ between human populations and individuals (Robinson et al., 2020), and mostly bind peptides of 9 amino acids (aa) length in their binding groove which is closed at either end (Bjorkman et al., 1987;Rammensee et al., 1995;Schellens et al., 2015).
In the present study, we analyzed whether there are linear 9 aa epitopes that are identical between proteins encoded by SARS-CoV-2 and one or more of the common human coronaviruses. We found many of such epitopes indeed, and, by using prediction software, found that some are expected to bind well to certain MHC-I alleles. We therefore expect that common human coronaviruses can induce some level of CD8 + T cell-mediated immune memory recognizing SARS-CoV-2, and consider the possibility of enhancing that immune memory by vaccination.

Methods
Proteins encoded by a reported genomic sequence for SARS-CoV-2 (GenBank MN908947; Wu et al., 2020) were compared with those for HCoV-OC43 (NC_005147; Vijgen et al., 2005), HCoV-HKU1 (NC_006577; Woo et al., 2005), HCoV-229E (NC_002645;Thiel et al., 2001), and HCoV-NL63 (NC_005831; van der Hoek et al., 2004) by performing BLAST homology searches at the NCBI database (https://blast.ncbi.nlm.nih.gov/ Blast.cgi) and by making multiple sequence alignments using CLUSTALW software (https://www.genome.jp/tools-bin/clustalw); continuous stretches of 9 aa acids identical between SARS-CoV-2 and one of the other viruses were identified manually. All these shared 9 aa epitopes were screened by ANN 4.0 software at IEDB Analysis Resource (http://tools. immuneepitope.org/mhci/) for prediction of their affinity to a set of representative human MHC-I alleles. Table 1 lists the 9 aa epitopes that are identical between proteins encoded by SARS-CoV-2 and one or more of the common human coronaviruses. Many identical >9 aa stretches were found with ORF1ab encoded polyprotein, one such identical stretch (of 12 aa) was found with the N protein of the other two type II coronaviruses HCoV-OC43 and HCoV-HKU1, and no such stretches were found when comparing with any of the other gene products; ORF1ab-derived mature proteins with such stretches, expected from cleavage of the polyprotein precursor (Wu et al., 2020), were the transmembrane protein nonstructural protein 4 (NSP4), 3C-like cysteine protease NSP5, RNA binding protein NSP9, RNA dependent RNA polymerase NSP12, helicase NSP13, 3'-to-5' exonuclease NSP14, nidoviral endoribonuclease specific for U NSP15, and S-adenosylmethionine-dependent ribose 2'-O-methyltransferase NSP16 (Table 1). Sequence alignment figures of the ORF1ab and N proteins are shown in Extended data (Dijkstra, 2020a) with highlighting of the interesting epitopes. It is of note that the S protein, which is the prime candidate for inducing neutralizing antibodies (Cohen, 2020), is poorly suitable for inducing an MHC-I-restricted immune memory across the investigated viral species as between S protein of SARS-CoV-2 and S proteins of the common human coronaviruses there are no 9 aa matches, and, among the virus isolates compared in this study, only a single 8 aa match (DRLITGRL with HCoV-NL63 and -229E) (not shown).

Results and discussion
In Table 1 (for Excel format see Extended data) it is shown that there are >200 linear epitopes of 9 aa that are identical between SARS-CoV-2 and at least one of the common human coronaviruses, most of them with OC43 and HKU1 which, like SARS-CoV-2, belong to the group II coronaviruses. In a simplified model, if people would have been exposed to many of these epitopes through common HCoV infections, this kind of equals immunization by a small intracellular protein under natural viral infection conditions. Whereas live virus is commonly considered the gold standard in regard to inducing strong immunity, unless the virus has some tricks up its sleeve to manipulate the immune system, which for common human coronaviruses is not well investigated, a research grant proposal suggesting this as a vaccination strategy would probably fail. Reviewers of such proposal would righteously point out that the strategy would not induce neutralizing antibodies, which for combating some viral infections can be very important, and that for inducing MHC-I-restricted cell-mediated cytotoxicity memory, ideally, a much larger protein or more proteins should be taken. Those reviewers would conclude that for such small intracellular protein to induce strong immune memory it would be too dependent on the MHC alleles of the immunized person and would need too much luck in regard to immunogenicity. Nevertheless, those reviewers would probably also agree that in most persons thus vaccinated some (small) level of immune memory protection would be established, even with such small non-surface protein (e.g. Polakos et al., 2001;Wasmoen et al., 1995;Zhao et al., 2005). Regardless of that this obviously is not the ideal way to induce a populationwide strong protective immunity (see the spread of COVID-19), together with other factors such as health and the number of encountered viruses (the strength of the viral challenge), the induced immune memory could make a difference for whether a person gets sick; at the population scale, it so may somewhat reduce the virus reproduction number. Importantly, by stimulating this HCoV-derived MHC-I restricted immune memory by vaccination (see below), it may become a more significant helper in fighting COVID-19.  ( (2008) found that a SARS-CoV-1 15 aa peptide sequence (their "Replicase 4701-4715" peptide) encompassing the SARS-CoV-2/HCoV-shared ORF1ab4725 and ORF1ab4726 epitopes that are predicted to bind well to the MHC-I alleles HLA-A*0201 and HLA-B*3901 (see our Table 1) was associated with a CD8 + T cell response against SARS-CoV-1 in humans. However, Li et al. (2008) also found such CD8 + T cell response associated with a SARS-CoV-1 15 aa peptide (their "Nucleocapsid 106-120" peptide) encompassing the SARS-CoV-2/HCoVshared N 106, N 107, N 108, and N 109 epitopes for which our analyses did not predict MHC-I binding (see our Table 1).
The MHC-I binding affinity is considered the most selective in determining which peptides are presented, but also steps in the peptide processing and loading pathways may play selective roles which are difficult to capture in prediction software (Nielsen et al., 2005). We argue that, if such steps would be selective for presentation, in most cases they would probably not differentiate between the 9 aa epitope in the SARS-CoV-2 context versus the respective HCoV context, since most of those epitopes are within stretches that also show many similarities in the neighboring residues (Extended data).
Not all stable complexes of MHC-I with non-self peptides elicit a strong immune response, but "immunogenicity" features are hard to predict with meaningful reliability by in silico analysis (Calis et al., 2013), and in the present study we refrain from such predictions. Table 1 should, foremost, be understood as evidence of principle and a list of promising peptides, whereas only future experiments can prove MHC-I-mediated immune memory involving these or other peptides.
In regard to SARS-CoV-2 recognition, the common human coronaviruses may also induce some MHC-II-mediated immune memory by CD4 + helper T cells (as an example for shared epitope use by different coronaviruses see Zhao et al., 2016). CD4 + helper T cells can help stimulate cells involved in antibody or cell-mediated cytotoxic immune responses (Neefjes et al., 2011). However, for this topic, in the present article, we have refrained from detailed (software) predictions because comparison of MHC-II epitopes across different viruses is harder than for MHC-I epitopes. Namely, although the core of MHC-II bound peptides is also only 9 aa, the surrounding amino acids are also part of the bound peptide that tends to be 12-25 aa (Brown et al., 1993;Rammensee et al., 1995;Stern & Wiley, 1994) and can affect how the peptide interacts with the receptors on the CD4 + helper T cells (Arnold et al., 2002).

Vaccination potential
Immune memory means that a secondary immune response, upon renewed encounter with the same pathogen, is faster and stronger than the primary immune response during the first encounter with the pathogen. This is based on expansion of specific B and T cell clones, which specifically recognize pathogen(-derived) epitopes, with some of those cells becoming memory cells (Paul, 2013). This principle also causes that for a booster vaccination/immunization the requirements for efficiently inducing an immune response are lower than for a first vaccination/immunization (e.g. Du et al., 2008;Goding, 1996;Schulze et al., 2008). Especially in elderly people, who have a decreased ability to mount adaptive immune responses against new antigens, vaccination that stimulates an immune memory response may be beneficial (Kaml et al., 2006;Reber et al., 2012;Wagner & Weinberger, 2020). As discussed above, people's past infections with common coronaviruses probably did not induce a B cell memory for making antibodies that can neutralize SARS-CoV-2. However, as the current study shows by analysis of linear 9 aa epitopes, these common human coronaviruses are expected to induce CD8 + T cells that may potentially kill SARS-CoV-2-infected cells and so can help eradicate the virus. There are several possible ways to exploit this probable immune memory. For example, if using RNA for immunization (Cohen, 2020), it may be best to also include SARS-CoV-2 genes that encode MHC-I epitopes that match those of the common coronaviruses. Alternatively, delivery of these epitopes to the MHC-I presentation system may be tried by peptide or protein based vaccines (e.g. Kohyama et al., 2009;Slingluff, 2011;van Montfoort et al., 2014;Yadav et al., 2014), possibly in combination with some of the strategies that are currently being explored for non-specific stimulation of the immune system against COVID-19 (Kupferschmidt & Cohen, 2020). Protein (-coding) vaccines, for example encompassing a large part of the SARS-CoV-2 ORF1ab product, would have an advantage over peptide-vaccines by including multiple possible MHC-I and also MHC-II epitopes, and be less dependent on MHC-allele matching and the quality of software predictions. Naturally, as for any new vaccine strategy, it should be carefully assessed whether the benefits of the induced type of immunity outweigh the potential deleterious health effects caused by, for example, an increased inflammation response (Cohen, 2020;Weingartl et al., 2004). Another fundamental concern is the maximum level of protection that can be generated by vaccination against coronavirus infections in humans, considering that infection of volunteers with HCoV-229E live virus gave only partial protection upon infection with the same virus one year later (Callow et al., 1990). Additional questions specifically related to the contents of our study are whether the history of previous-especially recent-infections with common coronaviruses, or people's MHC alleles, affect people's resistance to SARS-CoV-2. Most definitely, if discussing possible strategies for vaccination against SARS-CoV-2, pre-existing MHC-I-based immunity derived from previous infections with common coronaviruses should be part of the consideration.

Notifications
Although we were not aware of this at the time of writing, a recent paper appeared with overlapping contents (Nguyen et al., 2020). The Nguyen et al. study was more complete on SARS-CoV-2 MHC epitope predictions and made an association with global MHC allele distributions. The advantage of our study is a more concentrated focus on the MHC-I mediated memory expected from previous coronavirus infections, and the vaccination potential deriving from that memory.
After we had submitted our study, two studies reported in vitro responses of T cells against SARS-CoV-2 peptides, which might represent memory from previous infections with common coronaviruses (Braun et al., 2020;Grifoni et al., 2020). However, both studies only used peptide mixes without identifying the responsible peptide, and at least several of the observed responses necessitated the allowance of peptide ligand sequence mismatches for T cell receptor to MHC/peptide binding (T cell cross-reactivity). Negative control donors, who with certainty had never been infected with common coronaviruses, were not available for the experiments, and conclusions that the observed responses were from T cell memory from previous coronavirus infections, and have in vivo relevance, should be considered only cautiously. Discussion of this topic is important because the two studies concluded a potential of the common coronavirus S proteins to induce CD4 + T cell memory (Braun et al., 2020;Grifoni et al., 2020) and CD8 + T cell memory (Grifoni et al., 2020), whereas these proteins do not share 9 aa identical stretches with SARS-CoV-2 (see our article and Supplementary Fig. 1 in Braun et al., 2020), and would arguably necessitate the allowance of peptide sequence mismatches (T cell cross-reactivity) for inducing an efficient MHC-mediated T cell response. As we pointed out in our article, although SARS-CoV-2 S protein is the prime vaccine component candidate for inducing neutralizing antibodies, for a more realistic chance to efficiently boost existing T cell memory it probably would be better to additionally include other SARS-CoV-2 proteins that do share identical MHC epitopes with common oronaviruses.
Regarding the potential of existing CD8 + T cell memory cells to help fight COVID-19 disease, a recent observation by Liao et al., (2020) might be interesting. Their study suggests that in COVID-19 patients with pneumonia, ZNF683 + CD8 + T cell clonal expansion may protect the patient from more severe disease.  I think the Table could be made quite a lot smaller and thus more valuable to the reader. The source proteins could be indicated as an abbreviation provided in the legend as could the various seasonal strains. The boxes could then be quite small, and either be positive or negative. In any case, an effort should be made to condense this table.

Is the work clearly and accurately presented and does it cite the current literature? Yes
Is the study design appropriate and is the work technically sound? Yes

If applicable, is the statistical analysis and its interpretation appropriate? Yes
Are all the source data underlying the results available to ensure full reproducibility? Yes

Are the conclusions drawn adequately supported by the results? Partly
No competing interests were disclosed.

Competing Interests:
Reviewer Expertise: Immunology I confirm that I have read this submission and believe that I have an appropriate level of expertise to confirm that it is of an acceptable scientific standard.
Author Response 08 Jul 2020 , Fujita Health University, Toyoake, Japan Johannes M. Dijkstra Dear Dr. Sant, Thank you for reviewing our article. We highly appreciate your comment that our article is timely and well balanced. Especially the latter is important, since we did not want to make a general audience too enthusiastic, while we simultaneously wanted to stress that there is a chance that this MHC-I-mediated immunity might (possibly after boostering by vaccines) give real protection.
You are correct that immune memory by CD4 cells is not only relevant to their control of B cell and CD8 T cell responses, and we have rewritten the sentence and paragraph on the "major arms of immune memory." As you state, our paper focusses on CD8 T cell memory indeed, because that is the memory response that can be predicted most reliably by sequence analysis alone. In the notification section we now shortly discuss two papers that appeared after we submitted our paper, and which claim to have found SARS-CoV-2 recognizing CD4 and CD8 T cell memory derived from previous common coronavirus infections.
As requested, we have now changed "software owners" to "software designers." + + + + As requested, we now have added more references of an enhanced immune reaction after a second (booster) immunization. However, we do not feel that for our type of paper it is necessary to discuss the mechanisms of memory T cells besides just mentioning their involvement.
As for the words "expected" versus "potential." We feel that we used the word "expected" correctly. Table 1, where all the peptides are listed, carries neither of these two words and has a very neutral title. In the text, some peptides are referred to as "expected" to bind a particular MHC molecule, a term clearly relating to the indicated software and literature. Given the large number of potential MHC-I epitopes shared between the viruses, we "expect" previous common coronavirus infections to have induced some CD8+ T cell immune memory that recognizes SARS-CoV-2; this claim is not about the protective value of this memory, and we feel, therefore, that the word "expect" is within reasonability.
As for the Table format. The format was chosen by the journal editorial team, and we can see that for some uses it has advantages. However, we understand your concern, and now have added an Excel format variant of the Table to the supplement section so that readers can more easily view and interact with the data.

If applicable, is the statistical analysis and its interpretation appropriate? Yes
Are all the source data underlying the results available to ensure full reproducibility? Yes Are the conclusions drawn adequately supported by the results? Yes No competing interests were disclosed. Competing Interests:

Reviewer Expertise: T cel viral immunology
We confirm that we have read this submission and believe that we have an appropriate level of expertise to confirm that it is of an acceptable scientific standard.
Author Response 08 Jul 2020 , Fujita Health University, Toyoake, Japan Johannes M. Dijkstra Dear Dr. Gil and Dr. Selin, Thank you for your kindness to review our article. We are glad that you find our study very useful, and that you agree with the conclusions. Coming from experts like you, this is very reassuring.
We have now added a bit more information on the Nguyen . study in the Notification section. et al The advantage of the Nguyen study was that they were more complete on SARS-CoV-2 et al. MHC epitope predictions and made an association with MHC allele distributions. The advantage of our paper is a more concentrated focus on the memory expected from previous coronavirus infections, and the vaccination potential deriving from that memory. They were first, which we acknowledged, although we wrote our paper independent of their article, and during the submission process of our article theirs was not an indexed publication yet. Presumably, this type of overlap will happen a lot with a topic as intensively studied as coronavirus, and we feel we took a reasonable approach for dealing with their study. From your reviews, we understand that you and the other reviewer, Dr. Sant, find this within acceptability.
Thank you for referring to the interesting YLRKHFSMMIL stretch which is identical between HCoV-NL63 and SARS-CoV-2, as indeed it harbors predicted binding epitopes for several MHC-I supertypes. However, we prefer not to discuss this in text form, because there are many uncertainties (e.g., about recent HCoV-NL63 distributions in the world population) and a textual discussion may not add clarity to the table presentation.
Based on your advice, we now have added HLA-C predictions to Table 1.
Likewise, we now have added a more extensive summary of previous reports on T cell memory after coronavirus infections.
Apart from addressing the reviewer's comments, we corrected a mistake and now informed the readers that there is a single 8 aa match between compared S proteins. readers that there is a single 8 aa match between compared S proteins.
Apart from addressing the reviewer's comments, we now also added the information that the study by Callow (1990) on HCoV-229E, concluded imperfect immune memory protection even by et al. live virus infection one year before challenge.
Again, we like to thank you for the reviewing, as we are aware of the time and effort that it takes. We are very happy that our article is appreciated, since it deals with such an important topic. In this comment I would just like to state that we submitted this article to F1000Research on April 15 (Japanese time). F1000Research only lists the publication date (April 23) which was after the journal edited our manuscript.