Gaidai reliability method for long-term coronavirus modelling

Background Novel coronavirus disease has been recently a concern for worldwide public health. To determine epidemic rate probability at any time in any region of interest, one needs efficient bio-system reliability approach, particularly suitable for multi-regional environmental and health systems, observed over a sufficient period of time, resulting in a reliable long-term forecast of novel coronavirus infection rate. Traditional statistical methods dealing with temporal observations of multi-regional processes do not have the multi-dimensionality advantage, that suggested methodology offers, namely dealing efficiently with multiple regions at the same time and accounting for cross-correlations between different regional observations. Methods Modern multi-dimensional novel statistical method was directly applied to raw clinical data, able to deal with territorial mapping. Novel reliability method based on statistical extreme value theory has been suggested to deal with challenging epidemic forecast. Authors used MATLAB optimization software. Results This paper described a novel bio-system reliability approach, particularly suitable for multi-country environmental and health systems, observed over a sufficient period of time, resulting in a reliable long-term forecast of extreme novel coronavirus death rate probability. Namely, accurate maximum recorded patient numbers are predicted for the years to come for the analyzed provinces. Conclusions The suggested method performed well by supplying not only an estimate but 95% confidence interval as well. Note that suggested methodology is not limited to any specific epidemics or any specific terrain, namely its truly general. The only assumption and limitation is bio-system stationarity, alternatively trend analysis should be performed first. The suggested methodology can be used in various public health applications, based on their clinical survey data.


Methods
Modern multi-dimensional novel statistical method was directly applied to raw clinical data, able to deal with territorial mapping.Novel reliability method based on statistical extreme value theory has been suggested to deal with challenging epidemic forecast.Authors used MATLAB optimization software.

Introduction
3][14][15][16][17][18][19] The latter is usually due to the fact that dynamic biological and environmental systems possess high number of degrees of freedom (in other words dimensional components), as well as system dependency on locationmaking bio-system of interest spatio-temporal.][22][23] For COVID-19, however, the only available observation numbers are limited as the observations are only available from the beginning of the year 2020 up to now. 20Motivated by the latter argument, the authors have introduced a novel reliability method for biological and health systems to predict and manage epidemic outbreaks more accurately; this study was focused on COVID-19 epidemics in northern China, with focus on cross-correlations between different provinces within same climatic zone.3][34][35][36][37][38][39][40] For example, in Ref. 13, authors employed EVT theory to determine the fitness effect using a Beta-Burr distribution.While in Ref. 14, the author discusses using a bivariate logistic regression model, which was then used to access multiple sclerosis patients with walking disabilities and in a cognitive experiment for visual recognition.Finally, is a paper of relevance, which used EVT to estimate the probability of an influenza outbreak in China.The author demonstrated a forecasting prediction potential amid the epidemic in this paper.[47][48][49][50] In this paper epidemic outbreak is viewed as unexpected incident that may occur at any province of a given country at any time, therefore spatial spread is accounted for.[53][54][55][56][57][58] Biological systems are subjected to ergodic environmental influences.0][61][62][63][64][65] The incidence data of COVID-19 in all provinces of the People's Republic of China (PRC) from February 2020 until end of 2022 was retrieved from the official public PRC health website, for simplicity only northern provinces were selected for this study.As this dataset is organized by province (more than 30 provinces in China), the biological system under consideration can be regarded as a multi-degree of freedom (MDOF) dynamic system with highly inter-correlated regional components/dimensions.[68][69][70][71][72] Note that while this study aims at reducing risk of future epidemic outbreaks by predicting them, it is solely focused on daily registered patient numbers and not on symptoms themselves.For long-lasting COVID-19 symptoms, the so-called "long COVID", and its risk factors and whether it is possible to predict a protracted course early in the disease, see Ref. 18, for mortality research see Ref. 1.

Gaidai method
This section presents theoretical details of novel Gaidai bio-reliability method.For numerical part authors used commercial software MATLAB, (Mathworks, V 8.6), namely its optimization routines, otherwise authors have used extrapolation code available from modified Weibull.Only the code available from modified Weibull was used to complete all sections of the methodsboth numerical part as well as final extrapolation.
Novel Gaidai bio-reliability method introduces MDOF (multi-degree of freedom) health bio-system vector process Þ that was measured over a sufficiently long (representative) period of time 0,T ð Þ. Unidimensional

REVISED Amendments from Version 2
The authors have addressed language and formatting concerns.Equations have been improved, references extended.
Any further responses from the reviewers can be found at the end of the article bio-system components global maxima over the entire time span 0, T ð Þdenoted as Let X 1 , …,X NX be time local maxima of the process X t ð Þ consequent in time, recorded at discrete time instants t X 1 < ⋯ < t X NX that are monotonously increasing in 0,T ð Þ.A similar definition follows for other MDOF bio-system components Y t ð Þ, Z t ð Þ, … with Y 1 ,…,Y NY ; Z 1 ,…, Z NZ and so on.For simplicity, all bio-system components, and therefore its maxima are assumed to be non-negative.
The target is to estimate system failure probability, in other words the probability of exceedance where being the probability of nonexceedance for critical values of response components η X , η Y , η Z ,…; ∪ denotes logical unity operation «or»; and p X max T ,Y max T ,Z max T ,… being joint probability density (PDF) of the global maxima over the entire period 0,T ð Þ.However, it is not feasible to estimate the latter joint probability distribution directly due to its high dimensionality and available dataset limitations.
More specifically, the moment when either and so on, the system is regarded as immediately failed.Fixed failure levels η X , η Y , η Z ,… are, of course, individual for each system one-dimensional component.X max T , and so on.Now, bio-system components local maxima time instants monotonously non-decreasing order into one single merged time vector In this case t j represents local maxima of one of MDOF structural response components either X t ð Þ or Y t ð Þ, or Z t ð Þ and so on.That means that having bio-system time record, one just needs continuously and simultaneously screen for unidimensional response component local maxima and record its exceedance of MDOF limit vector η X ,η Y ,η Z , … ð Þ in any of its components X,Y,Z, … In order to unify all three measured time series X, Y,Z, the following scaling was performed as follows making all bio-system components non-dimensional and having the same failure limit equal to λ ¼ 1. Unidimensional bio-system component's local maxima are merged into a single temporal non-decreasing system vector RðtÞ R Þ following the merged time vector t 1 ≤ ⋯ ≤ t N , see Figure 1.That is to say, each local maxima R j is, in fact, actual encountered bio-system components local maxima corresponding to either X t ð Þ or Y t ð Þ, or Z t ð Þ and so on.Finally, the unified limit/hazard bio-system vector η 1 ,…, η N ð Þis introduced with each component η j is either η X , η Y or η Z and so on, depending on which of , corresponds to the current local maxima with the running index j.In case of simultaneous occurrence of various bio-system components maxima at the same time instant t j , the maximum of those limits will be taken, for example - Now, the scaling parameter 0 < λ ≤ 1 is introduced to, artificially, simultaneously decrease limit values for all bio-system components, namely the new MDOF limit vector z and so on.The latter automatically defines probability P λ ð Þ as a function of λ, note that P P 1 ð Þ from Equation (1).Non-exceedance probability P λ ð Þ can be introduced as follows Next, cascade of approximations which is based on conditioning is briefly outlined.In practice, [32][33][34][35][36] the dependence between the neighboring R j is not negligible; thus, the following one-step (will be called conditioning level for 2 ≤ j ≤ N (conditioning level k ¼ 2).The approximation introduced by Equation ( 3) may be further expressed as where 3 ≤ j ≤ N (will be called conditioning level k ¼ 3), and so on.The idea is to monitor each independent failure that happened locally first in time, thus avoiding cascading local inter-correlated exceedances.
Equation ( 5) exhibits subsequent refinements with respect to the statistical independence assumption.These approximations increasingly accurately approximate statistical dependence between neighboring maxima.Since the original MDOF process R t ð Þ has been assumed ergodic and thus stationary, the probability for j ≥ k is independent of j, however dependent on the conditioning level k.Thus, non-exceedance probability may be approximated as in the average conditional exceedance rate method, see Ref. 28 for more details on exceedance probability Note that Equation ( 5) follows from Equation (1 , as design failure probability must be epsilon order o(1), with N ≫ k.Equation ( 5) is analogous to the well-known mean up-crossing rate equation for the stochastic process probability of exceedance. 28There is convergence with respect to k, called here conditioning level Note that Equation (6) for k ¼ 1 is equivalent to a well-known non-exceedance probability relationship with the mean up-crossing rate function where ν þ λ ð Þ denotes the mean up-crossing rate of the bio-system non-dimensional level λ for the above assembled . The mean up-crossing rate is given by the Rice's formula, given in Equation (7)  7) relies on the Poisson assumption that is up-crossing events of high λ levels (in this paper, it is λ ≥ 1) can be assumed to be independent.The latter may not be the case for narrowband bio-system components and higher-level dynamical systems that exhibit cascading failures in different dimensions, subsequent in time, caused by intrinsic inter-dependency between extreme events, manifesting itself in the appearance of highly correlated local maxima clusters within the assembled vector R In the above, the stationarity assumption has been used.However, the proposed methodology can also treat the nonstationary case.For nonstationary case, the scattered diagram of m ¼ 1,…, M seasonal epidemic conditions, each short-term seasonal state has the probability q m , so that P M m¼1 q m ¼ 1. Next, let one introduce the long-term equation with p k λ,m ð Þbeing the same function as in Equation ( 6) but corresponding to a specific short-term seasonal epidemic state with the number m.
Note that the accuracy of the suggested approach for a large variety of one-dimensional dynamic systems was successfully verified by authors in previous years. 28,29plementing modified Weibull extrapolation method Introduced by Equation ( 5) functions p k λ ð Þ are regular in the tail, specifically for values of λ approaching and exceeding 1. 17 More precisely, for λ ≥ λ 0 , the distribution tail behaves similar to exp g with a,b, c,d being suitably fitted constants for suitable tail cut-on λ 0 value.Therefore, one can write Next, by plotting ln ln , often nearly perfectly linear tail behavior is observed.Optimal values of the parameters a k ,b k ,c k ,p k ,q k may also be determined applying sequential quadratic programming (SQP) methods, incorporated in NAG Numerical Library. 30Methods described above have been applied as described in methods section.Authors used MATLAB (Mathworks, V 8.6) (RRID:SCR_001622) commercial tool as a basis for their numerical purposes.For more specific author developed code routines, related to the extrapolation method by Equation (9), see modified Weibull.Note that modified Weibull is a repository, containing not only the code, but user manual, examples and references.In this study only extrapolation part of modified Weibull was used.In other words, current study presents novel theoretical methodology, but using modified Weibull software previously developed by some of the authors. 31

Ethical consideration
Authors confirm that all methods were performed in accordance with the relevant guidelines and regulations according to the Declarations of Helsinki.

Use case
Methods described in this paper are novel and state of art.Prediction of influenza type epidemics has long been the focus of attention in mathematical biology and epidemiology.It is known that public health dynamics is a seasonally and spatially varying dynamic system that is always challenging to analyse.This section illustrates the efficiency of the above-described methodology using the new method applied to the real-life COVID-19 data sets, presented as a new daily recorded infected patient time series, spread over different regions.
COVID-19 and influenza are contagious diseases with high transmissibility and ability to spread.Seasonal influenza epidemics caused by influenza A and B viruses typically occur annually during winter, presenting a burden on worldwide public health, resulting in around 3-5 million cases of severe illness and 250,000-500,000 deaths worldwide each year, according to the World Health Organization (WHO). 20is section presents a real-life application of the above-described method.The statistical data in the present section are taken from the official website of the National Health Commission of the people's Republic of China.The website provides the number of newly diagnosed cases every day in 13 administrative regions in northern China from 22 January 2020 to the end of 2022.][41] Failure limits η X ,η Y ,η Z ,…, or in other words, epidemic thresholds, are not set values and must be decided.The simplest choice would be for different countries to set failure limits equal to the percentage per corresponding country population, making X,Y, Z,… equal to percentage of daily infected per country.In this study, however, twice maxima of daily infected per country have been chosen as failure limits.Note that failure limits may be chosen differently for different dynamic bio-systems.Although the latter choice obviously introduces bias (accumulation point) at λ ¼ 0:5 if the number of countrys is large, in this study the number of regions is not that large (below 20 national regions) and proper extrapolation technique may easily circumvent the above-mentioned accumulation point bias.
Next, all local maxima from three measured time series were merged into one single time series by keeping them in time non-decreasing order: R Þ with the whole R ! vector being sorted according to non-decreasing times of occurrence of these local maxima.
Figure 2 presents new daily recorded patients number plotted as a time-space 2D surface using MATLAB.Figure 3 presents the number of new daily recorded patients as a 13D vector R ! , consisting of assembled regional new daily patient numbers.Note that vector R ! does not have physical meaning on its own, as it is assembled of different regional components with different epidemic backgrounds.Index j is just a running index of local maxima encountered in a nondecreasing time sequence.9) towards epidemic outbreak with 100-year return period, indicated by the horizontal dotted line, and somewhat beyond, λ ¼ 0:1 cut-on value was used.Dotted lines indicate extrapolated 95% confidence interval according to Equation (10).According to Equation (5) p λ ð Þ is directly related to the target failure probability 1 À P from Equation (1).Therefore, in agreement with Equation ( 5), system failure probability 1 À P ≈ 1 À P k 1 ð Þ can be estimated.Note that in Equation ( 5), N corresponds to the total number of local maxima in the unified bio-system vector R ! .Conditioning parameter k ¼ 3 was found to be sufficient due to occurrence of convergence with respect to k, see Equation (6). Figure 4 exhibits reasonably narrow 95% CI.The latter is an advantage of the proposed method.
Note that while being novel, the above-described methodology has a clear advantage of utilizing available measured data set quite efficiently due to its ability to treat health system multi-dimensionality and perform accurate extrapolation based on quite limited data set.Note that, predicted non-dimensional λ level, indicated by star in Figure 4, represents probability of epidemic outbreak at any northern province in China in the years to come.

Discussion
Traditional health bio-systems reliability methods dealing with observed/measured time series do not have the ability to efficiently deal with high dimensionality and cross-correlation between different bio-system components.The main advantage of the introduced methodology is its ability to study high dimensional non-linear dynamic systems reliability.
Despite the simplicity, the present study successfully offers a novel multidimensional modelling strategy and a methodological avenue to implement the forecasting of an epidemic during its course, if it is assumed to be stationary in time.Proper setting of epidemiological alarm limits (failure limits) per province has been discussed, see Section Use case.This paper studied recorded COVID-19 patient numbers from thirteen different Chinese northern provinces, constituting an example of a thirteen dimensional (13D) and ten-dimensional (10D) dynamic biological system respectively observed in 2020-2022.The novel reliability method was applied to new daily patient numbers as a multidimensional system in real-time.
The main conclusion is that if the public health system under local environmental and epidemiologic conditions in northern China is well managed.Predicted 100-year return period risk level λ of epidemic outbreak is reasonably low.However, there is an ultra-low risk of a future epidemic outbreak in chosen country of interest, at least in 100 years horizon.
This study outlines a general-purpose, robust and straightforward multidimensional bio-system reliability method.The method introduced in this study has been previously successfully validated by application to a wide range of engineering models, 11,12 but only for one and two-dimensional bio-system components and, in general, very accurate predictions were obtained.Both measured and simulated bio-system components time series can be analysed using the proposed method.It is shown that the method produced an acceptable 95% confidence interval, see Figure 4. Thus, the suggested methodology may be used as a tool in various non-linear dynamic biological systems reliability studies.The presented COVID-19 example does not limit potential areas of new method applicability by any means.

Eduard Campillo-Funollet
Mathematics and Statistics, Lancaster University, Lancaster, England, UK The manuscript presents the application of an extreme-value methodology to the analysis of epidemiological data.The methodology have been applied in the past to other types of data.The manuscript illustrates the methodology using data from the COVID-19 pandemic in China.
I have several reservations about the manuscript, in particular regarding the technical background and the conclusions.

Major concerns:
-The claim that this study "fully space-time dynamic biosystem" (Introduction, paragraph beginning with "Biological systems...") is not justified.If anything, the proposed method does not account for the dynamics of the disease, and in fact it relies on the assumption of stationary data.
-Similarly, the assumption of the process being stationary is a strong assumption in the context of an evolving epidemic.It is mentioned briefly but its implications should be discussed.
-The description of the methods mixes using three components, with using many.See for example the paragraph before equation 2. The presentation would be clearer if the notation was adapted to use a generic number of components n, instead of using three and saying "etcetera", for exapmle in "X,Y,Z,...".This also affects the rest of the equations in the presentaiton, for example \eta_j is not clearly defined in (3).
-The assumption of an exponential tail ("exceedance probability", eq. 6) should be clearly justified in paper, rather than citing an application paper.
-The 100-year return level in Fig 4, which summarises very well the applicability of the method, should be discussed in detail.The data is from a two year period during the COVID-19 pandemic, and the method essentially captures the peaks of new cases during that time.However, the changes in the population (e.g.immunity or vaccinations) make the past data not representative of the current or future situations, and from an epidemiological perspective, make the 100-year return level difficult to apply in practice.Could the methodology be applied in shorter time frames, maybe even allowing for some validation of the method?Other minor comments: -Several references are not relevant to the paper For example, 4-11 are cited as related to epidemics, and in fact they are about renewable energy, wind turbines and other topics.Similarly with 14, 15.
-Several citation styles are used in the paper (see for example the first paragraph of the introduction) Reviewer Expertise: Epidemiology, statistics I confirm that I have read this submission and believe that I have an appropriate level of expertise to confirm that it is of an acceptable scientific standard, however I have significant reservations, as outlined above. 2 Department of Physical Sciences, Independent University, Bangladesh, Dhaka, Dhaka Division,

Bangladesh
The article presents a new method for long-term coronavirus forecasting using statistical extreme value theory and MATLAB optimization.The rationale for this method is clearly explained, addressing the limitations of existing models in varying epidemic scenarios.While the description of the method is technically sound, its reproducibility is partly limited due to insufficient detailed procedural and algorithmic information.Additionally, the full reproducibility of results is constrained as the underlying source data is not entirely available.The conclusions drawn about the method's performance are well-supported by the findings.To enhance the scientific soundness and utility of the article, the authors should consider: Providing more detailed procedural and algorithmic descriptions, enabling others to replicate the method more accurately. 1.
Including or making accessible the complete source data used in their study, allowing for full validation and replication of results.

2.
Addressing these points will significantly improve the paper's comprehensiveness and its applicability in different research contexts.

Is the rationale for developing the new method (or application) clearly explained? Yes
Is the description of the method technically sound?Yes ○ "Moreover, a specific non-dimensional factor λλ is introduced to predict the latter epidemic risk at any time and any place."(What exactly is the biological meaning of this parameter, this seems to be a critical component of the study and really needs a lot of detail when it is introduced.)

○
Many of the equations are not well explained e.g.Eq 2 third equality has a typo ' ', similarly equation 5 needs to be explained further.
In the discussion section there is a mention of 'both countries', not clear which country besides China is being referred to.
These are just some examples of unclear text, hence my recommendation below. Recommendation: The aims and results are not well explained.My recommendation is that the authors may want to extensively edit the paper, making their goals, method and results clearer and the paper more readable.

Is the rationale for developing the new method (or application) clearly explained? No
Is the description of the method technically sound?Partly

Are sufficient details provided to allow replication of the method development and its use by others? No
If any results are presented, are all the source data underlying the results available to ensure full reproducibility?Yes Are the conclusions about the method and its performance adequately supported by the findings presented in the article?No Competing Interests: No competing interests were disclosed.

Reviewer Expertise: Mathematical Biology
I confirm that I have read this submission and believe that I have an appropriate level of expertise to state that I do not consider it to be of an acceptable scientific standard, for reasons outlined above.
this seems to be a critical component of the study and really needs a lot of detail when it is introduced.)Answer: Changed to: "introduced to unify various bio-system components having different failure limits".As seen later in the text λ being just a non-dimensional parameter, used to indicate system failure, namely when λ reaches 1.
Many of the equations are not well explained e.g.Eq 2 third equality has a typo '', similarly equation 5 needs to be explained further.
In the discussion section there is a mention of 'both countries', not clear which country besides China is being referred to.
Answer: Thanks a lot for your valuable comments, changed now to a "chosen country of interest" Recommendation: The aims and results are not well explained.My recommendation is that the authors may want to extensively edit the paper, making their goals, method and results clearer and the paper more readable.
non-linear, dynamic biological systems reliability studies.
Oleg and his colleagues come up with the MDOF (multiple degrees of freedom) health response vector process.They do this in a new way.In addition, a novel new statistical approach-a generalized extreme value (GEV)-that was directly applied to unprocessed clinical data and is capable of dealing with territorial mapping was used as the foundation for model construction.This part of the study gives future researchers a new idea about how they might try to predict how an outbreak will change over time.
This paper is, in my opinion, an excellent contribution to the literature.

Is the rationale for developing the new method (or application) clearly explained? Yes
Is the description of the method technically sound?Yes

Are sufficient details provided to allow replication of the method development and its use by others? Yes
If any results are presented, are all the source data underlying the results available to ensure full reproducibility?Partly Are the conclusions about the method and its performance adequately supported by the findings presented in the article?Yes Competing Interests: No competing interests were disclosed.
Reviewer Expertise: Mathematical and statistical modeling of infectious disease I confirm that I have read this submission and believe that I have an appropriate level of expertise to confirm that it is of an acceptable scientific standard.
The benefits of publishing with F1000Research: Your article is published within days, with no editorial bias • You can publish traditional articles, null/negative results, case reports, data notes and more • The peer review process is transparent and collaborative • Your article is indexed in PubMed after passing peer review • Dedicated customer support at every stage • For pre-submission enquiries, contact research@f1000.com

Figure 1 .
Figure 1.Illustration on how two exemplary processes X and Y are combined into new synthetic vector R. Red ellipse highlights case of simultaneous maxima for different components.

Figure 2 .
Figure 2. New daily recorded patients number plotted as a 2D time-space surface, data was according to http://www.nhc.gov.cn.

Figure 3 .Figure 4
Figure 3. Number of new daily recorded patients as 13D vector R !

Figure 4 .
Figure 4. 100 years return level (horizontal dotted line) extrapolation of p k (λ) towards critical (biosystem failure) level (indicated by star) and beyond.Extrapolated 95% CI indicated by dotted lines, using modified Weibull extrapolation technique.
the rationale for developing the new method (or application) clearly explained?Partly Is the description of the method technically sound?Partly Are sufficient details provided to allow replication of the method development and its use by others?Yes If any results are presented, are all the source data underlying the results available to ensure full reproducibility?Yes Are the conclusions about the method and its performance adequately supported by the findings presented in the article?No Competing Interests: No competing interests were disclosed.

Are sufficient details provided
to allow replication of the method development and its use by others?Partly If any results are presented, are all the source data underlying the results available to ensure full reproducibility?Partly Are the conclusions about the method and its performance adequately supported by the findings presented in the article?Yes Competing Interests: No competing interests were disclosed.Reviewer Expertise: Operations Research, Machine Learning, Deep Learning, Optimization, TIme Series Forecasting I confirm that I have read this submission and believe that I have an appropriate level of expertise to confirm that it is of an acceptable scientific standard, however I have significant reservations, as outlined above.Is the rationale for developing the new method (or application) clearly explained?YesIs the description of the method technically sound?YesAre sufficient details provided to allow replication of the method development and its use by others?YesIf any results are presented, are all the source data underlying the results available to ensure full reproducibility?YesAre the conclusions about the method and its performance adequately supported by the findings presented in the article?Yes Competing Interests: No competing interests were disclosed.Reviewer Expertise: Mathematical Biology governing dynamic biological systems, spread over extensive terrain."(Hard to understand) "For example, Gumbel used EVT to estimate the demographic of various populations in." (Incomplete sentence)