Predictive modelling of arsenate (As(V)) adsorption onto surface-engineered magnetite nanoparticles [version 1; peer review: 1 approved]

Background: Since adsorption is a complex process, numerous models and theories have been devised to gain general understanding of its underlying mechanisms. The interaction between the adsorbates and adsorbents can be identified via modelling of the adsorption data with different adsorption isotherms as well as kinetic models. Many studies are also focused on developing predictive modelling techniques to facilitate accurate prediction of future adsorption trends. Methods: In this study, a predictive model was developed based on a multiple linear regression technique using existing data of As(V) adsorption onto several coated and uncoated magnetite samples. To understand the mechanisms and interactions involved, the data was first modelled using either Temkin or Freundlich linear isotherms.  The predicted value is a single data point extension from the training data set. Subsequently, the predicted outcome and the experimental values were compared using multiple error functions to assess the predictive model’s performance. Results: In addition, certain values were compared to that obtained from the literature, and the results were found to have low error margins. Conclusion: To further gauge the effectiveness of the proposed model in accurately predicting future adsorption trends, it should be further tested on different adsorbent and adsorbate combinations.


Introduction
Heavy metal ions are toxic and dangerous towards living organisms depending on the concentration and duration of exposure. Among the most toxic heavy metals are chromium (Cr), nickel (Ni), copper (Cu), zinc (Zn), cadmium (Cd), lead (Pb), mercury (Hg) and arsenic (As) 1 . The metalloid As itself has been considered as the highest priority pollutant to be controlled in China since 2009 together with four other heavy metal ions 2 . Heavy metals come from two major sources, namely natural and anthropogenic activities. Natural sources include volcanic-related occurrences, soil erosion, eroded rocks and minerals whilst anthropogenic sources involve landfills, sewage, agricultural activities, industrial pollutants, as well as mining 3 . The increased discharge of heavy metals into the environment may be attributed to global industrialisation and increased urbanisation processes 1 . In addition, the metalloid As is a ubiquitous element which exists naturally in the environment, hence resulting in enhanced exposure that could prove to be fatal if not controlled. A recent spate of toxic waste dumping incidences into rivers in Peninsular Malaysia necessitates the development of more effective water remediation methods. Adsorption is one of the numerous solutions that has gained popularity lately due to its effectiveness and low cost, especially adsorption using nanoparticles (NPs) as adsorbents. Extensive research is being carried out to improve the adsorption capacity of the wide range of NPs available. However, the comprehension of the adsorption mechanism by NPs is still on-going, thus isotherm and kinetic equations are continuously being formulated 4 . Having the ability to predict certain outcomes especially regarding the behaviour of particles involved in the process of adsorption or desorption would be beneficial for the effective design of experiments. An accurate prediction will bring forth a great breakthrough not only for the understanding of the underlying mechanisms but for future experiments as well.
In this project, existing arsenate (As(V)) adsorption data onto uncoated, humic acid-coated and ceria-coated magnetite samples were initially modelled. As for the predictive model, multiple linear regression was implemented to predict the equilibrium adsorption capacity, q e of each sample. The predicted results were compared to the experimental q e results and the model's accuracy was thus ascertained.

Literature review
In general, the first step involved in predicting an adsorption process is to emulate the process itself. An example is the study by Mandal et al. 5 , in which a hybrid material was synthesised to run an actual As adsorption experiment. The optimum parameters such as adsorbent dose, pH and initial concentration were varied for the predictive model's design.
A predictive model consists of a set of data framework which is correlated to the experimental variables. In the same study by Mandal et al. the evolutionary Genetic Programming (GP) and Least Squares Support Vector Machine (LS-SVM) models were utilised. The optimal physical parameters were inserted into these models for computation. The result was a model that could predict the adsorption process with a certain degree of confidence in emulating the actual adsorption process results.
Another study by Onur et al. 6 implemented Quantitative Structure-Activity Relationship (QSAR) and Linear Solvation Energy Relationship (LSER) techniques to develop a predictive model for aromatic contaminants adsorption onto multi-walled carbon nanotubes. In the study, 29 aromatic adsorption datasets obtained from literature and experiments were used in developing the predictive model. A few variables such as the molecular connectivity indices and solvatochromic descriptors were included for both the QSAR and LSER models, thus increasing the accuracy of predicting future trends. Besides that, Sisi et al. 7 predicted the adsorption of As(V) onto iron-coated sand, which was based on a model known as surface complexation. The input parameters were the experimental maximum adsorption capacities as well as results from an extended X-ray absorption fine structure spectroscopy (EXAFS) as a function of pH and adsorbent dose. The goal was to derive a model that could predict and determine the adsorption performance over a pH range of 5-8. This indicates that the predictive model can be implemented to determine the other factors of adsorption as well. Meanwhile, a study by Fukushi et al. 8 implemented the extended triple layer model (ETLM), which is also a form of the surface complexation model. The model was used to predict As(V) adsorption onto oxides with the aid of in situ spectroscopic evidence as well as molecular theories and calculation. Based on the various studies reviewed, it can be surmised that there are multiple techniques that could be applied in predicting the outcomes of an adsorption process. The understanding of the adsorption process and as well as the availability of data parameters are the deciding factors of the suitable predictive model(s) to be selected and implemented.

Methods
The flow of executing the predictive model from providing input to acquiring the output is shown in Figure 1. After determining the best model that fits the experimental data obtained from our previous experiments involving the different types of magnetite samples 9,10 , that specific model's equation was used to provide the linearity in determining the equilibrium adsorption capacity, q e . Aside from the model's parameters, the other values which acted as input are the raw data (the experimental q e ) that was linearised by the model chosen and the value of physical parameters such as concentration, temperature and the adsorbent sizes.

Input parameters
The respective particles' adsorption data: humic-acid coated magnetite NPs (modelled with Temkin isotherm), uncoated magnetite and ceria-coated magnetite particles (both modelled with Freundlich) were chosen as inputs to the predictive model. The summary of the choice of the isotherm parameters such as C e (equilibrium concentration of adsorption), b (Temkin constant), 1/n (intensity of adsorption) as well as K F (Freundlich constant) and the physical parameters are as shown in Table 1. Y-axis parameter values, which are forms of q e (dependant on the chosen isotherm model), were predicted and compared to the experimental Y-axis parameter values.

Multiple regression model
Linear predictive analysis was implemented using the software Python 3.0 (Python Programming Language, RRID: SCR_008394) to code the model's framework. Below are some key points of the linear regression model: • Sample size of four including the starting point.
• The first three rows of data were the training data set while the last one was the test set.

Validation of model performance
Results Table 2, Table 3 and Table 4 depict the predicted results which are based on the actual data sets provided as input. The overall absolute error is particularly low. However, this may be because linearity is the main component for the model framework, where almost zero error is possible so as long as the provided input gives good data fit as well. In short, the data simply stretches out linearly and can predict values for every additional 20 minutes beyond the modelled data's duration. In addition, the linearity extends indefinitely, which neglects the fact that there is an adsorption equilibrium capacity. Further analysis based on the results from the Table 2-Table 4 is tabulated and summarised in Table 5.
From Table 2 it can be observed that the lowest absolute error of 0.000805 was obtained for the 2238 nm particle, while the largest error, 0.087876 was for the 3203 nm particle. Also observed is that at a constant humic acid concentration, the predicted value increased with increasing particle sizes and surpasses the actual value.
Based on the tabulated results in Table 3, the lowest absolute error was for the 998.3 nm particle under a sonication temperature of 70°C, while the largest was for the 782 nm particle at On the other hand, the absolute error was the highest at the same temperature for the 998.3 nm particle. In addition, at 70°C, as the particle size increased, the absolute error values decreased. From Table 4, the lowest and the highest values of absolute error are observed for the 617 nm particle at 30°C and 1259 nm particle at 70°C, respectively.
The validated results are shown in Table 6. The value of RMSE obtained from various predictive models in the literature to compare against our predictive model's accuracy are as shown in Table 7.

Discussion
Compared to other studies 5,[11][12][13] shown in Table 7, our predictive modelling has lower error percentages, which may imply a higher prediction accuracy. However, this was only because our model is based solely on linearity, given very few input data sets with no iterations and no complex theories derived. The isotherm model's linear regression was simply recycled and transformed to a multiple linear regression by adding in a few X-axis parameters (including the chosen isotherms model's parameters) to predict the q e value. Compared to other methodologies employed as outlined in Table 7 5,11-13 , their results were simulated and obtained from a more complex framework of codes, not to mention having many data sets as well as multiple repetitions of trial and error. Nonetheless, the framework of the predictive technique developed here could act as a foundation in the formulation of a more complex model with more input datasets to improve its reliability.
To explain further, since the linear isotherm model chosen for each sample provided the best fit to the adsorption data, the assumptions and theories of the model would be applicable for our samples as well. Hence, the existing the model's equation can be used as a basis to predict future adsorption trends under different experimental variables once the preliminary adsorption data has been modelled. This is crucial especially as the linearity of the modelled data is the input to the predictive model analysis. Hence, this would reduce the need to derive or formulate new complex theories to emulate the adsorption process. The simplicity and straight-forward nature of the predictive model developed herein makes it attractive for further development and testing with the adsorption data of numerous types of samples under different experimental parameters for a more effective design of experiment.

Conclusions
Many predictive modelling techniques were researched to emulate the actual adsorption process. In this project, a simple and straightforward multiple linear regression method was implemented. Satisfactory results were obtained with small error values. Despite lacking in theoretical derivations, choosing an existing model may prove to be sufficient to predict the q e values as long as sufficient experimental data are available, and the best-fit model is decided beforehand.

Software availability
Source code available from: https://github.com/Nishadevaraj/ Phyton-source-code-Manuscript-73260-.git 14 Archived source code at time of publication: License: Apache-2.0 License. This work has enough content to be published in the journal, is well-argued, well-written and well organized information. The organization of the article is satisfactory. The paper's title is brief and reflects the main theme of the paper. The abstract is sufficiently informative. It is completely self-explanatory, briefly presents the topic, states the scope of the experiments, indicates significant data, and points out major findings and conclusions.

○
The keywords are suitable so the article can be found in the current registers or indexes.

○
The introduction shows how this work builds on previous work on the subject. Introduction clearly states the problem being investigated. The purpose and objectives of the manuscript are adequate and appropriate in view of the subject matter. The authors highlighted the novelty of the paper.

○
The authors accurately explain how the model was developed.

○
The input parameters, the key points of the linear regression model and multiple error functions determined are specified. There is sufficient information to allow replication of the research.

○
In the Results and Discussion part, the authors present and interpret the results of the ○ expertise to confirm that it is of an acceptable scientific standard.
The benefits of publishing with F1000Research: Your article is published within days, with no editorial bias • You can publish traditional articles, null/negative results, case reports, data notes and more • The peer review process is transparent and collaborative • Your article is indexed in PubMed after passing peer review • Dedicated customer support at every stage • For pre-submission enquiries, contact research@f1000.com