Keywords
Pervoskite solar cell, SCAPS-1D, dataset, photovoltaic technology, numerical simulation, machine learning.
This paper presents a synthetic dataset to study the performance of perovskite solar cells (PSC) simulations using the simulation tool SCAPS-1D. The dataset consists of 18.570 simulated devices generated from four baseline device architectures and their respective photovoltaic performance values. The data set was generated through numerical simulations, and the evaluation of the electrical performance of the device was carried out by studying current density-voltage (J-V) curves under standard illumination conditions, temperature, and maximum applied voltage as working conditions, which were not modified. The dataset can be used to train different machine learning (ML) models using supervised methods or unsupervised techniques such as clustering or dimensionality reduction, which facilitate the identification of patterns or relationships between parameters. Thus, it can be useful in reverse design strategies to determine optimal configurations based on defined objectives. This work contributes to the development of PSC by providing a broad dataset for further analysis and optimization.
Pervoskite solar cell, SCAPS-1D, dataset, photovoltaic technology, numerical simulation, machine learning.
Perovskite solar cells (PSC) have emerged as one of the most promising technologies in the field of photovoltaic energy because of their high absorption coefficient, low manufacturing costs and great versatility in device design.1–3 However, the optimization of these solar cells involves a complex interaction between optical, electric, and structural properties of multiple functional layers.4–6 The experimental exploration of this design space is expensive and time-consuming, which has driven the increasing use of computational simulations as a complementary tool to understand and predict the performance of these devices.7–9 Multiple configurations have already been studied under standardized and comparable conditions using different simulation tools,10–18 which are essential to validate the implementation of new numerical analysis.
The use of the software SCAPS-1D (Solar Cell Capacitance Simulator) has become popular because it is freely available and its versatility for modeling thin-film heterojunction solar cells.19 SCAPS-1D solves Poisson and continuity equations to calculate the photovoltaic performance, considering charge generation, recombination mechanisms, and transport through multilayer structures.20 SCAPS-1D results allow us to understand how the photoelectric properties of PCS affect its performance,21 representing a useful tool for designing PSC. Additionally, the integration of machine learning (ML) with device simulations has been proposed, showing promise for accelerating materials development and device optimization.22–25 For example, Odabaşı and Yıldırım used data mining and decision trees to analyze the impact of deposition methods on cell efficiency, concluding that the quality of the data input into the algorithm is key to obtaining accurate results.26 In 2022, Yan et al. applied supervised models such as XGBoost and random forest with GridSearchCV to predict bandgap, Jsc, and Voc values, obtaining a 2% margin of error compared to experimental measurements.27 In 2023, a study was published that combined SCAPS-1D with ML tools (RandomSearchCV and GridSearchCV) to predict the best material combination for perovskite and HTL, achieving an efficiency of 23.9% with 75% accuracy.24 In 2023, Lu et al. used supervised and unsupervised algorithms to predict cell performance based on experimental data, concluding that A cations increase the device’s energy efficiency.28 Recently, in 2024 Shrivastav et al. integrated SCAPS-1D with ML models to analyze six cesium-based perovskite materials. They used 2160 simulations varying thickness, doping, and defect density, achieving a maximum efficiency of 14% with CsPbI and a coefficient of determination R of 99.99% with XGBoost. Additionally, they used SHAP analysis and revealed that the absorber layer material and its thickness are the most influential factors on efficiency.7 In this context, the development of synthetic databases acquires strategic relevance. These databases not only allow the systematization of knowledge about the relationships between material parameters and photovoltaic performance but also allow the training of ML models,25,27,29,30 the application of optimization techniques,31 and the reverse design of solar cells.28
This work presents a structured database composed of 18.570 simulations of PSC generated with SCAPS-1D, a simulation tool freely provided for academic use by the University of Gent in Belgium.32 Four structures were analyzed, considering systematic variations of the active material and geometric parameters of the cell, where the materials most widely used in the literature were included to ensure the practical relevance of the dataset. Each entry in the dataset includes the input parameters that describe optical, electrical, and physical properties of the solar cell, as well as the electrical performance results in terms of open circuit voltage (Voc), short circuit current density (Jsc), fill factor (FF) and power conversion efficiency (PCE). This database represents a valuable tool for the scientific community, allowing researchers to evaluate the individual or combined impact of key parameters on device efficiency, train ML models for performance predictions for the sake of obtaining surrogate models. These models can facilitate the integration and fusion of domain knowledge into more complex machine learning models that include synthesis conditions for solar cells. They would also allow the application of multi-objective optimization techniques to improve solar cell efficiency. In this way, this work aims to contribute to the accelerated advancement of the design of photovoltaic devices through reproducible and accessible computational approaches.
The structure shown in Figure 1 corresponds to a nip-type PSC33 with five layers, which is the configuration studied in this work. The first and last layers are the electrical contacts, while the internal layers are responsible for the device’s energy conversion. For the top contact, fluorine-doped tin oxide (FTO) was used since it is ideal to function as a transparent electrode. Titanium oxide (TiO ) and tin oxide (SnO ) were used for the electron transport layer (ETL) due to their electronic properties and proven use in the scientific literature.17,34–38 For the perovskite absorber layer, methylammonium lead iodide (MAPbI ), methylammonium tin iodide (MASnI ) and formamidinium lead iodide (FAPbI ) were used because these materials have the highest reported energy efficiencies and have been extensively studied by the scientific community.10,25,39–41 For the hole transport layer (HTL), Spiro-OMeTAD and copper(I) thiocyanate (CuSCN) were used because configurations with these materials have demonstrated remarkable performance in hole mobility and effective energy alignment.41–43 Finally, the last layer is generally made of gold (Au) since it has high electrical conductivity.43 These materials were chosen for their optoelectronic properties, energy compatibility, and the high performance demonstrated in experimental and simulated studies available in the literature.11,20,21,35,41,43–47 To convert solar energy into electrical energy, the PSC absorbs photons from solar radiation in the perovskite layer, generating electron-hole pairs, which are separated and transported by the ETL layer, which extracts the electrons, while the HTL layer collects the holes. The top electrode, usually made of a transparent conductive oxide like FTO, allows the entry of light and the collection of carriers, while the bottom metallic electrode completes the circuit, allowing the flow of external current under load conditions.
SCAPS-1D analyzes the electrical response of a PSC solving a coupled set of differential equations that include the Poisson equation (1), the continuity equations for electrons (2) and holes (3), and the performance metrics equations (4)-(7). The Poisson equation is presented below:
The data set was generated through numerical simulation using the freely available software SCAPS-1D version 3.3.09 and the evaluation of the electrical performance of the device was carried out by studying J-V curves under the standard working conditions of AM1.5G illumination (1000 W/m ), temperature of 300 K and maximum applied voltage of 1.2 V.48 From these curves, the main electrical parameters that characterize the performance of the system were determined, including the Voc, Jsc, FF and PCE. These parameters were obtained directly from the software after simulating the optoelectronic behavior of the device, allowing a precise evaluation of the expected performance and allowing comparative analysis in terms of efficiency, stability, and robustness against variations in the materials, properties or thickness of the studied layers.
The selection of parametric variation was based on their direct influence on the physical and electrical device performance. The thickness of the perovskite layer (T_PVK) must be adjusted to absorb the largest amount of photons, maximizing Jsc without exceeding the carrier diffusion length, since excessive thicknesses increase recombination losses and degrade Voc and FF.49 Similarly, the thicknesses of the transport layers (T_ETL and T_HTL) must be optimized to ensure efficient electron and hole transport with low recombination and series resistance.50,51 Additionally, the properties of perovskite have a direct influence on the performance of the cell; the bandgap (EG_PVK) establishes the balance between the current density and voltage, based on the Shockley-Queisser limit52; the dielectric permittivity (ER_PVK) influences exciton dissociation53,54; the acceptor density (NA_PVK) models the internal electric field, essential for charge separation and a high Voc55; and the defect density (NT_PVK) represents the main pathway for non-radiative recombination loss and limiting the carrier lifetime.56 The variation ranges are specified in Table 1 and parameter combinations leading to convergence errors in the software were discarded, as these typically arose from physically realistic ranges, disrupting the solution of Poisson’s equation, continuity equations, or boundary conditions.
The accuracy of the simulation methodology was validated by successfully reproducing the results reported by research articles as shown in Table 2. For this purpose, over 50 recently published scientific articles were collected that included most of the parameters to simulate a nip-type PSC in SCAPS-1D and also reported the values of Voc, Jsc, FF, and PCE. After a systematic review, to avoid unrealistic results, articles reporting energy efficiencies above the Shockley-Queisser limit57 for cells based on MAPbI , MASnI and FAPbI were excluded, as, according to experimental validations,58 the maximum efficiency achieved for these materials does not exceed 22.2 , 14.35 and 24.66 , respectively. It is important to note that, in the simulations of PSC, variable physical (surface roughness, grain size, and orientation), chemical (temperature and drying time, solvent and antisolvent engineering, and additives) and environmental factors (temperature variations, cloud cover, irradiance, etc.) are not incorporated, which can affect the actual performance of the cells. Therefore, it is expected that the values obtained through simulation will be higher than those observed experimentally maintaining consistency with realistic values.34,58
Reference | Reported | Simulated | ||||||
---|---|---|---|---|---|---|---|---|
VOC (V) | JSC (mA/cm2) | FF (%) | PCE (%) | VOC (V) | JSC (mA/cm2) | FF (%) | PCE (%) | |
59 | 1.04 | 30.5 | 82.69 | 26.95 | 1.02 | 29.8 | 78.28 | 26.12 |
60 | 0.98 | 18.6 | 82.50 | 13.40 | 0.96 | 18.2 | 79.97 | 13.07 |
61 | 1.02 | 22.7 | 62.67 | 21.42 | 1.01 | 22.4 | 65.92 | 21.26 |
62 | 0.91 | 24.1 | 54.19 | 16.08 | 0.93 | 24.3 | 78.98 | 16.49 |
63 | 0.87 | 24.9 | 85.80 | 15.50 | 0.85 | 24.4 | 81.26 | 15.14 |
A comparison of values obtained through simulation and values reported in the literature for Voc, Jsc, FF, and PCE is shown in Table 2. There are slight variations in the obtained values compared to the reported ones, which can be attributed to multiple causes, as very few authors report in detail all the simulation conditions or all the parameters used. Some studies include models for recombination, absorption or defects without specifying numerical values (defect density, defect type, or recombination coefficients); therefore, the exact replication of the simulation conditions is limited. This is crucial for a simulation since it considers recombination phenomena, losses due to defects of the layers or interfaces derived from manufacturing processes or impurities in the materials that compose the cell, which can significantly affect the performance of the system. Although the variations in the reported parameters prevent an identical replication, the obtained results show consistency with the published data, supporting the robustness of the methodology.
The dataset comprises 18.570 simulated PSC generated from four device architectures: TiO /MAPbI /CuSCN,60 TiO /MASnI /Spiro-OMeTAD,59 SnO /FAPbI /Spiro-OMeTAD,64 and TiO /MAPbI /Spiro-OMeTAD.65 The performance results of the cells were obtained by using the SCAPS-1D option “Batch set-up”, which allows carrying out a parametric study of PSC in specific value ranges and obtaining the results associated with all the combinations; the ranges specified in Table 1 were used, and only combinations that produced convergence errors were discarded. The dataset includes, for each record, nineteen PSC features (those could be taken as inputs or “X” values in case ML application is implemented) and four associated results, such as Voc, Jsc, FF, and PCE (that could be used as outputs or “y” values). The description of each convention name in the column is as follows:
Material (M):
• Column A: Material of the ETL layer (M_ETL).
• Column B: Material of the perovskite absorber layer (M_PVK).
• Column C: Material of the HTL layer (M_HTL).
Ranging parameters:
The next columns correspond to parameters that were varied as shown in Table 1:
• Column D: Thickness of ETL layer in ( ).
• Column I: Thickness of absorber layer in ( ).
• Column J: Bandgap of absorber layer in eV ( ).
• Column K: Dielectric permittivity of absorber layer ( ).
• Column L: Shallow acceptor density of absorber layer in ( ).
• Column N: Defect density of absorber layer in ( ).
• Column O: Thickness of HTL layer in ( ).
Constant value parameters:
The next columns have constant values for the specific parameters. It is important to include them to validate the results presented in this work.
• Column E: Bandgap of ETL layer in eV ( ).
• Column F: Dielectric permittivity value of ETL layer ( ).
• Column G: Shallow donor density of ETL layer in ( ).
• Column H: Defect density value of ETL layer in ( ).
• Column M: Shallow donor density value of absorber layer in ( ).
• Column P: Bandgap of HTL layer in eV ( ).
• Column Q: Dielectric permittivity of HTL layer ( ).
• Column R: Shallow donor density of HTL layer in ( ).
• Column S: Defect density of HTL layer in ( ).
Performance metrics:
• Column T: Open circuit voltage in V ( ).
• Column U: Short circuit current density in JSC
• Column V: Fill factor in percentage ( ).
• Column W: Power conversion efficiency in percentage ( ).
Basic descriptive statistics were conducted for the dataset, generating the data distribution for the performance metrics (Voc, Jsc, FF, and PCE). Figure 2 a) presents the data distribution for Voc, which shows a main peak in the multimodal distribution with a mean of 0.98 V and a median of 1.00 V, with a standard deviation of 0.14 V and an interquartile range (IQR) of 0.90 V–1.10 V. A smaller number of parametric combinations are observed for values below 0.7 V, which can be attributed to increased recombination due to the geometric configuration of the device or improper band alignment.57,66 The data distribution of Jsc presented in Figure 2 b), shows a multimodal distribution with four marked peaks around 16 , 21 , 27 , and 34 , with a mean of 20.7 ; median of 20.2 ; and standard deviation of 9.4 . The peaks suggest subsets defined by discrete thicknesses of the perovskite or by steps in the optical absorption imposed during the parametric sweep. Physically, current densities below 10 are associated with thin films or large bandgaps, while values above 30 are associated with sufficiently thick layers with low defect density, where carrier absorption and collection are maximized.67,68 Figure 2 c) presents the data distribution of the FF, where a noticeable peak is observed in values close to 79%, with a mean of 65.2%, a median of 69.2%, and a standard deviation of 15.8%. This demonstrates a high dispersion in the data, due to a considerable amount of data being located at values below 60%, which significantly affects the FF distribution for the dataset and shows that some configurations can generate losses, either due to recombination or cell defects.69 Finally, Figure 2 d) presents the data distribution of the PCF, which shows a peak around 9% with a mean of 12.8%, a median of 12.1%, and a standard deviation of 6.26%, exhibiting a broad and slightly bimodal shape: one cluster between 5% and 15% associated with devices with one or two suboptimal parameters (e.g., moderate Jsc and acceptable FF) and another peak between 18–23% that asociated with Voc and FF values. The decreasing trend of 25% reflects the limit imposed by maximum absorption and residual non-radiative losses, consistent with the Shockley-Queisser model.57,70 In summary, the dispersion demonstrates how the joint variation of thickness, bandgap, and defects controls the efficiency, reproducing the range of values reported experimentally.
The dataset was stored in OSF HOME, an open-source platform for managing and sharing research data. The project, titled “Synthetic dataset to study the performance of perovskite solar cell simulations”, with DOI: 10.17605/OSF.IO/ZX4AJ includes the file “Synthetic dataset to study the performance of perovskite solar cell simulations.xlsx”, which contains all simulated device configurations and their corresponding performance metrics.
This dataset can be used to analyze, design, and optimize PSC, as it contains a considerable number of simulations (18.570) with their respective photovoltaic performance values, which is useful for studying the relationships between design parameters and electrical performance. Because of its composition and variety in the parameters that constitute the device, the dataset can be used to train different ML models using supervised methods such as random forest, gradient boosting, support vector machines (SVM), and deep neural networks to predict the performance metrics Voc, Jsc, FF, and PCE, or unsupervised techniques like clustering or dimensionality reduction (PCA, t-SNE) that allow discovering patterns or relationships between parameters. Additionally, multi-objective optimization techniques can be implemented, such as genetic algorithms, Bayesian methods, or particle swarm methods.
On the other hand, as each input represents a set of specific parameters along with their performance results, researchers can identify optimal regions of the design space to maximize efficiency, minimize recombination losses, or reduce the use of high-cost materials. Thus, it can be useful in reverse design strategies to determine optimal configurations based on defined objectives.
Additionally, it is important to mention that this database was generated for planar PSC with a single absorbing layer, which may represent limitations that should be considered when using it. Moreover, all the simulations were generated under constant and one-dimensional conditions, so three-dimensional effects, long-term degradation, or real environmental conditions (temperature, humidity, material degradation, etc) are not capture. Although a cross-validation was conducted with data reported in the literature, there may be discrepancies attributable to the lack of detail in the parameters reported by some authors, which prevents an exact replication of the experimental results. Finally, some parametric combinations were discarded due to numerical convergence failures, which may slightly bias the exploration of the design space.
Y.V.-G.: conceptualization, methodology, validation, investigation, data curation, and writing—original draft preparation, E.G.-V.: conceptualization, methodology, validation, formal analysis, investigation, data curation, and writing—original draft preparation, visualization, and supervision. A.S.-S.: conceptualization, formal analysis, investigation, data curation, and writing—review and editing, project administration, supervision, and funding acquisition. N.G.-C.: conceptualization, methodology, formal analysis, investigation, resources, writing—review and editing, supervision, project administration, and funding acquisition. All authors have read and agreed to the published version of the manuscript.
Open Science Framework (OSF). Synthetic dataset to study the performance of perovskite solar cell simulations. DOI: 10.17605/OSF.IO/ZX4AJ.71
This project contains the following underlying data:
Synthetic dataset to study the performance of perovskite solar cell simulations.xlsx. All simulated device configurations and their corresponding photovoltaic performance metrics (Voc, Jsc, FF, PCE) were generated with SCAPS-1D under the conditions described in the methods. Data are available under the terms of the CC-By Attribution 4.0 International.
The authors are grateful to Marc Burgelman and his colleagues at the University of Gent, Belgium, for providing the SCAPS-1D simulator and gratefully acknowledge Prof. Monica Botero Londoño (School of Electrical, Electronic and Telecommunications Engineering, Universidad Industrial de Santander) for her valuable guidance in defining the set of parameters explored in this work.
Views | Downloads | |
---|---|---|
F1000Research | - | - |
PubMed Central
Data from PMC are received and updated monthly.
|
- | - |
Provide sufficient details of any financial or non-financial competing interests to enable users to assess whether your comments might lead a reasonable person to question your impartiality. Consider the following examples, but note that this is not an exhaustive list:
Sign up for content alerts and receive a weekly or monthly email with all newly published articles
Already registered? Sign in
The email address should be the one you originally registered with F1000.
You registered with F1000 via Google, so we cannot reset your password.
To sign in, please click here.
If you still need help with your Google account password, please click here.
You registered with F1000 via Facebook, so we cannot reset your password.
To sign in, please click here.
If you still need help with your Facebook account password, please click here.
If your email address is registered with us, we will email you instructions to reset your password.
If you think you should have received this email but it has not arrived, please check your spam filters and/or contact for further assistance.
Comments on this article Comments (0)