GoodVibes: automated thermochemistry for heterogeneous computational chemistry data [version 1; peer review: 2 approved with reservations]

GoodVibes is an open-source Python toolkit for processing the results of quantum chemical calculations. Thermochemical data are not simply parsed, but evaluated by evaluation of translational, rotational, vibrational and electronic partition functions. Changes in concentration, pressure, and temperature can be applied, and deficiencies in the rigid rotor harmonic oscillator treatment can be corrected. Vibrational scaling factors can also be applied by automatic detection of the level of theory and basis set. Absolute and relative thermochemical values are output to text and graphical plots in seconds. GoodVibes provides a transparent and reproducible way to process raw computational data into publication-quality tables and figures without the use of spreadsheets.


Introduction
Quantum chemistry software packages implement various levels of theory and basis sets, so-called model chemistries, that can be used to optimize molecular geometries and compute ground state vibrational modes. Statistical mechanical expressions using these vibrational frequencies, along with other contributing terms to the partition function, are used to obtain the enthalpy, entropy and Gibbs energy values required to understand, validate or predict experimental observations. GoodVibes was developed to address several challenges faced by practitioners of computational thermochemistry.
Firstly, the rigid-rotor harmonic oscillator (RRHO) model is routinely used to obtain vibrational entropic contributions and is the default for most electronic structure packages. However, the harmonic approximation fails to accurately describe low frequency modes and alternative models may be more appropriate [1][2][3] . Corrections to the RRHO model may be theoretically desirable, but the absence of practical and accessible tools to implement such corrections has limited their widespread adoption. Here we present and detail the use of the program GoodVibes 4 , a Python-based project used for obtaining thermochemical values while applying corrections using quasi-harmonic approximations, alongside other corrections relevant to the overestimation of the zero-point energy, multi-conformer ensembles, and standard concentrations.
Secondly, computational chemistry projects often combine results from different software. For example, geometries may be optimized with one program and the energies evaluated with another. Complex spreadsheets are frequently used to process these results and to prepare figures and manuscripts. However, errors in spreadsheets are commonplace and may be hard to detect. For this reason, GoodVibes allows the combination of data from multiple program outputs as part of the same project. Our group has utilized this program in informatics and organic mechanistic studies 5 , however, GoodVibes is not limited to a specific area of chemistry and has been used in published studies by more than 30 different research groups. As an example, previous studies in organic catalysis and mechanism, photocatalysis, and inorganic structure characterization have made use of this program package to process and make corrections to thermochemical data 6 . Figure 1 details an overview of the GoodVibes workflow from input data to the various outputs. Input options supplied via the command line enable chemists with basic experience with Python to process a large number of computational output files and to generate publication-ready data. These data (e.g. figures, tables) can be quickly reproduced by any other user with access to the raw data and the GoodVibes code. With recent and coming changes to standards of supplying full computational outputs through open repositories (such as Zenodo and ioChem-BD 7 ) alongside publications, GoodVibes allows for complete transparency in how reported thermochemical values were obtained.

Methods
By recomputing translational, rotational, vibrational and electronic partition functions from the data generated by quantum chemistry programs (such as vibrational frequencies, molecular mass, etc.), thermochemistry can be calculated in GoodVibes at any specified temperature or concentration/pressure. By default, these are set to 298.15 K and 1 atmosphere. A notable automated correction to the RRHO model is applied to low frequency vibrational modes. These low frequency modes (typically less than 100 cm -1 ) are not well approximated as harmonic and their entropy contributions tend to be overestimated. Methods to avoid this include nontrivial hinderedrotor calculations and computationally expensive anharmonic calculations, both of which become more infeasible Figure 1. The GoodVibes workflow. Computational chemistry output data files are parsed, and thermochemistry data is generated based on a series of user-defined physical assumptions and reaction conditions. The program outputs absolute and relative values to text and reaction profiles as plots. The execution time is seconds. with higher atom counts 1 . Both Cramer/Truhlar 2 and Grimme 3 have proposed simple, more widely adopted corrections, so-called quasi-harmonic approximations, formulated specifically to obtain vibrational entropies for these low frequency modes. GoodVibes will also automatically apply empirical scaling factors to computed frequencies. These corrections arise from the tendency of electronic structure calculations (e.g. with density functional theory) to overestimate vibrational frequencies relative to experiment, and hence zero-point energies. Linear scaling factors have been collated for a number of functional and basis set combinations for sets of small organic molecules. GoodVibes accesses the scaling factors compiled by the Truhlar group across several studies 8 , automatically detects the level of theory and basis set and scales the frequencies if there is a match.
Single point energy calculations performed at more expensive levels of theory and with larger basis sets are commonly used in combination with the thermal corrections obtained from separate calculations, often using different software packages 9 . A multitude of output files from different program packages along with correction applied by GoodVibes can be combined to construct a potential energy surface by using an easily interpretable YAML file that defines the elementary steps of the reaction, file definitions and formatting options.

Implementation
GoodVibes is a module implemented in Python, currently supported by 2.6-7 and all 3.x versions. The module requires the NumPy library (version 1.14.2 or greater), with optional importing of Matplotlib (version 2.2.4 or greater) to graph potential energy surfaces. To test the accuracy and upkeep of this coding package, a series of tests have been implemented in TravisCI, checking functionality and accuracy of the code when the master branch is updated on GitHub against target Python versions on Linux, macOS and Windows operating systems.

Operation
GoodVibes is compatible with Windows 10, Linux and macOS operating systems. This software is appropriate for use with output files produced by a wide range of calculation types, including density functional theory, wave function theory, molecular mechanics, COSMO-RS solvation calculations, and semi-empirical methods. Current supported programs include Gaussian 09 10 , Gaussian 16 11 , ORCA 4 12 single point energy calculation files, and COSMOtherm 13 COSMO-RS solvation free energy output files. GoodVibes provides an output file with tabulated thermochemical data, optionally exported as a CSV file. Cartesian coordinates of processed files can also optionally be exported in an XYZ file. Plots of energy profiles are optionally generated.

Use case
In this section we show how GoodVibes can be used with a variety of input options and files to transform heterogeneous computational chemistry data into human-readable tabulated values and figures. In this example, 50 calculation output files (25 Gaussian geometry optimizations and vibrational frequency calculations with 25 corresponding ORCA single point calculations) are used to create the data in Table 1 and graphed in Figure 2. All raw data was taken from a 2018 study 14 , and is freely accessible through a Zenodo repository 15 . Each point in Figure 2 represents a unique conformer's Gibbs energy. The Boltzmann-weighted values are shown as dashes connected by the curved profiles. Optimizations were done with ωB97X-D/6-31+G(d) 16 implemented in Gaussian using an "ultrafine" pruned (99,590) integration grid. Considering solvent effects of the reaction, calculations were run with the SMD solvation model 17 using ethanol, and the concentration of each substance was set to 1.0 M for further thermochemical calculation. GoodVibes allows for solvent media corrections to entropy based on select solvent standard state concentration 18 , which was applied to ethanol in this case.
Thermochemistry is evaluated at a temperature of 80°C (353.15 K) in accordance with experimental conditions (the temperature assumed in the original calculations does not influence the results from GoodVibes). These values are corrected using the quasi-harmonic approximation proposed by Grimme. One conformer of intermediate II in the 'Py' pathway has a small imaginary frequency of 2.9 cm -1 , which was "inverted" to a real value 2.9 cm -1 , as done in previous works to small imaginary frequencies, typically under >i50 cm -114,19 .
Separate single point energies are extracted from ORCA calculations performed at a coupled cluster level of theory with a (DZ/TZ) basis set extrapolation (DLPNO-CCSD(T), cc-pVDZ/cc-pVTZ) 20 , then used for calculations and added to the GoodVibes output for comparison. A potential energy surface is constructed by using Boltzmann weighted averaging of all conformers at each step in the pathway. A multi-structural correction is then applied to the resulting Gibbs free energy based on the number and energy of distinguishable conformers present for each species 21 . The Gibbs energy profile is constructed from options specified in the YAML file containing the reaction pathway steps, file definitions and plot formatting options. All of the output files, correction and formatting  options are supplied to GoodVibes to output tabulated data and a graph of the reaction pathway from a single command: python -m goodvibes *.log --spc DLPNO --pes science.yaml --graph science.yaml -t 353.15 --imag --invertifreq -5 --media ethanol -c 1 Additional usage examples are described at GoodVibes GitHub repository, where several features have been added in response to requests from the community of users.

Conclusion
GoodVibes is a Python-based tool that calculates thermochemical data from quantum mechanical calculations in a transparent and reproducible way. GoodVibes may be employed with any type of chemical structure, including organic and inorganic molecules of varying sizes as well as with single point calculations performed by differing programs. Additionally, GoodVibes contains many additional automated features that are designed to save time for researchers, allowing for the calculation of thermochemical data at any temperature or concentration, incorporating valuable and overlooked corrections to the RRHO model through quasi-harmonic and vibrational scaling factor corrections and construction of potential energy surfaces with applied corrections accounting for the accessibility of multiple conformations. For projects involving the analysis of a large number of computational chemistry output files, GoodVibes helps to prevent human errors associated with spreadsheets, and can be used to reproduce any table or figure from the raw data.
This project contains data referenced in the use case.   ○ "a Python-based project" -> "a Python-based program/tool" (page 3/9) ○ "standard concentrations" -> "standard state concentrations" (page 3/9) ○ Addition: After reading the review by Dr. Pollice, I agree with him that the handling of the moment of inertia for Grimme's method is not correct. There is a reference implementation of his method which can be found in the xtb Github repository: https://github.com/grimme-lab/xtb. The "subroutine axis2" handles the calculation of the average moment of inertia. After correcting for this in the code, I think the authors should do an analysis of how large the error is for using the erroneous moment of inertia, e.g., as a function of molecular size as Dr. Pollice points out.

Is the description of the software tool technically sound? Yes
Are sufficient details of the code, methods and analysis (if applicable) provided to allow replication of the software development and its use by others? Yes

Is sufficient information provided to allow interpretation of the expected output datasets and any results generated using the tool? Yes
Are the conclusions about the tool and its performance adequately supported by the findings presented in the article? Yes Competing Interests: No competing interests were disclosed.
Reviewer Expertise: physical organic chemistry, reaction modelling, machine learning I confirm that I have read this submission and believe that I have an appropriate level of expertise to confirm that it is of an acceptable scientific standard, however I have significant reservations, as outlined above.
conformer used and the masses of all the atoms. In my opinion, this issue needs to be addressed properly by implementing the calculation of the average moment of inertia of a conformer based on its Cartesian coordinates in GoodVibes. This is particularly important for large molecules with hundreds of atoms because their average moment of inertia will deviate significantly from the used global parameter.
After close inspection of the code, I believe that, currently, GoodVibes applies the solvent correction as obtained from the output of COSMOtherm as is without any adjustments. However, it has been pointed out recently that this is (unfortunately) a common mistake as COSMOtherm uses the mole fraction reference state for pure solvent rather than the concentration reference state. Consequently, this leads to inconsistencies in the standard states used and to a systematic offset in predicted Gibbs free energies of solvation. The details of this problem and how to correct for it are described in the literature. 1 for generating high-resolution illustrations.
The benefits of publishing with F1000Research: Your article is published within days, with no editorial bias • You can publish traditional articles, null/negative results, case reports, data notes and more • The peer review process is transparent and collaborative • Your article is indexed in PubMed after passing peer review • Dedicated customer support at every stage • For pre-submission enquiries, contact research@f1000.com