The PubChemQC Project
Copyright © 2013-2023 NAKATA Maho, MAEDA Toshiyuki, SHIMAZAKI Tomomi, HASHIMOTO Masatomo
NEWS
2023-8: The paper for the PubChemQC B3LYP/6-31G*//PM6 datastes (86 million mols.) has been published
2023-5-31: PubChemQC B3LYP/6-31G*//PM6 datastes (86 million mols.) have been released.
2022-8-4: pubchemqc.riken.jp has been temporally moved to https://nakatamaho.riken.jp/pubchemqc.riken.jp/
2022-3-23: pubchemqc_jcim2017_jsons.10150017a15274edd1e5ed06ad5831de.tar.gz JSON files for PubChemQC JCIM 2017. It does include all the properties but not the basis set information nor MO-AO matrix are included.
2021-8-20: The PubChemQC PM6 ver.2.0.0 is released. Docker images using Postgrest for databases are available from this version and you can query molecules using these.
2021-7-20: We uploaded a docker image of PubChemQC (JCIM 2017 B3LYP) database. You can download from here.
2020-10: Our new paper has been published:PubChemQC PM6: Data Sets of 221 Million Molecules with Optimized Molecular Geometries and Electronic Properties
2020-09-03, 2020-09-07: The whole B3LYP dataset is now avaiable on Google Drive (2.0TB) and splitted by CID 25k ranges. Includes GAMESS inputfiles and output files. We strongly suggest use of rclone to download the file.
2020-08: The PubChemQC PM6 datasets are now available.These data contain PM6 optimized structures for neutral, spin flippled, cation and anion states of molecules in PubChem
Datasets download
Molecule Query
Please use The Public Computational Chemistry Database Project, our web interface to pubchemqc dataset.
Mission
- We provide quantum chemical results for molecules of the PubChem Project: input, output, and mol files. The molecular geometires in the input files and the mol files are already optimized, you may also want to check by output files, or recalculation.
Citation
- PubChemQC B3LYP/6-31G*//PM6 Data Set: The Electronic Structures of 86 Million Molecules Using B3LYP/6-31G* CalculationsJ. Chem. Inf. Model. 2023, 63, 18, 57345754
- PubChemQC PM6: Data Sets of 221 Million Molecules with Optimized Molecular Geometries and Electronic Properties , J. Chem. Inf. Model. 2020, DOI: 10.1021/acs.jcim.0c00740
- Maho Nakata and Tomomi Shimazaki, "PubChemQC Project: a Large-Scale First-Principles Electronic Structure Database for Data-driven Chemistry", J. Chem. Inf. Model., 2017, 57 (6), pp 1300-1308.
- Nakata Maho, "The PubChemQC project: A large chemical database from the first principle calculations", AIP Conf. Proc. 1702, 090058 (2016).
Calculation conditions
- Density Functional Theory and B3LYP (Becke + Slater + HF exchange, with LYP + VWN1RPA e- gas formula) functional.
- 6-31G(d) (aka 6-31G*) basis set.
- Geometry optimization
- Excited state calculation by time depended density functional theory, using the geometry obtained above with 6-31+G(d) basis set.
- GAMESS (Linux version)
How we done
- Get molecular information from the PubChem Project
- Extract Canonical SMILES description for each molecule by OpenBABEL.
- Obtain initial geometories as 3Dify by "--gen3d -h" by OpenBABEL.
- Perform PM3 geometry optimization calulation by GAMESS.
- Extract PM3 optimized geometry and perform Hartree-Fock STO-6G geometry optimization calulation by GAMESS.
- Extract Hatree-Fock STO-6G optmized geometry and perform DFT B3LYP geometry optimization calulation by Firefly.
- Extract DFT B3LYP optmized geometry and perform DFT B3LYP geometry optimization again by GAMESS with tighter threshold.
- Extract DFT B3LYP geometry and perform DFT B3LYP geometry optimization calulation again for sure, and reduce confusions for input files we provide.
- Perform TD-DFT calulation with 6-31+G(d) for optimized geometry using B3LYP/6-31G(d), and obtain 10 lowest excited states (we added diffuse function, as excited states may contain diffuse orbitals).
- Upload TD-DFT/DFT B3LYP input/output files.
- Upload the mol files, too. This is for GaussView (and/or Gaussian) users and other quantum chemical program package users. In any case, you can use OpenBABEL to convert to the formats which your favorite program packages employ.
- Upload results daily.
- Almost everything has been automated ;-)
Limitation
- The results are in as is basis. We are not sure results are correct or not.
- The PubChem Project provides over 75,000,000 molecules which are far beyond from our calculation capacity. (this is our challenge!)
- The molecular weight should be less than 500.
- Only for the singlet and neutral molecules.
- Only for molecules which contain H, He, Li, Be, B, C, N, O, F, Ne, Na, Mg, Al, Si, P, S, Cl, Ar, K, Ca, Sc, Ti, V, Cr, Mn, Te, Co, Ni, Cu, Zn (basis set limitation of PM3 method, 6-31G(d) and STO-6G basis).
How to participate?
- Politically, you must agree to uploading results to this project as well as the PubChem Project.
- You need machine(s) to run GAMESS .
- Then ask me ;-)
History
Acknowledgment
- Horikosi, Masashi (horikoshi.masashi@intel.com)
References
- GAMESS: "General Atomic and Molecular Electronic Structure System" M.W.Schmidt, K.K.Baldridge, J.A.Boatz, S.T.Elbert, M.S.Gordon, J.H.Jensen, S.Koseki, N.Matsunaga, K.A.Nguyen, S.Su, T.L.Windus, M.Dupuis, J.A.Montgomery J. Comput. Chem., 14, 1347-1363(1993), "Advances in electronic structure theory: GAMESS a decade later" M.S.Gordon, M.W.Schmidt pp. 1167-1189, in "Theory and Applications of Computational Chemistry: the first forty years" C.E.Dykstra, G.Frenking, K.S.Kim, G.E.Scuseria (editors), Elsevier, Amsterdam, 2005.
- OpenBabel: N M O'Boyle, M Banck, C A James, C Morley, T Vandermeersch, and G R Hutchison. "Open Babel: An open chemical toolbox." J. Cheminf. (2011), 3, 33. DOI:10.1186/1758-2946-3-33, The Open Babel Package, version 2.3.1 http://openbabel.org (accessed Oct 2011)
- The PubChem Project: Bolton E, Wang Y, Thiessen PA, Bryant SH. PubChem: Integrated Platform of Small Molecules and Biological Activities. Chapter 12 IN Annual Reports in Computational Chemistry, Volume 4, American Chemical Society, Washington, DC, 2008 Apr.
- Jmol: an open-source Java viewer for chemical structures in 3D. http://www.jmol.org/
- Alex A. Granovsky, Firefly version 8.0, www http://classic.chem.msu.su/gran/firefly/index.html
Developers
- NAKATA Maho
- MAEDA Toshiyuki
- SHIMAZAKI Tomomi
- HASHIMOTO Masatomo
maho.nakata@gmail.com