The PubChemQC PM6 datasets

License and Copyright

Copyright © 2019,202,2021 NAKATA Maho, MAEDA Toshiyuki, SHIMAZAKI Tomomi, HASHIMOTO Masatomo

Creative Commons License
The PubChemQC PM6 datasets are licensed under a Creative Commons Attribution 4.0 International License.

News

2021-08-20: PubChemQC PM6 ver2.0.0 is available. We added experimental databases using Docker compose, smaller subsets CHON300noSalt and CHNOPSFCl300noSalt, and raw Gaussian output files. By using databases you can query molecules very easily!

Downloads

How to use docker databases

Please refer this page.

Older versions

The PubChemQC PM6 dataset (ver.1.0.3.3) can be downloaded from here.
The PubChemQC PM6 dataset (ver.1.0.3.2) can be downloaded from here.
The PubChemQC PM6 dataset (ver.1.0.3.1) can be downloaded from here.
The PubChemQC PM6 dataset (ver.1.0.3) can be downloaded from here.
The PubChemQC PM6 dataset (ver.1.0.0) can be downloaded from here.

History

2020-10-26: PubChemQC PM6: Data Sets of 221 Million Molecules with Optimized Molecular Geometries and Electronic Properties is now published.

2020-09-08: update to 1.0.3.3. Now salts are included for CHNOPSFClNaKMgCa datasets.

2020-08-19: an essential part of jobscripts have been uploaded . These scripts are just for reference.

2020-06-24: update to 1.0.3.2. Remake sub-datasets to use mnemonic like CHON and CHNOPS. No changes are made expept for the sub-dataset4. We add Mg to sub-dataset4 so that cover the most common elements of human body except for Fluorine.

2020-06-21: Sub-Datasets are added: (1) contains C, H, N and O elements, molecular weight less than 500, and no salt. (2) contains C, H, N, O, S and P elements, molecular weight less than 500, and no salt. (3) contains C, H, N, F, Cl, O, S and P elements, molecular weight less than 500, and no salt. (4) contains C, H, N, F, Cl, O, S, P, K, Na and Ca elements, molecular weight less than 500. No changes in the fullset; just added sub-datasets.

2019-05-29: Ver.1.0.3 is released

2019-02-28: Ver.1.0 is released

Reference

(published version) PubChemQC PM6: Data Sets of 221 Million Molecules with Optimized Molecular Geometries and Electronic Properties
(arXiv version) PubChemQC PM6: A dataset of 221 million molecules with optimized molecular geometries and electronic properties
Nakata Maho