The following is a brief tutorial on how to use the PubChemQC PM6 databases. 2021/8/20, T. Maeda

License

Creative Commons License
The PubChemQC PM6 datasets are licensed under a Creative Commons Attribution 4.0 International License.

Prepare the database.

Downlaod the database

You can choose following databases. Smaller size is better for the first time.

Install the prerequisites, if necessary.

Unpack the compressed archive.

Because the database file is packed as tar.gz, we need to unpack it first, as follows:

% tar xvzf (the name of the database)-postgrest-docker-compose.tar.gz

Launch and shutdown the database

Because the database is made as a Docker container with Docker Compose, it can be operated by docker-compose, as follows:

Launch the database

% cd (the directory extracted from the compressed archive)
% docker-compose up -d --build

Shutdown the database

% cd (the directory extracted from the compressed archive)
% docker-compose down

In the initial state, the superuser of the database has the name pgrest and its password is configured as the variable DB_PASSWORD in .env file. Therefore, it is strongly recommended to change the password from the viewpoint of security, as follows:

  1. Launch the database, if it is not running.

  2. Launch psql (the interactive terminal of PostgreSQL) and connect to the database (db) as the superuser (pgrest), as follows.
% docker exec -it $(docker-compose ps -q db) psql -U pgrest db

The above command will print the following outputs:

psql (13.2 (Debian 13.2-1.pgdg100+1))
Type "help" for help.

db=# 
  1. Change the password Then, run the command \password as follows:
db=# \password

The command requires the new password to be set (twice), as follows:

Enter new password: 
Enter it again: 
db=# 
  1. Exit psql
db=# \q
  1. Shutdown the database.
% docker-compose down
  1. Modify .env file to bind DB_PASSWORD to the new password.

  2. Launch the database, if necessary.
% docker-compose up -d --build

Query the database

In order to make a query to the database, we first need to know the name of the table in the database, and then able to make a query with the name, as follows:

  1. Obtain the name of the table

The following command will show the name of the table in the database:

% docker exec -i $(docker-compose ps -q db) psql -U pgrest db -c "\dt"
  1. Query the table

There are two methods for querying the database: psql and PostgREST:

Method 1: psql (the interactive terminal of PostgreSQL)

For example, the following command show the row of the molecule whose PubChem CID is 962 in the table pm6opt_chnops500nosalt:

% docker exec -i $(docker-compose ps -q db) psql -U pgrest db -c "SELECT * FROM pm6opt_chnops500nosalt WHERE cid = 962"

The following command show the rows of the molecules whose PubChem CIDs are between 950 and 1000:

% docker exec -i $(docker-compose ps -q db) psql -U pgrest db -c "SELECT * FROM pm6opt_chnops500nosalt WHERE cid BETWEEN 950 AND 1000"

The following command show only CIDs, states, and HOMO-LUMO gaps of the molecules whose PubChem CIDs are between 950 and 1000:

% docker exec -i $(docker-compose ps -q db) psql -U pgrest db -c "SELECT cid,state,data->'pubchem'->'PM6'->'properties'->'energy'->'alpha'->'gap' FROM pm6opt_chnops500nosalt WHERE cid BETWEEN 950 AND 1000"

Further information on querying the database with SQL is available from, for example, the following links to the manual of PostgreSQL: * https://www.postgresql.org/docs/current/tutorial-select.html * https://www.postgresql.org/docs/current/queries.html

Method 2: PostgREST (a RESTful API frontend server for PostgreSQL databases)

PostgREST is an HTTP server that talks to a PostgreSQL database and provides RESTful API to users for accessing the database. That is, we are able to query the database via PostgREST with any HTTP client. PostgREST is automatically launched when the database is launched with docker-compose up -d --build because the configuration file of docker-compose (docker-compose.yml) contains the settings for launching PostgREST on port 3000.

For example, the following command show the entry of the molecule whose PubChem CID is 962 in the table pm6opt_chnops500nosalt:

% curl 'http://localhost:3000/pm6opt_chnops500nosalt?cid=eq.962'

The following command show the entries of the molecules whose PubChem CIDs are between 950 and 1000:

% curl 'http://localhost:3000/pm6opt_chnops500nosalt?and=(cid.gte.950,cid.lte.1000)'

The following command show only CIDs, states, and HOMO-LUMO gaps of the molecules whose PubChem CIDs are between 950 and 1000:

% curl 'http://localhost:3000/pm6opt_chnops500nosalt?select=cid,state,data->pubchem->PM6->properties->energy->alpha->gap&and=(cid.gte.950,cid.lte.1000)'

Further information on querying the database with PostgREST is available from the following link to the manual of PostgREST: * https://postgrest.org/en/stable/api.html

Please note that querying the table may take significant amounts of time. Indexing the table (explained below) will help to improve query performance.

For example, the following command creates an index on the HOMO-LUMO gaps in the table pm6opt_chnops500nosalt:

% docker exec -i $(docker-compose ps -q db) psql -U pgrest db -c "CREATE INDEX alpha_gap_index ON pm6opt_chnops500nosalt (((data -> 'pubchem'-> 'PM6' -> 'properties' -> 'energy' -> 'alpha' -> 'gap') :: NUMERIC)); "

With the above-created index, the following query, which shows the molecules whose HOMO-LUMO gaps are between 15.5 and 16.0, will take only a few seconds (it will take a lot of time without the index):

% docker exec -i $(docker-compose ps -q db) psql -U pgrest db -c "SELECT cid,state,data->'pubchem'->'PM6'->'properties'->'energy'->'alpha'->'gap' FROM pm6opt_chnops500nosalt WHERE (data -> 'pubchem'-> 'PM6' -> 'properties' -> 'energy' -> 'alpha' -> 'gap') :: NUMERIC BETWEEN 15.5 and 16.0"

Please be careful that creating indexes may take a lot of time and an extra disk space.

Further information on creating indexes is available from the following link to the manual of PostgreSQL: * https://www.postgresql.org/docs/current/indexes.html


maho.nakata@gmail.com