The experimental PostgreSQL database of PubChemQC B3LYP (JCIM2017)

This directory contains the experimental PostgreSQL database that hold all the JSON files in the datasets. The following is a brief tutorial on how to use the database. 2021/7/20, T. Maeda

License

Creative Commons License
The PubChemQC B3LYP (JCIM2017) datasets are licensed under a Creative Commons Attribution 4.0 International License.

Install the prerequisites, if necessary.

Unpack the compressed archive.

Because the database file is packed as tar.gz, we need to unpack it first, as follows:

% tar xvzf (the name of the database)-postgrest-docker-compose.tar.gz

Launch and shutdown the database

Because the database is made as a Docker container with Docker Compose, it can be operated by docker-compose, as follows:

Launch the database

% cd (the directory extracted from the compressed archive)
% docker-compose up -d --build

Shutdown the database

% cd (the directory extracted from the compressed archive)
% docker-compose down

In the initial state, the superuser of the database has the name pgrest and its password is configured as the variable DB_PASSWORD in .env file. Therefore, it is strongly recommended to change the password from the viewpoint of security, as follows:

  1. Launch the database, if it is not running.

  2. Launch psql (the interactive terminal of PostgreSQL) and connect to the database (db) as the superuser (pgrest), as follows.
% docker exec -it $(docker-compose ps -q db) psql -U pgrest db

The above command will print the following outputs:

psql (13.2 (Debian 13.2-1.pgdg100+1))
Type "help" for help.

db=# 
  1. Change the password Then, run the command \password as follows:
db=# \password

The command requires the new password to be set (twice), as follows:

Enter new password: 
Enter it again: 
db=# 
  1. Exit psql
db=# \q
  1. Shutdown the database.
% docker-compose down
  1. Modify .env file to bind DB_PASSWORD to the new password.

  2. Launch the database, if necessary.
% docker-compose up -d --build

Query the database

In order to make a query to the database, we first need to know the name of the table in the database, and then able to make a query with the name, as follows:

  1. Obtain the name of the table

The following command will show the name of the table in the database:

% docker exec -i $(docker-compose ps -q db) psql -U pgrest db -c "\dt"
  1. Query the table

There are two methods for querying the database: psql and PostgREST:

Method 1: psql (the interactive terminal of PostgreSQL)

For example, the following command show the row of the molecule whose PubChem CID is 962 in the table b3lyp_2017:

% docker exec -i $(docker-compose ps -q db) psql -U pgrest db -c "SELECT * FROM b3lyp_2017 WHERE cid = 962"

The following command show the rows of the molecules whose PubChem CIDs are between 950 and 1000:

% docker exec -i $(docker-compose ps -q db) psql -U pgrest db -c "SELECT * FROM b3lyp_2017 WHERE cid BETWEEN 950 AND 1000"

The following command show only CIDs and HOMO-LUMO gaps of the molecules whose PubChem CIDs are between 950 and 1000:

% docker exec -i $(docker-compose ps -q db) psql -U pgrest db -c "SELECT cid,data->'properties'->'energy'->'alpha'->'gap' FROM b3lyp_2017 WHERE cid BETWEEN 950 AND 1000"

Further information on querying the database with SQL is available from, for example, the following links to the manual of PostgreSQL: * https://www.postgresql.org/docs/current/tutorial-select.html * https://www.postgresql.org/docs/current/queries.html

Method 2: PostgREST (a RESTful API frontend server for PostgreSQL databases)

PostgREST is an HTTP server that talks to a PostgreSQL database and provides RESTful API to users for accessing the database. That is, we are able to query the database via PostgREST with any HTTP client. PostgREST is automatically launched when the database is launched with docker-compose up -d --build because the configuration file of docker-compose (docker-compose.yml) contains the settings for launching PostgREST on port 3000.

For example, the following command show the entry of the molecule whose PubChem CID is 962 in the table b3lyp_2017:

% curl 'http://localhost:3000/b3lyp_2017?cid=eq.962'

The following command show the entries of the molecules whose PubChem CIDs are between 950 and 1000:

% curl 'http://localhost:3000/b3lyp_2017?and=(cid.gte.950,cid.lte.1000)'

The following command show only CIDs and HOMO-LUMO gaps of the molecules whose PubChem CIDs are between 950 and 1000:

% curl 'http://localhost:3000/b3lyp_2017?select=cid,data->properties->energy->alpha->gap&and=(cid.gte.950,cid.lte.1000)'

Further information on querying the database with PostgREST is available from the following link to the manual of PostgREST: * https://postgrest.org/en/stable/api.html

Please note that querying the table may take significant amounts of time. Indexing the table (explained below) will help to improve query performance. This experimental PostgreSQL database for PubChemQC (JCIM2017) includes indexes for ‘properties’ -> ‘energy’ -> ‘alpha’ -> ‘gap’, ‘properties’ -> ‘energy’ -> ‘alpha’ -> ‘homo’, ‘properties’ -> ‘energy’ -> ‘alpha’ -> ‘lumo’, ‘properties’ -> ‘energy’ -> ‘beta’ -> ‘gap’, ‘properties’ -> ‘energy’ -> ‘beta’ -> ‘homo’, ‘properties’ -> ‘energy’ -> ‘beta’ -> ‘lumo’, and ‘properties’ -> ‘total dipole moment’.

For example, the following command was used to create an index on the HOMO-LUMO gaps (alpha) in the table b3lyp_2017:

% docker exec -i $(docker-compose ps -q db) psql -U pgrest db -c "CREATE INDEX alpha_gap_index ON b3lyp_2017 (((data -> 'properties' -> 'energy' -> 'alpha' -> 'gap') :: NUMERIC)); "

With the above-created index, the following query, which shows the molecules whose HOMO-LUMO gaps are between 9.7 and 9.8, will take only a few seconds (it will take a lot of time without the index):

% docker exec -i $(docker-compose ps -q db) psql -U pgrest db -c "SELECT cid,data->'properties'->'energy'->'alpha'->'gap' FROM b3lyp_2017 WHERE (data -> 'properties' -> 'energy' -> 'alpha' -> 'gap') :: NUMERIC BETWEEN 9.7 and 9.8"

Please be careful that creating indexes may take a lot of time and an extra disk space.

Further information on creating indexes is available from the following link to the manual of PostgreSQL: * https://www.postgresql.org/docs/current/indexes.html

This work is done by T. Maeda and M. Nakata.
maho.nakata@gmail.com