This directory contains the experimental PostgreSQL database that hold all the JSON files in the datasets. The following is a brief tutorial on how to use the database. 2021/7/20, T. Maeda
Because the database file is packed as tar.gz
, we need to unpack it first, as follows:
% tar xvzf (the name of the database)-postgrest-docker-compose.tar.gz
Because the database is made as a Docker container with Docker Compose, it can be operated by docker-compose
, as follows:
% cd (the directory extracted from the compressed archive)
% docker-compose up -d --build
% cd (the directory extracted from the compressed archive)
% docker-compose down
In the initial state, the superuser of the database has the name pgrest
and its password is configured as the variable DB_PASSWORD
in .env
file. Therefore, it is strongly recommended to change the password from the viewpoint of security, as follows:
Launch the database, if it is not running.
psql
(the interactive terminal of PostgreSQL) and connect to the database (db
) as the superuser (pgrest
), as follows.% docker exec -it $(docker-compose ps -q db) psql -U pgrest db
The above command will print the following outputs:
psql (13.2 (Debian 13.2-1.pgdg100+1))
Type "help" for help.
db=#
\password
as follows:db=# \password
The command requires the new password to be set (twice), as follows:
Enter new password:
Enter it again:
db=#
psql
db=# \q
% docker-compose down
Modify .env
file to bind DB_PASSWORD
to the new password.
% docker-compose up -d --build
In order to make a query to the database, we first need to know the name of the table in the database, and then able to make a query with the name, as follows:
The following command will show the name of the table in the database:
% docker exec -i $(docker-compose ps -q db) psql -U pgrest db -c "\dt"
There are two methods for querying the database: psql
and PostgREST:
psql
(the interactive terminal of PostgreSQL)For example, the following command show the row of the molecule whose PubChem CID is 962 in the table b3lyp_2017
:
% docker exec -i $(docker-compose ps -q db) psql -U pgrest db -c "SELECT * FROM b3lyp_2017 WHERE cid = 962"
The following command show the rows of the molecules whose PubChem CIDs are between 950 and 1000:
% docker exec -i $(docker-compose ps -q db) psql -U pgrest db -c "SELECT * FROM b3lyp_2017 WHERE cid BETWEEN 950 AND 1000"
The following command show only CIDs and HOMO-LUMO gaps of the molecules whose PubChem CIDs are between 950 and 1000:
% docker exec -i $(docker-compose ps -q db) psql -U pgrest db -c "SELECT cid,data->'properties'->'energy'->'alpha'->'gap' FROM b3lyp_2017 WHERE cid BETWEEN 950 AND 1000"
Further information on querying the database with SQL is available from, for example, the following links to the manual of PostgreSQL: * https://www.postgresql.org/docs/current/tutorial-select.html * https://www.postgresql.org/docs/current/queries.html
PostgREST is an HTTP server that talks to a PostgreSQL database and provides RESTful API to users for accessing the database. That is, we are able to query the database via PostgREST with any HTTP client. PostgREST is automatically launched when the database is launched with docker-compose up -d --build
because the configuration file of docker-compose
(docker-compose.yml
) contains the settings for launching PostgREST on port 3000.
For example, the following command show the entry of the molecule whose PubChem CID is 962 in the table b3lyp_2017
:
% curl 'http://localhost:3000/b3lyp_2017?cid=eq.962'
The following command show the entries of the molecules whose PubChem CIDs are between 950 and 1000:
% curl 'http://localhost:3000/b3lyp_2017?and=(cid.gte.950,cid.lte.1000)'
The following command show only CIDs and HOMO-LUMO gaps of the molecules whose PubChem CIDs are between 950 and 1000:
% curl 'http://localhost:3000/b3lyp_2017?select=cid,data->properties->energy->alpha->gap&and=(cid.gte.950,cid.lte.1000)'
Further information on querying the database with PostgREST is available from the following link to the manual of PostgREST: * https://postgrest.org/en/stable/api.html
Please note that querying the table may take significant amounts of time. Indexing the table (explained below) will help to improve query performance. This experimental PostgreSQL database for PubChemQC (JCIM2017) includes indexes for ‘properties’ -> ‘energy’ -> ‘alpha’ -> ‘gap’, ‘properties’ -> ‘energy’ -> ‘alpha’ -> ‘homo’, ‘properties’ -> ‘energy’ -> ‘alpha’ -> ‘lumo’, ‘properties’ -> ‘energy’ -> ‘beta’ -> ‘gap’, ‘properties’ -> ‘energy’ -> ‘beta’ -> ‘homo’, ‘properties’ -> ‘energy’ -> ‘beta’ -> ‘lumo’, and ‘properties’ -> ‘total dipole moment’.
For example, the following command was used to create an index on the HOMO-LUMO gaps (alpha) in the table b3lyp_2017
:
% docker exec -i $(docker-compose ps -q db) psql -U pgrest db -c "CREATE INDEX alpha_gap_index ON b3lyp_2017 (((data -> 'properties' -> 'energy' -> 'alpha' -> 'gap') :: NUMERIC)); "
With the above-created index, the following query, which shows the molecules whose HOMO-LUMO gaps are between 9.7 and 9.8, will take only a few seconds (it will take a lot of time without the index):
% docker exec -i $(docker-compose ps -q db) psql -U pgrest db -c "SELECT cid,data->'properties'->'energy'->'alpha'->'gap' FROM b3lyp_2017 WHERE (data -> 'properties' -> 'energy' -> 'alpha' -> 'gap') :: NUMERIC BETWEEN 9.7 and 9.8"
Please be careful that creating indexes may take a lot of time and an extra disk space.
Further information on creating indexes is available from the following link to the manual of PostgreSQL: * https://www.postgresql.org/docs/current/indexes.html
This work is done by T. Maeda and M. Nakata.