1996 Volume 11 Issue 3 Pages 440-450
We have developed a deductive database system PACADE for analyzing three dimensional and secondary structure of protein. PACADE consists of an inference system and a relational database which stores data about primary, secondary and tertiary structure converted from Brookhaven's Protein Data Bank (PDB). In addition to usual searches for substructures of proteins, PACADE provides a similarity search function, based on similarity among the substructures. When a specific substructure is given as a closed query, the system returns a set of substructures similar to it. Through this operation, direct similarity relationships are obtained. Therefore, a closure of indirect similarity relationships can be computed by repeating this operation. The resulting closure is a cluster of substructures which includes the substructure given in the query, and there are indirect similarity relationships among them. Comparison of the clustering mentioned above with the clustering of proteins, based on biological functions, may lead to the discovery of structures concerning the biological functions of proteins. We show here that, by iterating a similarity search with a closed query, PACADE can automatically compute a closure of indirectly similar structures including the one in a query. The algorithm of the iterative similarity search provides a quick way to compute a closure of indirectly similar structures of proteins because: 1) It is not a combinatorial computation. 2) It performs differential evaluation analogous to the semi-naive evaluation in a bottom-up evaluator of deductive database system. 3) It employs Magic set transformation, which utilizes constant bindings in a closed query. Finally, we report results of experiments by computing a closure using secondary structure data stored in PACADE. The elapsed time for computation of a closure increased linearly to the size of search space, and it is a desirable feature under the situation that PDB data are exponentially increasing.