Showing posts with label trombine. Show all posts
Showing posts with label trombine. Show all posts

Thursday, June 5, 2008

Docking validation study: classic example, thrombine

The Figure on the left represents a docking study of more than 200 molecules with known activity on thrombin. The protein is a well known target ....

We have extracted the binding data from the BindingDB database and docked all the molecules onto a single (of a few available) 3D structure (2cn0 from the pdb databank).

The figure represents graphically the results of the research. The calculated and the measured activities are well correlated. Strong binders are indeed identified as strong binders (left bottom part of the graph). The accuracy of the predictions is quite good (see our discussion on the quality of the biological data here and here).

The results of the calculations can be conveniently summarized in terms of confidentiality matrix. Normally a first screen of novel compounds is performed at a certain concentration to distinguish between the active and non-active compounds. Let's take a standard, 1muM (~-35kJ/M) activity, as a separation cut-off. Then the confidence matrix has the following elements:
  • Experimentally active, Predicted active: 29 molecules
  • Experimentally n-active, Predicted active: 15 molecules (false positives)
  • Experimentally active, Predicted n-active: 8 molecules (false negatives)
  • Experimentally n-active, Predicted n-active: 156 molecules

Tuesday, December 18, 2007

How good are biological data - II: Trombine, GSK, GPCR

Many bindign affinity prediction methods, such as scores and QSAR models, rely on availability of accurate information on binding constants. The figure on the left is a result of our sdf-file parser applied to trombine (blue) and GSK (yellow) binding data from BindingDB database. The parser is written with python and uses pybel to extract unique molecules from a given multimolecular sdf.
The parser not only finds identical (in Tanimoto-similarity sense) compounds, but also prints the binding constants from the sdf records. The graph shows the correlation of the reported inverse log(binding constants) for the same molecules from different entries (sources).
The result is in fact fairly impressive (the blue points): the discrepancies-"errors" are quite large and are especially profound for good (or better say very good) binders.
The yellow points represent the result of the same script over GSK-kinase activity data. Although the total number of molecules in BindDB is much larger, almost all of them are unique. The difference between different sources is not as much as for trombine.
The Figure on the right is the visualized script output for GPCR(5-HT2B) from PDSP Ki database. The situation is roughly the same: the accuracy of a typical biological experiment reported in a literature amounts roughly to a single unit of pKd.

This and previously reported correlation for HERG ion channel should serve as an example when the results of binding affinity calculations are compared to experimental data.