Tuesday, December 18, 2007

How good are biological data - II: Trombine, GSK, GPCR

Many bindign affinity prediction methods, such as scores and QSAR models, rely on availability of accurate information on binding constants. The figure on the left is a result of our sdf-file parser applied to trombine (blue) and GSK (yellow) binding data from BindingDB database. The parser is written with python and uses pybel to extract unique molecules from a given multimolecular sdf.
The parser not only finds identical (in Tanimoto-similarity sense) compounds, but also prints the binding constants from the sdf records. The graph shows the correlation of the reported inverse log(binding constants) for the same molecules from different entries (sources).
The result is in fact fairly impressive (the blue points): the discrepancies-"errors" are quite large and are especially profound for good (or better say very good) binders.
The yellow points represent the result of the same script over GSK-kinase activity data. Although the total number of molecules in BindDB is much larger, almost all of them are unique. The difference between different sources is not as much as for trombine.
The Figure on the right is the visualized script output for GPCR(5-HT2B) from PDSP Ki database. The situation is roughly the same: the accuracy of a typical biological experiment reported in a literature amounts roughly to a single unit of pKd.

This and previously reported correlation for HERG ion channel should serve as an example when the results of binding affinity calculations are compared to experimental data.

No comments: