Friday, December 7, 2007

How good are biological experiments? HERG binding data analysis


A correlation between predicted and expermentally measured values of biological activity is a natural measure of a model quality. For instance, QUANTUM docking software calculates binding free energies, which are directly comparable with experimental values of -p(binding constant, Kd). Root mean squared error between the measured and the calculated quantities is the quantitative measure of the software performance.
Whatever the correlation is presented to prove the validity of a model, another important issue is the quality of the experimental data itself. The reported values for binding constants (or activities) often vary because of different measurement strategies, experimental errors or interpretation uncertanties. To visualize the situation we investigated a few datasets for HERG binding taken from QSAR World website.
The downloaded files were saved in source folder and processed with the following simple python script (thanks to openbabel):
files = os.listdir('source/')
molecules = []
for file in files:
molfile = readfile("sdf",'source/'+file)
for mol in molfile:
molfp = mol.calcfp()
present = 0
for savedmol in molecules:
savedmolfp = savedmol.calcfp()
if (molfp | savedmolfp == 1):
present = 1
print mol.data, savedmol.data
if (not present):
molecules.append(mol)

The results where analyzed in a spreadsheet program and represented on the graph above. A lot of molecules occur multiple times in the datasets. While in many of the cases the activities coinside up to 0.01 (which most probably indicates citing from a single source), the remaining values thouch correlated with each other, differ by roughly a single pKd unit.



2 comments:

Noel O'Boyle said...

Great to see you using pybel. Just remember that identity in terms of fingerprint does not guarantee that it's the same molecule. You might also want to calculate the InChI and ensure they are the same.

Peter Fedichev (Quantum CTO) said...

thank you, there are indeed quite a few cases when fingerprints identity does not tranlate into actual structural identity. We have cought the whole library of such cases with open babel when making http://leadfinding.com.

Thank you for your great bog as well!