Drug Discovery with Quantum Pharmaceuticals: IC50

Showing posts with label IC50. Show all posts

Friday, January 23, 2009

Solvation energy of a large atom cluster: continuous solvation energy test - II

As it has been already stated here, binding energy calculation of a small molecule to a large protein poses a difficult problem: a method for molecular electrostatic energy calculation should work well both for the protein ligand complex, the protein and the ligand at infinite separation. The protein and the complex are large molecules, whereas the ligand is, by definition, small.

Not every computational approach for the solvation energy calculation is fit for the job though. To elucidate the nature of the problems at hand we performed the following model calculation:
- we prepared a spherical "protein" of a large (but realistic) radius
- we placed a single-atom ligand with a charge at a given distance from the "protein" center (see the Figure)
- we calculated the solvation energy of the system as a function of the ligand distance both when the protein is neutral and charged (in the latter case the protein charge is opposite to that of the "ligand")

We used four different methods for the electrostatic contribution to the solvation energy calculation: Poisson equation solver (in its surface electrostatic incarnation, blue) QUANTUM MGB (cyan) and the two "classic" GB methods, based on the Coulomb approximation: HCT (magenta) and AGBNP (yellow).

The first Figure, corresponding to an overall electrically neutral cluster, shows absolute deficiency of HCT approach deep enough inside the "protein". The problem is caused by unrealistic assumptions with regard to the overlap integrals calculations is occurs pretty frequently in realistic proteins. AGBNP method represents one of the latest GB approaches and is practically free of these difficulties. However, AGBNP is based on Coulomb approximation and thus fails to recover correct behavior of the solvation energy close to the "protein" boundary: AGBNP energy is off by a large number from both QMGB and the exact solution. QMGB and Poisson solutions agree very well everywhere!

The last Figure shows the same calculation for a charged model "protein-ligand" complex. Once again, HCT fails entirely, AGBNP does not work properly at the "protein" boundary and both Poisson solver and QMGB agree very well, though QMGB is about one order of magnitude faster than the Poisson solver!

Practically all this means that QMGB represents a fast and accurate approximation to the Poisson equation solution. QMGB approach does not rely on Coulomb approximation and is shown to work both for small molecules and large molecular clusters involving molecules of very different sizes. Therefore, with QMGB one can find a single transferable set of GB parameters capable of describing large and small molecules on the same footing.

Tuesday, January 13, 2009

Self-consistent solvation energy contribution calculation for protein-ligand complexes

Solvation energy is a major contribution to a ligand binding energy and is the interaction pretty much responsible for binding selectivity. Actual calculation of the solvation energy requires a method valid both for small molecule ligands and large proteins (and protein-ligand complexes).

Calculation of the electrostatic contributions for the binding energies in a continuous solvation energy approach imposes different problems for large and small molecules. Normally people use some kind of Generalized Born (GB) approximation. The latter is only exact for a charge in the center of a spherical cavity and thus can only be valid for a small molecule with most of the charge located within a few atoms.

If the molecule of interest is large, most of the charges are close to the molecular surface, instead. GB approximation in its most commonly accepted form fails next to a molecular surface: the Born radius is missed by a factor of 2. This means that there can be no "classic" GB model working good both for small and large molecules!

Binding affinity calculation requires calculation of differences between the energy of the complex (a large molecule) and the energies of the protein (another large molecule) and the ligand (a small molecule) at infinite separation.

If a GB model is made working by careful adjusting of "bare" Born radii to fit experimental IC50 of complexes, a good sanity check would require reproduction of experimentally known solvation energies of small molecules and ions (and the other way around). The two graphs in this post show, that this is indeed possible. A relatively large error in the small molecules solvation energies shows that although the resulting model is reasonable, the obtained GB parameters are only quantitatively transferable between large and small molecules.

Wednesday, October 8, 2008

Making a good water model: Molecules do conformationally change when cross from a gas to water solution

Solvation energy calculation is absolutely crucial for a successful binding free energy (IC50) determination. Quantum Pharmaceuticals develops aqueous solvation models and tests them against available experimental data to validate the theoretical approaches.

The graph on the left represents two types of solvation energy calculations compared with experiments. The first series (small circles) are the energy differences on solvation for a set of molecules without conformational changes taken into account. The second set (large squares) is obtained after a single optimization run.

The correlation with the experiment clearly improves after conformational changes calculations. Apparently this does not only mean that the model is good, it also means that the molecules do change structure when inserted into water from the gas phase.

Thursday, September 25, 2008

What's an ultimate value of reversible drug binding constant?

Traditional opinion is that a good drug must have a high value of the absolute meaning of the binding energy with target protein in order to prevent the thermal dissociation of the drug-protein complex. In this case an essential deformation of protein arises, which has to be taken into account in developing different models of protein-small molecule and protein-protein interaction, and computing affinity constants in drug discovery in-silico methods. The effect of essential perturbation of protein molecule is ignored in standard computational methods of drug design that can contribute a large mistake to results of calculation, to binding energy, for example.
To demonstrate the existence of the ultimate value of the binding energy two models are considered: macroscopic and microscopic, both giving the same conclusions: the critical value of absolute meaning of binding energy is 50-100kJ/M. If the binding energy exceeds this value, then drug essentially perturbs protein configuration. In a microscopic picture this perturbation is a sequence of irreversible conformational transitions in protein body. In a macroscopic one it is an inelastic deformation of a protein substance. Our estimation agrees with the experimental value (50 kJ /M) of the ultimate energy that can be stored in a protein molecule without its destruction.
The existence of the critical value of binding energy should be accounted in structure based drug design methods where protein molecule is considered in an elastic deformation approximation.

Reference: accepted in Russian Journal of Biophysics, 2008

Monday, August 18, 2008

Ferro-electric phase transition in a polar liquid and the nature of lambda-transition in supercooled water

Water is a major and all-important example of a strongly interacting polar liquid. Dielectric properties of water surrounding nano-scale objects pose a fundamentally important problem in physics, chemistry, structural biology and in silica drug design. The issue of temperature dependence of dielectric constant, the role of fluctuations and a possibility of a ferro-electric phase transition in a polar liquid is fairly old . It attracted a new attention when a new phase transition (so called lambda-transition) was observed in supercooled water at critical temperatures between T_{c}=228K and T_{c}=231.4K . Isothermal compressibility, density, diffusion coefficient, viscosity and static dielectric constant \epsilon and other quantities diverge as T_{c} is approached, which is signature of a second order phase transition. The singularity of \epsilon is a feature of a ferro-electric transition . However, given a complexity of interactions between water molecules, the physical picture behind this phenomenon is not entirely clear . In the phase transition is explained as a formation of a rigid network of hydrogen bonds. On the other hand a ferro-electric hypothesis was also proposed and supported by molecular-dynamics simulations (MD). For example, a ferro-electric liquid phase was observed in a model of the so called ``soft spheres'' with static dipole moments . In fact, the existence of a ferro-electric phase appears to be model independent: domains where formed both in MD calculations with hard spheres with point dipoles and with soft spheres with extended dipoles .

In the our latest publication, Ferro-electric phase transition in a polar liquid and the nature of lambda-transition in supercooled water, we develop two related approaches to calculate free energy of a polar liquid. We show that long range nature of dipole interactions between the molecules leads to para-electric state instability at sufficiently low temperatures and to a second-order phase transition. We establish the transition temperature, T_{c}, both within mean field and ring diagrams approximation and demonstrate that the ferro-electric transition is a sound physical explanation behind the experimentally observed \lambda-transition in supercooled water. Finally we discuss dielectric properties, the role of fluctuations and establish connections with earlier phenomenological models of polar liquids.

Reference: arXiv:0808.0991 [ps, pdf, other]

Title: Ferro-electric phase transition in a polar liquid and the nature of \lambda-transition in supercooled water

Authors: P.O. Fedichev, L.I. Menshikov

Comments: 4 pages, 1 eps figure

Subjects: Statistical Mechanics (cond-mat.stat-mech); Soft Condensed Matter (cond-mat.soft)

Friday, June 6, 2008

Docking validation study: PDK1-kinase (oncology)

Following the classic thrombine study, we catch up with a more important target: PDK1 kinase.

Pyruvate dehydrogenase kinase, isozyme 1, also known as PDK1, is a human gene.It codes for an isozyme of pyruvate dehydrogenase kinase (PDK).Pyruvate dehydrogenase (PDH) is a part of a mitochondrial multienzyme complex that catalyzes the oxidative decarboxylation of pyruvate and is one of the major enzymes responsible for the regulation of homeostasis of carbohydrate fuels in mammals. The enzymatic activity is regulated by a phosphorylation/dephosphorylation cycle. Phosphorylation of PDH by a specific pyruvate dehydrogenase kinase (PDK) results in inactivation.

There are no as much known inhibitors as for thrombine. BindingDB gives a few more than 70 compounds with measured binding affinities, all relatively strong binders, many of them similar to each other. We run our QUANTUM software to perform docking and the affinity calculations. The results are represented on the graph and demonstrate a solid correlation. In fact the correlation shows QUANTUM's ability to identify strong binders and distinguish between similar compounds (selectivity).

Thursday, June 5, 2008

Docking validation study: classic example, thrombine

The Figure on the left represents a docking study of more than 200 molecules with known activity on thrombin. The protein is a well known target ....

We have extracted the binding data from the BindingDB database and docked all the molecules onto a single (of a few available) 3D structure (2cn0 from the pdb databank).

The figure represents graphically the results of the research. The calculated and the measured activities are well correlated. Strong binders are indeed identified as strong binders (left bottom part of the graph). The accuracy of the predictions is quite good (see our discussion on the quality of the biological data here and here).

The results of the calculations can be conveniently summarized in terms of confidentiality matrix. Normally a first screen of novel compounds is performed at a certain concentration to distinguish between the active and non-active compounds. Let's take a standard, 1muM (~-35kJ/M) activity, as a separation cut-off. Then the confidence matrix has the following elements:

Experimentally active, Predicted active: 29 molecules
Experimentally n-active, Predicted active: 15 molecules (false positives)
Experimentally active, Predicted n-active: 8 molecules (false negatives)
Experimentally n-active, Predicted n-active: 156 molecules

Monday, May 26, 2008

Protein Flexibility and False Positives detection.

Standard hit identification procedure with QUANTUM software implies screening of a large compound library against a given protein target. An example of such procedure for a small set of compounds with known activities is discussed in another blog entry.

Let us show first that a calculation with flexible protein gives a reasonable prediction of the binding free energy. To do that we selected a set of ~200 protein - ligand complexes from the BindingDB database. The protein-ligand pairs were selected mainly so that the complex is small and therefore the whole calculation is fast. The results are represented on the Figure. The horizontal and the vertical axis represent the calculated and the experimental value of the binding free energy calculated from the complexed positions of the ligand within the protein.

The correlation is clearly there and in a few days I will show that the calculated values demonstrate not only the accuracy, but also a good selectivity.

The other Figure represents the correlation be

tween the results of rigid receptor fast docking procedure (horizontal axis) and the fully flexible binding free energies (vertical axis). Although the rigid protein force field has a decent correlation, it fails to recognize electrostatic clashes and thus leads to a fairly large amount of false positives among the predicted ligands. Only about 10% of all the ligands, all originally predicted in the muM range survives as binders. The trend is also clear: all the binding energy values increase (fully flexible force field gives less binders than the rigid calculation would suggest).

Wednesday, May 21, 2008

Computer aided drug design video from Quantum Pharma

Molecular modelling software of Quantum Pharmaceuticals is used to dock small molecule to active site of target protein. The molecular docking on flexible protein is explored. The Quantum docking software is available for free use at LeadFinding.com, the online hit-to-lead optimization service to filter and profile chemical compounds in chemical database of ChemDiv - organic chemistry supplier.

Friday, January 18, 2008

q-hERG: QUANTUM's innovative approach to hERG binding calculations is finally released

QUANTUM hERG (q-hERG) screening assays is a unique and innovative computational approach, which allows you to predict from a molecule structures of compounds their inhibition constants (IC50) for hERG channels.

q-hEARG features:

Output is pIC50 values (-logIC50) for the molecules. The accuracy of prediction is 1.1 pIC50 units;
No training sets or QSAR methods applied;
hERG inhibition prediction is made by docking of compound on Quantum Pharmaceuticals’ Proprietary Flexible 3D structure of hERG;
Docking is based on quantum and molecular physics (see Quantum Science Core for an overview);
Average correlation has RMSD=1.18 pIC50 unit, and correlation coefficient = 0.82;
Easy to use user interface, no special hardware requirements, both Linux/Windows supported;
You can also request services based on QUANTUM hERG Screening Assays.

q-hERG is an independent software module, sharing the user interface and basic usage concepts with our q-ADME: ADME/PK properties prediction software, q-Mol: physico-chemical properties calculator, and q-Tox: toxicological profiling software. More information, including q-hERG product booklet can be obtained from the Quantum Pharmaceuticals products site.

Obtaining Q-Albumin software:

Please review your licensing options, add Q-Albumin: QUANTUM Albumin Binding Prediction Software to your shopping card and checkout to get the download links.

And continue to CHECK OUT

Friday, January 11, 2008

HERG binding prediction quality: q-HERG model vs. experiments

Quantum Pharmaceuticals has recently completed development of its in-house HERG-protein binding model. Since there is no 3D structure of HERG-protein available, the calculations envolved a number of fits and model assumptions.

To see whether our data is notoverfitted, we compared the errors inour calculations with experimentaluncertanty of binding affinities for the same set of molecules. The graph on the left shows two sets of points: q-HERG model vs. experiment (red squares) and pIC50 values for the same molecules taken from different sources (green triangles, see our How good are biological experiments? HERG binding data analysis post for more details).

The two distributions are roughly of the same width, which, in a way, provides a sanity check for our HERG model.

Tuesday, December 18, 2007

How good are biological data - II: Trombine, GSK, GPCR

Many bindign affinity prediction methods, such as scores and QSAR models, rely on availability of accurate information on binding constants. The figure on the left is a result of our sdf-file parser applied to trombine (blue) and GSK (yellow) binding data from BindingDB database. The parser is written with python and uses pybel to extract unique molecules from a given multimolecular sdf.
The parser not only finds identical (in Tanimoto-similarity sense) compounds, but also prints the binding constants from the sdf records. The graph shows the correlation of the reported inverse log(binding constan

ts) for the same molecules from different entries (sources).
The result is in fact fairly impressive (the blue points): the discrepancies-"errors" are quite large and are especially profound for good (or better say very good) binders.
The yellow points represent the result of the same script over GSK-kinase activity data. Although the total number of molecules in BindDB is much larger, almost all of them are unique. The difference between different sources is not as much as for trombine.
The Figure on the right is the visualized script output for GPCR(5-HT2B) from PDSP Ki database. The situation is roughly the same: the accuracy of a typical biological experiment reported in a literature amounts roughly to a single unit of pKd.

This and previously reported correlation for HERG ion channel should serve as an example when the results of binding affinity calculations are compared to experimental data.

Friday, December 7, 2007

How good are biological experiments? HERG binding data analysis

A correlation between predicted and expermentally measured values of biological activity is a natural measure of a model quality. For instance, QUANTUM docking software calculates binding free energies, which are directly comparable with experimental values of -p(binding constant, Kd). Root mean squared error between the measured and the calculated quantities is the quantitative measure of the software performance.
Whatever the correlation is presented to prove the validity of a model, another important issue is the quality of the experimental data itself. The reported values for binding constants (or activities) often vary because of different measurement strategies, experimental errors or interpretation uncertanties. To visualize the situation we investigated a few datasets for HERG binding taken from QSAR World website.
The downloaded files were saved in source folder and processed with the following simple python script (thanks to openbabel):

files = os.listdir('source/')
molecules = []
for file in files:
molfile = readfile("sdf",'source/'+file)
for mol in molfile:
molfp = mol.calcfp()
present = 0
for savedmol in molecules:
savedmolfp = savedmol.calcfp()
if (molfp | savedmolfp == 1):
present = 1
print mol.data, savedmol.data
if (not present):
molecules.append(mol)

The results where analyzed in a spreadsheet program and represented on the graph above. A lot of molecules occur multiple times in the datasets. While in many of the cases the activities coinside up to 0.01 (which most probably indicates citing from a single source), the remaining values thouch correlated with each other, differ by roughly a single pKd unit.

Monday, September 3, 2007

QUANTUM and albumin binding calculations: the role of protein flexibility

Drug distribution within the body is determined mainly by free (unbound) concentration of drug in circulating plasma. The unbound fraction, in turn, depends on drug absorption by plasma proteins. Human Serum Albumin (HSA) is the most abundant blood plasma protein and is produced in the liver.

Binding of a compound to HSA results in an increased solubility in plasma, decreased toxicity, and /or protection against oxydation of the bound ligand. Binding can also have a significant impact on the pharmacokinetics of drugs, e.g. prolonging in vivo half life of the therapeutic agent. However too strong binding prevents drug release in tissues. That is why HSA binding information is one of the key characteristics of a compound determining its ADME properties. Successful in silico calculation of HSA binding could provide a structural basis of drug derivatives with altered HSA-binding properties.

HSA has at least two main drug binding sites characterized as Sudlow site I and Sudlow site II, which bind a number of drugs selectively. Site I, also known as the warfarin binding site, is formed by a pocket in subdomain IIA of HSA. Site II is located in subdomain IIIA and is known as the benzodiazepine binding site. Ibuprofen and diazepam are selectively bind to site II. Multiple active sites make HSA a complicated target for structure-based modeling.

The ligands with known HSA binding affinities were taken from and prepared with a set of built in QUANTUM molecular preparation and processing tools. Acenocoumarol, Acetylsalicilic_acid, Azapropazone, Benzylpenicillin, Benzylthiouracil, Bilirubin, Canrenoate, Carbamazepine, Carbenicillin, Chlorpropamide, Diphenylhydantoin, Furosemide, Indomethacin, Methyl_p-hydroxybenzoate, N-acetyl-L-tryptophan, Oxyphenbutazone, Phenobarbital, phenyl_salicylate, Phenylbutazone, Piretanide, Propyl_p-hydroxybenzoate, Quercetin, Salicylate, Sodium_benzoate, Spironolactone, Sulfadimethoxine, Sulfamethizole, Sulfathiazole, sulfisoxazole, Tenoxicam, Tolbutamide, Warfarin, Carprofen, Chlofibrate, Iopanoate, L-tryptophan were docked on to both of the sites. Three different structures – 2BXH, 2BXF and 2BXG were taken for docking. 2BXH was used for docking to the first binding site, 2BXF and 2BXG have different structures of the binding site II and we decided to use both structures for docking. Docking grid 20x20x20A were centered around a central ligand atom in appropriate binding site.

The results of the docking runs are summarized on the Figure. Both binding sites show considerable flexibility, molecular dynamics and the binding free energy calculations with flexible protein (large red points) lead to remarkable improvement of the predicted values of HSA binding affinities with respect to experiment (smaller blue points represent the results of docking on a rigid protein model). Since QUANTUM model does not have training parameters the presented correlation proves QUANTUM abilities to predict HSA binding constants of druglike compounds to both of the active sites.

Tuesday, May 15, 2007

QUANTUM free energy vs. statistical scoring functions

QUANTUM employs quantum mechanics, thermodynamics, and an advanced continuous water model for solvation effects to calculate ligands binding affinities. This approach differs dramatically from scoring functions that are commonly used for binding affinity predictions. By including the entropy and aqueous electrostatics contributions in to the calculations directly, QUANTUM algorithms produce much more accurate and robust values of binding free energies.
Interaction of a ligand with a protein is characterized by the value of binding free energy. The free energy (F) is the thermodynamic quantity, that is directly related to experimentally measurable value of inhibition constant (IC50) and depends on electrostatic, quantum, aqueous solvation forces, as well as on statistical properties of interacting molecules. There are two major contributing quantities leading to non-additivity in F: 1) the electrostatic and solvation energy, and 2) the entropy.
Most of popular scores employ a reasonable approximation for short-range quantum interactions, but do not perform a detailed calculation of aqueous electrostatics and entropy. Both the solvation energy and the entropy are difficult to compute: so instead of exact computations, scoring function use an approximation. In this approximation, the contributions of non-additve properties are estimated as fractions of easier-to-calculate pairwise interactions: electrostatics and van der Waals forces.
Such approximation works to a some extent because vacuum electrostatics is nearly canceled by solvation energies; at the same time the enthalpy of binding is approximately compensated by entropy. Therefore, the calculations of solvation energies and entropy seemingly can be avoided by combining molecu

lar mechanics, van der Waals and electrostatic forces linearly with usually small numerical coefficients (of the order of 10%).
However, a potential energy surface given by such linear combinations of unrelated quantities with statistics-based coefficients is not necessarily related to the true interaction profile. That is why such a simple score fails frequently to reproduce unique binding modes and hence gives docking false negatives. In the same time, this approximation tends to overestimate the affinity of weak binders producing docking false positives. Thus, in spite of reasonable accuracy of such predictions, the selectivity of scoring function is low. This means that frequently scoring functions will not allow to identify really strong binder among the pool of similar weak binders. Moreover, affinities of weak binders may be overestimated.
QUANTUM software does not rely on approximate cancellations of important physical quantities. Instead, we employ our continuous water model to compute both the vacuum and aqueous electrostatic energies, use quantum mechanics to calculate the short range forces and thermodynamic sampling to obtain the value of the free energy (entropy). As the result we can not only observe the necessary subtractions of individual energy components, but also perform molecular modeling in a more realistic and physically justified potentials. Fig. 1 shows the results of a docking run on a single rigid protein structure for 220 different ligands. R.m.s. error in free energy is 2kJ/mol, the correlation coefficient is 0.7.
The selectivity improvement in our approach is illustrated by the following model calculation. First, we derived a simple linear model directly from our QUANTUM vacuum force field. The short range interactions where variationally adjusted (to allow for empirical hydrogen bonds). The van der

Waals “scaling factors” and “protein dielectric constant” were found by correlating the suggested score with experimental binding affinities for a number of known complexes. Both the linear score and complete QUANTUM force field were tested using 300 protein-ligand pairs and showed comparable accuracy.
Fig. 2 shows docking funnels (the energy difference between the conformers plotted as a function of r.m.s. from the known crystallographic position of a ligand). Both full QUANTUM free energies (red lines) and scoring function (blue lines) were used to calculate, in the case of QUANTUM, or estimate, in the case of linear score, the free energies for numerous binding modes (conformers). The conformers where generated by the QUANTUM 3.3 docking program. The resulting energies were averaged over the conformers with similar r.m.s. values and plotted on the same graph.
The model calculation shows that the QUANTUM force field has a much steeper docking funnel, i.e. is more sensitive to misplacements of a ligand, than a scoring function. Therefore QUANTUM complete force field can be used to distinguish similar binding modes and hence obtain much more accurate docking positions. As a result, QUANTUM shows dramatically lower ratio of false positives and false negatives as compared with a scoring function based method.
In fact, non-additive interactions (especially solvation effects) play a key role in molecular recognition of small molecules by proteins. QUANTUM software is practically the only accurate and highly sensitive method available to a broad audience of researchers, which is capable to realistically model the intermolecular interactions.

Thursday, April 19, 2007

Docking selectivity: Additive vs. non-additive force field

The free binding energy (F) of a small molecule and a protein is a non-additive complex function of individual interatomic interactions. There are two major contributing quantities leading to non-additivity in F: the electrostatic energy, and the entropy.

A common approach to molecular docking is to develop a simple (generally additive) model of intermolecular interactions and train it to reproduce experimental values of binding energy for a range of known inhibitors. The obvious advantage is the calculations speed.

Any more sophisticated approach takes requires more computational resources and may hardly lead to an immense improvement in calculations accuracy. The question thus is: do non-additive force fields have any advantages in docking situtations?

To investigate the issue the following experiment was performed:

A simple force Molecular Mechanics (MM) force field, containing a reasonable approximation for (distant-dependent media polarization) electrostatics, van-der-waals and hydrogen bonding, was developed.
Same van-der-waals and hydrogen bonds were paired with vacuum electrostatics and a (non-additive) water model to simulate solvation effects.

Both models were transformed into simple linear regressions (to avoid sophisticated thermodynamic integration) and trained to reproduce the same amount of experimental values.

Although the two models show roughly the same level of accuracy in predicting the binding energies, the selectivity is drustically different. The figure above shows the results of the energy calculations for a set of few hundreds decoys. Both graphs represent the relative binding energy (counted from the minimum position) vs. the r.m.s. deviation of the calculated ligand positions from the known native position. The lines are the energy offset values averaged over the conformers (decoys) with similar r.m.s. positions.

The conclusion?
The solvation energy is the major source of non-additivity in ligand binding. Though being one of the most complicated quantities to be acounted properly in a docking run, the solvation energy is one of the major mechanisms of molecular recognition

Friday, January 19, 2007

What makes a correct binding energy calculation?

A correct calculation of a ligand binding affinity requires proper account of a few large contributions at the same time. The final value of the binding free energy is the sum of multiple sign-alternating entries: electrostatic (ES), solvation (ESolv), exchange-van der Waals (vdW) and entropic (TdS) term.

Common wisdom claims that ES and ESolv nearly compensate each other (up to a few h-bond energies), the remaining (normally large) vdW contribution is nearly canceled by TdS term. In fact, there is another important anti-correlation: strongly burried ligands with large vdW contributions are normally both entropycally and electrostatically constrained: The deeper a ligand gets into a protein (low dielectric constant medium) from water (high dielectric costant medium), the more the dessolvation energy (ES+ESolv) is, the larger the entropy losses normally are.

To clarify this issue, a simple calculation with 32 pdb structures with knows binding affinities was performed (1AU0, 1AU3, 1AU4, 1BR5, 1BR6, 1C83, 1C84, 1C85, 1C86, 1C87, 1EFY, 1EJN, 1JIJ, 1JIK, 1MS6, 1N3W, 1NAX, 1NLI, 1PWY, 1Q6K, 1RRI, 1RRW, 1RRY, 1RS2, 1TOW, 1TSM, 1URG, 1V2M, 2CBR, 2CBS, 3CBS, 4ERK). The binding affinities were taken from the PDBBind database. For every structure both the complex, the ligand and the protein was initially optimized in QUANTUM force field, then a thermodynamic integration was used to establish the free binding energies. The correlation between the calculated and measured (reported in the literature) values is shown on the left Figure 1. The free binding energies are reported in kJ/mol.

First we check the sanity of our solvation energy calculations. The following graph represents the anti-correlation between the electrostatic energy ES and the solvation contribution ESolv (both polar and non-polar, no additional dielectric constant for the protein was used since the protein is fully flexible). The two values nearly cancel each other, as it should be the case.
The only physically meaningful combination is the dessolvation punishment ESolv+ES, which is plotted against the vdW contribution (see Figure 3).

Remarkably, the dessolvation energy anti-correlates with the vdW term, which is a good measure of the contact surface. In fact, many scoring functions are pair sums of relatively short range potentials and hence correlate well with the contact surface. Non-additive desolvation energy does not correlate with the contact terms and (together with entropy losses) provides a limitation on binding energy for large and deeply burried ligands. TdS depends on the details of interactions in a smooth logarithmic way, which means that the dessolvation punishment is a leading effect.

The whole Molecular Mechanics (MM) Hamiltonian was used to generate MM trajectories for thermodynamic integration for each of the complexes. The results of the calculations are represented on the Figure 1. Root mean squared error of the calculations 6.1 kJ/mol. The dessolvation energy is a big contribution and removes (or better to say improves over) the correlation between the binding free energy and contact surface.
Conclusion 1: Binding free energy of a ligand is not a contact surface (or anything close to that approximable with a sum of pairwise short-range potential)
Conclusion 2: Desolvation energy is a major limiting factor restricting the binding energy of large ligands.