Introduction
PAINS is a hot topic recently. Some people estimates, that those compounds are 5-12% of all commercial libraries [1]. Here I present results of assesing percentage of PAINS in a various popular commercial libraries as well as in ZINC-all-now database.Libraries was filtered with SMARTS patterns prepared by Rajarshi Guha [2] and provided by filter-it software [3]. For comparison, I've included also ZINC database (which is a filtered and curated collection of ligands from commercial libraries), Chembl (v. 19; small molecules from scientific literature) and SureChemBl (structures from patents). There is also StructuralAlerts filter delivered by silicos-it and based on [4].
Results
How many PAINS are there?
To sum up: it's not so bad: maximal percentage of PAINS is <3% and 1.65% on average (1.82% for typical libraries).| database | FilterFamily A | FilterFamily B | FilterFamily C | Total PAINS (A+B+C) | StructuralAlerts | 
|---|---|---|---|---|---|
| chembl_19 | 1.56% | 0.62% | 0.21% | 2.39% | 47.03% | 
| Enamine Advanced | 0.30% | 0.10% | 0.08% | 0.48% | 21.17% | 
| Enamine HTS | 0.65% | 0.10% | 0.13% | 0.89% | 26.75% | 
| LifeChemicals stock | 1.70% | 0.15% | 0.11% | 1.95% | 25.11% | 
| Maybridge Screening | 1.56% | 0.76% | 0.62% | 2.94% | 48.62% | 
| SureChEMBL | 0.02% | 0.00% | 0.00% | 0.02% | 0.65% | 
| Zelinsky HTS | 1.66% | 0.86% | 0.32% | 2.83% | 47.36% | 
| ZINC All_now | 1.20% | 0.31% | 0.21% | 1.73% | 29.53% | 
| Average - %PAINS | 1.08% | 0.36% | 0.21% | 1.65% | 30.78% | 
What are those PAINS?
Here are results of top 20 alerts (for all screened libraries) and a percentage of all alerts:
Here we have SMARTS of top PAINS pollutants (do you recognize your hits here?;):
| rule | Count | Percent | 
|---|---|---|
| azo_A(324) | 63934 | 15.65 | 
| ene_rhod_A(235) | 61727 | 15.11 | 
| anil_di_alk_D(198) | 44103 | 10.79 | 
| anil_di_alk_C(246) | 43936 | 10.75 | 
| imine_one_A(321) | 28330 | 6.93 | 
| ene_five_het_G(10) | 25333 | 6.20 | 
| anil_di_alk_B(251) | 21285 | 5.21 | 
| ene_five_het_B(90) | 14669 | 3.59 | 
| imine_one_isatin(189) | 13342 | 3.27 | 
| ene_five_hetA1(201A) | 13136 | 3.21 | 
| thio_ketone(43) | 8192 | 2.00 | 
| anil_alk_ene(51) | 7063 | 1.73 | 
| ene_one_hal(17) | 4949 | 1.21 | 
| thiophene_amino_Aa(45) | 4933 | 1.21 | 
| ene_five_het_C(85) | 4762 | 1.17 | 
| ene_one_ene_A(57) | 4447 | 1.09 | 
| imine_one_fives(89) | 4323 | 1.06 | 
| amino_acridine_A(46) | 4096 | 1.00 | 
| ene_five_het_D(46) | 3870 | 0.95 | 
| keto_keto_beta_A(68) | 3790 | 0.93 | 
| rhod_sat_A(33) | 2207 | 0.54 | 
| ene_cyano_A(19) | 2104 | 0.51 | 
| ene_five_one_A(55) | 1699 | 0.42 | 
| het_thio_66_one(8) | 1633 | 0.40 | 
| imidazole_A(19) | 1395 | 0.34 | 
| diazox_sulfon_A(36) | 1391 | 0.34 | 
| quinone_B(5) | 1266 | 0.31 | 
| keto_phenone_A(11) | 1262 | 0.31 | 
| acyl_het_A(9) | 1245 | 0.30 | 
| thiaz_ene_D(8) | 1219 | 0.30 | 
| keto_keto_gamma(5) | 1089 | 0.27 | 
| anil_di_alk_F(14) | 969 | 0.24 | 
| styrene_A(13) | 967 | 0.24 | 
| imine_imine_A(9) | 893 | 0.22 | 
| cyano_cyano_A(23) | 766 | 0.19 | 
| keto_keto_beta_B(12) | 644 | 0.16 | 
| het_6666_A(2) | 572 | 0.14 | 
| steroid_A(2) | 433 | 0.11 | 
| imine_one_sixes(27) | 344 | 0.08 | 
| ene_five_het_E(44) | 263 | 0.06 | 
| keto_phenone_B(1) | 241 | 0.06 | 
| het_65_C(6) | 216 | 0.05 | 
| styrene_B(8) | 186 | 0.05 | 
| het_5_A(7) | 133 | 0.03 | 
| imine_one_fives_B(9) | 128 | 0.03 | 
| het_thio_5_imine_A(1) | 114 | 0.03 | 
| ene_misc_A(5) | 108 | 0.03 | 
| het_pyridiniums_B(2) | 91 | 0.02 | 
| cyano_cyano_B(3) | 86 | 0.02 | 
References
[1] http://cen.acs.org/articles/92/i35/Getting-Rid-Painful-Compounds.html, http://pipeline.corante.com/archives/2014/09/26/pains_go_mainstream.php
[2] http://blog.rguha.net/?p=850
[3] Unfortunatelly, no logner on the web
[4] Brenk et al. (2008) ChemMedChem 3, 435-444



 




