Strony

wtorek, 21 października 2014

How many PAINS are there in commercial libraries?

Introduction

PAINS is a hot topic recently. Some people estimates, that those compounds are 5-12% of all commercial libraries [1]. Here I present results of assesing percentage of PAINS in a various popular commercial libraries as well as in ZINC-all-now database.

Libraries was filtered with SMARTS patterns prepared by Rajarshi Guha [2] and provided by filter-it software [3]. For comparison, I've included also ZINC database (which is a filtered and curated collection of ligands from commercial libraries), Chembl (v. 19; small molecules from scientific literature) and SureChemBl (structures from patents). There is also StructuralAlerts filter delivered by silicos-it and based on [4].


Results


How many PAINS are there?

To sum up: it's not so bad: maximal percentage of PAINS is <3% and 1.65% on average (1.82% for typical libraries).






databaseFilterFamily AFilterFamily BFilterFamily CTotal PAINS (A+B+C)StructuralAlerts
chembl_191.56%0.62%0.21%2.39%47.03%
Enamine Advanced0.30%0.10%0.08%0.48%21.17%
Enamine HTS0.65%0.10%0.13%0.89%26.75%
LifeChemicals stock1.70%0.15%0.11%1.95%25.11%
Maybridge Screening1.56%0.76%0.62%2.94%48.62%
SureChEMBL0.02%0.00%0.00%0.02%0.65%
Zelinsky HTS1.66%0.86%0.32%2.83%47.36%
ZINC All_now1.20%0.31%0.21%1.73%29.53%
Average - %PAINS1.08%0.36%0.21%1.65%30.78%



What are those PAINS?


Here are results of top 20 alerts (for all screened libraries) and a percentage of all alerts:


Here we have SMARTS of top PAINS pollutants (do you recognize your hits here?;):





ruleCountPercent
azo_A(324)6393415.65
ene_rhod_A(235)6172715.11
anil_di_alk_D(198)4410310.79
anil_di_alk_C(246)4393610.75
imine_one_A(321)283306.93
ene_five_het_G(10)253336.20
anil_di_alk_B(251)212855.21
ene_five_het_B(90)146693.59
imine_one_isatin(189)133423.27
ene_five_hetA1(201A)131363.21
thio_ketone(43)81922.00
anil_alk_ene(51)70631.73
ene_one_hal(17)49491.21
thiophene_amino_Aa(45)49331.21
ene_five_het_C(85)47621.17
ene_one_ene_A(57)44471.09
imine_one_fives(89)43231.06
amino_acridine_A(46)40961.00
ene_five_het_D(46)38700.95
keto_keto_beta_A(68)37900.93
rhod_sat_A(33)22070.54
ene_cyano_A(19)21040.51
ene_five_one_A(55)16990.42
het_thio_66_one(8)16330.40
imidazole_A(19)13950.34
diazox_sulfon_A(36)13910.34
quinone_B(5)12660.31
keto_phenone_A(11)12620.31
acyl_het_A(9)12450.30
thiaz_ene_D(8)12190.30
keto_keto_gamma(5)10890.27
anil_di_alk_F(14)9690.24
styrene_A(13)9670.24
imine_imine_A(9)8930.22
cyano_cyano_A(23)7660.19
keto_keto_beta_B(12)6440.16
het_6666_A(2)5720.14
steroid_A(2)4330.11
imine_one_sixes(27)3440.08
ene_five_het_E(44)2630.06
keto_phenone_B(1)2410.06
het_65_C(6)2160.05
styrene_B(8)1860.05
het_5_A(7)1330.03
imine_one_fives_B(9)1280.03
het_thio_5_imine_A(1)1140.03
ene_misc_A(5)1080.03
het_pyridiniums_B(2)910.02
cyano_cyano_B(3)860.02



References


[1] http://cen.acs.org/articles/92/i35/Getting-Rid-Painful-Compounds.html, http://pipeline.corante.com/archives/2014/09/26/pains_go_mainstream.php
[2] http://blog.rguha.net/?p=850
[3] Unfortunatelly, no logner on the web
[4] Brenk et al. (2008) ChemMedChem 3, 435-444