Vortex script for Matching muliple SMARTS queries such as PAINS

One of the really neat features of the latest version of Vortex (> build 29622) is the ability to script multiple sub-structure searches using SMARTS. There are many occasions when this sort of feature is useful, if you want to flag molecules that contain reactive functional groups, toxicophores, or PAINS functional groups that have been shown to interfere with a variety of screens. Alternatively if you have a drug discovery project with multiple chemotypes you might want to tag particular groups of compounds as belonging to a named series to aid analysis.

PAINS filter

A recent comment in Nature “Naivety about promiscuous, assay-duping molecules is polluting the literature and wasting resources” underlines the importance of ensuring that any hits from a bioassay are genuine ligands and not a non-selective artefact.

These molecules — pan-assay interference compounds, or PAINS — have defined structures, covering several classes of compound. But biologists and inexperienced chemists rarely recognize them. Instead, such compounds are reported as having promising activity against a wide variety of proteins. Time and research money are consequently wasted in attempts to optimize the activity of these compounds. Chemists make multiple analogues of apparent hits hoping to improve the ‘fit’ between protein and compound. Meanwhile, true hits with real potential are neglected….Most of all, academic drug discoverers must be more vigilant. Molecules that show the strongest activity in screening might not be the best starting points for drugs. PAINS hits should almost always be ignored. Even trained medicinal chemists have to be careful until they become experienced in screening. Take it from us: do not even start down these treacherous routes.

Jonathan B. Baell and Georgina A. Holloway published a very interesting paper on their analysis of frequent hitters from screening assays. DOI, in the supplementary information they provided the corresponding filters in Sybyl Line Notation (SLN) format. These were converted to SMARTS format for use with filter-it but can also be used to create a Vortex script to provide a PAINS filter as nicely demonstrated by Dan Ormsby and Mike Hartshorn.

The first part of the script contains the series of SMARTS strings and the associated text label, these are in a standard format (only a limited selection are shown below).

It then adds explicit Hs to all molecules before matching to deal with the PAINS SMARTS as the patterns contain explicit Hs. (currently this is limited to molecules with <1000 heavy atoms).

The script also displays a progress box, rather nicely this also displays the name of the SMARTS string currently being used as the query.

One of the nice things is that Vortex uses multiple threads.  (Kmeans clustering and property calculation are all threaded too).  Java asks the OS how many processors are there.  My six core MacPro has 12 notional cores resulting in 1100% CPU usage.

The PAINS filter Vortex Script

Reactive Groups Filter

One of the important steps in building a screening collection is to remove molecules containing chemically reactive groups (unless you are looking for covalent modifiers). Most companies have there own set of functional groups they don’t want in the screening collection. The list of groups shown below in the script I’ve compiled over the years (usually as the result of finding a false positive in a screen). It would be trivial to add or remove groups.

The first part of the script contains the series of SMARTS strings and the associated text label, these are in a standard format

It is very straight forward to add new structural alerts by simply adding the appropriate query string. I would really recommend using the free online SMARTSviewer to check the queries.

The Remove Reactive Vortex Script

Organising into Structural Classes

When working on a drug discovery project with multiple chemotypes you often want to tag particular groups of compounds as belonging to a named structural class to aid analysis. An example is shown below which classifies structures into Indoles, Indazoles, Benzimidazoles etc. by simply replacing the SMARTS patterns with those defining each of the structural classes.

The scripts can be downloaded here

MatchPAINS


RemoveReactive

Page Updated 9 October 2014

Related Posts

One thought on “Vortex script for Matching muliple SMARTS queries such as PAINS

Comments are closed.