Finding Novel Antimicrobial Peptides with AMPLY

In this post Ben Thomas gives an overview of the rationale behind his tool AMPLY:

Rise of the superbugs
The rise of antibiotic resistant strains of microbes (bacteria, parasites, viruses and fungi) is probably the leading threat facing humankind. Increasingly desperate warnings about the real-world implications of the increasing resistance are now front page news. The problem is a multifaceted one. Antimicrobial resistance is not just about the loss of human life, but inextricably intertwined with increased patient morbidity and massive economic consequences for global healthcare systems. There are two possible solutions, either a socio-political/behavioural change or a technical/scientific response. Humankind has shown itself remarkably intransigent when faced with doom laden prophecies that require behavioural modification to circumvent (see also Climate Change), therefore it is probably prudent to assume that a managed technical response may be our best hope. But new antibiotics are unlikely to arise spontaneously. Mokyr highlights one of the issues with relying on the existing pharmaceutical industry to address the problem: “…few economies have ever left [decisions like these] entirely to the decentralized decision-making processes of competitive firms. The market test by itself is not always enough” (Mokyr, 1998).
Discovery of AMPs

Figure 1: A cationic, helical AMP (Taliecin-1)

The discovery of AMPs dates back to 1939, when Dubos extracted an antimicrobial agent from a soil Bacillus strain. The designation of AMPs has been extended to encompass a general view of them as a group of anionic antimicrobial proteins/peptides; host defence peptides; cationic amphipathic peptides and cationic AMPs. In contrast to acquired immune mechanisms these endogenous peptides provide a fast and effective means of defence against pathogens as part of the innate immune response. Antimicrobial peptides are evolutionary ancient weapons and their ubiquity throughout the animal and plant kingdoms supports the hypothesis that they have played a key role in the successful evolution of complex multi-cellular organisms. Such is their diversity they can be found in locations as disparate as the skin secretions of a frog to the defensive arsenal of a protozoa.

Dolby Bioinformatics

Figure 2: The Dolby certification logo (

One specific feature of AMPs that makes them difficult to find is that they’re small (often less than 20 amino acids in length – which is comparatively tiny compared to typical proteins). In a typical ‘omic dataset containing, potentially millions and millions of datapoints, isolating interesting AMPs for synthesis and testing is a challenging test. For inspiration we can look to the music industry. In the mid-20th century recordings were made on magnetic tape and engineers wrestled with an ever present low level of hissing noise in the background that threatened to drown out the music. Various ingenious solutions were deigned to mitigate the persistent hiss from forms of “low-noise” tape which recorded more signal; running the tape at a higher speed, or using dynamic pre-emphasis during recording and a form of dynamic de-emphasis during playback. This latter approach became the backbone of the Dolby noise reduction system, which became all pervasive in home audio equipment from the late 60s onwards. The audio engineer’s struggle to maximise signal-to-noise is the same core problem that faces computational biologists and the ongoing analysis of ‘omic “big data” in the search for tiny novel AMPs. There is music there, but at the moment the hiss is tremendous.
The detection of AMPs in metagenomic data is a tantalising low-hanging fruit for computational biologists, however. Post-computational wet-lab work is relatively cheap with spot synthesis of peptides up to around 25aas long possible from a wide array of third party companies with prices from as low as £2.50 per amino acid. A well organised screening program can screen in excess of 100 peptides a day, per person, against a model bacterial organism to test for activity. As a potential workflow the rapid assessment of multiple ‘omic datasets; identification of homologues of pattern matched AMPs; rapid synthesis and screening and a rush to publication would appear to provide a grant-friendly drug-discovery goldmine! But to tap this rich vein, improving the hit rate of putative AMPs from ‘omic data needs to be streamlined and improved.
The AMPLY Pipeline
Finding small sequences (you’re interested in) that often look a lot like other small sequences (you’re not interested in) in datafiles that can contain potentially gigabytes of data is a trickier task than it first appears. Annotation in metagenomics is an art and the determination of what’s real and what’s not often relies purely on defining mutually agreed thresholds. However, as the length of the aligned data being identified starts to shorten, a lot of the assumptions on PercentageID, BitScore and E-Value thresholds begins to fall away. It’s here we return to the Dolby signal-to-noise analogy – the “music” of the AMPs in metagenomic datasets are often drowned out by the sheer volume of background noise and to find them we need to adopt a novel strategy of aggressive emphasis.
 Designed by Ben Thomas in the CreeveyLab (, AMPLY ( is a pipeline designed to plug this gap between the ‘omic data and lab work. AMPLY is designed to provide a basis to sift-out AMPs suitable as synthesis candidates and provide potential regions for crude synthesis by adopting a hyper-wide “balance of evidence” approach. AMPLY passes over data with a series of detection methods, then wrapping the summative results of both them and presenting the final results into a final tableau (known as the “bitpad”) where each potential AMP can be evaluated on the strength of a series of hundreds of datapoints, rather than just a couple of numeric values.
Figure 3: The AMPLY workflow
To date, AMPLY has been used to find, characterise and synthesise thousands of novel AMPs. Among the AMPs discovered by AMPLY many are highly active against MRSA (a key superbug) and offer encouraging potential treatment avenues for future development. While there is still much work to be done, results so far have been extremely promising: AMPLY has been used to find bioactive AMPs in datasets as diverse as the skin of Peruvian poison dart frogs to the testicles of a Salamander so the only limitation in the AMPLY pipeline is the diversity of the stream of ‘omic data provided to it.
So, if you’re reading this blog and have interesting data and would like to be part of the drive to find new antimicrobials then get in touch for potential collaborations. We are always interested.
Contact Ben Thomas at, or via Twitter @flwrs4algrnon

Mokyr, Joel. “The political economy of technological change.” Technological revolutions in Europe (1998): 39-64.

Introducing CowPi: A rumen microbiome focussed version of the PICRUSt functional inference software

We've just release our version of PICRUSt for the rumen microbiome. This software uses the 16S sequences from the Global Rumen Census and the nearly 500 published genomes from cultured rumen organisms currently available (most from the Hungate 1000 project) to allow functional inferences from 16S meta-taxonomic studies and can be found at

The paper has now been published in Frontiers in Microbiology here as part of a special research topic on "Metaomic Approaches to Study the Rumen Microbiome: Challenges and Innovation". We demonstrate how using the data directly from the Rumen allows a better prediction of the abundance of functional units (for example see figure 1 from the paper, reproduced here).

FIGURE 1. Prediction accuracy on the Hess et al. (2011) dataset. (A) Metagenome compared with PICRUSt and (B) Metagenome compared with CowPI. Points represent the relative abundance of KOs in the observed (y-axis) and predicted (x-axis) dataset

The data behind the tool can be downloaded at Zenodo under this DOI: DOI but we have also implemented a workflow on the publicly accessible GALAXY implementation from Aberystwyth University here. This is designed to make functional predictions from Rumen 16S metataxonomic data as simple as possible.

All users of this system will need to register and set up their account (walk through tutorial here). Once set up the impatient users can assess a rapid overview of using the system here and a more detailed explanation of each step of the workflow can be found here.

We would welcome feedback from the community and would encourage other groups to implement external-facing version for other groups to use so please get in touch if you have any questions/suggestions.

Detecting microbial niches in Metagenomic data

Microbes colonising the surface of grass.
The image was taken using a Hitachi S-4700 FESEM scanning electron microscope by Alan Cookson 
at the IBERS Advanced Microscopy and Bio-Imaging Laboratory, Edward Llwyd building, Penglais.
In collaboration with the Agriculture and Food Development Authority in Ireland (Teagasc) we recently published a new way of identifying how different types of microbes can survive when competing for resources in the same environment.

The paper by former PhD student in the group, Dr Francesco Rubino, identifies what is known as ‘niche specialisation’  and is published in the Nature Publishing Group (NPG) ‘Multidisciplinary Journal of Microbial Ecology’: The ISME Journal.

Niche specialisation is the process by which, through natural selection, a species becomes better adapted to the specific characteristics of a particular habitat.
These organisms can be the principal drivers of important processes in the community and therefore are prime targets for researchers looking to engineer microbial communities to achieve desired outcomes.

It has long been thought that ecological principles developed for the study of large organisms should also be applicable to micro-organisms and while processes such as successional change and competition are known to occur in microbial communities, identifying signatures of niche specialisation remains a challenge.

Despite the large numbers of microbiome studies that have been generated from the microbial populations found in the gut, the soil, the sea and human skin, we still lack a clear understanding of the ecology of the micro-organisms that have an essential role to play in everything from human health to earth system processes.

We were looking to identify what resources different micro-organisms compete over when they are present in the same environment. Developing such an understanding is essential to meet many of the major challenges facing human society today, such as management of natural ecosystems and mitigation of climate change.

This study examined the signatures of niche specialisation between some of the most abundant organisms in the rumen microbiome of cattle, a major source of methane – the second most significant greenhouse gas in the UK.

We used a novel computational biology approach implemented in MGKit and based on evolutionary methods to identify the genes and functions that play an important role in maintaining niche specialisation.

The results identified the specific functions important for each organism within the microbial community to maintain its niche in the rumen of cattle and represent novel targets for engineering this microbiome for desirable outputs (such as reducing greenhouse gasses).
This represents the first use of evolutionary approaches in this context and will open avenues of further research to both identify niche specialisation in any microbiome and to identify the organisms important for specific functions in any microbial community.

This work was funded by the Biotechnology and Biological Sciences Research Council (BBSRC), EU Seventh Framework Programme and Science Foundation Ireland.