Thursday, November 28, 2013

Standing on the Shoulders of Giants: A Brief Overview of Microbiome Studies and Why They are Important


The discovery and characterization of microbial species continues to be a long and painstaking process.  Scientists have spent decades (well... really centuries) carefully studying the distinct properties of different microbes, examining the evolutionary relationships between different microbes, and organizing microbes into taxonomic groups based on various commonalities.  One of the many benefits from this research program, and especially the DNA sequence characterization and organization of microbes, have been the resulting reference database collections which are used in sequence-based microbiome studies.  In recent years there has been a lot of excitement growing aroung the use these DNA sequence reference databases to characterize microbial communities.  While the spotlight has really been on microbial communities as a whole, the importance of individual microbial species discovery and characterization has been somewhat overlooked.

Flowchart illustrating the synergistic relationship between studies focusing on
microbial community characterization and individual microbe characterization.
Studies involving characterization of individual microbes provide references
for microbial community studies, while microbial community studies provide
context for individual microbe characterization studies.  This relationship is
sometimes overlooked.

A recent paper in PNAS did a really nice job of highlighting the important synergy between molecular microbial community studies and studies focusing on individual microbe discovery [1].  Most importantly, this study highlights the benefits microbiome studies will provide microbe species discovery studies, instead of the other way around (i.e. microbiome studies benefiting from microbe discovery studies).  In this post,  I first want to briefly go over what microbiome studies are actually doing, just to make sure we are all on the same page.  Then I am going to talk about the interesting implications of the paper by Kang et al [1].  For another brief overview of the paper by Kang et al, check out the PNAS commentary by Culley, AI [2].


There are basically two types of molecular techniques, which make use of high-throughput sequencing platforms, that are used to study microbial communities (the microbiomes).  The first is the amplicon sequencing approach, where the region of a certain ubiquitous gene is PCR amplified (this amplified gene region is called the amplicon), sequenced, and compared to other sequences.  To illustrate this technique, we can think about bacteria community analysis using the 16S rRNA gene.

The 16S rRNA gene is a gene that contains the code needed to make protein synthesis machinery within the bacterial cell, and the gene is found in all bacteria.  The 16S rRNA gene maintains conserved and variable sequences, with the conserved regions allowing for general PCR primer binding and the variable regions allowing for bacteria identification and community analyses (taxonomic and phylogenetic analyses).  Scientists use this gene to study bacterial communities by PCR amplifying certain regions of the gene, sequencing those PCR products (amplicons), and using those sequences in community analyses.

Some populations, notably virus and bacteriophage populations, do not have ubiquitous genes that can be used in amplicon-based molecular techniques, so scientists have to use whole metagenome shotgun sequencing techniques.  Instead of focusing on a certain genetic region like amplicon based techniques, this technique involves randomly sequencing fragments from all of the genomes within the population, reassembling them (based on sequence similarity) into large sections of genomes, and using those genomic sections for community analyses.  The video below does a great job illustrating the basic concept behind whole metagenome shotgun sequencing.

As you might imagine, using these techniques to identify bacteria, viruses, and other microbes depends heavily on reference databases for DNA sequence matching.  The problem here is that most organisms remain unknown and uncharacterized, especially in virus communities.  In fact, it is estimated that 99% of bacterial species have not even been cultured for characterization, and an even greater amount of viruses (including bacteriophages) remain unknown.  Therefore, it will be important to continue research investigating undiscovered microbes, adding their DNA sequences to reference databases, and thereby allowing for more informative microbial community studies with more complete reference databases.  However, with the increased deposition of various microbiomes, it seems that instead of species characterization providing a foundation for microbiome studies, the microbiome studies will also provide a context for future microbe discoveries (see cartoon figure above).


In their somewhat recent study, Kang et al discovered and characterized the HMO-2011 bacteriophage, which infects bacteria of the SAR116 clade, one of the most abundant marine bacterial lineages.  At first, their study goes into the usual virus characterization with high points including electron microscope images, genome sequence overview, etc, and also goes over how the HMO-2011 phage interacts with its SAR116 host.  This is all really interesting of course, but the overall highlight of the paper was what happened when they searched for matches to the HMO-2011 phage genome in existing phage metagenome data sets.

As I alluded to above, virus metagenomes often result in a high amount of unknown sequences (sometimes up to 99% of sequences are unknown), meaning they do not match any known virus, or even any known organism.  This clearly limits the utility of some virus metagenomic data sets.  To understand how their newly characterized phage fit into known virus communities, Kang et al searched for matches to their new phage genome in existing phage metagenome data sets, and found something quite interesting.  The group's phage matched up to 30% more sequences, meaning that the addition of their single new phage genome to the reference database provided 30% more sequences with matches to known viruses (overview in figure below) [1,2].  Said another way, this study used existing marine virus metagenomes to give context to the phage discovery, and likewise added important information to the existing marine phage metagenomes.

Simplified overview of how the SAR bacteriophage allowed for 30% more
sequences to be identified.  Briefly, in previous studies, seawater samples
were collected from various marine environments, and phage genomes were
extracted, sequenced, and compared to reference databases.  The addition of
the SAR phage genomes into reference databases allowed for up to 30%
more of the sequences to be identified.  Taken from ref [2].
This discovery of SAR phage dominance in existing marine phage metagenomes is important for a few reasons.  The first reason, and the reason most directly related to the purpose of the paper, is simply that the HMO-2011-like phages are a dominant family of bacteriophages in marine virus communities.  This was expected because their hosts, bacteria from the SAR116 clade of bacteria, are dominant in the marine bacterial communities.  Although the dominance of HMO-2011 was expected, it had not yet been shown, which is one reason why this paper is valuable.

The second reason why this discovery is important is because it really supports the value of research that focuses on discovering novel microbes.  There are a lot of microbes which have not yet been discovered or characterized, so this research must continue if we are to improve our understanding of microbial communities.  Microbiome studies have somewhat overshadowed single microbe studies, but this paper really shows how important the single microbe studies are.

Finally, this paper's discoveries are overall important because they shed some light onto how much unknown data microbiome sequences contain, and how previous microbiome studies will continue to be important when integrated into future analyses.  Much like how species discovery aids microbiome studies by providing extensive reference collections, microbiome studies will aid microbe discoveries by giving context to their discoveries and allowing greater understanding of how they fit into bigger pictures.  Additionally, as more microbiomes are characterized, I predict we will see an increase in their use in giving context to discoveries of newly characterized microbes.


1.  Kang, Ilnam; Oh, Hyun-Myung M; Kang, Dongmin; Cho, Jang-Cheon C (2013). Genome of a SAR116 bacteriophage shows the prevalence of this phage type in the oceans Proceedings of the National Academy of Sciences of the United States of America DOI: 10.1073/pnas.1219930110

2.  Culley AI (2013). Insight into the unknown marine virus majority. Proceedings of the National Academy of Sciences of the United States of America, 110 (30), 12166-7 PMID: 23842091

No comments:

Post a Comment