EDAMAME 2014's Greatest Hits
I recently had the wonderful opportunity to attend the Explorations in Data Analyses for Metagenomic Advances in Microbial Ecology (EDAMAME) Workshop. The workshop was held at the Kellogg Biological Station in Michigan by Ashley Shade (@ashley17061), Tracy Teal (@tracykteal) and Josh Herr (@number_three) - who are all AWESOME. I thought I'd go through and highlight what I found to be my favorite and/or most useful parts of the workshop.
Before I jump right in to the science, my favorite non-science related parts of EDAMAME included non-stop ice cream (lunch and dinner - with waffle cones!), meeting such a diverse crowd of microbe lovers, seeing Guardians of the Galaxy at the Alamo Drafthouse Cinema, enjoying some local Michigan beer at Bell's Brewery and hearing Jack Gilbert (@gilbertjacka) play guitar and sing one night at the campfire (and yes, there were s'mores!).
Anyway, pop this bad boy in and let's get started!
Track 1: O-T-U Child
For microbial ecology newbies, I think that EDAMAME's introduction to alpha diversity and beta diversity lectures are a very good resource. I found the section on ordination plots especially helpful as even though I have taken introductory statistics classes such plots were never discussed! I would also point the microbial ecology newcomer to EDAMAME's Introduction to Shell and Introduction to QIIME tutorials (Part 1, Part 2 and Part 3) as both are very well documented and every step in the tutorials is well described.
Some things to think about: Replication and Experimental Design. Something that was mentioned several times by different guest speakers (Pat Schloss (@PatSchloss) and Jim Cole, I think) was the idea of using a synthetic mock community each time you do a sequencing run to ascertain the error rate of that particular sequencing run. We also had several conversations about what replication means for microbial ecologists - if interested in what we discussed, go read: Replicate or lie.
Track 2: Bioinformatics Killed My Computer
If you have previously tried to pick OTU's or chimera check in QIIME or mothur (if confused, please review resources discussed in Track 1) on your personal laptop or even a lab computer, you are probably familiar with long wait times, black screens and spinning wheels of death. Data sets are increasing in size at a much more rapid pace then the computational power included in standard use laptops. My Macbook Pro is five years old (which means its basically a dinosaur) and still kicking it with its 2.26 GHz Intel Core 2 Duo processor and its recently upgraded 8 GB of RAM. However, when it comes to some of my data analysis, its fan and slow response time make it sound like a small dying animal. So what is a researcher to do? Use a superior lab computer that has an i7 processor and 16 GB of RAM - YAY! But what happens when even that is not enough?
Solution Numero Uno: Use a service like Amazon Web Services where you pay to use Amazon's computational power to run your analyses. Don't know how to set up an Amazon instance? The lovely EDAMAME instructors have got you covered with this tutorial and also these follow-up tutorials on how to connect to your instance from a PC or a Mac/Linux machine. One thing that makes using Amazon Web Services nice is that there are community API's you can use that have QIIME and/or mothur already installed.
Solution Numero Dos: Use your university's computing cluster (if available). In my experience, most clusters are logged into the same way as you log into your Amazon instance - using SSH. For those unfamiliar with SSH, here is a tutorial I found using the magic of Google for Mac/Linux computers. One problem you might run into with using a cluster is that the software you want to use might not be installed - in which case, if you are lucky the computer gurus who manage your local cluster will install it and keep it up to date for you! However if you are unlucky... they might say that they won't install it at all.
If using either of these solutions, I highly suggest looking into using either Screen or Tmux which were introduced to us at EDAMAME (I am currently using Tmux). Screen and Tmux allow you to open multiple bash windows on the machine you are SSH-ed into. This means if you run a command from inside a bash window in Screen/Tmux and then detach from Screen/Tmux and exit your SSH session, your command will continue to run on the server or cluster your were logged into. This means you no longer have to worry about the internet cutting out or leaving your personal computer on when running commands on remote server/clusters!
Track 3: I Wanna Download Some Genomes
Have you ever wanted to download a genome from NCBI or MG-RAST but not wanted to hassle with the ever changing website interfaces? Have you ever wanted to download hundreds of genomes but the thought of all that clicking has you wanting to run to your safe place? Me too! I found guest lecturer, Adina Chuang Howe (@teeniedeenie)'s tutorial on how to do these things from the shell to be one of the most useful parts of EDAMAME. The tutorial comes with the necessary scripts that you can re-purpose to download your favorite genomes - these could potentially save you hours and hours of valuable research time (and most of your sanity).
Track 4: I Got A Database
Throughout EDAMAME, we were introduced to several different marker gene databases (16S, ITS). The default database in QIIME is the greengenes database and until EDAMAME, I never realized just how many databases there were! I was especially surprised when in Jim Cole's guest lecture he showed us a slide with a venn diagram comparing the sequences contained in the different fungal ITS databases and there was a definite visible discrepancy between what each database contained. I also found his slides comparing the taxonomic accuracy of the different fungal ITS databases as we have some seagrass fungal microbe data to analyze. I think the exposure to the different databases was helpful to my evolution as a microbial ecologist as it made me start thinking about which database I'm using for my analyses and what my biases these databases might introduce to my analysis. Some helpful questions to keep in mind when looking at different databases are laid out in this EDAMAME tutorial.
Track 5: Don't You (Forget To Visualize Me)
Humans are very visually oriented creatures and having visually appealing, statistically accurate and reproducible graphs can really help a presentation succeed. A strong knowledge of R (which I someday hope to have) can help you achieve such graphs. We walked through several different R tutorials relating to microbial ecology at EDAMAME which I thought were helpful when combined with EDAMAME's beta diversity and hypothesis testing lectures. These R tutorials were also specifically aimed at utilizing OTU tables which helped me think about how I might apply what I was learning to my own data analysis. We were also introduced to a vast array of data visualization and exploration tools (see this lecture).
Bonus Track: Hooked on Microbial Ecology (because if I hadn't been hooked on microbial ecology before EDAMAME, I sure as would be now!)
EDAMAME Class of 2014 in our MoBio T-shirts!
All of the workshop materials for EDAMAME 2014 can be found on their website and there is also a storify of all the live tweeting that went on throughout the EDAMAME workshop using the hashtag #edamame2014. Also, look out for a future guest post on MoBio's blog, The Culture Dish, about seagrass microbes!