This release changes the format of the "Results.txt" file to make it easier to parse .. it removes colon-delimited information fields from DicerCall and from the phasing results. It also adds an option to suppress phasing analysis altogether if the user so desires.
Minor change in new ShortStack.pl package in this release -- now includes a third helper script, "miR_homologs.pl". miR_homologs.pl takes as input alignments of known mature miRNAs against a reference genome, and analyzes hairpin structure and miRNA/miRNA* pairing to determine if the locus appears to conform the Meyers et al. criteria for MIRNA annotation (in the absence of expression data). It can be used in at least two ways:
1. Annotating a new genome based on miRNAs known from other species, in the absence of any small RNA expression.
2. Prefiltering entries from miRBase to a subset of loci whose predicted structures conform to accepted criteria.
Importantly, the results for miR_homologs.pl can be fed into ShortStack in --count mode, in order to compare MIRNA loci identified by homology to actual expression data.
I am pleased to present a new tool for analysis of small RNA sequencing data -- ShortStack. This program is designed for comprehensive annotation of small RNA-producing genes, including siRNA clusters, phased small RNAs, hairpin-derived small RNAs, and MIRNAs. Loci are discovered de novo and annotated with the statistics that are relevant for small RNAs .. size, polarity, repetitiveness, phasing, and underlying hairpins. In addition, ShortStack can be run in 'count' mode to quantify and annotate a set of loci provided a priori by the user. If you have smallRNA-seq data (and a reference genome to align it to), give the beta ShortStack version a try (and give me your comments at firstname.lastname@example.org).
Congratulations to Chenggang Liu on publishing this study in Plant Physiology. In this work, the results of a hyl1 suppressor screen are reported: several novel, dominant DCL1 alleles with mutations either in the helicase domain or in the RNaseIIIa domain. These alleles rescue hyl1 phenotypes but do not fix the microRNA processing accuracy defects typical of the hyl1 null mutant. In vitro, these alleles cleave pri-miRNAs with enhanced kinetics. In addition, a requirement for the DCL1 helicase domain for accurate processing of select pri-miRNAs is found.
Version 2.1 of 'trim_SOLiD_sRNA_cs-fastq.pl' has been released on the 'Tools' page of the website. This version adds the option, at the user's discretion, to keep or remove the 3' "hybrid" color left after initial adapter trimming. In addition, the defaults are now set to output trimmed reads in cs-fastq format, with the 3' hyrbid colors removed. This enables accurate mapping in Colorspace by bowtie 0.12.7 PROVIDED the --col-keepends option is specified (see bowtie 0.12.7 manual). An in-depth discussion of the 3' hybrid color issue in adapter trimming of SOLiD small RNA reads can be found in the README for version 2.1.
On the question of whether it is better to retain trimmed SOLiD small RNA reads in color-space prior to mapping, or to directly translate into DNA first: If there is no reference genome to which your data will be mapped, you are forced to directly translate to make sense of your data. However, if you are mapping the trimmed small RNAs to a reference, my analysis indicates that keeping the reads in color-space will improve the number of mappable reads, especially if you are trying to rescue lower-quality reads by allowing mismatches to the reference. However, it is critical that you understand how your mapping software decodes color-space reads, and the the trimming settings used are compatible.
The 'Tools' page has been updated to include some tools for SOLiD small RNA sequencing -- a 3' adapter trimmer and a genetic formatting tool to combine the native SOLiD .csfasta and .qual files into a single .cs-fastq formatted file.
Besides the tools themselves, check out the details of adapter trimming in colorspace excerpted from one of the README files. Hopefully this is helpful.
Note that our approach to adapter trimming of color-space small RNA sequencing reads differs from that taken by Marco and Giffiths-Jones. In our approach, we directly identify the 3' adapter and trim it before attempting to map; Marco and Griffiths-Jones in contrast use a ".. sequential trimming and mapping approach". My sense is that the Marco/Griffths-Jones approach might result in imprecision in defining the 3' ends of the small RNAs -- since the exact size of small RNA matters a lot in many organisms, this could be a problem (although we really should directly compare the approaches -- to do at a later time) ... hence our approach of directly looking for the adapter.
The other issue with color-space small RNA data is the problematic nature of the first nucleotide .. both Marco and Giffiths-Jones and my document discuss the details of this issue. Essentially, we have been directly translating our trimmed reads into DNA-space before mapping to avoid the 1st nt issue. This causes a reduction in mappable reads, but guarantees a mapping result where both the biological 5' and 3' nucleotides of the original small RNA are accurately noted.
The SOLiD 3' trimming tool has options to output trimmed reads both in color-space and to translate the trimmed reads into DNA-space at the user's discretion.
Congratulations to Josh Puzey and his colleagues in the Kramer Lab at Harvard for their recent publication in PLoS ONE, in which the Axtell Lab contributed. A nice story on microRNA annotation in Poplar using much deeper sequencing datasets than had been previously used in this species.
A new page on our website has just been created -- 'Tools' -- which contains downloads of small pieces of software that may be of use in small RNA sequencing analysis (and perhaps in the future other applications as well). The first additions are a couple of simple tools for use in removing the 3' adapters from untrimmed Illumina-FASTQ small RNA sequencing files. Certainly not covering new ground here; there are several methods out there for this, and most labs have their own in-house methods. But, here is ours!
Since we've done a fair bit of small RNA sequencing using SOLiD, the next 'Tools' addition will likely be an adapter-trimming tool for SOLiD small RNA sequencing data. We have our own in-house program but it needs to be 'cleaned up' before being made available to the wider world .. - MJA
Just read a paper from Lei Li's lab at the University of Virgina which provides an analysis of the impact of alternative splicing on miRNA target sites in Arabidopsis. Very interesting data. Their analysis suggests that a little more than one in ten miRNA target sites in Arabidopsis are differentially present/absent in alternative splice forms. Even more curious, their analysis indicates that miRNA target sites are 'hotspots' for alternative splicing. There is a nice discussion of how this could be a useful mechanism to preserve a target site evolutionarily, but to eliminate miRNA regulation by alt. splicing in tissue or condition-specific situations. Of course, the caveat for most of these is that the alt. splicing is also affecting protein sequence, since most of these sites are in the ORFs .. so disentangling the significance of the alt. splicing events (to change protein function or miRNA regulation?) is a challenge.
Yang et al., 2012. The Plant Journal. Pubmed: 22247970 doi:10.1111/j.1365-313X.2011.04882.x
CleaveLand 3 has now been completed and is ready for use. It is substantially different than both CleaveLand 1 (Described by Addo-Quaye et al. (2009) in bioinformatics, as well as CleaveLand2. It runs much faster, no longer uses extensive random queries, and has a sound statistical footing to assess the likelihood of a hit occurring by random chance, taking into account both the quality of the small RNA / mRNA alignment and the noise in the degradome library being analyzed.
Comments / complaints / bugs / and especially praise welcomed. -- mike a
-- THIS WEBSITE HAS BEEN REPLACED ... please go to http://sites.psu.edu/axtell for the current Axtell Lab Website! --
Michael Axtell, lab PI