Thursday, September 18, 2014

Microbiome Analysis: Average Sequence Lengths and Looping with xargs

Sometimes I want to easily calculate the average sequence length of our collection of sequences in either a fasta or fastq file.  To address this, I wrote up a couple of small perl scripts to quickly calculate the median sequence length of a fasta or fastq file.  You can find these perl scripts on GitHub in my "Microbiome_sequence_analysis_toolkit" repository.  The nice thing about this script is that it returns the median and file name to the standard output, which makes it easier to loop across many files and collect the results into a single summary file.