Comments on Prophage: Microbial Biomarker Discovery and How to Properly Format Your Data (Lefse)

For disease research, biomarker generally refers t...

2016-01-20T04:29:11.201-05:00

For disease research, biomarker generally refers to a biochemical indicator for objective measurement and evaluation of certain characteristics of general physiological or pathological or therapeutic procedure. The biological process of the body can be measured through biomarker testing.
It is of great importance for the research of biomarkers.

Just wanted to update the comment thread for other...

2015-03-22T15:05:43.431-04:00

Just wanted to update the comment thread for other readers. We were able to fix the issues, and the script is updated on GitHub to reflect those fixes. Everything is working now better than ever. :)

Hi Nastassia, No problem, glad to help! :) My fir...

2015-03-19T15:43:40.033-04:00

Hi Nastassia,

No problem, glad to help! :) My first question is whether you included an output file as well, which needs a -o flag. Sorry about this not being in the README, but it will show up if you type:

python transform_data_for_lefse.py -h

My first guess from this error would be that the sample ID column names are different in the mapping file and the relative abundance file. These columns should have the same name.

Would you mind sending me an example subset of your two input files (like the first five rows of the first five columns of each file) so that I can try trouble shooting it on my end? You can email them to me if you would like at ghanni@upenn.edu.

Thanks for your patience! :)

Hi Greg, Thanks for the quick response! I tried t...

2015-03-19T15:11:45.769-04:00

Hi Greg,

Thanks for the quick response! I tried the following command, where the input file is a tab-delimited .txt output file from the summarize_taxa command and the mapping file has been edited to include only SampleID and Metadata columns:

python $PATH/transform_data_for_lefse.py -i otu_table_mc2_w_tax_L2.txt -m BZA3_map_for_Lefse.txt -o lefse/table_for_lefse.txt

When I tried this, I got the following error:
Traceback (most recent call last):
File "$PATH/transform_data_for_lefse.py", line 49, in
map = f1_data[key]
KeyError: 'SampleID'

Maybe it doesn't know how to correlate the columns in the mapping file with the samples in the tab-delimited file? Not really sure. I'm afraid I don't really know how to code so I can't go through your script manually.

Thanks,
Nastassia

Hi Nastassia, Thanks for reading the blog and rea...

2015-03-19T11:12:04.378-04:00

Hi Nastassia,

Thanks for reading the blog and reaching out. It looks like the problem is some typos in the README. You are right that this should be relative abundance and not alpha diversity.

BIOM format cannot be used with this script. The input should be a tab delimited relative abundance table from summarize_taxa. Having the sampleIDs as the first row and the taxa as the first column should be used, so I will have to fix the README.

Have you tried running the .txt tab delimited file from summarize_taxa, as you mentioned, with taxa in the first column, and the sample IDs on top? Let me know if that works.

Thanks again for letting me know about these issues. I will fix the README, and will also include an example input file to make this clearer.

Hi Geoffrey, Thank you for this super helpful pos...

2015-03-17T18:05:11.994-04:00

Hi Geoffrey,

Thank you for this super helpful post, and for the code! I was wondering if you could clarify what QIIME output table this script uses as input. This post and the readme file for your script both say to use an 'alpha diversity' table, but these tables generally don't have relative abundance of OTUs per sample. I think maybe you mean an OTU table created by the pick_otus command, which have relative abundance of OTUs per sample. These are in BIOM format and I'm not sure if your script is designed for that. The other option is one of the .txt files created by the summarize_taxa command, but these have sample ids in the top row and taxa identifiers in the first column, which contradicts the instructions in your readme file.

Any clarification would be very helpful! I posted this here as a comment in case anyone else has similar questions.

Thanks,
Nastassia