Julia is a new language that first appeared in 2012 and has been gaining attention ever since. The creators have focused on creating an efficient and fast language that is also relatively easy to use. Because people are talking more about it each day, and because I think it shows exceptional promise, I wanted to try it out for myself.
The BenchmarkingI was a little bummed when I saw their homepage benchmarking failed to include Perl, my goto language for a lot of the data munging associated with bioinformatics. Perl is also lightening fast for a scripting language, which makes it handy. I decided I would familiarize myself with the Julia language by setting up some basic benchmarking.
To get a feel for Julia's speed, I decided to recreate a Perl script that I use to calculate the median length of sequences in a fasta file. I downloaded Julia from the Julia website, installed it on my computer, and rewrote the Perl script in Julia. In total this took me about 1-1.5 hours, which highlights the ease of writing in Julia. It really took no time at all before I was writing a decent Julia script. I had never used the language before, but it is familiar to any Python or R user.
Once I had the two scripts, I ran them on the same example fasta file and compared the execution time required for both. I got the following results.
|Comparison of Perl and Julia speeds for calculating the median sequence lengths in|
an increasingly larger fasta file. Code is found here.
So the Perl script clearly ran faster than the Julia script, and both increased in time at about the same rate as I added sequences. So what can we say from these results? I would conclude that although Julia is fast, it still can't beat Perl for parsing data and making quick calculations. Of course this comes with the caveat that I have very little experience writing in Julia and could have written it poorly (I did try to make it efficient to give it a fair chance though). I also only tested the two on relatively small files, and the results may be different for very large files. Regardless, I still think this is informative.
Check out the associated data and code on the JuliaPerlBenchmark GitHub page.
- After spending some time with the Julia language, I really liked the familiarity of the syntax and data structures. Anybody with exposure to Python, R, or any similar high-level scripting/programming language will easily pickup Julia in about an hour or two.
- I like that Julia seems to be a bit of a hybrid between R and Python. It seems like it could be really good for bioinformatics by allowing easy data formatting, analysis, and presentation in one cohesive and fast language environment.
- Although it was a little slower than Perl for parsing sequencing data files, Julia is still a fast language and I think this will draw more and more bioinformaticians to use it.
- Finally, Julia allows for easy integration with C, which I think will help with future development.
|Benchmarking results provided on the Julia homepage.|
- Although I like Julia, there are certainly some problems that will prevent me from switching over right now. The biggest issue is that it simply does not have the support and infrastructure that a language like Python or R has. Julia is still up-and-comming and the community is not at the same level as the R, Python, or Perl communities. I expect it will pickup in the coming years, but for now it just makes sense (for me) to work in the more developed communities of R, Python, and Perl.
- Although Julia is fast, it still can't beat my simple and fast Perl scripting. Until it beats Perl performance in data formatting and management, I honestly won't have a strong incentive to make the move over to Julia heavy scripting.
Final ThoughtsJulia is a promising and exciting new programming language that I think we will hear more about in the next few years. The community is small and there is less support compared to Python and R, but that could (and probably will) change over time. The general feeling I got for Julia was that it was a combination of Python and R that offered me the best of each in one language. That, in addition to the speed advantages over R and Python, could allow Julia to replace Python and R as major programming languages in the near future. I really do think it is reasonable to expect Julia to be the bioinformatics language-of-choice in the next ten to fifteen years. Ultimately though only time will tell.
Any thoughts, comments, or concerns? Any bugs in my code or errors in my interpretations? Let me know in the comments below. You are also always welcome to reach out on Twitter or by email. I always love to hear from Prophage readers.
UpdateI have been getting incredible feedback on this blog post and I wanted to update the readers with what I have learned, and how the data has improved. Thanks to the readers in the comments below, as well as on the GitHub repository, we have addressed two issues with the benchmark.
- The script I wrote needed to be written more efficiently. Ismael rewrote the script to run more efficiently, and also provided a solid explanation of what they did.
- As you can see in the comments, the problem with this test is that Julia is taking time to start and compile the code. The time required to get started is considerably greater for Julia, which is the biggest reason for why Perl appears to perform better. Given this information, you might predict that Julia could outperform Perl on larger file sizes where the startup time become negligible. I quickly bolstered the size of my file to about 500MB (from 30MB) and reran the benchmark. Wouldn't you know it, Julia begins to outperform Perl at larger file sizes, which is awesome. The updated results are below.
|Updated comparison of Perl and Julia speeds for calculating the median sequence lengths in|
an increasingly larger fasta file. Larger file than figure above. Code is found here.
So what what can we take away from this? It turns out that while Julia startup takes longer, it is blazing fast and actually outperforms Perl when using larger but reasonable files. With this new and more correct knowledge, I am happy to say that I am even more excited about Julia and think that it has a place in bioinformatics. Speed for me is a big thing, so I can see incorporating this into my own work.
I finally want to thank all of the readers who contributed to this blog post. I love that people were able to help make this little piece of data accurate and fair, and I feel like we all benefitted from the improved results. Thank you so much and please feel free to continue commenting.