Sunday, January 25, 2015

Coding Responsibly Part I: Version Control

GitHub is a great version control resource. Source
As a result of the growing number of resources allowing everyone to learn how to code, as well as numerous other awesome educational efforts, programming is steadily growing in popularity and accessibility. In previous posts, I have offered some resources I found helpful for learning to code, and have even started offering some workshops for my colleagues here at the University of Pennsylvania (find slides here). But what is the next step after getting the basics down? The answer is to make sure you don't just code, but that you learn to code responsibly.

This will be the first post in a series I am devoting to responsible coding. This is such an important topic and it is too much to pack into a single, digestible post so I am going to spread it out. So without further interruption, please enjoy part one which is devoted to version control.

The Importance of Version Control

One of the most important practices in programming is version control. Version control is the process of tracking changes made to files (here we are talking about program code files) and archiving the old versions of documents in case they need to be referenced. This way a user can always see what changes have been made to a set of code, and can always go back to see what the code looked like at previous dates. 

As an example of what this means, most people eventually adopt the practice of saving their code, word documents, or whatever as new files each time they make changes. This looks like a folder full of 'my_code_version1.pl', 'my_code_version2.pl', 'my_code_version3.pl', and so on (see example below for how this gets confusing). This way the user can theoretically reference old versions of their file, in case they changed something that they later want to add back (what a pain to have to do though!). This is a very common symptom of when version control software is needed. When version control is implemented on a system like this, the user can simply keep a single file (i.e. my_code.pl) with the changes and versions archived for future reference in a separate location. It's a simple, clean, and easy to search option for tracking changes.

An example of poor version controlling. Source
And really how important is it to use version control when coding? As an anecdotal example, I once heard about a research scientist whose results relied heavily on some code that he had written up. He recorded the results, but as time passed he continued to improve and add to his original code without version controlling it. A couple of years later, he needed to replicate his original findings using his old code, and was unable to do so with the current version. Something important had changed over the years, and he needed to know what it was. If he had been version controlling his code, he could easily have looked it up from the time in question (a few years earlier), but instead he was left with no idea of what it looked like at that time. This unfortunately set his research back significantly, and could have been avoided with version control software. So the moral of the story here is to simply always version control your software. If you are just getting started, you should still practice version control because you should get into the habit for when you get more into programming.

Getting Started with Version Control

So now you are convinced that you should start version controlling your code, but you still need to know how to get started. Fortunately there are numerous good options out there for you. I would recommend getting started with GitHub, which is a free and easy to use version control platform online. GitHub is nice because it allows you to easily version control your software in a social networking environment that fosters communication, collaboration, and education. Using GitHub will allow you to keep your code version controlled, but will also allow you to share your code and learn from others. I have found this to be a great environment and I really can't recommend it enough.

To get started with GitHub, simply go to their website (click here), make an account, and follow their instructions from there. They do a good job of guiding you through the process and teaching you through tutorials, so go ahead and get started. Additionally, if you are a student, you can get even more free benefits from GitHub. Check out this blog post for more information. And finally, check out the paper below for some further reading about responsible coding in research.



ResearchBlogging.org


Works Cited & Further Reading




Perkel, J. (2011). Coding your way out of a problem Nature Methods, 8 (7), 541-543 DOI: 10.1038/nmeth.1631




No comments:

Post a Comment