Guide to scientific coding

Programming languages
When people with no common language need to communicate, they develop something called a pidgin—a sort of simplified combination of their native languages. At its core, a programming language is a pidgin: a way for people (who don’t speak machine code) to communicate with computers (which only speak machine code). I’ve found that this approach to programming really helps when things get frustrating. It can be very difficult to communicate complex ideas in a language with no native speakers.

There are really three ways to measure how “good” a programming language is: performance, support, and readability. Performance deals with how quickly the code runs and how much memory it uses. Support deals with how well the language is documented and maintained, as well as the scope of things it can do. Readability is a measure of how easy it is to understand what the code will do just by reading it. There are hundreds of languages that fall all over this spectrum, but one of the most common for scientific work is Python.

There are a few reasons why this is the case. First of all, Python was designed from square one with readability as the top priority. That means it’s easy to pick up, even if you’ve never coded before. For a scientist like you, it also means less time thinking about syntax and more time thinking about science! The second big reason to use Python is its ubiquity. MATLAB is great for math, and LabVIEW is great for device interfacing, but Python is great at both, and more! You can automate an entire experiment, check on it remotely through SSH, format the collected data into a compact file structure, then analyze that data in all sorts of ways, all without ever having to switch syntactic gears. Finally, Python (and nearly all of its add-on libraries) is completely free—free as in “free lunch,” but also free as in “free speech.” It belongs to a worldwide community of developers that is open to all.

That said, Python isn’t perfect. Its biggest drawbacks are in the performance department, speed in particular. But generally the hours gained in coding time are worth the seconds lost in run time.

How to code
Due to the large number of coding tutorials out there, it would be a waste of time for me to sit here and write my own. LearnPython.org is a great place to start if you’ve never programmed before. If you’re a little more experienced, the official Python tutorial is much more comprehensive. Even after you’ve mastered the basics, it’s probably a good idea to have the documentation for Python, NumPy, SciPy, and Matplotlib all bookmarked in your web browser, along with anything else you find yourself searching for a lot.

You can write code using any text editor, but there are certain apps (called integrated development environments or IDEs) that are designed to make it way easier. Python comes with its own basic IDE (called IDLE), but there are others out there as well, such as PyCharm, Spyder, Thonny, and many others. Each has its own benefits and drawbacks, so try a few before you settle on one.

Finally, physics majors are required to take CS 142, which covers the basics of computer programming in C++. Even though you probably won’t use C++ in the lab, it is really helpful to learn. Python hides a lot of things behind its simple language, and C++ is a bit more exposed.

Object-oriented and modular programming
Object-oriented programming (OOP) is a way of organizing code to give the impression that objects within the script (rather than the script itself) are carrying out the desired procedures. Of course, this is technically a lie, since those “objects” really just represent more code, but it’s a useful lie because it makes the front-end code more intuitively readable. A closely related principle is that of modular code structure. Whenever you, you’re using a module. Writing and importing your own modules is probably the single best way to keep your code organized, and it lets you

I was going to include this in the last section, but it didn’t feel right. I really can’t emphasize enough how important it is. No serious programming project should be done with a single file. At the very least, you should group your code into sections, turn those sections into functions, and import those functions into another file to run. If this seems like a waste of time, imagine taking over a project from another student who just graduated, and this is the code you’re given:

At first glance, what does this code actually do? Say you run it and it doesn’t give you the expected result, where would you start to look for the bug? The unfortunate truth is that you would have to understand nearly every single line of code in that script before you could do anything more than simply run it. Furthermore, what if you want to write another script that does some (but not all) of the same things as this one? Putting all the code in the same file makes reusing code a tedious process.

Now imagine instead that you get an organized set of a dozen-or-so Python files. The primary file is called something like &quot;main.py&quot; or &quot;run_experiment.py&quot; and looks like this:

Even with no context, the purpose of this code is obvious at a glance. If you get an unexpected problem, the code is organized conceptually, which makes locating bugs easier. The code is compartmentalized, so you can do meaningful work on certain parts of the code without understanding every detail of every other part. And the code is modular, which makes it easy to reuse and adapt code to tackle new problems.

Git
If and when you start writing more complicated and collaborative code, you’ll want to use Git to keep it organized. At first glance, Git might just look like an overcomplicated Google Drive, but it’s much more than that. It’s a powerful version control system designed entirely for programmers.

Git can look complicated and unapproachable if you’ve never used it before, but I promise you it is worth learning. Start by downloading a git client here. It’s free, and comes with its own command-line interface as well as a GUI. Do some Google research to learn the basic commands (clone, stage, commit, pull, push, branch, merge) and you’ll be good like 98% of the time. Also, many IDEs have built-in Git support, which makes managing code a lot easier.

GitHub has some really good learning resources for getting started.

Textbooks

 * Python for Science and Engineering (Halvorsen, 2019) (with accompanying video lectures)

Journal articles

 * Best practices for scientific computing (Wilson, 2014)
 * Good enough practices for scientific computing (Wilson, 2017)
 * Re-run, Repeat, Reproduce, Reuse, Replicate: Transforming Code into Scientific Contributions (Benureau, 2017)

Youtube

 * mCoding

Python tutorials

 * The official Python tutorial
 * Getting started with Python for science


 * LearnPython.org
 * TutorialsPoint.com