Python: Coding standards and reproducible data science

In today's self study day, you will familiarise yourself with two concepts related to programming, but not writing code yourself! The first concept is related to good coding practices, and the second ensures that the program you write can be easily executed by others.

Today's learning objectives

  • Explore PEP 8 - Python's in-house coding standard.
  • Understand why reproducible data science matters and investigate methods to ensure reproduciblity.

Coding Standards: PEP 8 Python Style guide

The Python Enhancement Proposal 8 or PEP 8 is a styling guide for Python code. Following a style guide such as the PEP 8 improves the readability and overall understanding of Python code. However, like any style guide, PEP 8 is not always applicable in every circumstance. The key is the use the style guide consistently when you can (and that comes with more coding experience). Please read the style guide in full (see here) and apply it your creative briefs from henceforth.

Reproducible data science

A key feature of a reproducible piece of code is that any person, at any given day, in any given computing machine should be able to run your analysis and arrive at the same conclusions as you have. In practice, this is easier said than done. When you share a script, you can't necessarily guarantee that the person receiving the script has all the same environmental components that you do – the same version of Python and the same version of libraries you use. This can result in the outcomes of your documented and scripted process turning out differently on a different machine.


Please watch the following video to understand more about why reproduciblity matters!

Assignment

  1. Please read the style guide in full (see here) and apply it your creative briefs from henceforth.
  2. Summarize the video on reproducible data science (500 words) and list 5 methods or good practices which you plan to pursue to ensure that your data science projects are reproducible.