Introduction¶
Why teach Python?¶
- In this first session, we will introduce Python.
- This course is about programming for data analysis and visualisation in research.
- It's not mainly about Python.
- But we have to use some language.
Why Python?¶
- Python has a readable syntax) that makes it relatively quick to pick up.
- Python is popular in research, and has lots of libraries for science.
- Python interfaces well with faster languages.
- Python is free, so you'll never have a problem getting hold of it, wherever you go.
Why write programs for research?¶
- Not just labour saving.
- Scripted research can be tested and reproduced.
Sensible Input - Reasonable Output¶
Programs are a rigorous way of describing data analysis for other researchers, as well as for computers.
Computational research suffers from people assuming each other's data manipulation is correct. By sharing readable, reproducible and well-tested code, which makes all of the data processing steps used in an analysis explicit and checks that each of those steps behaves as expected, we enable other researchers to understand and assesss the validity of those analysis steps for themselves. In a research code context the problem is generally not so much garbage in, garbage out, but sensible input, reasonable output: 'black-box' analysis pipelines that given sensible looking data inputs produce reasonable appearing but incorrect analyses as outputs.
Many kinds of Python¶
Python notebooks¶
A particularly easy way to get started using Python, and one particularly suited to the sort of exploratory work common in a research context, is using Jupyter notebooks.
In a notebook, you can easily mix code with discussion and commentary, and display the results outputted by code alongside the code itself, including graphs and other data visualisations.
For example if we wish to plot a figure-eight curve (lemniscate), we can include the parameteric equations $x = \sin(2\theta) / 2, y = \cos(\theta), \theta \in [0, 2\pi)$ which mathematically define the curve as well as corresponding Python code to plot the curve and the output of that code all within the same notebook:
# Plot lemniscate curve
import numpy as np
import matplotlib.pyplot as plt
theta = np.linspace(0, 2 * np.pi, 100)
x = np.sin(2 * theta) / 2
y = np.cos(theta)
fig, ax = plt.subplots(figsize=(3, 6))
lines = ax.plot(x, y)
Notebook cells¶
Jupyter notebooks consist of sequence of cells. Cells can be of two main types:
- Markdown cells: Cells containing descriptive text and discussion with rich-text formatting via the Markdown text markup language.
- Code cells: Cells containing Python code, which is displayed with syntax highlighting. The results returned by the computation performed when running the cell are displayed below the cell as the cell output, with Jupyter having a rich display system allowing embedding a range of different outputs including for example static images, videos and interactive widgets.
The document you are currently reading is a Jupyter notebook, and this text you are reading is Markdown cell in the notebook. Below we see an example of a code cell.
print("This cell is a code cell")
Code cell inputs are numbered, with the cell output shown immediately below the input. Here the output is the text that we instruct the cell to print to the standard output stream. Cells will also display a representation of the value outputted by the last line in the cell, if any. For example
print("This text will be displayed\n")
"This is text will also be displayed\n"
There is a small difference in the formatting of the output here, with the print
function displaying
the text without quotation mark delimiters and with any escaped special characters (such as the
"\n"
newline character here) processed.
Markdown formatting¶
The Markdown language used in Markdown cells provides a simple way to add basic text formatting to the rendered output while aiming to be retain the readability of the original Markdown source. For example to achieve the following rendered output text
bold, italic, striketrough, monospace
- Bullet
Quote
We can use the following Markdown text
**bold**, *italic*, ~~striketrough~~, `monospace`
* Bullet
> Quote
[Link to search](https://duckduckgo.com/)
For more information see this tutorial in the official Jupyter documentation.
Editing and running cells in the notebook¶
When working with the notebook, you can either be editing the content of a cell (termed edit mode), or outside the cells, navigating around the notebook (termed command mode).
- When in edit mode in a cell, press esc to leave it and change to command mode.
- When navigating between cells in command mode, press enter to change in to edit mode in the selected cell.
- When in command mode:
- The currently selected cell will be shown by a blue highlightto the left of the cell.
- Use the arrow keys ▲ and ▼ to navigate up and down between cells.
- Press a to add a new cell above the currently selected cell.
- Press b to add a new cell below the currently selected cell.
- Press dd to delete the currently selected cell.
- Press m to change a code cell to a Markdown cell.
- Press y to change a Markdown cell to a code cell.
- Press shift+l to toggle displaying line numbers on the currently selected cell.
- Press shift+enter to run the code in a currently selected code cell and move to the next cell.
- Press ctrl+enter to run the code in a currently selected code cell and keep the current cell selected.
- Press ctrl+shift+c to access the command palette and search useful actions in the notebook.
- The currently selected cell will be shown by a
- When in edit mode:
- Press tab to suggest completions of variable names and object attribute. (Try it!)
- Press shift+tab when in the argument list of a function to display a pop-up showing documentation for the function.
Supplementary material: Learn more about Jupyter notebooks. Jupyter lab.
Python interpreters¶
An alternative to running Python code via a notebook interface is to run commands in a
Python interpreter (also known as an interactive shell or read-eval-print-loop (REPL)).
This is similar in concept to interacting with your operating system via a command-line interface
such as the bash
or zsh
shells in Linux and MacOS or Command Prompt
in Windows. A Python
interpreter provides a prompt into which we can type Python code corresponding to commands we
wish to execute; we then execute this code by hitting enter with any output from the
computation being displayed before returning to the prompt again.
Python libraries¶
A very common requirement in research (and all other!) programming is needing to reuse code in multiple different files. While it may seem that copying-and-pasting is an adequate solution to this problem, this should generally be avoided wherever possible and code which we wish to reuse factored out in to libraries which we we can import in to other files to access the functionality of this code.
Compared to copying and pasting code, writing and using libraries has the major advantage of meaning if we spot a bug in the code we only need to fix it once in the underlying library, and we straight away have the fixed code available everywhere the library is used rather than having to separately implement the fix in each file it is used. This similarly applies to for example adding new features to a piece of code. By creating libraries we also make it easier for other researchers to use our code.
While it is simple to use libraries within a notebook (and we have already seen examples of this when we
imported the Python libraries NumPy and Matplotlib in the figure-eight plot example above), it is non-trivial
to use code from one notebook in another without copying-and-pasting. To create Python libraries we therefore
generally write the code in to text files with a .py
extension which in Python terminology are called modules.
The code can in these file can then be used in notebooks (or other modules) using the Python import
statement.
For example the cell below creates a file draw_eight.py
in the same directory as this notebook containing Python
code defining a function (we will cover how to define and call functions later in the course) which creates a
figure-eight plot and return the figure object.
%%writefile draw_eight.py
# The above line tells the notebook to write the rest of the cell content to a file draw_eight.py
import numpy as np
import matplotlib.pyplot as plt
def make_figure():
theta = np.linspace(0, 2 * np.pi, 100)
fig, ax = plt.subplots(figsize=(3, 6))
ax.plot(np.sin(2 * theta) / 2, np.cos(theta))
return fig
Note, in a real example, we could edit the file on disk using a code editor (or IDE) rather than using %%writefile
We can use this code in the notebook by importing the draw_eight
module and then calling the
make_figure
function defined in the module.
# restart kernel on jupyterlite (browser) before running this
import draw_eight # Load the library
fig = draw_eight.make_figure()
Note, we can import our draw_eight
module in this notebook only if the file is in our current working directory (i.e. the folder this notebook is in).
To allow us to import our module from anywhere on our computer, or to allow other people to reuse it on their own computer, we can create a Python package. We will cover how to import and use functionality from libraries, how to install third-party libraries and how to write your own libraries that can be shared and used by other in this course.
Python scripts¶
While Jupyter notebooks are a great medium for learning how to use Python and for exploratory work, there are some drawbacks:
- The require Jupyter Lab (or a similar application) to be installed to run the notebook.
- It can be difficult to run notebooks non-interactively, for example when scheduling a job on a cluster such as those offered by UCL Research Computing.
- The flexibility of being able to run the code in cells in any order can also make it difficult to reason how outputs were produced and can lead to non-reproducible analyses.
In some settings it can therefore be preferrable to write Python scripts - that is files (typically with
a .py
extension) which contain Python code which completely describes a computational task to perform
and that can be run by passing the name of the script file to the python
program in a command-line
environment. Optionally scripts may also allow passing in arguments from the command-line to control
the execution of the script. As scripts are generally run from text-based terminals, non-text outputs such
as images will generally be saved to files on disk.
Python scripts are well suited to for example for describing computationally demanding simulations or analyses to run as long jobs on a remote server or cluster, or tasks where the input and output is mainly at the file level
- for instance batch processing a series of data files.
Python libraries/packages¶
A package is a collection of modules that can be installed on our computer and easily shared with others. We will learn how to create packages later on in this course.
There is a huge variety of available packages to do pretty much anything. For instance, try import antigravity
or import this
.
IDEs¶
IDEs are Interactive Development Environments and it's what we will be using in this course. We will be demonstrating it through VS Code but you could use whichever you like, e.g., spyder, pycharm, ...). We won't be using notebooks, except for these notes so you can download and experiment with them. However, we will be learning how to build libraries, and they need to be composed of python files rather than notebooks. When working with an IDE, you'll get access to a Python interpreter and you can run scripts directly from the interface as well as use tools like the debugger, test frameworks and git from within it.