Packaging¶
Once we've made a working program, we'd like to be able to share it with others.
A good cross-platform build tool is the most important thing: you can always have collaborators build from source.
Along with the content below, the Python ecosystem has a few fantastic guides on both simple and complex packaging. Some of the recommended ones are listed below -
Besides the guides, there are numerous "cookie" templates available for developers. These templates allow developers to generate an empty Python package following good practices and guidelines with a simple CLI command.
- Scientific Python cookie
- Open Science Labs' scicookie
- Domain/ecosystem specific cookies like pybamm-cookiecutter for battery modeling projects in Python exist too
Finally, the packaging community regularly organises PackagingCon to discuss the packaging ecosystems of multiple languages and operating systems at a single place.
Distribution tools¶
Distribution tools allow one to obtain a working copy of someone else's package. The package managers are usually CLI utilities that allow you to query inside a repository of existing packages.
Language specific package/library managers:
Platform specific package/library managers e.g.:
Every language has a repository or a central database of packages submitted by the developers.
- Language-specific repositories:
The difference between the package management tools and the package repositories is similar to the difference between Git and GitHub.
Laying out a project¶
When planning to package a project for distribution, defining a suitable project layout is essential. A typical scientific python compliant layout might look like this:
repository_name
|-- src
| `-- package_name
| |-- __init__.py # optional; required for exporting things under package's namespace
| |-- python_file.py
| |-- another_python_file.py
`-- tests
|-- fixtures
| `-- fixture_file.yaml
`-- test_python_file.py
|-- LICENSE.md
|-- CITATION.md
|-- README.md
`-- pyproject.toml
To achieve this for our greetings.py
file from the previous session, we can use the commands shown below. We can start by making our directory structure. You can create many nested directories at once using the -p
switch on mkdir
.
%%bash
mkdir -p greetings_repo/src/greetings
mkdir -p greetings_repo/tests/fixtures
For this notebook, since we are going to be modifying the files bit by bit, we are going to use the autoreload ipython magic so that we don't need to restart the kernel.
%load_ext autoreload
%autoreload 2
Using pyproject.toml¶
Since June 2020, python's recommendation for creating a package is to specify package information in a pyproject.toml
file.
Older projects used a setup.py
or setup.cfg
file instead - and in fact the new pyproject.toml
file in many ways mirrors this old format.
A lot of projects and packages have not yet switched over from setup.py
to pyproject.toml
, so don't be surprised to see a mixture of the two formats when you're looking at other people's packages.
For our greetings
package, right now we are adding only the name of the package and its version number.
This information is included in the project
section of our pyproject.toml
file.
But we also need to tell users how to build the package from these specifications.
This information is specified in the build-system
section of our toml
file. Python packages are shipped on PyPI as "wheel" files,
which are installable via pip
and are generated by build backends. Wheels can be different for different platforms
and different Python versions, and pip
only resorts to installing a library through its source distribution (often referred as SDist
)
if it fails to find a compatible wheel file on PyPI. Python ecosystem houses a number of build backends, each of them spacializing in a different task.
Some common Python build backends -
setuptools
: allows building pure Python and Python + C/C++ projectshatch
: allows building (and recommended for) pure Python projectsflit
allows building pure Python projects with minimal extra configurationspoetry
allows building pure Python projects (a full blown dependency and environment management system)scikit-build-core
allows building pure Python and Python + C/C++ projects (under active development)meson
allows building pure Python and Python + C/C++ projects (has a custom DSL)maturin
: allows building Rust binary extensions
In this case, we'll be using hatch
to build our package, so we list it in the requires
field. Technically speaking, hatch
is the front-end (a CLI utility)
for the actual build-backend hatchling
. hatchling
is installed with hatch and can be specified as the build-backend
in pyproject.toml
.
Finally, we can set specific options for hatch
using additional sections in pyproject.toml
: in this case, we will tell hatch
that it needs to find and include all of the files in our src/greetings
folder.
We could have skipped adding the directory manually if our package had a __init__.py
file (more on this below).
The best way to look at all the options of a build-backend is by going through its documentation.
%%writefile greetings_repo/pyproject.toml
[build-system]
requires = ["hatchling"]
build-backend = "hatchling.build"
[project]
name = "Greetings"
version = "0.1.0"
[tool.hatch.build.targets.wheel]
packages = [
"src/greetings",
]
Some of the build-backends allow users to automate the package's version using VCS.
For instance, you might want to look into hatch-vcs
to enable VCS versioning with hatch
.
We can now install this "package" with pip (make sure hatch
is installed):
%%bash
cd greetings_repo
pip install .
And the package will be then available to use everywhere on the system. But so far this package doesn't contain anything and there's nothing we can run! We need to add some files first.
To create a regular package, we needed to have __init__.py
files on each subdirectory that we want to be able to import. This is, since version 3.3 and the introduction of Implicit Namespaces Packages, not needed anymore.
The __init__.py
files can contain any initialisation code you want to run when the (sub)module is imported.
For this example, we don't need to create the __init__.py
files.
And we can copy the greet
function from the previous section in the greeter.py
file.
%%writefile greetings_repo/src/greetings/greeter.py
def greet(personal, family, title="", polite=False):
greeting = "How do you do, " if polite else "Hey, "
if title:
greeting += f"{title} "
greeting += f"{personal} {family}."
return greeting
For the changes to take effect, we need to reinstall the library:
%%bash
cd greetings_repo
pip install .
And now we are able to import it and use it:
from greetings.greeter import greet
greet("Terry", "Gilliam")
Convert the script to a module¶
Of course, there's more to do when taking code from a quick script and turning it into a proper module:
We need to add docstrings to our functions, so people can know how to use them.
%%writefile greetings_repo/src/greetings/greeter.py
def greet(personal, family, title="", polite=False):
""" Generate a greeting string for a person.
Parameters
----------
personal: str
A given name, such as Will or Jean-Luc
family: str
A family name, such as Riker or Picard
title: str
An optional title, such as Captain or Reverend
polite: bool
True for a formal greeting, False for informal.
Returns
-------
string
An appropriate greeting
Examples
--------
>>> from greetings.greeter import greet
>>> greet("Terry", "Jones")
'Hey, Terry Jones.
"""
greeting = "How do you do, " if polite else "Hey, "
if title:
greeting += f"{title} "
greeting += f"{personal} {family}."
return greeting
We can see the documentation using help
.
help(greet)
The documentation string explains how to use the function; don't worry about this for now, we'll consider this on the next section (notebook version).
Write an executable script¶
We can create an executable script, command.py
that uses our greeting functionality and the process
function we created in the previous section.
Note how we are importing greet
using relative imports, where .greeter
means to look for a greeter
module within the same directory.
%%writefile greetings_repo/src/greetings/command.py
from argparse import ArgumentParser
from .greeter import greet
def process():
parser = ArgumentParser(description="Generate appropriate greetings")
parser.add_argument('--title', '-t')
parser.add_argument('--polite', '-p', action="store_true")
parser.add_argument('personal')
parser.add_argument('family')
arguments = parser.parse_args()
print(
greet(
arguments.personal,
arguments.family,
arguments.title,
arguments.polite
)
)
if __name__ == "__main__":
process()
Specify entry point¶
This allows us to create a command to execute part of our library. In this case when we execute greet
on the terminal, we will be calling the process
function under greetings/command.py
.
We can encode this into our package information by specifying the project.scripts
field in our pyproject.toml
file.
%%writefile greetings_repo/pyproject.toml
[build-system]
requires = ["hatchling"]
build-backend = "hatchling.build"
[project]
name = "Greetings"
version = "0.1.0"
[project.scripts]
greet = "greetings.command:process"
[tool.hatch.build.targets.wheel]
packages = [
"src/greetings",
]
%%bash
cd greetings_repo
pip install .
And the scripts are now available as command line commands, so the following commands can now be run:
%%bash
greet --help
%%bash
greet Terry Gilliam
greet --polite Terry Gilliam
greet Terry Gilliam --title Cartoonist
Specify dependencies¶
Let's give some life to our output using ascii art
%%writefile greetings_repo/src/greetings/command.py
from argparse import ArgumentParser
from art import art
from .greeter import greet
def process():
parser = ArgumentParser(description="Generate appropriate greetings")
parser.add_argument('--title', '-t')
parser.add_argument('--polite', '-p', action="store_true")
parser.add_argument('personal')
parser.add_argument('family')
arguments = parser.parse_args()
message = greet(arguments.personal, arguments.family,
arguments.title, arguments.polite)
print(art("cute face"), message)
if __name__ == "__main__":
process()
We use the dependencies
field of the project
section in our pyproject.toml
file to specify the packages we depend on.
We provide the names of the packages as a list of strings.
%%writefile greetings_repo/pyproject.toml
[build-system]
requires = ["hatchling"]
build-backend = "hatchling.build"
[project]
name = "Greetings"
version = "0.1.0"
dependencies = [
"art",
]
[project.scripts]
greet = "greetings.command:process"
[tool.hatch.build.targets.wheel]
packages = [
"src/greetings",
]
When installing the package now, pip will also install the dependencies automatically.
%%bash
cd greetings_repo
pip install .
%%bash
greet Terry Gilliam
Installing from GitHub¶
We could now submit "greeter" to PyPI, so everyone could pip install
it.
However, when using git, we don't even need to do that: we can install directly from any git URL:
pip install git+git://github.com/UCL-ARC-RSEing-with-Python/greeter
$ greet Lancelot the-Brave --title Sir
Hey, Sir Lancelot the-Brave.
There are a few additional text files that are important to add to a package: a readme file, a licence file and a citation file.
Write a readme file¶
The readme file might look like this:
%%writefile greetings_repo/README.md
# Greetings!
This is a very simple example package used as part of the UCL
[Research Software Engineering with Python](development.rc.ucl.ac.uk/training/engineering) course.
## Installation
```bash
pip install git+git://github.com/UCL-ARC-RSEing-with-Python/greeter
```
## Usage
Invoke the tool with `greet <FirstName> <Secondname>` or use it on your own library:
```python
from greeting import greeter
greeter.greet(user.name, user.lastname)
```
Write a license file¶
We will discus more about licensing in a later section. For now let's assume we want to release this package into the public domain:
%%writefile greetings_repo/LICENSE.md
(C) University College London 2014
This "greetings" example package is granted into the public domain.
Write a citation file¶
A citation file will inform our users how we would like to be cited when refering to our software:
%%writefile greetings_repo/CITATION.md
If you wish to refer to this course, please cite the URL
http://github-pages.ucl.ac.uk/rsd-engineeringcourse/
Portions of the material are taken from [Software Carpentry](http://software-carpentry.org/)
You may well want to formalise this using the codemeta.json standard or the citation file format.
Write some unit tests¶
We can now write some tests to our library.
Separating the script from the logical module made this possible.
%%writefile greetings_repo/tests/test_greeter.py
import os
import yaml
from greetings.greeter import greet
def test_greet():
with open(
os.path.join(
os.path.dirname(__file__),
'fixtures',
'samples.yaml'
)
) as fixtures_file:
fixtures = yaml.safe_load(fixtures_file)
for fixture in fixtures:
answer = fixture.pop('answer')
assert greet(**fixture) == answer
Add a fixtures file:
%%writefile greetings_repo/tests/fixtures/samples.yaml
- personal: Eric
family: Idle
answer: "Hey, Eric Idle."
- personal: Graham
family: Chapman
polite: True
answer: "How do you do, Graahm Chapman."
- personal: Michael
family: Palin
title: CBE
answer: "Hey, CBE Mike Palin."
We can now run pytest
%%bash --no-raise-error
cd greetings_repo
pytest
However, this hasn't told us that also the third test is wrong too! A better aproach is to parametrize the testfile greetings_repo/greetings/test/test_greeter.py
as follows:
%%writefile greetings_repo/tests/test_greeter.py
import os
import pytest
import yaml
from greetings.greeter import greet
def read_fixture():
with open(
os.path.join(
os.path.dirname(__file__),
'fixtures',
'samples.yaml'
)
) as fixtures_file:
fixtures = yaml.safe_load(fixtures_file)
return fixtures
@pytest.mark.parametrize("fixture", read_fixture())
def test_greeter(fixture):
answer = fixture.pop('answer')
assert greet(**fixture) == answer
Now when we run pytest
, we get a failure per element in our fixture and we know all that fails.
%%bash --no-raise-error
cd greetings_repo
pytest
We can also make pytest to check whether the docstrings are correct by adding the --doctest-modules
flag. We run pytest --doctest-modules
and obtain the following output:
%%bash --no-raise-error
cd greetings_repo
pytest --doctest-modules
Finally, we typically don't want to include the tests when we distribute our software for our users. We can also add pytest as an "optional" dependency for the developers of our package.
Additionally, we can make sure that our README and LICENSE are included in our package metadata by declaring them in the readme
and license
fields under the project
section.
If you're using a particularly common or standard license, you can even provide the name of the license, rather than the file, and your package builder will take care of the rest!
%%writefile greetings_repo/pyproject.toml
[build-system]
requires = ["hatchling"]
build-backend = "hatchling.build"
[project]
name = "Greetings"
version = "0.1.0"
readme = { file = "README.md", content-type = "text/markdown" }
license-files = { paths = ["LICENSE.md"] }
dependencies = [
"art",
"pyyaml",
]
[project.scripts]
greet = "greetings.command:process"
[project.optional-dependencies]
dev = ["pytest >= 6"]
[tool.hatch.build.targets.wheel]
packages = [
"src/greetings",
]
Developer Install¶
If you modify your source files, you would now find it appeared as if the program doesn't change.
That's because pip install copies the files.
If you want to install a package, but keep working on it, you can do:
pip install --editable .
or, its shorter version:
pip install -e .
with installing the dev
dependencies:
pip install -e ".[dev]"
Distributing compiled code¶
If you're working in C++ or Fortran, there is no language specific repository. You'll need to write platform installers for as many platforms as you want to support.
Typically:
dpkg
forapt-get
on Ubuntu and Debianrpm
foryum
/dnf
on Redhat and Fedorahomebrew
on OSX (Possiblymacports
as well)- An executable
msi
installer for Windows.
Homebrew¶
Homebrew: A ruby DSL, you host off your own webpage
See an installer for the cppcourse example
If you're on OSX, do:
brew tap jamespjh/homebrew-reactor
brew install reactor