XClose

COMP0233: Research Software Engineering With Python

Home
Menu

Publishing

NOTE: using bash/git commands is not fully supported on jupyterlite yet (due to single thread/process restriction), and the cells below might error out on the browser (jupyterlite) version of this notebook

We're still in our working directory:

In [1]:
import os
top_dir = os.getcwd()
git_dir = os.path.join(top_dir, 'learning_git')
working_dir = os.path.join(git_dir, 'git_example')
os.chdir(working_dir)
working_dir
Out[1]:
'/home/runner/work/rsd-engineeringcourse/rsd-engineeringcourse/ch00git/learning_git/git_example'

Sharing your work

So far, all our work has been on our own computer. But a big part of the point of version control is keeping your work safe, on remote servers. Another part is making it easy to share your work with the world In this example, we'll be using the "GitHub" cloud repository to store and publish our work.

If you have not done so already, you should create an account on GitHub: go to GitHub's website, fill in a username and password, and click on "sign up for GitHub".

Creating a repository

Ok, let's create a repository to store our work. Hit "new repository" on the right of the github home screen.

Fill in a short name, and a description. Choose a "public" repository. Don't choose to initialize the repository with a README. That will create a repository with content and we only want a placeholder where to upload what we've created locally.

Paying for GitHub

For this course, you should use public repositories in your personal account for your example work: it's good to share! GitHub is free for open source, but in general, charges a fee if you want to keep your work private.

In the future, you might want to keep your work on GitHub private.

Students can get free private repositories on GitHub, by going to GitHub Education and filling in a form (look for the Student Developer Pack).

UCL pays for private GitHub repositories for UCL research groups: you can find the service details on the Advanced Research Computing Centre's website.

Adding a new remote to your repository

Instructions will appear, once you've created the repository, as to how to add this new "remote" server to your repository, in the lower box on the screen. Mine say:

In [2]:
%%bash
git remote add origin git@github.com:UCL/github-example.git
In [3]:
%%bash
git push -uf origin main # You shouldn't need the extra `f` switch. We use it here to force the push and rewrite that repository.
      #You should copy the instructions from YOUR repository.
To github.com:UCL/github-example.git
 + b632828...bb375a0 main -> main (forced update)
branch 'main' set up to track 'origin/main'.

Remotes

The first command sets up the server as a new remote, called origin.

Git, unlike some earlier version control systems is a "distributed" version control system, which means you can work with multiple remote servers.

Usually, commands that work with remotes allow you to specify the remote to use, but assume the origin remote if you don't.

Here, git push will push your whole history onto the server, and now you'll be able to see it on the internet! Refresh your web browser where the instructions were, and you'll see your repository!

Let's add these commands to our diagram:

In [4]:
message="""
Working Directory -> Staging Area : git add
Staging Area -> Local Repository : git commit
Working Directory -> Local Repository : git commit -a
Staging Area -> Working Directory : git checkout
Local Repository -> Staging Area : git reset
Local Repository -> Working Directory: git reset --hard
Local Repository -> Remote Repository : git push
"""
from wsd import wsd
%matplotlib inline
wsd(message)
Out[4]:
No description has been provided for this image

Playing with GitHub

Take a few moments to click around and work your way through the GitHub interface. Try clicking on 'index.md' to see the content of the file: notice how the markdown renders prettily.

Click on "commits" near the top of the screen, to see all the changes you've made. Click on the commit number next to the right of a change, to see what changes it includes: removals are shown in red, and additions in green.

Working with multiple files

Some new content

So far, we've only worked with one file. Let's add another:

nano lakeland.md
In [5]:
%%writefile lakeland.md
Lakeland  
========   
  
Cumbria has some pretty hills, and lakes too.  
Writing lakeland.md
In [6]:
cat lakeland.md
Lakeland  
========   
  
Cumbria has some pretty hills, and lakes too.  

Git will not by default commit your new file

In [7]:
%%bash --no-raise-error
git commit -m "Try to add Lakeland"
On branch main
Your branch is up to date with 'origin/main'.

Untracked files:
  (use "git add <file>..." to include in what will be committed)
	__pycache__/
	lakeland.md
	wsd.py

nothing added to commit but untracked files present (use "git add" to track)

This didn't do anything, because we've not told git to track the new file yet.

Tell git about the new file

In [8]:
%%bash
git add lakeland.md
git commit -m "Add lakeland"
[main 6b8a950] Add lakeland
 1 file changed, 4 insertions(+)
 create mode 100644 lakeland.md

Ok, now we have added the change about Cumbria to the file. Let's publish it to the origin repository.

In [9]:
%%bash
git push
To github.com:UCL/github-example.git
   bb375a0..6b8a950  main -> main

Visit GitHub, and notice this change is on your repository on the server. We could have said git push origin to specify the remote to use, but origin is the default.

Changing two files at once

What if we change both files?

In [10]:
%%writefile lakeland.md
Lakeland  
========   
  
Cumbria has some pretty hills, and lakes too

Mountains:
* Helvellyn
Overwriting lakeland.md
In [11]:
%%writefile index.md
Mountains and Lakes in the UK   
===================   
Engerland is not very mountainous.
But has some tall hills, and maybe a
mountain or two depending on your definition.
Overwriting index.md
In [12]:
%%bash
git status
On branch main
Your branch is up to date with 'origin/main'.

Changes not staged for commit:
  (use "git add <file>..." to update what will be committed)
  (use "git restore <file>..." to discard changes in working directory)
	modified:   index.md
	modified:   lakeland.md

Untracked files:
  (use "git add <file>..." to include in what will be committed)
	__pycache__/
	wsd.py

no changes added to commit (use "git add" and/or "git commit -a")

These changes should really be separate commits. We can do this with careful use of git add, to stage first one commit, then the other.

In [13]:
%%bash
git add index.md
git commit -m "Include lakes in the scope"
[main 90eb38f] Include lakes in the scope
 1 file changed, 4 insertions(+), 5 deletions(-)

Because we "staged" only index.md, the changes to lakeland.md were not included in that commit.

In [14]:
%%bash
git add lakeland.md
git commit -m "Add Helvellyn"
[main def1290] Add Helvellyn
 1 file changed, 4 insertions(+), 1 deletion(-)
In [15]:
%%bash
git log --oneline
def1290 Add Helvellyn
90eb38f Include lakes in the scope
6b8a950 Add lakeland
bb375a0 Add a lie about a mountain
aae0ecc First commit of discourse on UK topography
In [16]:
%%bash
git push
To github.com:UCL/github-example.git
   6b8a950..def1290  main -> main
In [17]:
message="""
participant "Cleese's remote" as M
participant "Cleese's repo" as R
participant "Cleese's index" as I
participant Cleese as C

note right of C: nano index.md
note right of C: nano lakeland.md

note right of C: git add index.md
C->I: Add *only* the changes to index.md to the staging area

note right of C: git commit -m "Include lakes"
I->R: Make a commit from currently staged changes: index.md only

note right of C: git add lakeland.md
note right of C: git commit -m "Add Helvellyn"
C->I: Stage *all remaining* changes, (lakeland.md)
I->R: Make a commit from currently staged changes

note right of C: git push
R->M: Transfer commits to Github
"""
wsd(message)
Out[17]:
No description has been provided for this image