13 Local branching with Git.
14 Branching with local git
Branching (and merging) is a useful way to keep different tasks organized and synchronized, even when working locally with git. Below we will work through a scenario for git branching, showing the commands and two different vizualizations. Eventually branching becomes a key part of working with github and shared repositories, but it is useful to approach it separately first.
14.1 A scenario
Imagine you are running a coffee roaster. You have code that produces a daily report that is emailed to your team. Each day the point of sale system makes new data available. Each night your computer executes a script (report_script.Rmd
) to access the data, create, and email the report.
We are going to manage a situation where we want to: 1. experiment and break things 2. keep things running
It is very hard to do these things at the same time. Branching helps us to do this.
One option might be to maintain different versions of files using different file names (report_script.Rmd
and new_report_script.Rmd
. This becomes very messy when we have many files and folders involved (/images
and /new_images
???). This is even worse when we have multiple experiments and attempts going on in parallel and entirely hopeless when we move online and are working with groups of people.
Recall, though, that git enables us to “swap out” our working directory. Until now we’ve kept this linear, each new version added after the last. But branching enables us to keep parallel lines of commits (as though we had two lines of trays with on our paper plane repository table).
14.2 Coordinating new work while keeping old work running
Currently the report is pretty simple, it’s just a table of sales divided up between in-store and online. Each day’s sales are added as a new row in the table.
cd ~
mkdir scratch_branching
cd scratch_branching
git init
Initialized empty Git repository in scratch_branching/.git/
We can now add the reporting script and the data for monday. To simulate editing a file using an editor we can use some fancy syntax so you can copy and paste it.
cat >> reporting_script.Rmd <<< "# Initial code for table report"
git add reporting_script.Rmd
git commit -m "starting setup"
[master (root-commit) 24bcda4] starting setup
2 files changed, 0 insertions(+), 0 deletions(-)
create mode 100644 data_monday.csv
create mode 100644 reporting_script.Rmd
The scripts runs normally on Monday night.
One Tuesday morning you decide that the data would be better presented as a chart. You have some ideas but rightly decide it will take more than a day or two to get that working.
In the meantime you still have to produce the report. So you create a branch called towards_chart
and begin working there, leaving the master branch untouched to produce the Tuesday night report.
git branch towards_chart
git checkout towards_chart
Again we can simulate editing the file.
cat >> reporting_script.Rmd <<< "# Work towards charts"
git status
On branch towards_chart
Changes not staged for commit:
(use "git add <file>..." to update what will be committed)
(use "git restore <file>..." to discard changes in working directory)
modified: reporting_script.Rmd
no changes added to commit (use "git add" and/or "git commit -a")
git add reporting_script.Rmd
git commit -m "Worked towards charts"
[towards_chart acc3f42] Worked towards charts
1 file changed, 1 insertion(+)
On Tuesday you see that the report ran fine (using the code in master
) and you continue to work on the charts.
cat >> reporting_script.Rmd <<< "# More work towards charts"
git add reporting_script.Rmd
git commit -m "More work towards charts"
[towards_chart acc3f42] Worked towards charts
1 file changed, 1 insertion(+)
On Wednesday morning, though, you get an error from your reporting script. The online sales system updated and the data files now include a new column saying whether a sale was cash or credit. You know how to get things working again, but you still aren’t ready to launch your chart system. You don’t want to wait until your chart work is finished to get the report going again.
So you switch to master, edit the reporting file to handle the new column, then add and commit the change.
git checkout master
Switched to branch 'master'
Now we are back on the master
branch and we won’t see our work towards the charts at all. So over on the towards_chart
branch we can work away without upsetting the working code on master
.
Check the current content of the script (will show nothing about charts)
cat reporting_script.Rmd
Initial code for table report
So now we can, without involving our work towards the chart, make the bugfix on master
.
cat >> reporting_script.Rmd <<< "# Fix to match new data format"
git add reporting_script.Rmd
git commit -m "A fix to match new data format"
Now if we run our git viz command:
git log --oneline --abbrev-commit --all --graph --decorate --color
we will see our branching starting to show up:
jlh5498@educcomp04:~/scratch_branching$ git log --oneline --abbrev-commit --all --graph --decorate --color
* 48afd7c (HEAD -> master) A fix to match new data format
| * 5c7fc76 (towards_chart) More work towards charts
| * c72d745 Worked towards charts
|/
* c434df6 starting setup
Happily the report runs fine on Wednesday night.
Thursday morning you switch back to the towards_chart
branch and are pleased to get things working.
git checkout towards_chart
cat >> reporting_script.Rmd <<< "# Code to finalize the charts"
git add reporting_script.Rmd
git commit -m "Finished up the charts"
You are ready to add the chart into the report by moving the code to the master
branch.
But if you just merge the code back to master
you may find that the code for the chart doesn’t work with the change to handle the updated data files. So you may have some merging to do. But you don’t know if you can get that done before the report has to run, and you don’t want to get caught fiddling with master
because if the report tries to run you could end up with nothing going out that night.
So you first merge master
over to your towards_chart
branch, and ensure that things work well and the two pieces of work done in parallel work well together (charts and dealing with the new column).
First confirm which branch you are in, this command lists the branches and the one highlighted with the * is the current branch. git status
can also show you.
git branch -v
git branch -v
master 576bc90 A fix to match new data format
* towards_chart bef7b72 Finished up the charts
Then merge over the master
branch into the towards_chart
branch.
git merge master
Auto-merging reporting_script.Rmd
CONFLICT (content): Merge conflict in reporting_script.Rmd
Automatic merge failed; fix conflicts and then commit the result.
Ah, good thing we did this on the branch because we do end up with a conflict. Git can resolve some conflicts but not all. Git shows conflicts by adding special lines of text into the file (using >>>>>>
and <<<<<
as indicators. To resolve them we remove those lines and leve the file the way we want it to be.
# Initial code for table report
<<<<<<< HEAD
# Work towards charts
# More work towards charts
# Code to finalize the charts
=======
# Fix to match new data format
>>>>>>> master
The parts separated by ========
show edits that conflict. Looking at this we can see that we want all the lines, so we edit the file to show:
# Initial code for table report
# Fix to match new data format
# Work towards charts
# More work towards charts
# Code to finalize the charts
Now we can save that file. Git knows that we are fixing a conflict, so git status
shows:
On branch towards_chart
You have unmerged paths.
(fix conflicts and run "git commit")
(use "git merge --abort" to abort the merge)
Unmerged paths:
(use "git add <file>..." to mark resolution)
both modified: reporting_script.Rmd
no changes added to commit (use "git add" and/or "git commit -a")
And now we can add
and commit
to finish the merge.
git add reporting_script.Rmd
git commit -m "merged bug fix with charts code"
[towards_chart e420a05] merged bug fix with charts code
(Git can only flag syntactical conflicts, not semantic conflicts, so one would also try running some test reports to make sure things are all working).
So once you’ve resolved all issues, then you can merge the towards_chart
branch back to master
, signalling that you are ready to launch your new report with charts.
git checkout master
git merge towards_chart
Switched to branch 'master'
Updating 576bc90..e420a05
Fast-forward
reporting_script.Rmd | 3 +++
1 file changed, 3 insertions(+)
Finally we can visualize this branching, editing, and merging in two ways. First we can use this handy command to see a visualization in the command line.
git log --oneline --abbrev-commit --all --graph --decorate --color
Which will show us this (using an image here because the color doesn’t copy):
We can also see this visually using learngitbranching, see the short video below.
14.2.1 Exercises (in-class):
- Go to “Learn Git Branching” site: https://learngitbranching.js.org/. (Note that this site is multi-locale, see the small planet icon in the bottom right?)
- Do exercises 2 and 3 (“Branching in Git” and “Merging in Git”)
- Shift back to Rstudio on the server and replicate the “Merging in Git” lesson. You will need to start with these commands (to create a new, empty, repo):
cd ~
mkdir replicate_learngitbranching
cd replicate_learngitbranching
git init
Remember that when we are working in the terminal it is different from learngitbranching because we have to actually create, edit, and save files before we can commit. We also have to use git commit -m 'some message'
rather than just git commit
.