16  Split a Pull Request

Hey, thanks! Some of this looks great, but can you split this up into separate PRs?

Sometimes maintainers like part of the work that a contributor has done, but they don’t like all of it.

In this situation, a maintainer might respond to a pull request by asking you to split your contributions into a few different pull requests. This can help keep different contributions separate, smaller, and make them easier to discuss. You might remember the importance of small layers from the week on Coordination and discussion of the Superposition paper in Chapter 7.

One common cause of mixing up different pieces of work in one pull request is forgetting to change branches in your repo. If you add additional commits to that branch then they will get added to the pull request. Remember, a pull request says, “Please come to this repo and get everything on this branch.” So it’s not the same as including some commits in a zip file and mailing them, it’s more like placing things into a particular spot (like a mailbox or dead-drop in a spy movie), and telling people to pick them up from there.

So you can always add additional things before the person comes to pick them up.

This is useful because if there is a conversation around the pull request then you can easily update things. For example, if someone said “Please fix a typo or pull from upstream before we consider your pull request,” you’d be able to do so without opening another PR (just add, commit, push to your branch on your fork and the PR is updated).

However, it is an issue if you accidentally add new commits to a branch before a pull request is accepted. Now your pull request has two sets of commits: the first set you meant to include and the second set you didn’t. This mistake is particularly easy to make if you are developing on the main branch in your fork (which you shouldn’t do), but also happens if you are have more than one contribution that you are working on, as when you are doing something else while waiting for a PR to be accepted. If you’ve accidentally added too many files to your pull request—something that is easy to do if you use git add * or some variant—you’ll also find yourself needing to remove content from your PR.

This is where the delightfully named git cherry-pick can help us.

16.1 Split before submit

Things would be better if you had created a new branch for the first set of commits, then a second branch for your second set of commits, never adding either set to your main branch and following the “always work on a (short-lived) feature branch” rule. Then each set of commits would be “sent” through a different pull request.

In other words we would be in good position to submit two separate pull requests:

gitGraph
  commit id: "main1"
  commit id: "main2"
  commit id: "branch here" type: HIGHLIGHT
  branch first_set
  commit id: "apple1"
  commit id: "apple2"
  checkout main
  branch second_set
  commit id: "orange1"
  commit id: "orange2"

But instead we have everything mixed up together on one branch.

gitGraph
  commit id: "main1"
  commit id: "main2"
  commit id: "branch here" type: HIGHLIGHT
  branch all_mixed_up
  commit id: "apple1"
  commit id: "orange1"
  commit id: "apple2"
  commit id: "orange2"

We might have forgotten to branch at all, and so have everything stuck on main

gitGraph
  commit id: "main1"
  commit id: "main2"
  commit id: "branch here" type: HIGHLIGHT
  commit id: "apple1"
  commit id: "orange1"
  commit id: "apple2"
  commit id: "orange2"

16.1.1 LearnGitBranching exercise

We can see an example of the overall workflow for splitting a PR using the LearnGitBranching Visualizer, I have created a level called Split Pull Request.

16.1.2 Cherry-pick via commandline git

You can see a situation like this in this repo on GitHub. Bring that to your working space with:

cd ~
git clone https://github.com/jameshowison/i320d-needs-split.git
cd i320d-needs-split

If you then run our gitviz command (see Section A.3 for how to set up a short cut for that)

You will see both our apple and orange edits all together on the main branch.

jlh5498@educcomp04:~/github_repos/i320d-needs-split$ git log --oneline --abbrev-commit --all --graph --decorate --color
* c70df5a (HEAD -> main) orange2
* 423ad05 apple2
* 4f1fe99 orange1
* 91abdfb apple1
* 8e35b12 branch here
* ea4d580 main1
* 252ff3f Initial commit

We eventually want this to look like our first graph above, with two new branches (apple_branch and orange_branch):

jlh5498@educcomp04:~/github_repos/i320d-needs-split$ git log --oneline --abbrev-commit --all --graph --decorate --color
* 09d5482 (HEAD -> orange_branch) orange2
* 2137500 orange1
| * c8fd64d (apple_branch) apple2
| * 1a5ac49 apple1
|/  
* 158f2c4 (main) branch here
* 96d65c3 main2
* 7036c11 main 1
* 252ff3f Initial commit

To get there we will take three steps:

  1. Create a new branch, specifying the starting point
  2. Move the relevant commits to the new branch
  3. Push to the fork, create a new pull request
git checkout -b apple_branch 8e35b12

The 8e35b12 here is the commit id of the point at which we want the branch to start. Until now when we’ve created a branch we have done so while sitting at HEAD but git allows us to create a branch back in time. Git does this by adding metadata to the earlier commit (labeling it with a branch label).

Then we can move the commits using git cherry-pick. Note that this doesn’t move them from the main branch, but creates new commits with the same content. This is very much like copying files from one directory into another directory (except we are moving commits from a branch to another branch).

git cherry-pick 91abdfb 423ad05

Again the strings 91abdfb and 423ad05 identify specific commits. We can provide a list (like above), just one, and it is also possible to provide a range if you want a full sequence of commits.

After the cherry-pick we see:

jlh5498@educcomp04:~/github_repos/i320d-needs-split$ git cherry-pick 91abdfb 423ad05
[apple_branch 9ca623a] apple1
 Date: Wed Mar 1 15:31:38 2023 -0600
 1 file changed, 1 insertion(+)
[apple_branch 2b82f41] apple2
 Date: Wed Mar 1 15:31:38 2023 -0600
 1 file changed, 1 insertion(+)
jlh5498@educcomp04:~/github_repos/i320d-needs-split$ git log --oneline --abbrev-commit --all --graph --decorate --color
* 2b82f41 (HEAD -> apple_branch) apple2
* 9ca623a apple1
| * c70df5a (origin/main, origin/HEAD, main) orange2
| * 423ad05 apple2
| * 4f1fe99 orange1
| * 91abdfb apple1
|/  
* 8e35b12 branch here
* ea4d580 main1
* 252ff3f Initial commit

We can then use git push as normal to push the apple_branch up to the fork and make a pull request.

Note

Moving around commits using cherry-pick shows us why it is so important to understand commits as full copies of the state of the working directory, as full snapshots of our files. If commits were just the changes (just a bunch of diffs) then we would have to apply them in the order they were created, otherwise we’d get nonsense results.

But because commits are full copies of everything, we can move them around without any logical problems. Think of reordering the trays with the paper planes we used in the first class.

In fact, all that git is doing is re-writing the parent for each commit.

And branches are just like little post-its added to some commits, they are just metadata pointers. Neat, isn’t it?

See more about this on the GitHub blog https://github.blog/2020-12-17-commits-are-snapshots-not-diffs/

16.1.3 Exercises

16.1.3.1 Individual Exercises

  1. Now you work to get the orange_branch organized.

16.1.3.2 Group Exercise

Groups of 3. Nominate Maintainer, Contributor A, and Contributor B.

  1. Maintainer creates a repo on Github.
  2. Maintainer adds 2 commits and pushes.
  3. Contributor A and Contributor B fork and clone (and add upstream).
  4. Contributor A and Contributor B create a feature branch called will_need_split.
  5. Contributor A and Contributor B add four commits of four files on will_need_split
  6. Contributor A and Contributor B create a pull request to upstream from their will_need_split branch (including all four commits).
  7. Maintainer rejects the pull request, closing it and commenting “please split this up” (and direct which files go together, probably 2 in each.)
  8. Contributor A and Contributor B follow procedure above to end up with two new branches split_branch_1 and split_branch_2 send through separate pull requests with only the right commits/files in them.
  9. Maintainer eventually accepts each of the four split up pull requests.