21  Split a Pull Request

Hey, thanks! Some of this looks great, but can you split this up into separate PRs?

Sometimes maintainers like part of the work that a contributor has done, but they don’t like all of it.

In this situation, a maintainer might respond to a pull request by asking you to split your contributions into a few different pull requests. This can help keep different contributions separate, smaller, and make them easier to discuss. This connects with the importance of small layers from the week on Coordination and discussion of the Superposition paper in Chapter 7. In essence, small contributions are

One common cause of mixing up different pieces of work in one pull request is forgetting to change branches in your repo. If you add additional commits to that branch then they will get added to the pull request. Remember, a pull request says, “Please come to this repo and get everything on this branch.” So it’s not the same as including some commits in a zip file and mailing them, it’s more like placing things into a particular spot (like a mailbox or dead-drop in a spy movie), and telling people to pick them up from there.

So you can always add additional things before the person comes to pick them up. This flexibility is powerful, but it also means contributors must manage what’s visible to maintainers via PRs (and thus branches) carefully — an example of social accountability embedded in technical practice.

This is useful because if there is a conversation around the pull request then you can easily update things. For example, if someone said “Please fix a typo or pull from upstream before we consider your pull request,” you’d be able to do so without opening another PR (just add, commit, push to your branch on your fork and the PR is updated).

21.1 Setup exercise: pushing more commits to a PR

  1. Establish a forking network with at least a Maintainer and ContributorA.
  2. ContributorA makes a contribution on a branch
  3. ContributorA makes a PR from that
  4. Maintainer comments and asks for additional work
  5. ContributorA adds that work (new commit on the branch, push to fork)
  6. Maintainer sees new work show up in the PR.

However, it is an issue if you accidentally add new commits to a branch before a pull request is accepted. Now your pull request has two sets of commits: the first set you meant to include and the second set you didn’t. This mistake is particularly easy to make if you are developing on the main branch in your fork (which you shouldn’t do), but also happens if you are have more than one contribution that you are working on, as when you are doing something else while waiting for a PR to be accepted. If you’ve accidentally added too many files to your pull request—something that is easy to do if you use git add * or some variant—you’ll also find yourself needing to remove content from your PR.

This is where the delightfully named git cherry-pick can help us.

21.2 Split before submit

Things would be better if we had created a new branch for the first set of commits, then a second branch for our second set of commits, never adding either set to your main branch and following the “always work on a (short-lived) feature branch” rule. Then each set of commits would be “sent” through a different pull request — one from apple_branch and a separate one from orange_branch.

Naming note: In these examples, we’ll refer to the two sets of work as apple_branch and orange_branch. Each represents a small, focused set of commits — the kind of modular contributions that make code review and coordination easier.

In other words we would be in good position to submit two separate pull requests:

gitGraph
  commit id: "main1"
  commit id: "main2"
  commit id: "branch here" type: HIGHLIGHT
  branch apple_branch
  commit id: "apple1"
  commit id: "apple2"
  checkout main
  branch orange_branch
  commit id: "orange1"
  commit id: "orange2"

But instead we have everything mixed up together on one branch (all_mixed_up).

gitGraph
  commit id: "main1"
  commit id: "main2"
  commit id: "branch here" type: HIGHLIGHT
  branch all_mixed_up
  commit id: "apple1"
  commit id: "orange1"
  commit id: "apple2"
  commit id: "orange2"

We might have forgotten to branch at all, and so have everything stuck on main

gitGraph
  commit id: "main1"
  commit id: "main2"
  commit id: "branch here" type: HIGHLIGHT
  commit id: "apple1"
  commit id: "orange1"
  commit id: "apple2"
  commit id: "orange2"

21.3 LearnGitBranching exercise

Now let’s practice seeing this visually before we try it in Git.

We can see an example of the overall workflow for splitting a PR using the LearnGitBranching Visualizer, I have created a level called Split Pull Request

21.4 Cherry-pick via commandline git

We can see a situation like this in this repo on GitHub. Bring that to your working space with:

cd ~
git clone https://github.com/jameshowison/i320d-needs-split.git
cd i320d-needs-split

If we then run our git viz command (see Section B.3 for how to set up a short cut for that)

We will see both our apple and orange edits all together on the main branch.

jlh5498@educcomp04:~/github_repos/i320d-needs-split$ git viz
* c70df5a (HEAD -> main) orange2
* 423ad05 apple2
* 4f1fe99 orange1
* 91abdfb apple1
* 8e35b12 branch here
* ea4d580 main1
* 252ff3f Initial commit

We eventually want this to look like our first graph above, with two new branches (apple_branch and orange_branch):

jlh5498@educcomp04:~/github_repos/i320d-needs-split$ git viz
* 09d5482 (HEAD -> orange_branch) orange2
* 2137500 orange1
| * c8fd64d (apple_branch) apple2
| * 1a5ac49 apple1
|/  
* 158f2c4 (main) branch here
* 96d65c3 main2
* 7036c11 main 1
* 252ff3f Initial commit

To get there we will take three steps:

  1. Create a new branch, specifying the starting point
  2. Move the relevant commits to the new branch
  3. Push to the fork, create a new pull request
git checkout -b apple_branch 8e35b12

The 8e35b12 here is the commit id of the point at which we want the branch to start. Until now when we’ve created a branch we have done so while sitting at HEAD but git allows us to create a branch back in time. Git does this by adding metadata to the earlier commit (labeling it with a branch label).

Then we can move the commits using git cherry-pick. Note that this doesn’t move them from the main branch, but creates new commits with the same content. This is very much like copying files from one directory into another directory (except we are moving commits from a branch to another branch).

git cherry-pick 91abdfb 423ad05

Again the strings 91abdfb and 423ad05 identify specific commits. We can provide a list (like above), just one, and it is also possible to provide a range if we want a full sequence of commits.

After the cherry-pick we see:

jlh5498@educcomp04:~/github_repos/i320d-needs-split$ git cherry-pick 91abdfb 423ad05
[apple_branch 9ca623a] apple1
 Date: Wed Mar 1 15:31:38 2023 -0600
 1 file changed, 1 insertion(+)
[apple_branch 2b82f41] apple2
 Date: Wed Mar 1 15:31:38 2023 -0600
 1 file changed, 1 insertion(+)
jlh5498@educcomp04:~/github_repos/i320d-needs-split$ git viz
* 2b82f41 (HEAD -> apple_branch) apple2
* 9ca623a apple1
| * c70df5a (origin/main, origin/HEAD, main) orange2
| * 423ad05 apple2
| * 4f1fe99 orange1
| * 91abdfb apple1
|/  
* 8e35b12 branch here
* ea4d580 main1
* 252ff3f Initial commit

We can then use git push as normal to push the apple_branch up to the fork and make a pull request.

Note

Moving around commits using cherry-pick shows us why it is so important to understand commits as full copies of the state of the working directory, as full snapshots of our files. If commits were just the changes (just a bunch of diffs) then we would have to apply them in the order they were created, otherwise we’d get nonsense results.

But because commits are full copies of everything, we can move them around without any logical problems. Think of reordering the trays with the paper planes we used in the first class.

In fact, all that git is doing is re-writing the parent for each commit.

And branches are just like little post-its added to some commits, they are just metadata pointers. Neat, isn’t it?

See more about this on the GitHub blog https://github.blog/2020-12-17-commits-are-snapshots-not-diffs/ which gets into detail on snapshots vs diffs.

21.5 Exercises

21.5.1 Individual Exercises

  1. Now you work to get the orange_branch organized.

21.5.2 Group Exercises (and homework)

Groups of 3. Nominate Maintainer, Contributor A, and Contributor B.

21.5.2.1 First round exercise

  1. Maintainer creates a repo on Github.
  2. Maintainer adds 2 commits and pushes.
  3. Contributor A and Contributor B fork and clone (and add upstream).
  4. Contributor A and Contributor B create a feature branch called will_need_split.
  5. Contributor A and Contributor B add four separate commits, each editing a separate new file, on will_need_split
  6. Contributor A and Contributor B create a pull request to upstream from their will_need_split branch (including all four commits).
  7. Maintainer rejects the pull request, closing it and commenting “please split this up” and direct which commits/files go together, probably 2 commits in each PR. That direction creates the instructions for the contributors next step.
  8. Contributor A and Contributor B follow procedure above to end up with two new branches split_branch_1 and split_branch_2 send through separate pull requests with only the right commits/files in them.
  9. Maintainer eventually accepts/merges each of the four split up pull requests.

In the end the Upstream repo will show 6 closed PRs (3 from ContributorA and 3 from ContributorB). Two will be closed without merging, and 4 will be merged and closed.

22 To discuss in group

What kinds of social negotiation happen when maintainers ask contributors to split work? How could contributors respond helpfully?”

22.0.0.1 Second round exercise

Now ContributorA and Maintainer swap roles and repeat the exercise (this requires new repos and setup)