Sunday, June 24, 2012

The Anatomy of a Git Pull

Ever seen something like this?

➜  ~/projects/gitblit/[master]>git pull
remote: Counting objects: 5899, done.
remote: Compressing objects: 100% (1322/1322), done.
remote: Total 5746 (delta 4099), reused 5413 (delta 3770)
Receiving objects: 100% (5746/5746), 3.78 MiB | 853 KiB/s, done.
Resolving deltas: 100% (4099/4099), completed with 98 local objects.
From git://github.com/gitblit/gitblit
 * [new branch]      bootstrap  -> origin/bootstrap
 * [new branch]      gh-pages   -> origin/gh-pages
 * [new branch]      issues     -> origin/issues
 * [new branch]      ldap       -> origin/ldap
   8f73a7c..67d4f89  master     -> origin/master
 * [new branch]      rpc        -> origin/rpc
From git://github.com/gitblit/gitblit
 * [new tag]         v0.9.1     -> v0.9.1
 * [new tag]         v0.9.2     -> v0.9.2
 * [new tag]         v0.9.3     -> v0.9.3
Updating 8f73a7c..67d4f89
Fast-forward
 .classpath                                         |  137 +-
 .gitignore                                         |   43 +-
 NOTICE                                             |   56 +
 build.xml                                          |  484 ++-
 distrib/add-indexed-branch.cmd                     |   20 +
[..clipped..]
 docs/screenshots/00.png                            |  Bin 41019 -> 38869 bytes
 374 files changed, 43257 insertions(+), 3508 deletions(-)
 create mode 100644 distrib/add-indexed-branch.cmd
 create mode 100644 distrib/federation.properties
[..clipped..]

Well, David Gerber on the Git-user mailing list did, and asked what all this output is about. I realized it that I've built up a mental filter on the output of many Git commands, ignoring the parts which aren't important. So I wanted to dig in there, understand and explain each of those lines. Here's a slightly adapted copy of my answer on the mailing list. I'll update it if you have any comments that can improve on the explanations.

I've swapped his example output with what I got from doing a large update in the Gitblit project:


➜  ~/projects/gitblit/[master]>git pull remote: Counting objects: 5899, done.
Any message prefixed with "remote:" means it's coming from the remote repository.

The first thing it does it to count the number of objects in the repository that will have to be transferred: commits, blobs, trees and tags. 5899 is the number of objects missing in your local repository, I believe.

If you want to find out more about these objects, try playing around with git count-objects -v in your repositories, before and after committing. Also note how git gc modifies the result.

Note that the object count differs between "loose" objects, and objects that have been compressed into "pack files" (think of it as zip files) for efficiency.

remote: Compressing objects: 100% (1322/1322), done.
This is the remote compressing loose objects before transfer. I reckon 1322 is the number of loose objects that need to be transferred.

remote: Total 5746 (delta 4099), reused 5413 (delta 3770)
Now here I'm getting a bit unsure. Git does a lot of optimization on making the transfer as fast as possible. Some of the compressions it has done are delta-compressed, and I reckon that's what those delta objects are. I think reused means the contents that were already compressed into pack files on the remote side. Closest thing I could find to an explanation is here.

Receiving objects: 100% (5746/5746), 3.78 MiB | 853 KiB/s, done.
This is just a progress counter during the transfer across the wire. The final 38.50 is the number of Kibibytes (analog to Kilobytes) that was transferred.

Resolving deltas: 100% (4099/4099), completed with 98 local objects.
Just the receiving end confirming the deltas mentioned above.


From git://github.com/gitblit/gitblit
* [new branch] bootstrap -> origin/bootstrap
* [new branch] gh-pages -> origin/gh-pages
* [new branch] issues -> origin/issues
* [new branch] ldap -> origin/ldap
8f73a7c..67d4f89 master -> origin/master
* [new branch] rpc -> origin/rpc


This is a summary of the changes in the remote branches. Most of them are new, but you were already tracking the branch master, so it says from which version it was updated , and which it was updated to (from..to).



From git://github.com/gitblit/gitblit
* [new tag] v0.9.1 -> v0.9.1
* [new tag] v0.9.2 -> v0.9.2
* [new tag] v0.9.3 -> v0.9.3

Easy enough, these are the new tags that have been created.

Konstantin Khomoutov adds: Worth mentioning that only the tags attached to objects which are
referenced (directly or indirectly) by the head(s) being fetched (`git
pull` calls `git fetch` first) are downloaded by default. 
To get all the tags from the remote one can use `git fetch --tags ...`

Updating 8f73a7c..67d4f89
This is your current active branch (master) being updated with the changes we saw earlier. Since you are pulling, and not simply fetching, the changes from the remote branch are being merged into your local branch (because your local branch 'master' is set up to track the remote branch 'origin/master').

Fast-forward
This means that your local branch has not diverged from origin/master. In other words: you haven't made any local commits. The merge can therefore be fast-forwarded, playing the changes onto your local branch without doing a merge commit.

Note: This is an important line! If your pull was not a fast-forward, it means a merge commit has been created for you. If this is not intentional, you should consider undoing the merge (git reset --hard HEAD~1), and then doing git pull --rebase instead.

 .classpath                                         |  137 +-
 .gitignore                                         |   43 +-
 NOTICE                                             |   56 +
 build.xml                                          |  484 ++-
 distrib/add-indexed-branch.cmd                     |   20 +

These are the changes in "stat" form (lines added minus lines removed - same as doing git diff --stat 8f73a7c..67d4f89).

 docs/fed_aggregation.png        |  Bin 0 -> 21532 bytes
Change in a binary file, cannot be expressed as line changes, so the change in size is printed instead

 374 files changed, 43257 insertions(+), 3508 deletions(-)
A summary of the changes that were made in your local branch.

 create mode 100644 distrib/add-indexed-branch.cmd
 create mode 100644 distrib/federation.properties
[..clipped..]
This is a notice on which of the changes files are actually new files.

If you can elaborate any more on any of these, please do so in a comment, and I'll extend the post.

Saturday, June 16, 2012

Broken Snapshots in Java Builds

Recently I've done a lot of thinking about build tools, especially in regards to Maven, Grails and Gradle, and how they play into release management and versioning with Git. This is just a post to get some of those thoughts off my chest. I'll come back to Gradle in future posts, as I build some more experience with it at work.

A few months ago, I wrote an article on our company blog about Grails' broken snapshot dependency mechanism.
Even though Grails (up onto, and including Grails 2) support snapshot dependencies, the feature is flawed in a way that makes it unusable for us. This will be fixed in Grails 3, but we couldn't wait that long, so we ended up hacking together a workaround. This article describes why and how we did it. (cont)
Now I've done a lot of modularization of huge builds over the years, and I've come to really like Maven's snapshot dependencies as an enabler for balancing between externalizing a library, and keeping it as part of the build.

So when a build tool comes along and claims that it supports snapshots, I expect it to fully support it, all the way, with local and remote repositories, time-stamping, update-policies, the lot.

So which build tools properly grok snapshots?


Which do not?

  • Ivy
  • Grails, and Griffon, because they're based on Ivy, but they will switch to Gradle next year
  • SBT uses Ivy, and thereby the Play 2.0 framework is affected as well.
I'd just like to emphasize how terrifying I find it that relatively fresh projects like SBT and Play base themselves on the muddy foundations of Ivy, instead of building something on top of Aether.

I haven't properly tested Lein, but I'll give it a run soon and update this post with the results. If anyone knows already, please comment and I'll sort it in.

What was the point of this blog post again?

The reason I'm bringing this up again is actually that there were some responses to my workaround on Twitter. Graeme Rocher, Grails project lead, responded with an alternative solution I figured it'd be nice to post in full, as his solution might work for some.

My tweet announcing the post.

Graeme:
With Grails 2.0 why didn't you just re-order the repositories as per the docs?
We're in fact still stuck on the old Grails 1.3.6, but anyhoo, I replied:
Because we want the freshest snapshot, whether it is from remote or local. See https://github.com/alkemist/grails-snapshot-dependencies-fix/issues/1
Graeme:
Ok, but surely a command line / system property to switch the repo order would have solved that for you?
Me:
TBH, that didn't occur to me. It would be a bit annoying for local dev though: Need to invoke grails twice to update deps.
Graeme:
More annoying than maintaining the hack? It would be "grails -Duselocal=true run-app". You could alias it to another cmd even :-)
In retrospect, my own hack has withstood the test of time pretty well. We haven't made any further Grails upgrades though, and I'm not sure if we will before Grails 3.0.

Some reflections on the approach Graeme suggests:

  • CI builds could always run with useLocal=false, but we would have to always deploy up-stream dependencies through the central maven repo. We pretty much always do this though, so this would work fine. Your build might be taking some shortcuts on this involving the local maven repo.
  • Developers would have to make a conscious choice when they would want to use locally built snapshots, and then run with the switch. This would work fine for us, as this is happening less as our snapshot deps are currently very stable, but I can imagine a build where you want the newest of both remote and local very often: You would then first have to run a build to get the remote ones, and then run a second to overwrite with your locally built snapshots.
  • His workaround (and mine) could be ported to other build tools (like SBT) as well.

In the end..

I suppose most Grails project developers don't care, and avoid using snapshots. This implies that
the modules that they do externalize must be released in a new version for every change that they want  to include in the downstream Grails application.

This is a hassle, but OK for slow-moving modules. If you have modules with a lot of development in them, chances are you'll just keep them part of the Grails application, and your build grow bigger and bigger, as well as maybe duplicating libraries you'd rather want to re-use in other applications.

Well, enough rambling. I hope this post might help out some Grails users, and make people more aware of problems with build tools based on Ivy..

Saturday, June 09, 2012

Github for Windows - first impressions

The other day I was listening to the recent Hanselminutes about Github for Windows (from now on abbreviated to GhfW), and decided to take it for a spin. I've had my pains explaining people how to set up Git on Windows, and I have pretty high hopes for this tool making it easier.

Dude, where's the frame..
At first I was a bit unsure if it was actually running, and not some popup on the current webpage:

Metro style lack of frame around the application
This is the first Metro style application I've installed, I think. And it didn't take long to like the feel of the application, cause it feels light, smooth and fast. I heard that GhfW can be a real memory hog, but I can't really confirm that here. It starts off taking 50 megs, and then later grows to 120 with a few repos checked out. At the end of writing this blog post it's at 200 megs.

But I already have Git set up
Now, my Windows is already set up to use Git, so I was a bit surprised to see that after logging in, I have to configure my full name and email. And GhW also automatically generates a new key which it authorizes at Github, even though I already have one set up. I guess that GhfW uses an isolated "We'll make everything just work (tm) for you" configuration somewhere, and I can see the reasoning for that.

It scans my homedir for repositories and prompts to import them, that's nice. Some of my local repos aren't hosted on Github, and it's nice to see that you can use GhfW for other repos as well, but I have no idea if my existing SSH-key authentication will work.

After importing some repos initially, I can't figure out how to import further repos. There's probably a button for it somewhere, but I can't find it.

Where are the buttons?
After playing around for a few minutes, this is a recurring thing for me in this modern Metro style interface:  I look around for a good few seconds before I find the button for what I want to do. It's just too unclear to me what is actually clickable, and what are just text or icons. Actually most text and icons are clickable, but my brain just doesn't cope with that yet.

A little later: Eureka! In order to import another repo, I have to drag it in from Explorer! I got that instruction by entering some text into the repository filter. See, that's what I mean.

FInally got a hint on how to import more repos
Some things can be right-clicked (repos, local changes), many cannot (commits). It takes a lot of clicking around to figure these things out.. Couldn't they build some conventions into the color scheme or something like that?

I wonder if the designers have done any usability tests on this..

Bugginess
I noticed a few bugs at first, like cryptic failure messages when pushing fails in mis-configured repos, and double-clicking below the list of repos will open the top one.

Then it got a bit nasty: I manage to land in a state where GhfW says the work tree is clean. Then it complains when I try to check out a branch, and shows me there are uncommitted changes (looks like there's some problems going on with line ending conversion there).

No local changes (background), but you have changes.
So I drop back to the command-line to fix up the repo, do a hard reset, then back into GhfW, just to see that it immediately has changes. And the changes do not match the git diff from the command-line. What the...

So, it could be that my installation is a bit messed up because of my existing .git settings or something like that, cause I don't think they would've shipped it everyone are having these problems.

The situation above occurred in a repo where I had worked with Windows line-endings checked in, and it could be that this conflicted a bit with how GhfW wanted to work things. They don't prompt you on how to handle line-endings during installation, so they must have some smart way of handling that, and maybe that failed in this situation.

Summary
I really like the rationale behind GhfW, which is making it super easy to get started with Git for Windows users. It certainly is easy to install, start new repos, and get them in and out of Github. And they also do install a Git command line, in case you want to go beyond the GUI.

The GUI is really nice, but it takes a bit of getting used to if you are new to Metro, I suppose.

For me, it feels a bit too buggy still, and a bit too alien that I could just recommend it to a complete Git newb, even if it was rock stable.

I suspect that they are aiming this product for a user group (I'm guessing suits in companies that buy Github's enterprise stuff) who will not be able to figure it out, and they need to do some usability testing here.

There are some more interesting notes on the design on Github's blog.