Filtering out illegal characters using Guava

The other day, I needed to validate a text-field against a whitelist of characters. Actually, lots of text-fields needed lots of different sets of whitelists, but let's just stick to one for the sake of example.

The text field "First name" is only allowed to contain any of these characters:

ABCDEFGHIJKLMNOPQRSTUVWXYZ
abcdefghijklmnopqrstuvwxyz°Žµ·žŒœŸÀÁÂÃÄÅÆ
ÇÈÉÊËÌÍÎÏÐÑÒÓÔÕÖØÙÚÛÜÝ
Þßàáâãäåæçèéêëìíîï
ðñòóôõöøùúûüýþÿ

.. and the user should be notified with which illegal characters she has entered in the field, if any.

The knee-jerk reaction to this kind of logic is regular expressions, but my knee-jerk reaction to regexp is to avoid it. The above character set probably doesn't fit with any of the predefined regexp notations, so it would probably be nasty anyway.

Some Guava fruits for coloring up this blog post (image source)

So, let's put Google Guava to some use. It has some nifty utilities for working with sets and other collections.

If you consider the user's input characters as one set, and the above whitelist characters as a second set, you can find the characters that appear in the first set, but not in the second like this (see the whole class here):

[Loading gist....]

That's pretty much it. I've put a whole example project along with tests on Github.

Here are the tests showing off some usage:

[Loading gist....]

I will probably be adding some more Guava examples to that project over time.

Comments

Open source CMS evaluations

I have now seen three more or less serious open source CMS reviews. First guy to hit the field was Matt Raible ( 1 2 3 4 ), ending up with Drupal , Joomla , Magnolia , OpenCms and MeshCMS being runner-ups. Then there is OpenAdvantage that tries out a handful ( Drupal , Exponent CMS , Lenya , Mambo , and Silva ), including Plone which they use for their own site (funny/annoying that the entire site has no RSS-feeds, nor is it possible to comment on the articles), following Matt's approach by exluding many CMS that seem not to fit the criteria. It is somewhat strange that OpenAdvantage cuts away Magnolia because it "Requires J2EE server; difficult to install and configure; more of a framework than CMS", and proceed to include Apache Lenya in the full evaluation. Magnolia does not require a J2EE server. It runs on Tomcat just like Lenya does (maybe it's an idea to bundle Magnolia with Jetty to make it seem more lightweight). I'm still sure that OpenAdvant...

Git tools for keeping patches on top of moving upstreams

At work, we maintain patches for some pretty large open source repositories that regularly release new versions, forcing us to update our patches to match. So far, we've been using basic Git operations to transplant our modifications from one major version of the upstream to the next. Every time we make such a transplant, we simply squash together the modifications we made in the previous version, and land it as one big commit into the next version. Those who are used to very stringent keeping of Git history may wrinkle their nose at this, but it is a pragmatic choice. Maintaining modifications on top of the rapidly changing upstream is a lot of work, and so far we haven't had the opportunity to figure out a more clever way to do it. Nor have we really suffered any consequences of not having an easy to read history of our modifications - it's a relatively small amount of patches, after all. With a recent boost in team size, we may have that opportunity. Also the need for be...

The Git Users Mailing List

A year ago or so, I came across the Git-user mailing list (aka. "Git for human beings"). Over the year, I grew a little addicted to helping people out with their Git problems. When the new git-scm.com webpage launched , and the link to the mailing list had disappeared, I was quick to ask them to add it again . I think this mailing list fills an important hole in the Git community between: The Git developer mailing list git@vger.kernel.org - which I find to be a bit too hard-core and scary for Git newbies. Besides, the Majordomo mailing list system is pretty archaic, and I personally can't stand browsing or searching in the Gmane archives. The IRC channel #git on Freenode, which is a bit out-of-reach for people who never experienced the glory days of IRC. Furthermore, when the channel is busy, it's a big pain to follow any discussion. StackOverflow questions tagged git , these come pretty close, but it's a bit hard to keep an overview of what questio...

Git-SVN Mirror without the annoying update-ref

This post is part of a series on Git and Subversion . To see all the related posts, screencasts and other resources, please click here . So no sooner than I had done my git-svn presentation at JavaZone , I got word of a slightly different Git-SVN mirror setup that makes it a bit easier to work with: In short, my old recipe includes an annoying git update-ref step to keep the git-svn remote reference up to date with the central bare git repo. This new recipe avoids this, so we can simply use git svn dcommit directly. So, longer version, with the details. My original recipe is laid out in five steps: Clone a fresh Git repo from Subversion. This will be our fetching repo. Set up a bare repo. Configure pushing from the fetching repo to bare repo In the shoes of a developer, clone the repo Set up an SVN remote in the developer's repo In the new approach, we redefine those last two steps: (See the original post for how to do the fir...

Git Stash Blooper (Could not restore untracked files from stash)

The other day I accidentally did a git stash -a , which means it stashes *everything*, including ignored output files (target, build, classes, etc). Ooooops.. What I meant to do was git stash -u , meaning stash modifications plus untracked new files. Anyhows, I ended up with a big fat stash I couldn't get back out. Each time I tried, I got something like this: .../target/temp/dozer.jar already exists, no checkout .../target/temp/core.jar already exists, no checkout .../target/temp/joda-time.jar already exists, no checkout .../target/foo.war already exists, no checkout Could not restore untracked files from stash No matter how I tried checking out different revisions (like the one where I actually made the stash), or using --force, I got the same error. Now these were one of those "keep cool for a second, there's a git way to fix this"situation. I figured: A stash is basically a commit. If we look at my recent commits using git log --graph --...