Skip to main content

Vibe Coding a GUI for Miller (mlr, the structured data file processing tool)

TL;DR

I (or AI) made a thing: mlr-desktop is a GUI for the awesome tool Miller (mlr)

Miller is like awk, sed, cut, join, and sort for name-indexed data such as CSV, TSV, and tabular JSON

And it was completely done by an AI agent (vibe coding). Read on for more details, or download the first release and try it out (available for Mac, Windows and Linux).

Background

One thing we do at snabble is importing large amounts of retail product data from our customers. Think product names and prices. 

Ideally, we receive data in the form of JSON, but often we get data from legacy systems that only support CSV export (or worse: some other proprietary text format).

Here's a trivial example of some well-formed CSV:

SKU,Product Name,Price,Barcode
FRO-010,Organic Free-Range Eggs (Dozen),5.99,5012345678901
PNC-025,Fresh Baked Sourdough Loaf,4.25,5023456789012
DRY-050,Basmati Rice (5kg),12.50,5034567890123
PRO-005,Roma Tomatoes (Per Lb),2.99,5045678901234
CNS-012,Black Beans (Canned),0.89,5056789012345
DAI-033,Whole Milk (Gallon),3.75,5067890123456
BEV-070,Sparkling Water (12-Pack),6.50,5078901234567
CLE-045,All-Purpose Kitchen Cleaner,4.99,5089012345678
PET-099,Adult Dog Food (10kg),25.99,5090123456789
BAK-001,All-Purpose Flour (2kg),3.20,5001234567890

It's pretty straight forward to whip out a program that uses a CSV-parsing library to gobble through these files and transform them into the structure that we use internally. 

But doing so requires a bit of effort, and this activity is only accessible to developers.

Now some non-programmer may be able to do some transformation using Excel or similar, but it quickly gets difficult once they have to do a complicated transformation, or the file gets into the hundreds of thousands of lines.

Miller to the rescue

When I work on structured data like this, I quickly to turn to Miller (mlr), a command line tool for processing large sets of structured data. 

Kind of like jq is for json, Miller is a must-have for people dealing with large data files. It's really fast, powerful but it can be hard to understand how it operates. 

Here are some examples where Miller quickly gives me answers that I need.

Show the lines where price is zero: 

mlr --from products.csv --icsv --opprint filter '$price == 0' then cat

Quick explanation of the provided options:

- read from the products.csv file
- expect input format to be CSV
- output format should be pretty-printed
- filter rows that have price zero
- print all the output

As you can see, after supplying the input/output arguments, Miller takes a chain of verbs that transform the stream of data. This is very flexible, and once you get familiar with it, you can tweak the verbs to find out or produce the data you like:

Count the lines where price is zero:

mlr --from products.csv --icsv filter '$price == 0' then count

Miller is very powerful and even contains its own DSL to operate on the stream. You can do stuff like join different datasets, query the data with SQL, and produce JSON as output.

I've been preaching the use of it for a while internally, but it's hard for others to get into it. They often resort to just using spreadsheets, simpler command line tools like grep, and manual counting. It's just a bit hard to get started with Miller.

So to this end, I figured, let's use some Vibe Coding to produce a nifty GUI for it!

Introducing Miller Desktop

Starting with the result, bear in mind that the tool is meant for desktop users, so the screenshots won't do it justice on mobile:


List of features:
  • Interactive Preview: See real-time output as you build your transformation pipeline
  • Sample Data: Comes pre-loaded with sample grocery data to help you get started
  • Select Input Format: Dropdown with CSV, TSV, JSON, and NDJSON
  • CSV/TSV-specific options: Ragged, Headerless, Custom field separator
  • Select Output Formats: Pretty Print, CSV, TSV, JSON, NDJSON
  • Verb Pipeline Builder: Chain multiple mlr verbs with drag-and-drop reordering
  • Quick Add Shortcuts: Common transformation patterns available with one click
    • Head 5 lines
    • Clean headers (replace spaces with underscores)
    • Filter by column value
    • Label columns
    • Cut columns
    • Add computed columns (split/extract)
  • Command Preview: See the exact mlr command that will be executed
  • Save Output: Export transformed data to a file
  • Auto-save: Your work is automatically saved between sessions
  • File Input: Load data from files or paste it directly
Here's a screencast to see it in action:



Keep in mind, the GUI does not offer access to all of Miller's features. It more of an 80% solution to help people get started.

It'll actually be quite useful to me as well, as it lets me quickly re-order the verbs and disable/enable them when inspecting files.

The tech stack is Go backend, with Wails framework and React with Vite for the frontend. But to be clear, I did not a single time have to dive into the code. It was all vibe coding with an AI agent.

Source code is published here: https://github.com/tfnico/mlr-desktop

Note that I'm not affiliated with the Miller project. Just a humble user.

Vibe coding with Google's Antigravity (Gemini)

I have tried this out in the past, and ended up a bit disappointed with the various AI agents results. They mostly got stuck into getting the desktop application working at all. 

Google's Antigravity editor that came out last week gave me the excuse to have another go at it, and this time I think it went quite well!

Altogether it was about a couple of hours of work. I focused on making minor improvements step by step, prompting Gemini to add a small feature, waiting a bit, firing up the application, manually testing, and repeat. 

Gemini did the right thing and got it working about 90% of the time. Occasionally, some bug was introduced, but Gemini was quick to fix these as I pointed them out.

Some things it was unable to figure out. For example. I wanted to be able to re-order the verbs using click & drag, but Gemini couldn't make it happen. Introducing up/down arrows was much easier for it to implement.

I think it was key that I had a good foundation of how Miller works, and what I wanted to achieve, while also being flexible on how the particulars were implemented.

In conclusion

(a) getting this mlr-desktop tool ready in such a short time is fantastic.

And (b), vibe coding is such a great fit for spinning up something quickly, where there is a non-critical wrapper (GUI) around a well-built core (the mlr tool itself).

I think we'll be seeing that a lot more going forward as a general trend: Hand-crafted cores and APIs, coupled with AI-generated frontends.

Finally, I need to emphasize that Miller is such a fantastic tool, and I suspect it's not getting a fraction of the usage that it deserves. So here's to spreading the word, and I hope this desktop tool can make it easier for some to "get it".

Comments

Popular posts from this blog

Git-SVN Mirror without the annoying update-ref

This post is part of  a series on Git and Subversion . To see all the related posts, screencasts and other resources, please  click here .  So no sooner than I had done my git-svn presentation at JavaZone , I got word of a slightly different Git-SVN mirror setup that makes it a bit easier to work with: In short, my old recipe includes an annoying git update-ref step to keep the git-svn remote reference up to date with the central bare git repo. This new recipe avoids this, so we can simply use git svn dcommit   directly. So, longer version, with the details. My original recipe is laid out in five steps: Clone a fresh Git repo from Subversion. This will be our  fetching repo. Set up a  bare repo. Configure pushing from the fetching repo to bare repo In the shoes of a developer, clone the repo Set up an SVN remote in the developer's repo In the new approach, we redefine those last two steps: (See the original post for how to do the fir...

Git Stash Blooper (Could not restore untracked files from stash)

The other day I accidentally did a git stash -a , which means it stashes *everything*, including ignored output files (target, build, classes, etc). Ooooops.. What I meant to do was git stash -u , meaning stash modifications plus untracked new files. Anyhows, I ended up with a big fat stash I couldn't get back out. Each time I tried, I got something like this: .../target/temp/dozer.jar already exists, no checkout .../target/temp/core.jar already exists, no checkout .../target/temp/joda-time.jar already exists, no checkout .../target/foo.war already exists, no checkout Could not restore untracked files from stash No matter how I tried checking out different revisions (like the one where I actually made the stash), or using --force, I got the same error. Now these were one of those "keep cool for a second, there's a git way to fix this"situation. I figured: A stash is basically a commit. If we look at my recent commits using   git log --graph --...

Git tools for keeping patches on top of moving upstreams

At work, we maintain patches for some pretty large open source repositories that regularly release new versions, forcing us to update our patches to match. So far, we've been using basic Git operations to transplant our modifications from one major version of the upstream to the next. Every time we make such a transplant, we simply squash together the modifications we made in the previous version, and land it as one big commit into the next version. Those who are used to very stringent keeping of Git history may wrinkle their nose at this, but it is a pragmatic choice. Maintaining modifications on top of the rapidly changing upstream is a lot of work, and so far we haven't had the opportunity to figure out a more clever way to do it. Nor have we really suffered any consequences of not having an easy to read history of our modifications - it's a relatively small amount of patches, after all. With a recent boost in team size, we may have that opportunity. Also the need for be...

Open source CMS evaluations

I have now seen three more or less serious open source CMS reviews. First guy to hit the field was Matt Raible ( 1 2 3 4 ), ending up with Drupal , Joomla , Magnolia , OpenCms and MeshCMS being runner-ups. Then there is OpenAdvantage that tries out a handful ( Drupal , Exponent CMS , Lenya , Mambo , and Silva ), including Plone which they use for their own site (funny/annoying that the entire site has no RSS-feeds, nor is it possible to comment on the articles), following Matt's approach by exluding many CMS that seem not to fit the criteria. It is somewhat strange that OpenAdvantage cuts away Magnolia because it "Requires J2EE server; difficult to install and configure; more of a framework than CMS", and proceed to include Apache Lenya in the full evaluation. Magnolia does not require a J2EE server. It runs on Tomcat just like Lenya does (maybe it's an idea to bundle Magnolia with Jetty to make it seem more lightweight). I'm still sure that OpenAdvant...

Git-SVN mirror product: SubGit

Just a quick mention for you people who can't be bothered with my somewhat intricate Git-SVN mirror setup , there now is a now is a solution packed into a product: SubGit . I tried it out on a little local SVN repo recently, and it worked just fine. It delivered pretty much perfect two-way syncing/bi-directional mirror (which I've earlier deemed to be very impractical with git-svn ). The bad parts are: It's new (use at your own risk, in other words: DO NOT install it in your main-line subversion repo used by a 100 devs). It's closed source, so you're at the mercy of the SubGit devs for fixing any feature that you need in your special repository.  It requires instrumentation on the Subversion repository installation. If the repository is beyond your control (hosted externally, or by some separate department), you're stuck with my old setup . The good parts: Super-sweet functionality! I mean, bi-directional Git/SVN is highly sought after. Seems to...