Command Line Interface

Vagrant is a great tool. I use it for various projects. Every time I need a new base box to create with vagrant, I need to go to to pick and download one.

Mostly because I dislike having to go to the browser when I am working, I wrote a tool to list and allow me to retrieve information from

The tool is called bves (who knows why I decided to give it that name) and you can find it here.


# bves list | head -n 10

1 Debian 7.3.0 64-bit Puppet 3.4.1 (Vagrant 1.4.0)
2 OpenBSD 5.5 64-bit + Chef 11.16.0 + Puppet 3.4.2
3 OpenBSD 5.4 64-bit + Chef 11.8.2 (150GB HDD)
4 OpenBSD 5.3 64-bit (Vagrant 1.2)
5 OpenBSD 5.3 64-bit
6 Aegir-up Aegir (Debian Squeeze 6.0.4 64-bit)
7 Aegir-up Debian (Debian Squeeze 6.0.4 64-bit)
8 Aegir-up LAMP (Debian Squeeze 6.0.4 64-bit)
9 AppScale 1.12.0 (Ubuntu Precise 12.04 64-bit)
10 Arch Linux 64 (2014-06-20)

# bves show 260

Name: Windows 8.1 with IE11 (32bit)
Details: Windows 8.1 with IE11 (32bit)
The Microsoft Software License Terms for the IE VMs are included in the release notes and supersede any conflicting Windows license terms included in the VMs. By downloading and using this software, you agree to these license terms.
Provider: Virtualbox
Size: 3584.0 MB

# vagrant box add Debian7 $(./bves url 1)

Adventures in Go

This Christmas I decided to write a small Go web app (Amazon Wishlist Stats) to see how nice Go actually is.
The web app I ended up doing shows you how much you would have to spend to buy everything in your Amazon Wishlist.

In theory this is a fairly simple problem: Scrape the amazon wishlist web page that the user submits, add the value of all items, and show it on screen.

This requires two parts:

  • Backend
    1. download webpage
    2. parse html
    3. serve static pages
  • Frontend
    1. display form where user can submit wishlist URL
    2. display total and table with items

For the first step I used net/http Get. For the second, net/html Parse and gorilla mux for the third.

Downloading the web page is really straightforward. I didn’t encounter any issues and Go makes this really easy.
Parsing html is a bit trickier. The first time I parsed, I called Parse and navigated through the tree until I found what I wanted. A Node struct has a Data entry that contains the text we most likely want to find. Simple enough right? Not really. Welcome to encoding 101.

Amazon wishlist pages are in UTF8 so I had to do a bit of extra work to have consistent behaviour with characters outside of ASCII:

reader, err := charset.NewReader(bytes.NewReader(body), "utf-8")
node, err := html.Parse(reader)

(Error checking not included for clarity sake)

I convert the body stream into UTF8 (using charset) and then Parse that. Works like a charm.

The frontend can all be read and studied on the site so I won’t go into it. It uses Bootstrap and Angular.

The project was a nice way to get my head wrapped around web development in Go. Scraping is not as nice as I would wish using Go. Hopefully the html library matures. I tried using xmlpath but it didn’t feel right. After spending 30m dealing with multiple issues, I decided to abandon xmlpath and navigate myself through html. I should take a second look at xmlpath as I am sure it would save me a lot of lines of crappy code.

Knowing Unicode is important in Go! This is a good thing. Read [1] and [2].

I really enjoy Angular. This framework motivated me to do frontend web development.


  1. The Absolute Minimum Every Software Developer Absolutely, Positively Must Know About Unicode and Character Sets
  2. Strings, bytes, runes and characters in Go

kill -HUP blog

As you may have noticed, we updated the blog’s theme.
It reads better and is easy on the eyes (probably because it reads better).

We have been pretty much dead to writing, but we are making a comeback.

A Linux gateway with mutiple ISP

For several reasons that are usually too complicated to describe here, I frequently end up having to set up a Linux box to [1, 2] simulate a Cisco router.

It’s pretty basic. Just insert 1 into /proc/sys/net/ipv4/ip_forward, set your network interfaces and your default gateway, add some IPTables rules, and you’re set.

For the second time now, however,  I had a situation where two or more ISP were being used. And for the second time I found the documentation to be difficult to find ,and the examples lying around kind of short/incomplete.
So I decided to write a script that would just set everything with no fuss, and make it available here. For the sake of simplicity this one only considers two links, but it is trivial to generalize it (the original used four).

Here’s the scenario:

  • two ISPs on interfaces vlan20 and vlan21
  • we have to make sure that all traffic going into vlanX goes out only on vlanX
  • the traffic is load-balanced between both ISPs



# ISP interfaces

# IP addresses of the interfaces
IP1=`ifconfig ${IF1} | grep “inet addr” | awk ‘{print $2}’ | sed -e ‘s/.*://’`
IP2=`ifconfig ${IF2} | grep “inet addr” | awk ‘{print $2}’ | sed -e ‘s/.*://’`

# IP address of the gateways on the IPSs
ISP1=`ip route | grep ${IF1} | grep default | awk ‘{ print $3}’`
ISP2=`ip route | grep ${IF2} | grep default | awk ‘{ print $3}’`

# network $ISP1 and $ISP2 are in
ISP1_NET=`ip route | grep ${IF1} | grep -v default | awk ‘{print $1}’`
ISP2_NET=`ip route | grep ${IF2} | grep -v default | awk ‘{print $1}’`

# create the routing tables (if not already there)

x=`grep $T1 $TABLES`
if [ “$x” == “” ]
echo “Creating routing tables…”
echo “1 $T1″ >> $TABLES
echo “2 $T2″ >> $TABLES
echo “Routing tables created…”

# routing for the IPSs tables
ip route add $ISP1_NET dev $IF1 src $IP1 table $T1
ip route add $ISP2_NET dev $IF2 src $IP2 table $T2

# main routing table
ip route add $ISP1_NET dev $IF1 src $IP1
ip route add $ISP2_NET dev $IF2 src $IP2

# make sure that traffic coming in on a particular interface get answered from that interface
ip rule add from $IP1 table $T1
ip rule add from $IP2 table $T2

# load-balanced default route
ip route del default dev $IF1
ip route del default dev $IF2
ip route add default scope global nexthop via $ISP1 dev $IF1 weight 1 nexthop via $ISP2 dev $IF2 weight 1





Converting LaTeX to plain text

I use LaTeX for 99% of my writings. It’s practical, convenient, I’m used to it and BibTeX really makes it easy to maintain a single references database.

Now, it’s really a pain in the ass when someone asks/forces me to handle someting in “plain text format”. Come on, am I really supposed to keep a ton of references by hand? Updating it every time something new comes up?

Well, here’s today’s nice little tool: catdvi. It translates a .dvi file into the “equivalent” plain text, and keeps the citations!

Just grab it from sourceforge:

Worst-case complexity: why I really don’t care

At some point in college everyone learns worst-case complexity
analysis. And for some obscure reason, almost everyone immediately
decides that it’s the perfect tool for studying the behaviour and
comparing the performance of algorithms. This is wrong. Just wrong.

1 – Worst-case complexity analysis gives you exactly what the name
implies: a majoration (order of magnitude) of the worst possible

Unless that result is proven to be tight, i.e., there is an
example of some input that actually makes the algorithm achieve
the worst case, you can’t really be sure of anything. If a
procedure is O(n^2), it is also O(n^3); you need the tightest
possible order, not “the best currently known result”. Thus,
unless you can also tell something about the best-case complexity
— which is usually ignored — you may be avoiding an O(n^3)
algorithm that someone will one day prove to be O(n^2).

2 – Lets assume, for the sake of argument, that a given algorithm A is
O(n^2), and that this result is tight. Two questions arise:

a) How frequently does this happen?

Knowing that there exists some input on which algorithm A will
behave quadratically doesn’t really tell you much. You need to
know more, specifically about the conditions on which *you* are
going to apply the algorithm.

For one, even if you must assume random input, and thus the
possibility of achieving the worst possible case, unless you
know that it occurs frequently, you may probably apply the
algorithm safely. Take for example the simplex algorithm. It’s
exponential in the worst case, but the worst case is so rare
that it is one of the most popular algorithms for linear
programming. You must also consider the particular environment
on which you are go use the algorithm. If you actually take
some time to think, you will find out that on several occasions
you have to consider only a subset of all possible inputs, and
maybe avoid the worst possible scenario.

b) How big is your *real* input?

Remember that the big O notation provides only an upper bound
on the growth rate and assumes that the size of you input will
tend to infinity. Now seriously, does the input of your
algorithms always go to the infinity and beyond? So why are you
choosing the perfect algorithm for a harder scenario?

3 – Constants can make a big difference

We always throw away constants when doing worst-case
analysis. They are useless, right? Well, I disagree. Consider
two algorithms, A1 and A2, such that A1 is O(n log(n)) and A2
is O(n^2). It’s pretty simple, right? A1 is faster than
A2. Well, think again, but this time a little harder. Actually,
what you know is that the asymptotic growth of A2 surpasses
that of A1. This analysis assumes that the size of the input
tends to infinity. What if you know that you will never have an
input such that n>100? What if on top of that, that big O
notation is hiding the constants 8 from A1 and 1/8 from A2?
Suddenly, in real life, a quadratic algorithm is faster than a
logarithmic one…

And for the record, I have a pretty good example of this
scenario using finite automata: a DFA with 100 states is huge
on many setups. A blind implementation of the “fastest n log(n)
algorithm” can be rather disappointing when compared to a
simple quadratic approach.

My point is: worst-case complexity analysis is a tool. Just as
best-case complexity, average-case, amortized analysis, etc. It is a
good thing to know it and to use it, but by all means it’s not the
answer to all problems!

Blindly choosing an algorithm simply based on a worst-case complexity
result will probably do you more harm than good. When implementing an
algorithm, it is crucial to know you setup: what kind of input can you
expect, equivalent algorithms available, tailor-made options, specific
optimizations, hardware you will be using, etc., etc.

cgit with charts

cgit is a web frontend to git. I like it because it is fast and does what I need and not more or less. Well, kind of.

In the stats page, cgit provides a table with the list of committers along with the amount of commits in a given timeframe. The table contains valuable information, but it is somewhat hard to read. At least for me. I prefer to visualize things.

That was when I decided to include some charts, with the help of the d3.js library.
Below is a screenshot of a development version of it.

If you want it, be my guest and fork it from my github:

git clone git://

EMC and Enterprise Storage

Although EMC provides command line tools to do (almost) everything you can do through the interface, I still think they should scrap the interface at all on the enterprise level Storage solutions they offer.

If you have a high tiered storage, you most probably (and should) know what you are doing. Instead of focusing in a web interface and nice icons, they should instead spend that energy writing good (or better) documentation on the command line tools; more options are also welcome.

@EMC: you should worry about good interface for graphs and related stuff. Not that you should not provide a command line interface to those graphs, or even externalize the links on the graphs. For example, externalizing links on the performance graphs would make it easier to integrate it with monitoring tools people already use and prefer (e.g. zenoss).

Python and Web Programming

For some years now that I’ve been keeping a decent amount of hack-ish libraries that do exactly what I want them to do. I have a library that provides me with a nice interface to different databases; another one keeps my sessions in check; others take care of cookies and encoding and decoding. To be honest, I never wrote any big website. I don’t really like web programming that much. But I do keep my libraries just in case. It’s like having that handy swiss army knife.

The other day, a co-worker asked me which framework he should use. He mentioned django and pylons I believe. There are plenty others, and the big difference is in how they do things and how good the interface they present is. Anyway, my answer was a manly one: write your own. I told him the time he would spend learning one of those frameworks and doing his own thing was about the same, considering he would realize the framework wouldn’t do exactly what he wanted. My answer has been bothering me for a couple of days and I’ve changed my mind. What I’m about to say is for all of you out there that want to start writing some websites in python.

Don’t write your own thing. It will be a nightmare for you to maintain it. You will write site specific stuff that you’ll regret later and then you’ll have to re-engineer some parts or even the whole thing. You will probably forget parts of the libraries that you rarely ever use and your documentation won’t probably match the (mostly good) documentation those projects (e.g.: django and friends) have. Not only that, but you will also regret you didn’t think about scalability and/or some other thing you should have thought when you wrote them. Believe me, my libraries have been with me for more than 8 years now and 8 years ago I didn’t think about scalability. I didn’t need to. Maybe one day I will and that is the day I am not looking forward.

My best advice to you is to learn one or two web frameworks and pick the one you feel most comfortable with. If you need stuff the framework doesn’t have, do the effort to extend it. In the long run, you will be better off that way. And if you really don’t want a complete web framework, at least look into some wsgi utility libraries to help you (e.g. werkzeug). Again, do not write your own thing.

As for me, I’ll be issuing a “rm -rf Projects/web-utilities” in my home folder.

P.S.: One of these days I may even write a tutorial on one of these frameworks.