Converting LaTeX to plain text

I use LaTeX for 99% of my writings. It’s practical, convenient, I’m used to it and BibTeX really makes it easy to maintain a single references database.

Now, it’s really a pain in the ass when someone asks/forces me to handle someting in “plain text format”. Come on, am I really supposed to keep a ton of references by hand? Updating it every time something new comes up?

Well, here’s today’s nice little tool: catdvi. It translates a .dvi file into the “equivalent” plain text, and keeps the citations!

Just grab it from sourceforge: http://catdvi.sourceforge.net/

Worst-case complexity: why I really don’t care

At some point in college everyone learns worst-case complexity
analysis. And for some obscure reason, almost everyone immediately
decides that it’s the perfect tool for studying the behaviour and
comparing the performance of algorithms. This is wrong. Just wrong.

1 – Worst-case complexity analysis gives you exactly what the name
implies: a majoration (order of magnitude) of the worst possible
case.

Unless that result is proven to be tight, i.e., there is an
example of some input that actually makes the algorithm achieve
the worst case, you can’t really be sure of anything. If a
procedure is O(n^2), it is also O(n^3); you need the tightest
possible order, not “the best currently known result”. Thus,
unless you can also tell something about the best-case complexity
— which is usually ignored — you may be avoiding an O(n^3)
algorithm that someone will one day prove to be O(n^2).

2 – Lets assume, for the sake of argument, that a given algorithm A is
O(n^2), and that this result is tight. Two questions arise:

a) How frequently does this happen?

Knowing that there exists some input on which algorithm A will
behave quadratically doesn’t really tell you much. You need to
know more, specifically about the conditions on which *you* are
going to apply the algorithm.

For one, even if you must assume random input, and thus the
possibility of achieving the worst possible case, unless you
know that it occurs frequently, you may probably apply the
algorithm safely. Take for example the simplex algorithm. It’s
exponential in the worst case, but the worst case is so rare
that it is one of the most popular algorithms for linear
programming. You must also consider the particular environment
on which you are go use the algorithm. If you actually take
some time to think, you will find out that on several occasions
you have to consider only a subset of all possible inputs, and
maybe avoid the worst possible scenario.

b) How big is your *real* input?

Remember that the big O notation provides only an upper bound
on the growth rate and assumes that the size of you input will
tend to infinity. Now seriously, does the input of your
algorithms always go to the infinity and beyond? So why are you
choosing the perfect algorithm for a harder scenario?

3 – Constants can make a big difference

We always throw away constants when doing worst-case
analysis. They are useless, right? Well, I disagree. Consider
two algorithms, A1 and A2, such that A1 is O(n log(n)) and A2
is O(n^2). It’s pretty simple, right? A1 is faster than
A2. Well, think again, but this time a little harder. Actually,
what you know is that the asymptotic growth of A2 surpasses
that of A1. This analysis assumes that the size of the input
tends to infinity. What if you know that you will never have an
input such that n>100? What if on top of that, that big O
notation is hiding the constants 8 from A1 and 1/8 from A2?
Suddenly, in real life, a quadratic algorithm is faster than a
logarithmic one…

And for the record, I have a pretty good example of this
scenario using finite automata: a DFA with 100 states is huge
on many setups. A blind implementation of the “fastest n log(n)
algorithm” can be rather disappointing when compared to a
simple quadratic approach.

My point is: worst-case complexity analysis is a tool. Just as
best-case complexity, average-case, amortized analysis, etc. It is a
good thing to know it and to use it, but by all means it’s not the
answer to all problems!

Blindly choosing an algorithm simply based on a worst-case complexity
result will probably do you more harm than good. When implementing an
algorithm, it is crucial to know you setup: what kind of input can you
expect, equivalent algorithms available, tailor-made options, specific
optimizations, hardware you will be using, etc., etc.

cgit with charts

cgit is a web frontend to git. I like it because it is fast and does what I need and not more or less. Well, kind of.

In the stats page, cgit provides a table with the list of committers along with the amount of commits in a given timeframe. The table contains valuable information, but it is somewhat hard to read. At least for me. I prefer to visualize things.

That was when I decided to include some charts, with the help of the d3.js library.
Below is a screenshot of a development version of it.

If you want it, be my guest and fork it from my github:

git clone git://github.com/nlopes/cgit.git

EMC and Enterprise Storage

Although EMC provides command line tools to do (almost) everything you can do through the interface, I still think they should scrap the interface at all on the enterprise level Storage solutions they offer.

If you have a high tiered storage, you most probably (and should) know what you are doing. Instead of focusing in a web interface and nice icons, they should instead spend that energy writing good (or better) documentation on the command line tools; more options are also welcome.

@EMC: you should worry about good interface for graphs and related stuff. Not that you should not provide a command line interface to those graphs, or even externalize the links on the graphs. For example, externalizing links on the performance graphs would make it easier to integrate it with monitoring tools people already use and prefer (e.g. zenoss).

EMC World

Just to tell the world I am currently writing this from EMC World.
VIB Lounge FTW.

Python and Web Programming

For some years now that I’ve been keeping a decent amount of hack-ish libraries that do exactly what I want them to do. I have a library that provides me with a nice interface to different databases; another one keeps my sessions in check; others take care of cookies and encoding and decoding. To be honest, I never wrote any big website. I don’t really like web programming that much. But I do keep my libraries just in case. It’s like having that handy swiss army knife.

The other day, a co-worker asked me which framework he should use. He mentioned django and pylons I believe. There are plenty others, and the big difference is in how they do things and how good the interface they present is. Anyway, my answer was a manly one: write your own. I told him the time he would spend learning one of those frameworks and doing his own thing was about the same, considering he would realize the framework wouldn’t do exactly what he wanted. My answer has been bothering me for a couple of days and I’ve changed my mind. What I’m about to say is for all of you out there that want to start writing some websites in python.

Don’t write your own thing. It will be a nightmare for you to maintain it. You will write site specific stuff that you’ll regret later and then you’ll have to re-engineer some parts or even the whole thing. You will probably forget parts of the libraries that you rarely ever use and your documentation won’t probably match the (mostly good) documentation those projects (e.g.: django and friends) have. Not only that, but you will also regret you didn’t think about scalability and/or some other thing you should have thought when you wrote them. Believe me, my libraries have been with me for more than 8 years now and 8 years ago I didn’t think about scalability. I didn’t need to. Maybe one day I will and that is the day I am not looking forward.

My best advice to you is to learn one or two web frameworks and pick the one you feel most comfortable with. If you need stuff the framework doesn’t have, do the effort to extend it. In the long run, you will be better off that way. And if you really don’t want a complete web framework, at least look into some wsgi utility libraries to help you (e.g. werkzeug). Again, do not write your own thing.

As for me, I’ll be issuing a “rm -rf Projects/web-utilities” in my home folder.

P.S.: One of these days I may even write a tutorial on one of these frameworks.

Fluid: life saver

Do you hate having 30 tabs open in firefox/chrome/whatever when 7 or 8 of those you would really wish they were an application?

I met Fluid this weekend. Mac OS only unfortunately. It made my day.
One of the things I wished for was Google Reader as an application, and now I do. Did you mention Facebook? That also works. I guess Playboy would work too.

Try it and you might find yourself a fan.
http://fluidapp.com

There is plenty more you can do with it. Get your imagination working.

CentOS NetInstall Proxy support

CentOS is widely used by some well known corporations around the globe as their distribution of choice. Without wanting to start a flame war (because I wouldn’t – if you haven’t figured out by now, there aren’t many readers to this blog) no one cares about your opinion on that. Doesn’t matter what you like, what you pick, that’s what those companies picked, they stick with it and it works pretty well for them.

One could argue that is because a rpm based system is better than a dpkg based system (or compared to any other package management system for that matter). One could argue that a distribution based from one from a corporation (Hello Red Hat) would be the best solution without having to pay one cent.
One could argue a lot of things about corporations choosing CentOS as their default distribution and never ever finish arguing. Maybe and probably because it is a worthless discussion.

But CentOS netinstall not supporting proxies? Really?

No comments on that.

A homage to Spinal Tap

Nigel Tufnel: My RAID array are all RAID-11. Look, right across the rack, RAID-11, RAID-11, RAID-11 and…
Marty DiBergi: Oh, I see. And most arrays go up to RAID-10?
Nigel Tufnel: Exactly.
Marty DiBergi: Does that mean it’s faster? Is it any faster?
Nigel Tufnel: Well, it’s one faster, isn’t it? It’s not RAID-10. You see, most blokes, you know, will be serving files at RAID-10. You’re on RAID-10 here, all the way up, all the way up, all the way up, you’re on RAID-10 on your database backup. Where can you go from there? Where?
Marty DiBergi: I don’t know.
Nigel Tufnel: Nowhere. Exactly. What we do is, if we need that extra push over the cliff, you know what we do?
Marty DiBergi: Put it up to RAID-11.
Nigel Tufnel: RAID-11. Exactly. One faster.
Marty DiBergi: Why don’t you just make RAID-10 faster and make RAID-10 be the top performer and make that a little faster?
Nigel Tufnel: [pause] These go to RAID-11.

Upgrading OpenSolaris 134 -> Solaris 11 Express

It’s as easy as:
# pkg set-publisher -P -O http://pkg.opensolaris.org/release/ opensolaris.org
# pkg image-update -v
# reboot
# pkg set-publisher –non-sticky opensolaris.org
# pkg set-publisher –non-sticky extra
# pkg set-publisher -P -g http://pkg.oracle.com/solaris/release/ solaris
# pkg install pkg
# pkg image-update -v –accept

A small explanation:
– set a new OpenSolaris publisher
– upgrade the current image (I was running snv 134)
– reboot into the new environment
– set the current opensolaris.org and extra as non-sticky so that a new one will be higher ranked
– set the new publisher for Solaris 11 Express
– update the packaging client
– upgrade the current image (don’t forget to use –accept, otherwise you will just see a license text and a note to repeat ‘pkg image-update’ with the –accept option)

Return top