In posts on
13 November 2007 tagged bayesian-networks, cloud-computing, ec2, pebl, python with 2 comments
“If computers of the kind I have advocated become the computers of the future, then computing may someday be organized as a public utility just as the telephone system is a public utility… The computer utility could become the basis of a new and important industry.”
– John McCarthy, MIT Centennial in 1961
This is the premise behind cloud or utility computing. You simply request some computing power, use it for as long as you need and only pay for what you use. Once you compare this with the usual process — order machines, configure them, use them and then keep paying for them while they sit idle — you realize just how disruptive (in the good sense) cloud computing can be.
At Loudcloud, the original business model was to provide cloud computing and server management services for web startups. Instead of worrying about managing its environment, a startup could focus on its core competencies while Loudcloud would use its automated systems to quickly create and scale-up computing environments. The changing economy forced Loudcloud to alter its focus and me to alter my address :) but I’ve been interested in the idea ever since.
So, I was really excited when Amazon announced its EC2 (Elastic Compute Cloud) service which lets anyone request a few compute instances and pay only $.10 per hour per instance. For $10, you can have a 100 node cluster for an hour!
Amazon’s approach is really simple: they use virtualization to split physical machines into instances. You can use pre-existing Amazon Machine Images (AMI) or create your own using the tools they provide. I’m not sure what virtualization technology they use but the idea is similar to VMWare or Xen. Because the instances are created from machine images, they lose all created data when they are terminated — so you either download your data or store in Amazon’s S3 storage service.
I started with a public AMI with Fedora and some basic libs, added python 2.5 and pebl and saved as a custom image. And now, I have a way to learn Bayesian networks in the cloud. I had to run some small jobs today but our lab Xgrid was busy, so instead of waiting or interrupting my web-browsing work by running the jobs on my laptop, I created an instace using my custom AMI and analyzed some data for 20 cents. These were small jobs and didn’t require a cluster but I have MPI and IPython1 installed on the AMI so I can create an ad-hoc cluster and run larger jobs.
This has the potential to really change academic computing. My lab has a grid which happened to be busy but many other labs don’t have such resources. Instead of going through the university beurocracy to get some time on the campus clusters, a student can simply use EC2 to get some analysis done. The best part is that you pay per instance-hour. So 100 nodes running for one hour costs the same as one node runnng for 100 hours but you get your results in an hour instead of 4 days.
I’m going to integrate access to EC2 into the next version of pebl so you can simply specify your dataset, some parameters, your EC2 security keys and have an ad-hoc cluster in the cloud chugging away on your data.
In posts on
1 November 2007 tagged cats, redesign with no comments
Till now, I’ve been immune to the feline-infested ridiculousness of the interwebs. No Lolcats for me, thank you. My youtubing was free of home videos of cats doing stupid stuff. And now, suddenly, that’s all changed. Previously, a video of a cat waking up its owner would evoke mild amusement at best; now, I recognize in it some deep universal truth. From ‘meh’ to ‘awwwww’ in less than four days. The culprit for this sudden change? Bruce Springsteen — Heather’s new kitten.
I’ve never had a pet (except for a hermit crab I had for two weeks during the ninth grade but that was for a science project) and never really cared to have one. But if all pets are like Brucie (I doubt it’s possible), I can see the appeal. Anyway, that was my introduction to my new best friend. Expect more cat-laden posts in the future.
ps. The blawg has a new look. That’s not related to Brucie though.
In posts on
21 October 2007 tagged google, programming with 1 comment
According to google, there are ~68,000 search results for “allintitle:* considered harmful“. No wonder programming is so difficult with so many harmful techniques and methods and tools and tech.
In posts on
20 July 2007 with no comments
Over the last couple of weeks, I’ve used some great open-source (mostly python) tools. Here’s some pagerank-improving props:
I borked up my python installation in a 4AM egg-deleting frenzy so I installed Python 2.5 and re-installed all packages. Without easy_install and the Cheeseshop, this would have been a tedious nightmare. I’ve loved this kind of package management since the first time I did an ‘apt-get install gnome’ (Yes, I used CPAN before apt-get
but only minimally (the punctuations made my eyes hurt)).
I decided to rewrite a Turbogears webapp in Pylons. I like TG and am still using it for another project but the upcoming TG2 release seem to have a lot of non-backwards-compatible changes and so I’m waiting for that… and I wanted to check out Pylons. It’s similar to TG and I was able to copy most of the code directly. It’s nice having the flexibility to copy over the old templates with minimal changes (thanks to Genshi) and even though I haven’t had the opportunity to use them much, the webhelpers package looks really awesome. I’m also diggin this whole WSGI and paste stuff — don’t understand most of the details but enough to get it working and see its larger potential.
SQLAlchemy is easy enough to use and lets me be relational while still being hip with the ORM kids and Elixir makes creating SA models dead simple. Their DSL-magic happens thanks to what they call Statements, classes that add themselves into the containing class’ dict. This little bit of python black magic let’s you write stuff like:
class Widget:
has_field('name')
has_field('type')
belongs_to('machine')
In the above example, has_field and belongs_to modify the Widget class without having a reference to it by getting the reference via sys._getframe. Hackish and beautiful.
CocoaDialog lets you create little mac GUI widgets (progress bar, file chooser, dialog boxes, etc) on the command line, passing all params via command line arguments and interacting via stdin and stdout. This has been my way of getting a little mac GUI goodness without learning Objective-C and all the Cocoa libs.
And finally, py2app let’s you stuff a Python interpreter, standard libs and any packages and modules you use into a Mac Application bundle so users can run your code just like any other mac app without having to worry about python or any dependencies. They get a native-looking app and I get python.
I’ve been working on a project that would’ve taken about 2 years to develop instead of 2 weeks had I not had the benefit of all these tools. Standing on the shoulders of giants, indeed!
In posts on
24 June 2007 with 3 comments
Zero is zero, right? Can’t have positive and negative zero? Wouldn’t make any sense!! right?
Well, as I learned, IEEE does actually define a negative zero. It doesn’t make sense conceptually but it’s used when one wants to round off small negative values to zero but still indicate that the original value was negative. In python, this results in this weird example:
In [1]: x = -0.0
In [2]: x
Out[2]: -0.0
In [3]: x < 0
Out[3]: False
This is what I was stuck with today as I tried to figure out how to detect negative eigenvalues… which is tricky when -0.0 < 0 is False. Fortunately, Numpy has a function that specifically checks the signbit of numbers.
It’s problems like this that take so long to discover and fix that makes programming frustrating at times.
In posts on
21 June 2007 with 1 comment
Academic journals perform roughly the same function as record labels — they filter and distribute. Both help their customers navigate the large amount of content available. And both face potential obsolescence for similar reasons: the net makes distribution an almost zero-cost venture and facilitates filtering by the users themselves.
So, what’s a journal to do? I think journals need to embrace the net (is fighting it even an option?) and transform from simple providers of knowledge to platforms for knowledge. Nature seems to be taking steps in the right direction: Connotea, an online, social reference manager, has been available for a while now and recently, they released Nature Precedings as an arXiv-like preprint service for the biochem world.
The filtering that journals provide (in the form of peer-review) isn’t going away anytime soon but it’s nice to see some journals expand beyond their traditional role in science.
In posts on
22 May 2007 with no comments
Problem: I don’t post much on this blog because I have nothing meaningful to say..
Solution: Start another one with lower signal-to-noise ratio requirements.
A Tumblelog is a type of blog with lots of small, non-editorial posts. They’re basically a list of photos, bookmarks, videos, quotes and short messages without much commentary or thought — the type of stuff most of us email to our friends. Since Tumblr provides a convenient way to create such blogs, I’ve created one to share with the world all the random junk that makes it into my tubes.
anomalously.tumblr.com
In posts on
15 May 2007 with 2 comments

Imagine if the 2.5 million plastic soda bottles consumed in America in one hour were dumped in one big colorful pile. Or the 426,000 cellphones retired in one day. Or the 750,000 shipping containers that pass through American port daily. Sure, these images would be meaningful and a far better representation of our over-consumption than bland statistics. But who knew they’d be so beautiful? Link [via BoingBoing]
In posts on
4 May 2007 tagged firefox, gmail with 9 comments

Install DelegateGcal v1.1.2
Here’s a new version of the DelegateGcal extension that:
- works with the DelegateToTodoist extension
- works with Google Apps for your Domain
I can’t seem to get the automatic firefox extension updating to work and since it’s friday evening, I’m not going to bother. If you have the older version, just click on the above link and it’ll upgrade. The update.rdf file is located here and if someone can figure out what I’m doing wrong there, please let me know.
Update 11/2007: This extension does not work with the new version of Gmail. There’s a greasemonkey userscript (not written by me) that seems to work well and has more features.