
Me:
I was recently introduced to the world of QR codes thanks to my new Android phone. They’re 2D barcodes that can be used for more than just identifying products — they can contain URL, phone numbers, contact information and lots more. I can scan a QR code (on the web, on paper, on the TV) with my phone and it’ll offer to add a contact to my phone, open a URL in the browser, let me call a number or add an event to my calendar depending on what’s in the code. The QR code on the left is my contact card, the one on the right is a colored version that also works. Scan either one using an appropriate application on your phone and it’ll offer to add me as a contact with my name, email, url and phone number. Much easier than typing it and offers nicer aesthetics.
Others:
Turns out I’m not the only anomalous on the net, there’s also http://an.omalo.us/ and since he’s my lexical neighbor (his words), I’m giving him some PageRank props here.


I haven’t posted in months and noticed that my last post was just me bragging about having access to a 13 node XGrid… so, to restart the blog on the same theme, I can now boast about using a 105 Ghz cluster for only $8.40 an hour.
My bayesian network software, pebl, can use Amazon’s EC2 cloud computing platform to create ad-hoc clusters, run code on them and then terminate them once finished. I needed to do some bootstrap learning for a project but it would’ve taken hundreds of hours on my lab desktop and our XGrid was busy. Being prone to both procrastination and impatience, I decided to use Amazon and was able to run my code within a day for only $55. That’s peanuts compared to the amount spent to generate the data I’m analyzing (tens of thousands of dollars) or required to buy and maintain your own cluster (again, tens of thousands).
But the coolest part:
With the current results, I can calculate p-values with three significant digits but I can quote exactly how much it would cost to get additional significant digits!
I think cloud computing will become a big part of scientific computing in the near future, especially for small labs or students who want to test out a hunch without writing up a formal grant. I’ve created an open source project, anyCloud, to take some of this code from pebl and make it more generally applicable for any python code. It’s in a very early stage but if you’re interested in helping out, I’d appreciate it.

I know it’s not a big deal and I use this grid all the time, but it never fails to amaze me how much computational power we have at our disposal sometimes.
I’m having too much fun with this Radiohead stuff. Made another late-night attempt thanks to a certain Fat Weasel (you Trader Joe’s fans know what I’m talking about). YouTube’s compression deteriorates the quality quite a bit but the original is over 500MB. Anyway, here’s the Youtube video:
And here are some screencaps from the full, uncompressed video:
Using NodeBox to visualize the Radiohead House of Cards data was fun but NodeBox proved to be too slow on my laptop (I think CoreGraphics has moved onto Intel while I’m stuck at G4)… so I decided to download Processing and give it a whirl.
I tried creating a smooth shader-like look by using ellipses to render the data points and faking Gaussian smoothing by hackishly using multiple ellipses with varying Alpha levels. Looks decent but not too exciting.
The jerkiness and unnatural movement of the ear are due to the intentional noise added to the dataset by the creators and not an effect of the motion capture technology (as explained here).
Gotta admit, I do like Processing… even though it makes me use curly braces. It was easy to learn quickly (same concept as NodeBox and other procedural art environments) and the use of Java didn’t get in my way too much.
Radiohead (cool) collaborated with Google (cool) to create a music video using lasers (cool) and 3D scanning devices (cool) instead of cameras and then released some of the resulting data under a CC-license (cool) and put it up on Google Code (cool) to let the internets muck around with it (cool). With so much awesomeness, how could I possibly go to bed?
The Google Code site for the project includes data as CSV files with x,y,z and intensity per data point and some source code for the Processing environment. The data points are actually much denser than the Radiohead video’s style implies, making possible all kinds of visualization. I haven’t used Processing before but have experimented with NodeBox and since the data files are simple text files, I had no problems writing a quick script to render the data. Even a simple 2D grayscale scatterplot looks good without any interpolation or smoothing.

Keep in mind that no regular cameras were used in capturing this and the data is in 3D and just begging for being imported into a proper modeling application. I tried making an animation but the poor laptop can’t handle it and the xgrid is working on some data.. and it’s for lab work anyway, right? ;)
Few years ago, my laptop was stolen from my bedroom in a house less than a block from the police station. Since then, I’ve been very good about backing up work-related stuff.. but have been more lax about music, pictures and other large files.
Fast forward to last week: I dropped off my laptop at the Apple store to have the LCD cable replaced. Due to way Apple’s flatrate servicing works, they not only replaced the cable but also the aging hardrive and returned my laptop with a fresh install of Leopard minus all my files. Even though I hadn’t had any problems with the drive and wasn’t expecting it to be replaced (my request was only for the LCD cable), I had asked our department sysadmin to backup my entire harddrive in case the laptop was lost or damaged during shipping.
So when I got my laptop without all my files, I was frustrated with Apple for neither copying over my files nor contacting me before erasing them, but I wasn’t worried. I patted myself on the back for having a good automated backup strategy for work files and for being proactive enough to backup the entire drive before sending in the laptop. At worst, I thought, it would cost me an extra day without my laptop. Earlier today, I learned that the external drive used by our department sysadmin had failed. I had lost all music and photos. I don’t care much about the music but I had many photos I cared about. I panicked and rummaged through all my CDs and DVDs and looked through all the old folders on the external drive.
Luckily, last month, I was playing with Adobe Lightroom and had imported all of my iPhoto pics to it and tested its backup feature with my external drive.. so I was able to recover all photos more than a month old. I have no mp3s but there’s a box full of old CDs under my desk that I could reacquaint myself with (late nineties, here I come!).
So, remember kids, backup your files.. and backup your backups.
ps. I fully expect to make a post in a couple of months about how I’m glad I survived the fire/tornado/whatever but really wished that I had used some off-site backups.
For the next three weeks, I will have to adjust from living in

to

After typing cd /Library/Frameworks/Python.framework/Versions/Current/lib/python2.5/site-packages/ for the hundredth time, I decided there had to be a better way. Even with tab completion, it’s a pain to access deeply-nested directories. I know about cdargs but it’s not very flexible and doesn’t work with regular bash commands like cd, cp, mv, etc..
I wrote up a couple of bash functions that save directory bookmarks to a file and make them available as environment variables. To install, download dirmarks.sh and source it from .bashrc or .bash_profile.
Once installed, use mark bookmark_name to save a bookmark for the current directory and use lsmarks search_term to see/grep the list of saved bookmarks. You can then use the bookmark like any other environment variable: cd $bookmark or cat $bookmark/file.txt. The bookmarks are saved in ~/.dirmarks.
The following example usage should clarify what my words have confused:
# lsmarks with any arguments $ lsmarks bubble "/Users/shahad/projects/tg_apps/bubble" docs "/Users/shahad/projects/docs" pebl "/Users/shahad/projects/pebl" py "/Library/Frameworks/Python.framework/Versions/Current" pybin "/Library/Frameworks/Python.framework/Versions/Current/bin" pylibs "/Library/Frameworks/Python.framework/Versions/Current/lib/python2.5/site-packages" # lsmarks with search term $ lsmarks py py "/Library/Frameworks/Python.framework/Versions/Current" pybin "/Library/Frameworks/Python.framework/Versions/Current/bin" pylibs "/Library/Frameworks/Python.framework/Versions/Current/lib/python2.5/site-packages" # using a directory bookmark $ cd $bubble $ pwd /Users/shahad/projects/tg_apps/bubble # adding a directory bookmark $ cd bubble/static/images $ mark bubble_img Adding bookmark bubble_img --> "/Users/shahad/projects/tg_apps/bubble/bubble/static/images" # look ma, the new bookmark has been added! $ lsmarks bubble bubble "/Users/shahad/projects/tg_apps/bubble" bubble_img "/Users/shahad/projects/tg_apps/bubble/bubble/static/images" # directories with spaces in the name $ cd /Users/shahad/Library/Application\ Support/ $ mark appsu Adding bookmark appsu --> "/Users/shahad/Library/Application Support" # trying to use bookmark as normal.. fails :( $ cd ~ $ cd $appsu -bash: cd: /Users/shahad/Library/Application: Not a directory # we have to enclose the dir bookmark in quotes. $ cd "$appsu" $ pwd /Users/shahad/Library/Application Support
Note that if a directory has spaces in its name, you have to enclose the bookmark name in quotes. I can’t find a way around this requirement that works on my mac. If anyone has suggestions, I would appreciate them.
“If computers of the kind I have advocated become the computers of the future, then computing may someday be organized as a public utility just as the telephone system is a public utility… The computer utility could become the basis of a new and important industry.”
– John McCarthy, MIT Centennial in 1961
This is the premise behind cloud or utility computing. You simply request some computing power, use it for as long as you need and only pay for what you use. Once you compare this with the usual process — order machines, configure them, use them and then keep paying for them while they sit idle — you realize just how disruptive (in the good sense) cloud computing can be.
At Loudcloud, the original business model was to provide cloud computing and server management services for web startups. Instead of worrying about managing its environment, a startup could focus on its core competencies while Loudcloud would use its automated systems to quickly create and scale-up computing environments. The changing economy forced Loudcloud to alter its focus and me to alter my address :) but I’ve been interested in the idea ever since.
So, I was really excited when Amazon announced its EC2 (Elastic Compute Cloud) service which lets anyone request a few compute instances and pay only $.10 per hour per instance. For $10, you can have a 100 node cluster for an hour!
Amazon’s approach is really simple: they use virtualization to split physical machines into instances. You can use pre-existing Amazon Machine Images (AMI) or create your own using the tools they provide. I’m not sure what virtualization technology they use but the idea is similar to VMWare or Xen. Because the instances are created from machine images, they lose all created data when they are terminated — so you either download your data or store in Amazon’s S3 storage service.
I started with a public AMI with Fedora and some basic libs, added python 2.5 and pebl and saved as a custom image. And now, I have a way to learn Bayesian networks in the cloud. I had to run some small jobs today but our lab Xgrid was busy, so instead of waiting or interrupting my web-browsing work by running the jobs on my laptop, I created an instace using my custom AMI and analyzed some data for 20 cents. These were small jobs and didn’t require a cluster but I have MPI and IPython1 installed on the AMI so I can create an ad-hoc cluster and run larger jobs.
This has the potential to really change academic computing. My lab has a grid which happened to be busy but many other labs don’t have such resources. Instead of going through the university beurocracy to get some time on the campus clusters, a student can simply use EC2 to get some analysis done. The best part is that you pay per instance-hour. So 100 nodes running for one hour costs the same as one node runnng for 100 hours but you get your results in an hour instead of 4 days.
I’m going to integrate access to EC2 into the next version of pebl so you can simply specify your dataset, some parameters, your EC2 security keys and have an ad-hoc cluster in the cloud chugging away on your data.
© ano.malo.us. Powered by WordPress using a tweaked version of the DePo Clean Theme.