loquacious lubricated lucubrations: 2007

Friday, November 23, 2007

Java generics in enhanced for loops

How exactly should the java compiler translate enhanced for loops? Should it try to enforce type safety (e.g. List<string> should only contain strings)? The JLS thinks it should not. It is a bug that Sun's javac enforces this type safety!

I got embroiled in this discussion because of a post on my old article on for-each loops in java. (Disclosure: I am nishrs in that world).

Wednesday, November 14, 2007

HotNets Trip Report

Session 1

Prabal Datta said that by grouping updates and reducing their communications, sensors can achieve a much better battery life. Theirs was one of the 3 HotNets papers covered by press, the other two being our metro-wifi proposal (which also was slashdotted) and Sachin's MIXIT. Prabal's message was actually more nuanced than the title might suggest -- he showed how clock skew can kill energy savings when you are grouping updates, and I found two of his factoids useful:

Over short time periods, links are bimodal but over longer times existing links deteriorate and new ones emerge (So no use keeping neighbour info and routing tables up to date between sleep-wake cycles)
Packet Delivery rate is high when RSSI is above -90dBm. It falls precipitously after that (fig 4). They use this as an agile link estimator.

Mark Allman showed that network protocols are highly chatty through the night (eg DHCP traffic), and we may not be able to shut them away from the network entirely. But, perhaps, we can write proxys that will stand in for many computers and allow those comps to go to sleep. When the comps come up, they can obtain updates from their proxies.

VROOM argues for separating logical and physical router configurations. More virtual routers would come up to handle peak traffic, and go away at night, yielding power savings. Virtual routers would make planned maintenance easy - just switch to a new virtual router and install patches & updates on the old one.

Session 2

RJ talked about building a voice-mail like infrastructure for telephony in Africa.

Michael LeMay addressed Emergency Response Networks. The main takeaway, for me, was that ERN needs about 100Kbps, which means we can do metro-wifi our way and have it be an ERN! I quickly changed my slides to point that out.

Ken Calvert gave a longish talk on the difficulties of home networking, which was joint work with HCI researchers. The problems part was very illuminating. Home Networking equipment is apparently the most-returned piece of electronics. Did you know that many homes actually setup multiple subnets? They do this for a) Tax reasons (to separate home and home-office usage) and b) because husband and wife work for competitor companies. The solution suggested is to have a portal that does all the configuration in a centralised fashion, rather than having to do it on each piece of a home network.

Ken is a student of my masters thesis advisor, Simon Lam, and quickly remembered me after I reminded him that we we met 5 years ago at ICNP, my last conference. He even remembered that my paper from back then was on congestion control! I noticed through the conference that he was very courteous, alert and mindful. I hope I get to be half as good as him.

Session 3
The Wireless Manifold was one of the neatest ideas in the conference. Mobility models most often use the unit-disk model, where every node within a unit euclidean distance is assumed to be within hearing distance. Instead, they measure signal decay between pairs of nodes, and extrapolate it to compute a metric over the whole geographic space (say a building floor). This gives them a manifold, and just the position of two nodes in this manifold will tell how the wireless signal will travel between them. All the unit-disk models still apply, except, distances will be in this manifold rather than in euclidean space! The utility of this comes from two claims: a) their metric can be very easily computed with only a few signal measurements (they even have a heuristic that they claim gives very good approximations of the manifold; but they dont understand yet why it works so well). b) The metric itself does not change too often i.e. in a given building, the metrics between two places stays roughly the same. They have a geographic routing algorithm that runs based on this idea.
Santosh Vempala presented because his student, Varun, did not want to. Hari Balakrishnan had an interesting suggestion for them - to try building the manifold for #retries of transmission or ETX, rather than using raw signal decay information.

Injong Rhee talked about how to model human mobility. Turns out the Levy Walk is a good model. And many animals in the wild, such as chimpanzees, use a levy walk to ensure survival -- the primary charecteristic is to follow a straight line with high probability but have random perturbations with a power law frequency. This enables foraging creatures to find food sources in optimal ways.

The next presentation was mea culpa. HotNets is a 15 min presentation followed by a 15 min discussion. As with other papers, there was a lot of discussion generated by my talk. One class of questions was why we cannot do it more simply: e.g. why can't we do fon-like authentication & send the traffic to cnn.com directly from the home (because the guest would have to trust the host), or why can't we just have the guest ssh to the home and connect from there (we dont know where home is, always, and the host is exposed in this scenario). There was also some brief discussion (Hari also came up and discussed after the main talk) of other attacks that might come up, which I showed our architecture could defend against. Jon Wroclawski pointed out that hosts have to trust the co-op now. But that's not as much of a problem as trusting arbitrary residents of the city.

Session 4

Charlie Reis talked about the need for making web browsers safe - with multiple execution frameworks such as javascript/flash/silverlight etc. there is no common security framework and the browser cannot guarantee the safety. His solution is to have a common interposition layer that provides access to the DOM and network to web programs.

Hari Balakrishnan's talk was on using self certifying addresses of the form AD:EID where AD is an autonomous domain and EID is an entity ID to make the Internet more accountable. I have not figured out how this is conceptually different than what HIP achieves.

Day 2
Session 1
Michael Demmer discussed a data-based API that amounts to publish-subscribe. A publisher puts out the data in the Internet Cloud, and all interested parties subscribe to it. This separates the content from the origin of the content, and there are mechanisms to time out old content or obsolete data with something else. It looks like an API form custom made their DONA architecture. Haggle was one of the motivating application scenarios!
All is well, but regular request-responses become more tricky - to do a google search I would have to create a temporary publication and subscribe to it. Google's response would post to this temporary publication. Interesting, but more difficult than today.
Also, they advocate that banks would put out your data, encrypted, for anyone to subscribe to. What if I download your account data, and then take a month to crack out your private key, and then figure out that you actually have a million $$$?

Next was Mark Allman on Personal Namespaces: I had to go out during the talk. Seemed reminiscent of recent Unmanaged Internet Architecture work.

Four Bit Link Estimation was on using information from physical, link and network layer to create efficient link estimators for wireless. their physical or white bit is similar to one of Prabal's suggestions.

W5 was on putting users in control of their web2.0 data (like photos & blog posts) and having providers like flickr as well access them in a secure manner. This would enable me to switch from flickr to picasa if needed, without having to download my 1 trillion photos to desktop and reupload to picasa. A neat concept was third party declassifier programs which would be allowed to remove monikers like Alice's private data and put on another moniker like Bob's private data - thus allowing sharing between friends and friends of friends etc. The point is Alice would have to authorise the declassifier, and could choose a reputed declassifier known to be trust worthy.

Andrew Miklas used output only Xen VMs to anonymise network traces in such a way that even if subpoenaed, they could not extract identity information from the traces.

The pangs of lunch prevented me from paying attention in this last talk before lunch by Jeffery Pang. The gist is that Service endpoint names as well as the process of searching for them can leak information (e.g. Juvenile Detention Classroom is the name of a Wi-Fi SSID!). How to do a confidential tryst between service provider and accessor?

Session 3
Murtaza Motiwala introduced path splicing: Run multiple versions of routing protocols with different link weights. This will discover multiple paths for given sender-receiver pair. These paths may share a few nodes. Using hint-bits, allow end hosts to choose between different links to choose at intermediate nodes. By splicing together multiple paths, one can get highly reliable connectivity. To disconnect two nodes, one has to create a cut in the graph. I wonder though, how often different paths share nodes that are far away from either edge...

Cristian Lumezanu talked about making overlay networks more popular by incorporating ways for hosts discover mutually advantageous peers who would improve their network paths. He did not mention motivating applications in the talk, but later told me that perhaps network gamers could use this...

Session 4
Dan Halperin said current wireless networks are wrong to treat interference the same as noise in the SINR equation. He gave a really neat talk showing how to cancel interference alone and improve reception.

Next up was Sachin Katti. A polished, great talk, with a single easy to understand motivating example. MIXIT mixes together their random coding ideas with a PPR-like scheme which allows bit errors to be corrected. His proposal is that even if intermediate nodes get a few bits wrong, they would do random coding with other packets and send out the coded packets. The receiver can do bit-level error correction if multiple copies of each packet are received via random coding.

Ratul Mahajan concluded with an account of some pathologies they observed.

Looks like the field is amassing new ways to deal with wireless issues nowadays, and seems like an exciting thing to follow if not do research in.

A final interesting factoid is that nearly every MIT paper (except Sachin's & Kyle's) thanked my friend, Mythili Vutukuru, in the acknowledgements! An interesting research problem in its own right is: Where does she get the time to read and review other people's work, between going to New York almost every weekend and doing her own work?

Tuesday, November 06, 2007

searching emacs subdirectories

How do you make emacs search subdirectories where you have placed your own customised emacs packages?:

(if (fboundp 'normal-top-level-add-subdirs-to-load-path)
   (let* ((my-lisp-dir "~/.elisp.d/")
          (default-directory my-lisp-dir))
          (setq load-path (cons my-lisp-dir load-path))
          (normal-top-level-add-subdirs-to-load-path)))

This lets me keep my customisations in separate subdirs:
$ ls -F .elisp.d/
python-mode-1.0/ tuareg-mode-1.45.5/

Choosing a programming environment

I have no religion when it comes to choosing programming environments. I used to use emacs a lot when I was working in Cisco. Most of the work I did was in C, and emacs/etags let me do powerful things and focus on programming rather than on the writing of programs. If you get what I mean...

At IBM, I persisted in using emacs because a) it was better than notepad b) it was a familiar beast and I had all sorts of customisations and c) the latest WebSphere Portal would not run except on the command line, so no IDE made sense.

And then it became possible to use Eclipse. with its command completions and refactoring, and code-snippet templates for things like for_loops, it let me operate with a much higher level of programming concepts than emacs ever did.

And now, working with nokia phones, I have been starting to learn python. My first reaction was to look for an eclipse python plugin - which had now supplanted emacs as my favourite editor. There is one, including one for directly writing progs. that work on the nokias.

But then, I came across the enhanced python-mode for emacs. This offers something that must be an obvious feature for every scripting language -- the ability to run each line of my python prog. with just a single hotkey made me switch back to emacs. PyDev hardly has the same sort of functionality.

Lesson to myself: For each problem, choose tools that have a good community already using them. That way you can actually focus on the problem rather than become a language lawyer, advocating new approaches and building the tools needed for your problem, rather than working on the problem solution itself.

Friday, August 10, 2007

for academic papers

This turned out to be a long rant, so here's a summary for the impatient:
1. Whitespace is your friend, not your enemy. Having neatly typeset text in a sea of white invites the readers to go through your content. Focus on making the content better and let the background whitespace do its job of helping readers along as they read through things that are usually dry to read if you are not the author :) Don't try tricks to reduce whitespace.
2. There are different types of content - normal text & floats; mixing them distracts the reader's eyes, which is never a good thing. Traditional LaTeX wisdom advises people not to overuse bold and italics because indiscriminately mixing different kinds of text distracts readers. The same principle can be generalised to floats & text. Avoid mixing them.

-----
(Ruminations after submitting a paper to hotnets).

Up against strict page limits, scientific authors resort to all sorts of tricks to squeeze more content in, like decreasing the font size or increasing margins. But this misses the point. It does not matter if you manage to squeeze 300 more characters into the page limit. If it is badly typeset or badly written, readers wont feel like getting through your paper.

Of course, if you use good LaTeX style files, you can't go terribly wrong. But there are still a few points worth keeping in mind:

Never change sentence constructions 2 hours before a conference deadline, just to decrease the line count of a paragraph or pull the last line over the edge. The prior construction is almost always better because more thought had gone into writing it and because it was not written with a deadline looming near.
Do not continuously look for more space. The illusion of having lesser space will encourage precise writing. So, for instance, use single column writing in earlier drafts.
When you need more space, try to create more by rewriting earlier sentences in a crisper fashion, rather than by decreasing fonts and/or fudging with bibliography styles.
conclusions are often a good place to look for space -- remember the golden rule that conclusions should not repeat things said in the introduction or abstract. (I am usually guilty of ignoring this one.)
After rewriting the conclusion, try placing a \small command before the bibliography. In my opinion, the references section is horribly hyphenated with normal font size, and \small makes it look a lot better.
Design good captions for figures & tables. A good rule of thumb is that captions should be understandable on their own, after the reader has gone through the introduction section. Good captions are concise, and complement rather than repeat the content that talks about the figure.
Figures are "floats" and disrupt the flow of reading. Place them at the top-right or top (if it should fill both columns) of the page where the figures are discussed. Readers should be able to take in the figure with a glance as they are reading the content discussing the figure. If possible, the figure should appear just before the content that talks about it, and this would be a good enough reason to place a figure on the top-left.
If a bunch of figures are related, collect them together and place them in the top half of a page, with appropriate captions. Readers can go through the figs separately from the text. Most good journals in other sciences seem to do this (Nature; Science etc), but unfortunately CS articles are typeset by authors, and there are no firm conventions.
Be careful in using the "here" or h option (e.g. \begin{figure}[htbp]) with figures and tables. LaTeX sometimes places the float after one line of text on a new column or page. Readers skimming through your paper will easily miss that line. Especially in "evaluation" or "simulation" sections which are usually littered with graphs, little lines of text can get lost between giant figures and column beginnings.

Friday, July 13, 2007

Attention to details marks the good UI

MIT is so rich that the music library lends out iPods loaded with a bunch of music from the library's collection! Which means, I have finally started listening to music on these devices.

Today I noticed something very interesting: if I remove the headphone from the iPod, it automatically pauses playing. I guess they look for changes in offered impedance as the cue to pause. This can turn out to be useful if the headphones get disconnected mid-sentence while you are listening to an audiobook during your run.

But of course there is a bug. I had connected the iPod to my Sony receiver. Then I moved to my Bose headphones. The iPod paused, but fails to recognize the bose headphone until I restart playing from the main menu. Difference in offered impedance/capacitance/inductance?

Thursday, July 12, 2007

80s' viruses as inspiration for DTNs

Viruses which were transmitted on floppy disks were probably the first application of "delay tolerant" networks. There was no contemporaneous connectivity between two nodes. And viruses opportunistically piggybacked themselves onto other files. Sometimes a virus copy would get lucky and reach a hub in the floppy social network, and then could reach a HUGE number of other computers very quickly. The only difference between DTN packets and virises is that viruses usually don't have a targeted destination.

Thursday, July 05, 2007

Disney reuses code too

http://en.andrenoel.com.br/?p=8

Thursday, June 28, 2007

The iPhone hype

I have always been a closet luddite. Yet, as pogue puts it, I could not have not heard about the iPhone. And, since I've started working with the very cool n800 that John Ankcorn@Nokia Research gave me, I paid some attention to the ads. I have to admit, the iPhone looks very cool. In particular: the landscape-portrait orientation sensing, browsing songs by scrolling thru cd album-art, and the image zooming facilities. I couldn't put a finger on what it was that made it cool. Pogue puts it brilliantly:

"things on the screen have a physics all their own"

There. that's what I like about the iPhone. This from someone who does not even own an iPod, and can never understand the appeal of a dumb circular wheel to navigate your albums.

Tuesday, June 26, 2007

Ubuntu as a windows file?

Wubi, when installed, is seen as a HUGE file on windows. It also modifies the windows boot menu to offer to load up this huge file. The file, it turns out, when loaded, turns out to be ubuntu. So it is not just the linux kernel which is loaded up. The file is essentially a partition that windows is not aware of - it can (I am guessing) have empty space where you can put new ext3 files as well as install new programs!

Wireless facts on google video

Last week, on video.google.com, I saw a video on testing google's metro wifi network in MountainView when it was first set up (search for testing metro wifi if the URL turns out not to be stable). Among the things that I gleaned from there are the following facts:

wifi is good for 300 ft + goes thru 1 wall well
The testing looked for

coverage (using Torius, a commercial company whose antennae were deployed in the injection layer of the network - which is an intermediate WiMax layer)
capacity (choices were ttcp/iperf nuttcp). iperf was chosen for simplicity + #options

LiveHttpHeader utility for firefox was also useful

And yesterday I watched a video of Van Jacobson talking about a new networking paradigm he wants to push. As with many of his papers, he has a quite unique way of looking at things. Definitely worth a watch!

Google video is proving surprisingly useful - lots of other "software engineer" type of videos are also out there; including a 2 hour introduction to python by an author of python in a nutshell and python cookbook.

Monday, June 25, 2007

On technology adoption

My officemate Chintan likes to think about how technology gets adopted - the classical model is that initially enthusiasts pick it up despite the cost; and over time, becomes economically worthwhile for everyone.

There can be surprising niches where it makes economical sense even for non-enthusiasts to invest in some things. Here is an example:

Riga Development, a wireless-technology firm in Toronto, has worked with hotels in Canada and the United States to replace ageing analogue thermostats with digital ones that are around 35% more energy-efficient. It wirelessly links the new temperature-control panels with heating and air-conditioning units, at a cost of around $350 per room. Each room can also be controlled from the front desk. And thanks to the wireless mesh network, the panel in each room also acts as a relay for the data traffic from other rooms back to a central control point.
At a medium-sized office park in Las Vegas, wireless temperature controls were installed in a few buildings containing around 200 offices, says the media-shy maintenance manager (who did not want his or the company's name to be used). Temperatures in the Nevada desert tend to extremes and landlords are responsible for energy bills, so managing a building's climate makes a difference to the bottom line. The new wireless thermostats allow rooms to be controlled centrally on a PC or over the web. The adjustments that tenants themselves are able to make can be controlled too, so that heating or air-conditioning is not used to excess. The system was cheap to put in, mainly because it required very little installation, the manager explains. Tenants are happier and the savings on the energy bills have been considerable, he says; “conservatively 25%”.

25% of a normal energy bill would not make a 350$ per room installation viable, especially when in just a few years, it might drop to something like 50-60$, when mesh networking is all figured out.

Wednesday, June 20, 2007

The not-carbon economy

Economist, May 31, 2007 has a special report on business and climate change. One article offers an interesting perspective on carbon energy credits - as not-carbon economy:

The carbon market is truly innovative. Although it works like any commodity market, what is being bought and sold does not exist. The trade is not actually in carbon, but in not-carbon: in certificates establishing that so many tonnes of carbon dioxide (or the equivalent in other greenhouse gases) have not been emitted by the seller and may therefore be emitted by the buyer.

The purpose of setting up the market was, first, to establish a price for carbon and, second, to encourage efficient emissions reductions by allowing companies which would find it expensive to cut emissions to buy credits more cheaply. It has had some success on both counts—some would argue too much on the second.

When the carbon price becomes high enough, companies will actually cut their own emissions instead of buying carbon credits on the Emissions Trading Scheme (ETS). And cheap sources of credit, such as eliminating HFC-23 emissions in China (these are the CFCs from old fridges) might be aplenty now, but they will all dry up.

The interesting thing here is that becoming a source of energy credit is an incentive for some people to reduce their emissions. It is also cheaper for established dirty air producers (say an old coal-based electricity plant) to subsidise the HFC-23 source in china. And eventually, by the time the cheap carbon credits become scarce, the coal-based plant could construct a new cleaner plant!

The not-carbon economy works because reducing pollution is important only in the long run, so we can let the current big polluters to continue on for a little longer - and although regulators can enforce extremely strict standards in Western Europe, it is much more important to catch the long tail of polluters in developing countries.

Networks have a long history of making use of ideas in economics - can we come up with an application similar to this? A not-congestion economy perhaps? Or how about a not-IPv6 certificate - this way, we could let established networks stick to old IPv4 addresses and incentivise newer ones to move to IPv6.

Tuesday, June 19, 2007

Seamful design

An engaging CHI paper illustrates how to realize Mark Weiser's goal of seamful design with a game called Feeding Yoshi. The idea is to "deliberately exploit the inherent limitations of technology, rather than being hidden away". The game involves getting points by finding Yoshis and different kinds of fruits to feed them. Secured and unsecured wireless access points are used as yoshis and fruits. The fact that there is no uniform network access now becomes a location based game of hunting for yoshis and fruits, yielding a very engaging user experience.

Thursday, June 14, 2007

How power hungry is communication support?

Anecdotally, most people know that their laptop batteries drain much sooner if wireless is enabled. But how power hungry is radio communication? I got a sense from an aside made by the CEO of Potenco who spoke at the I2V workshop (I attended this after the IDEAS competition).

Potenco has invented a portable power generator - you pull on a string and generate electricity. 1 minute of pulling is good for:

25 minutes of talk time on a cell phone
230 minutes of playing on an iPod
45 minutes of Nintendo gaming

Obviously, this is not totally scientific - what kind of cell phone is it; how strongly was the string pulled etc make a big difference. And of course, cell phone connectivity and WiFi are different things.

But at a gross level, a cell phone could be thought of as ipod playing + connecting to the cell tower. Just this connection decreases battery life by nearly 90%! Even intensive graphics engines last twice as long as pure communications equipment!

Thoughtfulness makes for success: ubuntu vs maemo

So I am now dealing with two versions of linux. ubuntu on my laptop & maemo on nokia 770. As I said before, I have my grudges with ubuntu but, they are minor.

On the other hand, most things with maemo have turned out to be painful. There is support for a development environment called scratchbox that lets you develop on the linux desktop but cross compile or retarget the applications for the 770. Installing that proved to be a pain - the installer failed at almost the last step; scratchbox seems to also mount the file system with circular dependencies, so if you try to delete the directory without stopping a service (bear in mind that you dont even know that such a service gets started; and the installation instructions tell you to blow away the directory to reinstall/upgrade from a previous version!), you end up corrupting the file system - the script to stop the service gets deleted, but the /scratchbox directory is not because of the circular dependencies.

Then, I tried developing on the tiny 770 itself. Installing python failed - there were complaints about version incompatibilities in apt-get. There is no standard set of apt repositories, so one does not know where to fetch the software from. This, despite python being the "most supported" development language for the 770.

Perhaps the biggest indictment is this: the 770 is purported to be targeted at hackers. But it does not come with an xterm installed! The input via keyboard or handwriting support on the tiny device is difficult, not surprisingly. And ssh is not installed by default; installing that takes a few steps.

Contrast that with ubuntu which has a comprehensive /etc/apt/sources.list, that I almost never had to modify. Perhaps the biggest difference in the maemo-ubuntu approaches is the thoughtfulness put into ubuntu - Each application is carefully selected to be good enough for most people; so people wont spend tons of time downloading the basic application. There is a comprehensive, well maintained repository for people who want to explore further. With google, there is community support for virtually every problem I have encountered. The discussion forum of ubuntu are more readable than the IRC archives of maemo questions - perhaps because people have to think and respond to a discussion post, rather than shoot an off-the-cuff response on an IRC channel (plus, the conversation can be threaded by replies, and you wont have some clown posting something totally irrelevant between the time the question and the reply).

I just saw the ads for iPhone. The thoughtfulness that has been gone into it is amazing - two things strike my mind: when you flip the phone from landscape to portrait mode, the display adjusts itself. And you can flip through your album list using the cd covers - the spines are displayed in a row from left to right and front of the current album is displayed in the centre. Look at how natural it is to do the intended applications of the iPhone, vs. the same for 770.

Tuesday, June 12, 2007

oo-boon-too

So ubuntu seems to be the newest linux distro that is making waves. I switched to it mainly because I wanted some linux for my work and some free tool that would resize my XP partition rather than overwrite it (mainly so I could keep my personal documents which I have not backed up). When I searched online, gparted on a Linux Live CD came up as an option, and of the ones I checked out, ubuntu's website looked the most inviting.

It was a breeze to install; it seems extremely usable, thus far. They have graphical tools for updating packages; debian's apt-get/aptitude has really saved Linux from frustrated users.

A really thoughtful (or at least, well thought out) feature is the consistent use of sudo instead of the traditional root account.

I also like this neat feedback at the command line - it makes the default *nix "command not found" response sound positively neanderthal and redundant:

ns@wag-a-bond:~$ eclipse
The program 'eclipse' is currently not installed. You can install it by typing:
sudo apt-get install eclipse
Make sure you have the 'universe' component enabled
bash: eclipse: command not found

It is getting there, but there are still a few (not-so-major) rough edges in this distro as well.

Why not have media support (flash;audio&video codecs;dvd support) installed by default? If it is only because of licensing issues, rather than some underhand trick to save space on their live cd, they could prompt the user to install this on first start up (with all the legal warnings about how it is illegal in some countries to actually watch dvds on non-windows platforms).
Instead of using their graphical update tool, I did an aptitude update; aptitude upgrade. There were tons of updates, and a few did not get installed properly. After that, the whole OS crashed! Blue Screens in a linux avatar!
When I download ppt files from the net, it opens up in Open Office Impress. When I close the Impress window, Open Office crashes, apologizes, and opens up Open Office Writer! This is the sort of bug that should never have shipped in any release of any stable product - it happens on casual use, and so consistently; it should have been noticed and fixed.
When I install eclipse, it gets set up to use gcj instead of sun's java, even though I had already installed sun's java. On top of this, gcj is supposed to be inferior (very slow & incomplete support in some cases). The "FSF is best" attitude has been the bane of linux - what works best is best, despite Stallman's rants
I believe this can be fixed by merely declaring the eclipse package depends on gcj OR sun-java. apt tools are powerful enough to handle precisely this kind of dependencies.
Support for things like sound is not complete - or at least it does not play well with XP dual boot - when I start up XP and play sound, and do a soft reboot, I can't hear audio in ubuntu and vice versa. I recall this problem existed (I think it is an alsa issue) even back in the 90s when I first started playing around with *nix distros. 10 years to fix an issue like this!?

Tuesday, June 05, 2007

Grand Challenges can be useful

Fields are to some extent driven by trends. For instance, in networking nowadays there are lots of "clean slate" proposals floating around. A few years back, the trend was to figure out how to innovate without disrupting the already successful network (eg TCP friendly congestion control, network tomography etc). Other trends include things like p2p, overlay nets etc.

Fields like networking, which are more synthetic (let us invent a DHT so we can do cool things with it) than analytic (eg let us discover how gravity works) are especially vulnerable to the herd mentality because we have to create novel things from thin air (rather than picking up something that already exists, e.g. a cancer cell, and studying how it works). I say this although my friends in analytical fields would say that certain "sexy" problems attract a lot of people who ride the trend winds.

So, how to motivate new work, so that it does not create too many people working on a single field? Applications can be great forcing functions (the term is from DC himself, though from a very different conversation). Grand Challenges that target some pie in the sky application are one way to force focus, without having everyone work on the same piece. This year's IDEAS competition is a great example. The challenge was TB Drug Adherence.

There are several noteworthy things to note about the design of this challenge:

It is big enough of a problem that it cannot be solved just by the IDEAS entrants
It is not just a sexy thing: ie the topic is not HIV-AIDS but TB drug adherence. This helps concrete work & to demotivate people who simply join for fashion
It is endorsed by someone big or charismatic enough
There is a big enough prize to create motivation for people who really want to work
The problem spec is generic enough for people from different fields to contribute in their own ways.

2 & 5 are probably the most important parts in designing challenges. 5 - because you dont want 100 different p2p implementations all of which work better than each other in slightly different applications. 2. because hype is usually why challenges turn out to be empty wastes of effort.

Other successful challenges: putting man on the moon (well, the real challenge was, can democracy do it faster than communism). The Ansari X Prize - this one is too specific, but it achieves its aim by offering a high-enough prize money.

Homework for the month: What would be good challenge applications in networking?

Wednesday, May 23, 2007

How do RFID smartcards work on mass transit buses?

The T or MBTA - whatever you want to call it - the Boston Subway has introduced the charliecard similar to London, Washington & Chicago. It turns out that stored value smartcards are sort of dumb - the only thing they store is the ID, and no information about the actual value remaining. This makes a lot of sense because otherwise hackers could easily compromise these cards in the comfort of their homes.

I can imagine how such a system is architected in real subway stations - the readers at the turnstiles can be connected to the central server. But then I see that the same card works on buses! What is more, if you transfer from Bus A to Bus B within x mins, you dont have to pay a fee. So clearly the reader at Bus A has to transmit some information to Bus B.

So, is there a wireless network of the Boston buses? It would be fun to think of how such a network could be constructed. It definitely needs reliable transport. But, the information only has to get to all other points in the network on a human timescale - minutes at least. i.e. I cant use a reader in Bus A and 1 sec later use it at Bus B.

Of course, things are much simpler if the card was read-write. The only information that needs to be propagated is the list of routes I have traveled since yesterday or whenever the last time
an update was sent out about how much value I have left. The reader in Bus A could then store a digitally signed bit recording the last use, and Bus B would then charge me 0$ if it could verify the signature of Bus A. If I used the transit system much later in the day, Bus C could charge me x$ and display my stored value as yesterdays-stored-value - 2x.

MBTA could even get away with weaker encryption by updating the keys every few days (faster than a hacker could computationally hack through using a systematic attack).

But MBTA has a whole new problem now - how to store the stored value of potentially a few million cards in each bus, and update it every day? One realistic option (again assuming read-write cards) is to store both the stored value & the ID of the card. The reader in each bus will believe the current stored value and deduct it after each use and re-write the value. At the end of each day, a sync operation can happen with the central server. If misuse of the card is detected, that card can be blacklisted. A much shorter list of blacklisted cards can be stored on each bus so that the hacker can be caught the next time the compromised card is used.

But this would involve storing the values left on millions of cards at each reader at the end of each day...

System Dynamics

My officemate Chintan is in the Engineering Systems Division and studies how different regulatory regimes affect technology adoption concentrating on VOIP. He models this using System Dynamics and simulates the effect of different parameters using a tool called Vensim. The basic idea in systems dynamics is that all variables can be modeled as either a stock or a flow. Stocks are the variables which can be measured in snapshots of the systems. Chintan's example was a parking lot. The parking lot itself is a resource, and how full it is can be measured by taking a snapshot. the cars are the flows that enter or leave the lot.