Thursday, November 27, 2008

The REAL problem with biometrically verifiable ID cards

There has been a recent push towards ID cards for everyone in the UK. There is a huge backlash against this idea, partly because of recent incidents which tell us that the government cannot be really trusted to keep the data safe. But a far more serious problem is that biometric ID cards are unforgeable and can be used by even less trustworthy people, Islamic terrorists for instance, to target specific people:
http://www.pbs.org/wnet/wideangle/episodes/iraqi-exodus/violence-targets-iraqs-christians/3456/
On Sunday, Farques Batool, a Christian who owned a music store, was gunned down and killed in his shop. His teenage nephew was also wounded. A pharmacist was killed Friday by a man who pretended to be an undercover police officer, asked to the pharmacist’s identification card, then shot him. Religion is listed on government-issued ID cards in Iraq.
Verifiable identity is a grave security risk, never mind all the useful side effects, such as being able to get your tax refund without actually filling in forms, based on unique ID as primary key into different data bases.

Tuesday, May 06, 2008

Virus vs worm

What exactly is the difference between a virus and a worm? Bruce Davie in his 6.829 class last week said security was one of those things that you truly dont know unless you know all of it. He also said it was difficult for him to get a clear definition for virus and worm. The final one he settled on, was that a virus, like a biological virus, does not actually have the code for replication and requires a host. Whereas, a worm has a life of its own, i.e. it has the code to replicate itself. No user action is required to replicate a worm.

Saturday, March 08, 2008

Looeyville, Kentucky

The 8th floor of Stata center has "True" architecture people as well as the Advanced Network Architecture group. I just went to the restroom and found the following posted on the two loo stalls: "Out of Order" and "Super Scalar".

Both in the same handwriting.

Tuesday, January 01, 2008

Inconsistencies in the Reality Mining dataset?

The MIT Reality Mining dataset has become a touchstone for DTN and social network type research, and I am exploring using that for some of my own simulations. My intended use is to use it as a source trace for when the experiment subjects were in bluetooth proximity with each other.

Imagine my surprise, therefore, when I ran the following SQL query, for average duration of bluetooth proximity, and found that this average was a negative number!

I chose the start and end dates to coincide with a period when Media Lab is frantically preparing for their annual showcase to Industry sponsors. Nathan Eagle, whose PhD consisted of gathering and mining this data set talks about this time in his thesis.

mysql> select avg(endtime-starttime) from (select ds.starttime,ds.endtime,ds.person_oid as src,d.person_oid as dst from devicespan ds left outer join device d on ds.device_oid=d.oid where d.person_oid != 0 and ds.starttime >'2004-10-01' and ds.endtime < '2004-11-01' order by ds.starttime) as xxx;
+------------------------+
| avg(endtime-starttime) |
+------------------------+
| -948.28116818633 |
+------------------------+
1 row in set (0.00 sec)


It turns out that this is because there are 4 records which is erroneous in this period, including one eggregious mistake, which looks like a typo (a date of 7/10 rather than 10/7):
mysql> select * from (select ds.starttime,ds.endtime,ds.person_oid as src,d.person_oid as dst from devicespan ds left outer join device d on ds.device_oid=d.oid where d.person_oid != 0 and ds.starttime >'2004-10-01' and ds.endtime < '2004-11-01' order by ds.starttime) as xxx where endtime < starttime;
+---------------------+---------------------+-----+------+
| starttime | endtime | src | dst |
+---------------------+---------------------+-----+------+
| 2004-10-07 15:51:34 | 2004-07-10 16:02:55 | 92 | 36 |
| 2004-10-31 02:02:02 | 2004-10-31 01:23:40 | 46 | 86 |
| 2004-10-31 02:12:46 | 2004-10-31 01:23:40 | 46 | 22 |
| 2004-10-31 02:18:12 | 2004-10-31 01:44:40 | 46 | 73 |
+---------------------+---------------------+-----+------+


Now, if we exclude these records, we get:
mysql> select avg(endtime-starttime) from (select ds.starttime,ds.endtime,ds.person_oid as src,d.person_oid as dst from devicespan ds left outer join device d on ds.device_oid=d.oid where d.person_oid != 0 and ds.starttime >'2004-10-01' and ds.endtime < '2004-11-01' order by ds.starttime) as xxx where endtime >=starttime;
+------------------------+
| avg(endtime-starttime) |
+------------------------+
| 12460.984108352 |
+------------------------+

There are a total of 22154 records in this period, so if you add in the effect of one mistake (bolded above), averaged over each record, you get roughly 12460+948=13405:
mysql> select (timestamp('2004-07-10 16:02:55')-timestamp('2004-10-07 15:51:34'))/22154;
+---------------------------------------------------------------------------+
| (timestamp('2004-07-10 16:02:55')-timestamp('2004-10-07 15:51:34'))/22154 |
+---------------------------------------------------------------------------+
| -13405.9257470434 |
+---------------------------------------------------------------------------+

(The other three records contribute 3 seconds to the wrong average).

Friday, November 23, 2007

Java generics in enhanced for loops

How exactly should the java compiler translate enhanced for loops? Should it try to enforce type safety (e.g. List<string> should only contain strings)? The JLS thinks it should not. It is a bug that Sun's javac enforces this type safety!

I got embroiled in this discussion because of a post on my old article on for-each loops in java. (Disclosure: I am nishrs in that world).

Wednesday, November 14, 2007

HotNets Trip Report

Session 1

Prabal Datta said that by grouping updates and reducing their communications, sensors can achieve a much better battery life. Theirs was one of the 3 HotNets papers covered by press, the other two being our metro-wifi proposal (which also was slashdotted) and Sachin's MIXIT. Prabal's message was actually more nuanced than the title might suggest -- he showed how clock skew can kill energy savings when you are grouping updates, and I found two of his factoids useful:
  • Over short time periods, links are bimodal but over longer times existing links deteriorate and new ones emerge (So no use keeping neighbour info and routing tables up to date between sleep-wake cycles)
  • Packet Delivery rate is high when RSSI is above -90dBm. It falls precipitously after that (fig 4). They use this as an agile link estimator.
Mark Allman showed that network protocols are highly chatty through the night (eg DHCP traffic), and we may not be able to shut them away from the network entirely. But, perhaps, we can write proxys that will stand in for many computers and allow those comps to go to sleep. When the comps come up, they can obtain updates from their proxies.

VROOM argues for separating logical and physical router configurations. More virtual routers would come up to handle peak traffic, and go away at night, yielding power savings. Virtual routers would make planned maintenance easy - just switch to a new virtual router and install patches & updates on the old one.

Session 2

RJ talked about building a voice-mail like infrastructure for telephony in Africa.

Michael LeMay addressed Emergency Response Networks. The main takeaway, for me, was that ERN needs about 100Kbps, which means we can do metro-wifi our way and have it be an ERN! I quickly changed my slides to point that out.

Ken Calvert gave a longish talk on the difficulties of home networking, which was joint work with HCI researchers. The problems part was very illuminating. Home Networking equipment is apparently the most-returned piece of electronics. Did you know that many homes actually setup multiple subnets? They do this for a) Tax reasons (to separate home and home-office usage) and b) because husband and wife work for competitor companies. The solution suggested is to have a portal that does all the configuration in a centralised fashion, rather than having to do it on each piece of a home network.

Ken is a student of my masters thesis advisor, Simon Lam, and quickly remembered me after I reminded him that we we met 5 years ago at ICNP, my last conference. He even remembered that my paper from back then was on congestion control! I noticed through the conference that he was very courteous, alert and mindful. I hope I get to be half as good as him.


Session 3
The Wireless Manifold was one of the neatest ideas in the conference. Mobility models most often use the unit-disk model, where every node within a unit euclidean distance is assumed to be within hearing distance. Instead, they measure signal decay between pairs of nodes, and extrapolate it to compute a metric over the whole geographic space (say a building floor). This gives them a manifold, and just the position of two nodes in this manifold will tell how the wireless signal will travel between them. All the unit-disk models still apply, except, distances will be in this manifold rather than in euclidean space! The utility of this comes from two claims: a) their metric can be very easily computed with only a few signal measurements (they even have a heuristic that they claim gives very good approximations of the manifold; but they dont understand yet why it works so well). b) The metric itself does not change too often i.e. in a given building, the metrics between two places stays roughly the same. They have a geographic routing algorithm that runs based on this idea.
Santosh Vempala presented because his student, Varun, did not want to. Hari Balakrishnan had an interesting suggestion for them - to try building the manifold for #retries of transmission or ETX, rather than using raw signal decay information.

Injong Rhee talked about how to model human mobility. Turns out the Levy Walk is a good model. And many animals in the wild, such as chimpanzees, use a levy walk to ensure survival -- the primary charecteristic is to follow a straight line with high probability but have random perturbations with a power law frequency. This enables foraging creatures to find food sources in optimal ways.

The next presentation was mea culpa. HotNets is a 15 min presentation followed by a 15 min discussion. As with other papers, there was a lot of discussion generated by my talk. One class of questions was why we cannot do it more simply: e.g. why can't we do fon-like authentication & send the traffic to cnn.com directly from the home (because the guest would have to trust the host), or why can't we just have the guest ssh to the home and connect from there (we dont know where home is, always, and the host is exposed in this scenario). There was also some brief discussion (Hari also came up and discussed after the main talk) of other attacks that might come up, which I showed our architecture could defend against. Jon Wroclawski pointed out that hosts have to trust the co-op now. But that's not as much of a problem as trusting arbitrary residents of the city.

Session 4

Charlie Reis talked about the need for making web browsers safe - with multiple execution frameworks such as javascript/flash/silverlight etc. there is no common security framework and the browser cannot guarantee the safety. His solution is to have a common interposition layer that provides access to the DOM and network to web programs.

Hari Balakrishnan's talk was on using self certifying addresses of the form AD:EID where AD is an autonomous domain and EID is an entity ID to make the Internet more accountable. I have not figured out how this is conceptually different than what HIP achieves.

Day 2
Session 1
Michael Demmer discussed a data-based API that amounts to publish-subscribe. A publisher puts out the data in the Internet Cloud, and all interested parties subscribe to it. This separates the content from the origin of the content, and there are mechanisms to time out old content or obsolete data with something else. It looks like an API form custom made their DONA architecture. Haggle was one of the motivating application scenarios!
All is well, but regular request-responses become more tricky - to do a google search I would have to create a temporary publication and subscribe to it. Google's response would post to this temporary publication. Interesting, but more difficult than today.
Also, they advocate that banks would put out your data, encrypted, for anyone to subscribe to. What if I download your account data, and then take a month to crack out your private key, and then figure out that you actually have a million $$$?

Next was Mark Allman on Personal Namespaces: I had to go out during the talk. Seemed reminiscent of recent Unmanaged Internet Architecture work.

Four Bit Link Estimation was on using information from physical, link and network layer to create efficient link estimators for wireless. their physical or white bit is similar to one of Prabal's suggestions.

W5 was on putting users in control of their web2.0 data (like photos & blog posts) and having providers like flickr as well access them in a secure manner. This would enable me to switch from flickr to picasa if needed, without having to download my 1 trillion photos to desktop and reupload to picasa. A neat concept was third party declassifier programs which would be allowed to remove monikers like Alice's private data and put on another moniker like Bob's private data - thus allowing sharing between friends and friends of friends etc. The point is Alice would have to authorise the declassifier, and could choose a reputed declassifier known to be trust worthy.

Andrew Miklas used output only Xen VMs to anonymise network traces in such a way that even if subpoenaed, they could not extract identity information from the traces.

The pangs of lunch prevented me from paying attention in this last talk before lunch by Jeffery Pang. The gist is that Service endpoint names as well as the process of searching for them can leak information (e.g. Juvenile Detention Classroom is the name of a Wi-Fi SSID!). How to do a confidential tryst between service provider and accessor?

Session 3
Murtaza Motiwala introduced path splicing: Run multiple versions of routing protocols with different link weights. This will discover multiple paths for given sender-receiver pair. These paths may share a few nodes. Using hint-bits, allow end hosts to choose between different links to choose at intermediate nodes. By splicing together multiple paths, one can get highly reliable connectivity. To disconnect two nodes, one has to create a cut in the graph. I wonder though, how often different paths share nodes that are far away from either edge...

Cristian Lumezanu talked about making overlay networks more popular by incorporating ways for hosts discover mutually advantageous peers who would improve their network paths. He did not mention motivating applications in the talk, but later told me that perhaps network gamers could use this...

Session 4
Dan Halperin said current wireless networks are wrong to treat interference the same as noise in the SINR equation. He gave a really neat talk showing how to cancel interference alone and improve reception.

Next up was Sachin Katti. A polished, great talk, with a single easy to understand motivating example. MIXIT mixes together their random coding ideas with a PPR-like scheme which allows bit errors to be corrected. His proposal is that even if intermediate nodes get a few bits wrong, they would do random coding with other packets and send out the coded packets. The receiver can do bit-level error correction if multiple copies of each packet are received via random coding.

Ratul Mahajan concluded with an account of some pathologies they observed.

Looks like the field is amassing new ways to deal with wireless issues nowadays, and seems like an exciting thing to follow if not do research in.

A final interesting factoid is that nearly every MIT paper (except Sachin's & Kyle's) thanked my friend, Mythili Vutukuru, in the acknowledgements! An interesting research problem in its own right is: Where does she get the time to read and review other people's work, between going to New York almost every weekend and doing her own work?

Tuesday, November 06, 2007

searching emacs subdirectories

How do you make emacs search subdirectories where you have placed your own customised emacs packages?:

(if (fboundp 'normal-top-level-add-subdirs-to-load-path)
(let* ((my-lisp-dir "~/.elisp.d/")
(default-directory my-lisp-dir))
(setq load-path (cons my-lisp-dir load-path))
(normal-top-level-add-subdirs-to-load-path)))

This lets me keep my customisations in separate subdirs:
$ ls -F .elisp.d/
python-mode-1.0/ tuareg-mode-1.45.5/