Inverse Distribution
Yesterday
Divesh Srivastava was giving the departmental colloquium here about
Streams, Security, and Scalability. Interesting talk generally, but there was one point I particularly liked, and think the idea can be useful in other scenarios too:
Suppose that you are monitoring the traffic of your lab's network, and would like to detect when a new internet worm is emerging. Note that you don't even know what this new work looks like. Here is how it works: take packets for one-minute (whatever) worth of traffic, form the
n-grams in the input (start with bigram or trigrams, but 4-grams and above are not out of question either), and distribute them into bins based on their frequency. You get an
inverse distribution which for given
x, tells how many
n-grams have frequency
x. Now if the traffic is random-enough, this inverse distribution function should follow a
Poisson distribution, which is a deformed bell curve of some kind. Draw the graph of the function (may want to use logarithmic x axis.)
Now as a new internet worm becomes epidemic, the frequency of the
n-grams in the worm payload get exponentially(?) higher and higher, and in the graphic view of the function, you see another bell curve raising higher and higher and farther to the right. That's a new worm spreading!
This paper contains the details, as well as ways to trigger this automatically.