Behdad Esfahbod's daily notes on GNOME, Pango, Fedora, Persian Computing, Bob Dylan, and Dan Bern!

My Photo
Name:
Location: Toronto, Ontario, Canada

Ask Google.

Contact info
Google
Hacker Emblem Become a Friend of GNOME I Power Blogger
follow me on Twitter
Archives
July 2003
August 2003
October 2003
November 2003
December 2003
March 2004
April 2004
May 2004
July 2004
August 2004
September 2004
November 2004
March 2005
April 2005
May 2005
June 2005
July 2005
August 2005
September 2005
October 2005
November 2005
December 2005
January 2006
February 2006
March 2006
April 2006
May 2006
June 2006
July 2006
August 2006
September 2006
October 2006
November 2006
December 2006
January 2007
February 2007
March 2007
April 2007
May 2007
June 2007
July 2007
August 2007
September 2007
October 2007
November 2007
December 2007
January 2008
February 2008
March 2008
April 2008
May 2008
June 2008
July 2008
August 2008
October 2008
November 2008
December 2008
January 2009
March 2009
April 2009
May 2009
June 2009
July 2009
August 2009
November 2009
December 2009
March 2010
April 2010
May 2010
June 2010
July 2010
October 2010
November 2010
April 2011
May 2011
August 2011
September 2011
October 2011
November 2011
November 2012
June 2013
January 2014
May 2015
Current Posts
McEs, A Hacker Life
Wednesday, August 30, 2006
 Links and Notes

Older stuff that totally rock:

Comments:
Hmm, Behdad! Did you first read the articles on copyright philosophy and then listened to all the Hayedeh on your playlist. Or was it reverse?! Take care of your self dude! (j/k)
 
I'm the author of the "Colour" article and would be interested to know what parts of it you think are "inaccurate on the CS side".
 
Hey Pooya. I normally resonate between Bob Dylan and Hayedeh...!!!
 
I guess that's something like resonating between bugzilla's OO bugs and copyright philosophy articles. (j/k again!)
 
Hi, Great link and notes. Do you happen to know what would be the criteria for Google Code Jam's "Qualification Round" ?
 
Monia, it's a programming assignment. If you register, there are practice assignments you can take to get the feel of it.
 
Ok, here is my comments about the Colour article.

There are statistical tests you can do; for instance, if you look at the file and discover that it contains a copy of the works of Shakespeare, then it doesn't look much like you would expect randomly generated numbers to look. But it could still be randomly generated. The test tells you whether the file has the statistical properties expected from randomly generated files, not whether the file really is randomly generated or not.

This is true as an statement, but the part "The test tells you whether the file has the statistical properties expected from randomly generated files" is ignoring a very important point here, that is, a radom number generator doesn't have to be a uniform random number generator. The works of Shakespeare don't look much like randomly generated data if you are thinking about ASCII encoding and a uniform random number generator, but they don't look as bad when you consider a Markov-based random number generator trained on a huge pile of old English text (not including Shakespear's at all.)

It's not even correct to say "the probability of this being from a random generator is very low" because that's not true

It actually is true for most random number generators. The statement however is not a very informative one, as it holds for too many sequences. For example, with a uniform random number generator, the probability of any sequence of length N-bits being generated from it is the same very low value 2^-N.

- it either was or was not randomly generated, that's not open to probability.

This is an statistician's view. There is another view, called the Bayesian view that actually assigns a probability as degree to which a person believes a proposition. This is a very intuitive notion. For example, if you have not be in Vancouver yesterday and have not read or heard any news, you cannot answer the question "did it rain in Vancouver yesterday" with certainty, but given the time of the year, you have an idea of what the answer possibly is. If it's in the winter, I would say there's a 70% probability that it rained yesterday in Vancouver, while in the summer, that will be less than 10%.

The same is true about the question of "was this text generated by a random number generator?". You definitely can tell by seeing the text. If it's the works of Shakespeare, you would say the probability that this text has been generated by a random number generator is very very low. If it looks like total garbage, you would say "yes, it's quite possible that the data is generated from a uniform random number generator". To understand how this works, one can go back and use the Bayes rule: given that you have a copy of the works of Shakespear in a file in your computer, is it coming from a uniform random number generator, or somebody copied it from another file? The probability of the random number generator generating this file is less than 2^-10,000 given the file is a few pages long. On the other hand, if someone copies a random file off the internet for you, the probability that it's works of Shakespear (given that we know that is available somewhere on the internet) is greater than 2^-100 (assuming gazillions of computers with gazillions of file on each). Comparing those two numbers, you deduce that the probability that the file you have has been generated by the random number generator is essentially zero.

Check out David MacKay's book for similar discussion.
 
Well, I'm disappointed that you characterized my entire article as "inaccurate" based on one paragraph, and I don't think even that paragraph is inaccurate. Yes, a copy of the complete works of Shakespeare will look more plausible as random number generator output if you assume a generator with an output distribution slanted towards English text of that era, instead of uniform over some larger set, like same-length binary files. But it still won't be plausible that a random generator with any significant amount of entropy in its output will generate Hamlet (the exact text, not just "something similar to Hamlet") without being specifically designed to do so.

As for the Bayesian interpretation of probability:

- I am aware there are multiple interpretations possible.
- I do not subscribe to the one you describe, and I do not agree that it's easier to understand. If you want to assign a number to how much you believe a statement, great, but then it's important to be clear that the uncertainty is in your understanding of whether the event occurred, not in whether the event actually did occur.
- It isn't relevant to what my article was about.
- Rewriting it in the terminology of Bayesian probability would not affect the conclusions I drew.
- In an informal article that is not about the philosophy of probability theory, I don't think it's necessary or useful to attempt to present some kind of balanced exposition of ways to interpret the meaning of probability including significant coverage of interpretations I don't agree with myself.
- I'm not sure that this "debate" even matters, because it seems rather metaphysical. If I'm building for instance a data compressor, my code is going to wind up doing the same thing regardless of whether I believe that the input file is a random oracle with certain probabilities of generating ones or zeroes, or a fixed non-random string about which I am uncertain whether the next unread bit is a one or a zero. Since it appears to depend on our personal beliefs, not on anything we could verify by experiment, and there doesn't seem to be any experimental outcome that would cause either of us to change our views of what probability is, when you get right down to it, I think this is a religious question, not a scientific one.
 
Post a Comment



<< Archive
<< Home