McEs, A Hacker Life
DesktopCon Last Call
Tomorrow is the last day to
submit proposals for talks on Desktop Developers' Conference this July in Ottawa. Last year it was some seventy of us there, pretty cool. I'm looking forward to meet more faces this year. I almost know what I'm going to talk about, but I've not submitted yet. Have been busy marking two final exams and an assignment, before I head to Niagara Falls and New York in a few hours. I'm going to meet Noah that should be fun, and a lot of old Iranian friends. I like to meet more hackers though.
Jamesh: SourceForge.net's compile farm is a very handy way to make sure your packages build on Solaris and MacOS X. All you need to gain access to that is to be a member of an active sf.net project. One can open a project for tinderboxing only.
New York and Boston next week
Everybody's registering for 6UADEC. Too bad I cannot make it :(. Apparently I should have not taken Roozbeh's words about not having funds to go to 6UADEC seriously, since they are trying anyway. Only thing lost was my opporunity to apply for a Canadian visa to be able to come back to Canada after 6UADEC, and that takes some five weeks itself.
Instead, yesterday I finally got a US visa. To make some of you more ashamed of your government, lemme explain the history of this visa. I applied for it on January 23rd, 2004. I was supposed to
present in the Internationalization and Unicode Conference on April 1st 2004. The officer told me we've got plently time to get my clearance back. In short, that didn't happened until late August. After that I was busy enough to not go get it (and I needed to bring new documents). So finally I went there on February 7th 2005, to get the visa to visit a friend. Since a clearance is only valid for a few months (three or six, don't know), I had to go over again. But fortunately this time it took less than two months.
So here I go now, with a visa in my passport, and a
Bob Dylan presale ticket for the last performance of the tour in my hand for April 30 in New York. So I'm heading there in two days. No plans on how long or where to stay yet, but if everything goes smoothly I plan to make a trip to Boston too, where I would like to meet some Gnome/Red Hat people, if anybody stands up. Any words, offers, interests are more than welcome.
With Roozbeh, we will make sure to update you on our visa problems, you that your goverments think we support our government,
you who travel with a tip of your toes.
Data Crunching
Data Crunching is the name a new book by Greg Wilson. Follow the title link to have a look inside. Highly recommended to warm your hands up. Fortunately for those of you in Iran, Pragmatic (by definition) sells PDF copies too. I wish I could freely read it, since I'm not going to buy that. Thanks Greg anyway.
Here we go
So I finally found jdub on IRC and he hooked up my feed to
the planet.
I coded a couple all nighters again last week, but the wrist-ache didn't let me continue. Some random links for now:
School of Rock
Watched the movie. Very entertaining.
In the News
I've been extremely busy the last few days. So I first go by a long news post.
Books arrived. Busy with "Inmates". Pretty good stuff so far. The Bob Dylan Encyclopedia is awsome too. I'm having a lot of good time reading the Encyclopedia and Lyrics books side by side. I will post a few notes on Bob Dylan later.
Web development. Last few days I've been busy doing web development. Focused mainly on
RiRa Persian Digital Library, but editing other stuff too. Behnam sent me this
Web Development Bookmarklets which totally changed my web development practices. I made a list of pages and services I maintain, and cleaned their codes and styles up until satisfaction (XHTML1.1) and perfect FireFox viewing. I probably ruined Windows support in the process, but I don't care _much_. Since Opera 8.0 is released now, I'll try to download and test under that too. I will compile the list of these pages I do and put on my homepage and post here later. Moreover, I finally decided to put all my web development stuff under the name RiRa Developers Resources. Will come soon.
AdSense. I finally decided to put Google AdSense on this blog. Not that I get any clicks, but that should be enough to pay for the domains like
iraniancalendar.info that I will not renew if I don't get enough dollars for that. Click a couple once in a while if you don't want to see them discontinued. Thanks.
Bloglines. For a few days now, I found bloglines.com very helpful in organizing feeds I read. I'm very satisfied with that. I've got to make the switch to del.icio.us too soon. I'm torn apart between my laptop at home and terminal at office. On each, there's FireFox for web development and Epiphany for Persian and long session browsing. So both Bloglines and del.icio.us are much needed. Epiphany rules with it's crash recovery. When I've got to leave office/home, I just kill it and continue browsing when back and recover. It's got a little better Persian support too, in tooltips and titles specially, thanks to Gtk+-2.0.
Linux World Expo. With my good friend Farhang we headed to the Linux World Expo yesterday, to get some merchandise. Like last year, Novell was making huge buzz, with very uninformed speakers that probably have been counting minutes down. One of them focusing on migration from Windows, saying that there are lots of solutions, like VMWare, Wine, etc...
We made a couple questions and got back stuff. I got a Novell cap like last year, Farhang got a huge red umbrella that can wait to be used in Vancouver next fall. We jumped in an IBM talk on Linux Internationalization. The speaker was a teacher at IBM Toronto. Lets not be picky about him calling "UTF-8" simply "UTF", but he even said that Unicode 3.2 is the latest version of the UCS standard! Got a good Tux from HP. The top earning was the Programming Languages and Unix History posters from O'Reilly. Not surprising, Red Hat's booth was a boring place with absolutely nothing to give away. Ah, got a VMWare 5.0 workstation too! Will try that sometime. Maybe run Ubunto under that, or Gentoo.
Gnome. Read
Davyd's presentation on his experience in Gnome. I'm a bit surprised on how he can enumerate the bugs he started with, in 2003. As a result, I feel more strongly that I am a Gnome developer :D.
I wanted to comment on Luis's comments on Gnome 3.0 too, but let that be for another occasion.
Enough for now! I would be happy to answer any questions the audience may have.
Context-Free Grammers, Cool Applications
Cool reading from slashdot on
students generating random papers that get accepted! They have released the source under GPL, with CVS and accepting patches an all. Another interesting one is the
Context Free Desgin Grammar, though it's not anything new, but the examples are beautiful.
Puzzle of the Week: select count(*) on bit vectors
Recently I
mentioned that counting the number of set (a.k.a on, 1) bits in a bit vector is fast. Well, that's definitely not true. This puzzle is about that problem. A naive approach is this C function:
int
count_bits_naive (unsigned char *buf, int len)
{
int n = 0;
for (; len > 0; len--, buf++)
for (; *buf; *buf >>= 1)
n += *buf & 1;
return n;
}
I want to see how much one can speed up this code. Valueable responses would include code, statistics, and theoretical analysis. Assume a 32 or 64-bit machine.
I've got a quite interesting idea about that myself, not sure how it does in reality. BTW, like the sample code above, assume for simplicity that you can trash the array in place.
Submit your solutions through comment system or via email (preferrably) before Sunday April 17, 2005.
Puzzles of the Weeks
So I decided to run puzzles here. I design a programming puzzle once in a while, mostly stuff that I encounter in my daily programming work. I would submit my solution by the end of the week in another post (and link the two.) To see what to expect, check
this for example. Kinda
IBM Ponder, but programming oriented. I expect most puzzles to be language-agnostic, though my preferred languages are C and Python, though I do JavaScript for were only it works too. Bash and UNIX stuff aside.
On Profiling and Optimizing OpenSource Software
A couple weeks ago I read an entry on
Optimizing GNU ld from a Mozilla developer. Very interesting reading, do not miss. It was shortly after start of my work on
optimizing Pango opentype code. We both used
OProfile, which I hereby declare as a quite handy tool. Profiles anything and everything.
From my own experience and what I've read here and there, including the mentioned report, seems like most OpenSource software is spending most of its in things like strcmp, a handwritten sort or search, etc, and not algorithms or other things we expect it to spend. Lemme rearrane my argument: If you want to optimize a piece of software, 95% of time you do not need to find a way to reduce the number of times the fast path is taken, you simply need to remove heavy (by C peoples' definition of heavy, like strcmp) operations out of the way, just that. Yes, by removing an O(n^2) algorithm in a fast path for an O(n log n) one you get a lot, but if you are working on such a code, you most probably do not need profiling!
Rants on what Arabeyes should do in the OpenSource World
From another posting, to Arabeyes this time. I strongly believe that localization efforts cannot justify starting new applications.On Fri, 8 Apr 2005, Mohammed Elzubeir wrote:
> Behdad, need I remind you that it took OVER A YEAR for VIM's author to
> incorporate the Arabic patches? We can't just sit around waiting for
> mainstream developers to integrate our work and neither do we want to
> start forks unless we feel that the maintainers have no intentions of
> doing anything about it. After all, the chances of the success of a fork
> is mostly not very high.
That's perferctly fine, yes. First, a year for a patch like Arabic support in VIM is very optimistic and due. There's a locale patch for glibc to add a new locale that was integrated in after two years. And a locale is just data, no code involved.
Second, unlike your preference, to me, a fork is still better than a new project. That's like the Linux kernel stuff. Everybody except for Linus, provides his patch, and update the patch when new kernels break it. Nobody starts writing a new kernel simply because Linux does not support his webcam! That's the way you can go, and you can provide prebuilt packages too. You can easily convince distributions like Mandrake, maybe Debian, and even Fedora to ship your patches, and that makes the maintainer more confident in the patch and ease integration.
Third, I want to draw your attention to a recent project called
Poppler, which is a fork of xpdf codebase by Red Hat guys on freedesktop.org. This was done because pdf maintainer/auther was not interested in integrating communities patches upstream. KDE/Gnome are both using Poppler right now, and we are going to see a far better PDF support in them. Another example is the
ooo-build project, which is a multi-distro effort to release integration patches for OpenOffice.org simply because OO.o process is too slow. They eventually integrate upstream, but the project exists, and has proven to be quite useful so far. Now, instead of seeing you guys writing minibidi for CUPS maintainers because they're not willing to link to
GNU FriBidi, I liked to see somebody starting a page for [L]GPLed patches to CUPS and start advertising that.
On Politics of God
Apparently
Khatami is strongly denying handshake with Katsav. It is a bit hard to decide whether they actually shook hands or not. On one hand, from the photo, it's hard to imagine the two standing side-by-side (or back-to-back?!) have not shook hands, on the other hand, it's hard again to believe that no body took a photo of that when Khatami has been subject to a lot of attention by attending the Pope's funeral.
Katsav says he greeted the Iranian president in Persian, and they talked about Yazd, the central city in Iran which both were born. Khatami denied this all as soon as entering Iran, right in the MehrAbad airport! His speech was offensive, at best. Everybody I've talked to believe that the news has happened, just that Khatami is feared to confirm that.
Katsav also added that it's been just being polite, not to be taken political... And Iranian government do not let its sportsmen (wrestlers mostly) to play with Israeli opponents.
On Lucene and it's decency
From a
reply of mine in a thread on gnome-devel-list, I'm quoting here since it documents part of my work in 2002 on
RiRa Persian digital library project:
On Wed, 6 Apr 2005, Jamie McCracken wrote:
> 2) Use of an SQL database is a far superior, faster and flexible
> solution to using a dedicated indexer like the lucerne engine (all other
> competing engines like spotlight use sql databases). This is one area
> search services has got right.
Lucene is a decent search engine. You cannot compare it with SQL databases, you can compare it to another search engine, that may or may not use SQL databases as backend, but, as soon as you are talking about search engines, their implementation details doesn't matter at all. So, SearchServices by using SQL databases is really losing here, since it has to do a lot to catch Lucene, that I doubt it can. SQL databases are good things if you want atomicity, transactions, scalability, support for (really) complicated queries: joins, subqueries, etc. None of which is needed at all in a Desktop search service that you have one single server per user that does the indexing too. What SQL databases provide for a search engine is at best the "like" operator and well, they can use indexes when you are matching the beginning of the string. And all the RDBMS hype comes from decent products like PostgreSQL, not a toy size one like SQLite.
Lucene on the other hand, comes out from an experience ex-employer of Excite, and from the Apache Foundation. It's specialized for search services. It allows for localization of search technology: You have an English normalizer, a German one, a Persian one, .... Yes, you have text normalizers there.
> Cause its not just about indexing - We have metadata too and
> that really needs a DB. If all you want is a google on your
> hard drive then yes a dedicated indexer would be best but an
> RDBMS will give you expanidbility and flebility in handling
> structured metadata with more powerful search options.
Very good point. Yes, Lucene accepts metadata too. You can have an unlimited number of fields. In fact, Lucene is quite like a relational database, you have different tables, each table has a number of fields. Just that you are not forced to have a primary key. At search time, you can search a table, any field of it, with exact or fuzzy matching. Queries can be built in a tree like fashion, by using AND, OR, and NOT operations. And it already has parsers for parsing Google like queries. It even
accepts wildcards in query words. It also accepts quotation for searching phrases exactly, something that's a nightmare doing with RDBMS-based systems.
I had an experience with Lucene a couple years ago. (http://rira.ir/) I was working on a smallsized database of Persian poetry, some 700'000 verses in 17'000 poems. I had it imported in PostgreSQL, in some ten tables. I wanted to add a search service. Using a table for word-item matchings was out of question. I got Lucene and it was a matter of couple hours to write an indexer to fetch data out of PostgreSQL and import into Lucene. Now some of my observations were really stunning:
- Data was getting out of PostgreSQL views, which were simply natural join of some six tables (poet, book, part, poem, block, verse), all indexed, etc. Database was tuned up to my best of knowledge (shared memory size, vacuumed, etc). Lucene and the indexer were running on another maching. The indexing got just under one minute, with the PostgreSQL server making it's machine just unusable in this period, perhaps writing join tables on hard disk and fetching back later, etc, while the Lucene machine was as happy as a machine can be.
- The raw SQL dump of the data was 45MiB, Bzip2 would reduce to 17MiB. The PostgreSQL database to hold this data takes more than 70MiB, not talking about aa indexing system on top of that. In Lucene, for each field you can select at index time whether you like this field to be stored in the database (to be returned at search time) or not. I could simply store primary keys to my RDBMS database, but decided to store the whole text in Lucene, since after all stored AND indexed, the database as a small 30MiB file!! and my search page didn't need to contact the RDBMS for serach excerpts anymore.
- For a small project like mine, that didn't need almost any of RDBMS's glories --or to be honest it needs, but the performance of joins I like is not satisfying at all--, I may decide to move completely to Lucene. It provides all I want, and at least fetching number of rows is far cheaper than in PostgreSQL for example. (Don't argue about MySQL and others, they barely have things like views, schemas, etc.)
Update: I forgot to mention where IMHO the speed of Lucene comes from. From looking at the code, Lucene (and probably many other small-scale (and large-scale too?) search engines) work with bit vectors over all documents. So a complex query can be performed by bitwise operations over long bit vectors of basic queries on one phrase. Now you probably say a bit vector of over all documents is HUGE, but no: eight million documents take only one megabyte, which is negligible these days. And eight million documents is pretty much more than what you find in any website.
Ghasedak, the Student Magazine
Sharvand English conducted an interview with
Ghasedak which is available
here. I've not read the interview yet. I used to be quite active in Ghasedak board, and contribute once in a while these days.
On Daylight-*Saving*
There's something I don't understand here: According to the
CNN article, US may expand daylight-saving period by two months, from first Sunday of March to last Sunday of November, which leave only three months of year in the
original time. The changed is justified by saving 10'000 barrels of oil a day, which is less than 0.05% of the daily oil consumption in US.
But my question is: If daylight-saving in March and November is so good and saves energy, why not save more and drop the hassle totally: Just adjust the time one hour and save all year round?
It's just asking people to go to work at 8 instead of 9, but in a way they don't nag.
UI readings
After discussion with
Roozbeh about my
multi-system calendar prototype (more on that later), I ordered
The Inmates Are Running the Asylum: Why High Tech Products Drive Us Crazy and How to Restore the Sanity, added
The Design of Everyday Thing to get free shipping, needed still 1.x dollars, so felt lucky and picked
Keys to the Rain: The Definitive Bob Dylan Encyclopedia from Amazon suggestions too. Not that merely buying good books makes me more/less sophisticated, I still got the last batch (Bob Dylan Songs/Chronicles vol. I, LaTeX Companion 2nd ed., Chicago Manual of Style 15th ed.) to read, but at least that makes me
look more sophisticated to the layperson^Wpeople around me. They typically can't tell between me before and after reading these books anyway...
Planets revisited
Few hours ago. It's barely near to midnight in some significant timezones, yet the two planets have boomed: Please welcome
Planet GNOME and
Planet KDE. If I was at
Google I would have made sure to crawl planet.gnome.org today and not again as long as possible. That could route some KDE traffic
our way. :D