McEs, A Hacker Life
UTF-8 Bit Manipulation
Roozbeh:
My reasoning was that the current code is not worth changing without strong profiling data showing measurable gain in real-world use cases. That's all.
Now to your solution and questions. You approach has two and a half major issues that make it unusable in real code:
- As you mention yourself, it uses 64-bit math,
- It assumes that shifting an integer more than its width results in zero. That's undefined by the C language,
- It does a "x*3". On many processors that's best implemented as "(x<<1)+x".
So while it theoretically works, using nine integer operations, in practice it's unusable. Oh, your function produces the exact same values as in the glib table BTW. That's good.
Here is my solution that can be written as valid C code using 13 simple 32-bit operations:
def behdad_utf8_skipper(c):
v = c ^ 0xff
r = (v > 0xF) << 2
v >>= r
s = (v > 0x3) << 1
v >>= s
r |= s
r |= (v >> 1)
return (0x11234561 >> (r << 2)) & 7
It's basically a negation followed by a log2 straight from
bithacks, followed by a table lookup. I particularly like the beautiful final constant.
I leave it to others to measure if this is faster than the lookup-table in glib. Enjoyed working this out though. Everyone, go crazy, shove a few ops off!
Labels: bithacks, glib, utf8
In the News
[I didn't see enough buzz made about these, hence posting.]Couple inspiring moves by FSF and Red Hat to end software patents:
While at that, also check out
What’s Going On With Red Hat Desktop Systems? An Update.
The list in there just doesn't do justice on how important a role Red Hat's Desktop team plays in advancement of the Free Software desktop and laptop experience. With recent hires like
Matthew Garrett,
Richard Hughes, and
William Jon McCann, you know how serious we are about doing the Right Thing. I feel so privileged to be part of that.
Labels: fsf, redhat
Climbing CN Tower, check

So, on Saturday, I did
it.
Reached my goal by
raising $253.67 in about 36 hours, and climbed the 144 floors up in just short of 19 minutes. Thank you all who sponsored me. It means a lot to me!
I woke up at 8:30, took shower, and walked down to the tower. Had my power bar and energy drink, checked in, and ready to go. Had to wait in line for an hour to start.
The climb was pretty smooth. I'd never done stairs before. It was easier than I expected. The stair machine in the gym is nothing close to the real thing. I started by running up. Before I knew I was at floor 30. Then slowed down to my steady speed, keeping a constant heart-bit rate. Floor 60 to 70 was crowded so had to slow down a bit. 70 to 110 was a bit breathtaking, but ok. 110 up was counting down the floors and before I had a chance to start running to drain my remaining energy I was already at the checkpoint.
Worst part was that we had to do another 11 floors after the checkpoint to reach the common area.
That was hard.
Was a great day. I spent the rest of the day walking around downtown, having brunch with friends, and otherwise enjoying the sun. Good times.
Labels: cn tower, gnome
Help save the world and enjoy!
[While people are in the fundraising/donation mood...]On Saturday I will climb the tallest free-standing structure on land in the world as part of the
18th Annual Canada Life CN Tower Climb for
WWF-Canada.
I will be climbing 1776 step in my TEAM GNOME tshirt, aiming for 25 minutes, and with a goal of raising $250 by tomorrow (Friday 18th) night, to help stop global warming.
So here is your chance to help save the world and enjoy while I'm suffering.
Sponsor me now!
Labels: cn tower, gnome
History meme
Liked this one enough to bother.
[behdad:0 ~]$ uname -a
Linux behdad.behdad.org 2.6.24.4-64.fc8 #1 SMP Sat Mar 29 09:54:46 EDT 2008 i686 i686 i386 GNU/Linux
[behdad:0 ~]$ history | awk '{a[$2]++}END{for(i in a){print a[i] " " i}}' | sort -rn | head
1429 cd
685 vim
576 ls
377 ll
369 make
349 makenull
192 git
166 grep
120 python
107 evince
Where
ll
is the Red Hat / Fedora alias for
ls -l
, and
makenull
is my alias for
make >/dev/null
.
Labels: history, meme, pgo