UTF-8 Bit Manipulation
Roozbeh:
My reasoning was that the current code is not worth changing without strong profiling data showing measurable gain in real-world use cases. That's all.
Now to your solution and questions. You approach has two and a half major issues that make it unusable in real code:
- As you mention yourself, it uses 64-bit math,
- It assumes that shifting an integer more than its width results in zero. That's undefined by the C language,
- It does a "x*3". On many processors that's best implemented as "(x<<1)+x".
So while it theoretically works, using nine integer operations, in practice it's unusable. Oh, your function produces the exact same values as in the glib table BTW. That's good.
Here is my solution that can be written as valid C code using 13 simple 32-bit operations:
def behdad_utf8_skipper(c):
v = c ^ 0xff
r = (v > 0xF) << 2
v >>= r
s = (v > 0x3) << 1
v >>= s
r |= s
r |= (v >> 1)
return (0x11234561 >> (r << 2)) & 7
It's basically a negation followed by a log2 straight from
bithacks, followed by a table lookup. I particularly like the beautiful final constant.
I leave it to others to measure if this is faster than the lookup-table in glib. Enjoyed working this out though. Everyone, go crazy, shove a few ops off!
Labels: bithacks, glib, utf8
In the News
[I didn't see enough buzz made about these, hence posting.]Couple inspiring moves by FSF and Red Hat to end software patents:
While at that, also check out
What’s Going On With Red Hat Desktop Systems? An Update.
The list in there just doesn't do justice on how important a role Red Hat's Desktop team plays in advancement of the Free Software desktop and laptop experience. With recent hires like
Matthew Garrett,
Richard Hughes, and
William Jon McCann, you know how serious we are about doing the Right Thing. I feel so privileged to be part of that.
Labels: fsf, redhat
Help save the world and enjoy!
[While people are in the fundraising/donation mood...]On Saturday I will climb the tallest free-standing structure on land in the world as part of the
18th Annual Canada Life CN Tower Climb for
WWF-Canada.
I will be climbing 1776 step in my TEAM GNOME tshirt, aiming for 25 minutes, and with a goal of raising $250 by tomorrow (Friday 18th) night, to help stop global warming.
So here is your chance to help save the world and enjoy while I'm suffering.
Sponsor me now!
Labels: cn tower, gnome
History meme
Liked this one enough to bother.
[behdad:0 ~]$ uname -a
Linux behdad.behdad.org 2.6.24.4-64.fc8 #1 SMP Sat Mar 29 09:54:46 EDT 2008 i686 i686 i386 GNU/Linux
[behdad:0 ~]$ history | awk '{a[$2]++}END{for(i in a){print a[i] " " i}}' | sort -rn | head
1429 cd
685 vim
576 ls
377 ll
369 make
349 makenull
192 git
166 grep
120 python
107 evince
Where
ll
is the Red Hat / Fedora alias for
ls -l
, and
makenull
is my alias for
make >/dev/null
.
Labels: history, meme, pgo