Glib bithacks
Because one blog post is typically not enough after a long day. Was looking into replacing glib Unicode tables with stuff generated with my
awesome compressor (lack of self-confidence!) that I found the functions accessing those tables are in fact more in need of some love.
Here is the problem: There's an enum of some 30 entries, which are Unicode general categores, like this is a letter, this is a digit, etc. You want to test whether a character is in one of a few of these classes. Typically you write something like this:
#define ISDIGIT(Type) ((Type) == G_UNICODE_DECIMAL_NUMBER \
|| (Type) == G_UNICODE_LETTER_NUMBER \
|| (Type) == G_UNICODE_OTHER_NUMBER)
If you are efficiency-concious, you may convince yourself that gcc takes care of it. Which it doesn't.
I solved this problem in FriBidi by assinging specially built values to my enum entries, but later found that that's overly complex for the task at hand. And forces you into 32-bit enum entries too, which may not be suitable. Here is part of the patch for your visual enjoyment, of my new solution:
-#define ISDIGIT(Type) ((Type) == G_UNICODE_DECIMAL_NUMBER \
- || (Type) == G_UNICODE_LETTER_NUMBER \
- || (Type) == G_UNICODE_OTHER_NUMBER)
-
-#define ISALPHA(Type) ((Type) == G_UNICODE_LOWERCASE_LETTER \
- || (Type) == G_UNICODE_UPPERCASE_LETTER \
- || (Type) == G_UNICODE_TITLECASE_LETTER \
- || (Type) == G_UNICODE_MODIFIER_LETTER \
- || (Type) == G_UNICODE_OTHER_LETTER)
-
-#define ISMARK(Type) ((Type) == G_UNICODE_NON_SPACING_MARK || \
- (Type) == G_UNICODE_COMBINING_MARK || \
- (Type) == G_UNICODE_ENCLOSING_MARK)
-
+#define IS(Type, Class) (((guint)1 << (Type)) & (Class) ? 1 : 0)
+#define OR(Type, Rest) (((guint)1 << (Type)) | (Rest))
+
+
+
+#define ISDIGIT(Type) IS((Type), \
+ OR(G_UNICODE_DECIMAL_NUMBER, \
+ OR(G_UNICODE_LETTER_NUMBER, \
+ OR(G_UNICODE_OTHER_NUMBER, 0))))
+
+#define ISALPHA(Type) IS((Type), \
+ OR(G_UNICODE_LOWERCASE_LETTER, \
+ OR(G_UNICODE_UPPERCASE_LETTER, \
+ OR(G_UNICODE_TITLECASE_LETTER, \
+ OR(G_UNICODE_MODIFIER_LETTER, \
+ OR(G_UNICODE_OTHER_LETTER, 0))))))
+
+#define ISMARK(Type) IS((Type), \
+ OR(G_UNICODE_NON_SPACING_MARK, \
+ OR(G_UNICODE_COMBINING_MARK, \
+ OR(G_UNICODE_ENCLOSING_MARK, 0))))
Yes, good old Pascal-like bit-sets! The real patch is much longer. As a side benefit, the macros only expand Type once, so you don't need to allocate an intermediate variable for it. How's that?