Behdad Esfahbod's daily notes on GNOME, Pango, Fedora, Persian Computing, Bob Dylan, and Dan Bern!

My Photo
Name:
Location: Toronto, Ontario, Canada

Ask Google.

Contact info
Google
Hacker Emblem Become a Friend of GNOME I Power Blogger
follow me on Twitter
Archives
July 2003
August 2003
October 2003
November 2003
December 2003
March 2004
April 2004
May 2004
July 2004
August 2004
September 2004
November 2004
March 2005
April 2005
May 2005
June 2005
July 2005
August 2005
September 2005
October 2005
November 2005
December 2005
January 2006
February 2006
March 2006
April 2006
May 2006
June 2006
July 2006
August 2006
September 2006
October 2006
November 2006
December 2006
January 2007
February 2007
March 2007
April 2007
May 2007
June 2007
July 2007
August 2007
September 2007
October 2007
November 2007
December 2007
January 2008
February 2008
March 2008
April 2008
May 2008
June 2008
July 2008
August 2008
October 2008
November 2008
December 2008
January 2009
March 2009
April 2009
May 2009
June 2009
July 2009
August 2009
November 2009
December 2009
March 2010
April 2010
May 2010
June 2010
July 2010
October 2010
November 2010
April 2011
May 2011
August 2011
September 2011
October 2011
November 2011
November 2012
June 2013
January 2014
May 2015
Current Posts
McEs, A Hacker Life
Monday, April 11, 2005
 Puzzle of the Week: select count(*) on bit vectors

Recently I mentioned that counting the number of set (a.k.a on, 1) bits in a bit vector is fast. Well, that's definitely not true. This puzzle is about that problem. A naive approach is this C function:

int
count_bits_naive (unsigned char *buf, int len)
{
int n = 0;
for (; len > 0; len--, buf++)
for (; *buf; *buf >>= 1)
n += *buf & 1;
return n;
}

I want to see how much one can speed up this code. Valueable responses would include code, statistics, and theoretical analysis. Assume a 32 or 64-bit machine.

I've got a quite interesting idea about that myself, not sure how it does in reality. BTW, like the sample code above, assume for simplicity that you can trash the array in place.

Submit your solutions through comment system or via email (preferrably) before Sunday April 17, 2005.

Comments:
I would precompute the number of set bits for every 8-bit value and store it in a 256-byte array:

char a[256] = {0};
for (int i = 1; i < 256; ++i)
a[i] = a[i/2] + i%2;

int n = 0;
for (; len > 0, --len)
n += a[*buf++];
return n;

It is also possible to precompute more values at the expense of higher memory usage.
 
what if we want to count numner of 0's? I am new to computer science and I was wondering if the same algorithm will work for 0's or not. (please mention a book about counting various numbers in datastructures if possible)
 
Have you seen this:
http://graphics.stanford.edu/~seander/bithacks.html
 
Thanks a lot, that page made my night! Just submitted some corrections.
 
#define TRESULT long
BYTE f(TRESULT vect)
{
  TRESULT lululu;
  BYTE sum=0;
  for(lululu=1;lululu;lululu<<=1)
   if(vect&lululu)
    sum++;
  //for(;;)love(!); I don't have much memory, but my CPU is fast!
  return sum;
}
 
Thanks to everybody who commented. Indeed using a precomputed table is the fastet way. But Xet, you really want to use an static table man. For the database search results bit vector that is sparse, a word-wide check for zero words can speed up by a constant factor still.

But on the smart solutions, what I was looking for was a way that takes time proportional to the number of set bits in a word, not the number of bits in the word, and indeed Brian Kernighan's method does exactly that. Thanks to Owrdac for the BitHacks link. You can find it here.

Thanks for the anonymous love letter h.

And finally, my rather dull idea was: If you have two bit vectors A and B, then C=(A|B) and D=(A&B) are two new vectors such that n(A)+n(B) == n(C)+n(D) where n(X) is the number of set bits in X. Moreover, n(C) >= max(n(A), n(B)), and n(D) <= min(n(A), n(B)). So if you have cheap operations for bitwise or, bitwise and, check for all-zero, and check for all-one on vectors, we could use this algorithm: given vector X, if X is all-zero or all-one, return with the answer, otherwise break X into X1 and X2, boost the number of set bits towards all-one and all-zero, by replacing X1 and X2 with (X1|X2) and (X1&X2) and recurse. Apparently this is of no use in Turing-equivalent machines we have today, but maybe in some quantum mechanics computers in the future ;).
 
Post a Comment



<< Archive
<< Home