## (Exercise) Are there more 1s or 0s?

The challenge is this:

Provide a program that will determine, given a set of 1s and 0s, if there are more 0s or more 1s.

So the first question that comes to my mind is about the representation of that set of 1s and 0s. Is it a linked list, a double linked list, an array, a tree, a string, or something else?

If it's a list (single linked or double linked actually doesn't matter much), it's going to be pretty straightforward: just read the list and use a counter. Yes, a single counter is enough, you can add one to the counter when you meet a 1, and subtract one when you meet a 0. At the end, if your counter has its initial value, then you have the same number of 1s and 0s. If it's lesser, then it has more 0s, else it has more 1s.

Let's declare a type for the result.

```
type result = Equal | More_zeros | More_ones
[try]
```

```
let count l =
let rec loop counter = function
| [] -> counter
| 1::tl -> loop (counter+1) tl
| 0::tl -> loop (counter-1) tl
| _::tl -> loop counter tl
(* you might want to debate whether you should stop
when you get something else than a 0 or a 1. *)
in
let r = loop 0 l in
if r = 0 then
Equal
else if r > 0 then
More_ones
else
More_zeros
[try]
```

Well, what's the risk with this program? The risk is that the integer
`counter`

overflows, of course! If you have a very long list of 1s (or
0s) only, you may get things just wrong. However, in OCaml, in
reality, there's not really a chance that the integer overflows
because of the length of a single linked list, especially if you're
using a 64-bit architecture based OCaml, on which the greatest integer
is 4_611_686_018_427_387_903 (about 4⨉10^{18}). There's really
a long way to have such a long list because basically you would need
to allocate more than (about 32⨉10^{6} terabytes) at once,
since basically a linked list of integers is made of blocks that have
2 cells each (one for the integer, one for the address of the next
cell), each cell taking 64 bits (or 8 bytes).

But then, what if you don't have linked lists but some stream that
gives you a very large number of 0s and/or 1s? Well, to begin with,
counting from 0 to 4⨉10^{18} takes a really long time. If your
machine can count from 0 to 10^{9} in a single second (that
would mean your machine is very fast), it would still take
4⨉10^{9} seconds, which is about 4000000000/(60*60*24*365)
years. Oh, that means about 126 years! So let's just assume that a 63
bit signed integer is enough for us. And if you really can't assume
that for some reason, you can always implement 128 bit signed integers
quite easily, and if you don't know how to do that or if you're too
lazy to do it, use the Big_int module.

But let's go back the representation of those 0s and 1s. I'd like to
make the computation as fast as possible and I'll put those 0s and 1s
in a very compact representation. Each 0 and 1 will now only take one
bit in the memory (with a possible constant overhead for the whole).
For that, let's use OCaml's strings, which are basically arrays of
bytes. The longest string I can have on my machine will bear
1_152_921_504_606_846_904 bits (I know that because I multiplied
Sys.max_string_length by 8), and that's a lot (more than
10^{8}).

Now say we want to know whether there are more 0s or 1s as fast as possible. How do we do that?

We don't want to count all 0s and 1s bit by bit, because that would have quite a high cost! Indeed, we would have to get each byte, and for each byte we would have to read each of its 8 bits (that can be done) one by one. We don't want to do that.

Instead, since we have bytes, we can conveniently allocate an array of size 256. Each cell of that array will contain the right number to add to the counter. This way, we can read a byte, get its number of 0s and 1s in O(1).

```
(* this table is computed only once *)
let table =
let number_for_a_byte b =
let r = ref 0 in
for i = 0 to 7 do
if (b lsr i) land 1 = 0 then
decr r
else
incr r
done;
!r
in
let a = Array.make 256 0 in
for i = 0 to 255 do
a.(i) <- number_for_a_byte i
done;
a
[try]
```

Then let's abstract from the means to read new 0s and 1s, by assuming
we'll be provided a function `f`

that given `()`

will return 8 bits in
a value of type `char`

, and will raise the exception `End_of_file`

when it has no more bits to give.

```
let more_zeros_or_ones f =
let c = ref 0 in
begin try while true do
c := !c + table.(int_of_char(f()))
done with End_of_file -> () end;
if !c = 0 then
Equal
else if !c > 0 then
More_zeros
else
More_ones
[try]
```

Note that `int_of_char`

has a zero-cost (i.e., it "disappears" at
compile-time because `int`

and `char`

sort of share the same memory
representation). If you want better performance, you should inline
`f`

, provided that you know what it is (you may want to check if the
compiler does the inlining itself first, just in case).

Also, you may want to use a table with a size larger than 256 if you have a lot of memory but I'm not so sure you'd gain performance unless you use nasty tricks to read larger integers from a string. Then again, you may not end up using a string, in which case you have to think with the whole problem in mind.

*started on 2013-09-25 16:19:07+00:00, (re)generated on 2014-01-15 15:14:11+00:00
*