On the Implementation of OPAMDOC
Early this year, Leo White started the implementation of opamdoc. Then Vincent Botbol worked on it from late May to mid August during his stay in Cambridge.
Now, Vincent's back to studying "computer science research" in Paris, and he continues working on opam-doc when he can on his free time.
I didn't really want to get involved too deep in the implementation of opamdoc (mainly because it takes time). Eventually, since I've ended up doing most of the technical-side implementation of the new ocaml.org web site, I had to eventually get into opamdoc's source code to integrate its output in the website.
If you look at the source code of
opamdoc, you'll see there's a
bunch of files at the root directory. Well, some of them are inherited
from the implementation of ocamldoc, and some are new. I've mostly
contributed to
generate.ml
and
opam_doc_config.ml
.
The former implements the HTML backend of opamdoc, and the latter
contains the JavaScript engine that loads the documentation. This blog
post is mostly about my experience with those two files.
generate.ml
This big file (about 1.5 kloc) contains the functions to retrieve the information from cmt, cmd, cmti, cmdi and cmi files in order to build the HTML document.
Side note:
The
.cmt
files were officially introduced in the standard OCaml compiler in version 4.00.0, so it's pretty recent work. Previously, they were produced by a separate tool developed at OCamlPro for TypeRex.
Two reasons why it's not super easy
Well, this can actually be explain in just a few words. To begin with, there are many cases to address. Well, that is not exact. I should say that there are may cases to address and those cases are fairly poorly documented, which is "normal" given that there was probably no particular motivation to put efforts into documentation. This is true for the source code of ocamldoc and for its official documentation.
For instance, if you look into
info.mli
, you
can see that the first type definition is:
type style_kind =
| SK_bold
| SK_italic
| SK_emphasize
| SK_center
| SK_left
| SK_right
| SK_superscript
| SK_subscript
| SK_custom of string
and the only documentation for this type is
(** The differents kinds of style. *)
.
Well, you can see that there's not really any
documentation needed for those SK_...
until... you see
SK_custom of string
. There you go! You have to guess...
It's not that hard when you have to guess a few times, it's a lot harder when you keep having to guess. That's my point.
Well, the other issue is more interesting. At the time ocamldoc
was
designed and implemented, I bet no one imagined what folks at
JaneStreet would do with OCaml! I'm talking about the implementation
of their open source library, known as Core
. "Core
& co" use a
lot of include
directives. The module Core.Std
includes a
lot of other modules, which also include modules. If you want to
generate a single full HTML page for Core.Std
, you'd end up with a
huge page. And such a page would contain a lot of information
coming straight from other pages, so you'd end up with hundreds of
megabytes. Instead of doing so, opamdoc generates only one page per
package and one per module. If a module includes another one, then the
first will fetch the documentation of the second and there you go. So
we only have 8.4MB of HTML for {async, async_core, async_extra,
async_unix, core, core_bench, core_extended, core_kernel} in total
(well, this number should increase in the future, but linearly, as
people will hopefully start documenting all those packages). And
that's why we have a JavaScript loader.
opam_doc_config.ml, or docs_loader.js
Since the JavaScript may have to load tens of megabytes of HTML, you have to program some nasty functions and loops... and at some point it does become big enough for your browser to stop responding while it's busy loading your documentation. So there are several solutions to that. The best would probably be to stop writing in JavaScript (and use something else that compiles to JavaScript). But that's for next step. Now, we're trying to make the JavaScript work.
The problem with JavaScript is that basically there is one "event"
loop, and all events are happening sequentially or concurrently,
depending on how you wrote your JS, but when one event is making the
browser busy, the browser is unable to do anything else. That's why
your browser may tell you at some point that you have a script making
it ill and ask you whether you want it to stop executing that script.
One workaround for that problem when you know you ask for a lot of
computation time is to divide your big computation into smaller ones.
You can use window.setTimeout
for that, meaning you transform your
recursive calls like f()
into window.setTimeout(f, 0)
so that f
will be called at some point. And if you have iterative loops instead,
write them recursive and use window.setTimeout
. That's bad. Because
then the browser is unable to tell that it's crazy busy... and it's
still really busy. So you can increase the value 0 to 1 or 10 or more.
But if you increase it too much, it'll become too slow...
Ok, well, don't use JavaScript. It's a nightmare.
We will probably rewrite the script using Js_of_ocaml at some point, mainly because when you write in OCaml, you have static type checking, and that saves so much time!
To be followed...
started on 2013-09-24 14:21:00+00:00, (re)generated on 2014-01-15 15:14:11+00:00