On the Implementation of OPAMDOC

Early this year, Leo White started the implementation of opamdoc. Then Vincent Botbol worked on it from late May to mid August during his stay in Cambridge.

Now, Vincent's back to studying "computer science research" in Paris, and he continues working on opam-doc when he can on his free time.

I didn't really want to get involved too deep in the implementation of opamdoc (mainly because it takes time). Eventually, since I've ended up doing most of the technical-side implementation of the new ocaml.org web site, I had to eventually get into opamdoc's source code to integrate its output in the website.

If you look at the source code of opamdoc, you'll see there's a bunch of files at the root directory. Well, some of them are inherited from the implementation of ocamldoc, and some are new. I've mostly contributed to generate.ml and opam_doc_config.ml. The former implements the HTML backend of opamdoc, and the latter contains the JavaScript engine that loads the documentation. This blog post is mostly about my experience with those two files.

generate.ml

This big file (about 1.5 kloc) contains the functions to retrieve the information from cmt, cmd, cmti, cmdi and cmi files in order to build the HTML document.

Side note:

The .cmt files were officially introduced in the standard OCaml compiler in version 4.00.0, so it's pretty recent work. Previously, they were produced by a separate tool developed at OCamlPro for TypeRex.

Two reasons why it's not super easy

Well, this can actually be explain in just a few words. To begin with, there are many cases to address. Well, that is not exact. I should say that there are may cases to address and those cases are fairly poorly documented, which is "normal" given that there was probably no particular motivation to put efforts into documentation. This is true for the source code of ocamldoc and for its official documentation.

For instance, if you look into info.mli, you can see that the first type definition is:

type style_kind =
  | SK_bold
  | SK_italic
  | SK_emphasize
  | SK_center
  | SK_left
  | SK_right
  | SK_superscript
  | SK_subscript
  | SK_custom of string

and the only documentation for this type is (** The differents kinds of style. *). Well, you can see that there's not really any documentation needed for those SK_... until... you see SK_custom of string. There you go! You have to guess...

It's not that hard when you have to guess a few times, it's a lot harder when you keep having to guess. That's my point.

Well, the other issue is more interesting. At the time ocamldoc was designed and implemented, I bet no one imagined what folks at JaneStreet would do with OCaml! I'm talking about the implementation of their open source library, known as Core. "Core & co" use a lot of include directives. The module Core.Std includes a lot of other modules, which also include modules. If you want to generate a single full HTML page for Core.Std, you'd end up with a huge page. And such a page would contain a lot of information coming straight from other pages, so you'd end up with hundreds of megabytes. Instead of doing so, opamdoc generates only one page per package and one per module. If a module includes another one, then the first will fetch the documentation of the second and there you go. So we only have 8.4MB of HTML for {async, async_core, async_extra, async_unix, core, core_bench, core_extended, core_kernel} in total (well, this number should increase in the future, but linearly, as people will hopefully start documenting all those packages). And that's why we have a JavaScript loader.

opam_doc_config.ml, or docs_loader.js

Since the JavaScript may have to load tens of megabytes of HTML, you have to program some nasty functions and loops... and at some point it does become big enough for your browser to stop responding while it's busy loading your documentation. So there are several solutions to that. The best would probably be to stop writing in JavaScript (and use something else that compiles to JavaScript). But that's for next step. Now, we're trying to make the JavaScript work.

The problem with JavaScript is that basically there is one "event" loop, and all events are happening sequentially or concurrently, depending on how you wrote your JS, but when one event is making the browser busy, the browser is unable to do anything else. That's why your browser may tell you at some point that you have a script making it ill and ask you whether you want it to stop executing that script. One workaround for that problem when you know you ask for a lot of computation time is to divide your big computation into smaller ones. You can use window.setTimeout for that, meaning you transform your recursive calls like f() into window.setTimeout(f, 0) so that f will be called at some point. And if you have iterative loops instead, write them recursive and use window.setTimeout. That's bad. Because then the browser is unable to tell that it's crazy busy... and it's still really busy. So you can increase the value 0 to 1 or 10 or more. But if you increase it too much, it'll become too slow...

Ok, well, don't use JavaScript. It's a nightmare.

We will probably rewrite the script using Js_of_ocaml at some point, mainly because when you write in OCaml, you have static type checking, and that saves so much time!

To be followed...

started on 2013-09-24 14:21:00+00:00, (re)generated on 2014-01-15 15:14:11+00:00

tags: • ocaml • opamdoc