Author:: José Valim <jose(dot)valim(at)gmail(dot)com>
Status:: Draft
Type:: Standards Track
Created:: 02-Jun-2021

EEP 59: Module attributes for documentation #

Abstract #

This EEP proposes a structured documentation API for Erlang where the documentation is handled as part of the language parser and included directly in the compiled .beam files, as a replacement for EDoc. Python, Elixir, and Clojure are examples of languages that follow this approach of treating documentation as data rather than code comments.

Rationale #

The main limitation in EDoc today is that the documentation is kept as code comments. This requires an explicit tool to parse said code comments, which complicates access to the docs by IDEs, from the shell, etc. There have been recent improvements in this area by making EDoc compile to EEP 48 but it still requires an explicit step.

Furthermore, the “code comments” approach is more complex implementation wise, as it requires parsing the source code alongside code comments, parsing the code comments, and so on. Personally, I also enjoy the explicit distinction between documentation and code comments: they have different requirements and different audiences.

This EEP proposes the addition of two module attributes to Erlang: -doc and -moduledoc. It also includes an extra section on additional features that can make aid writing documentation, but they are optional and some of them should likely be explored more in depth in other EEPs. Still, they are included to provide a long term view of the features and challenges related to structured documentation.

As with EEP 48, this proposal pertains exclusively to API references and their documentation. It doesn’t cover guides, tutorials, and other documentation formats.

New module attributes #

This EEP proposes two new attributes: -doc and -moduledoc. They could be used as follows:

-module(base64).
-moduledoc "
Convenience functions for encoding and decoding from base64.
".

-doc "
Encodes the given binary to base64.
".
-spec encode(binary()) -> binary().
encode(Binary) ->
  % ....

-doc "
Decodes the given binary from base64.
".
-spec decode(binary()) -> {ok, binary()} | error.
encode(Binary) ->
  % ....

The new -moduledoc attribute can be listed anywhere and it will contain the documentation for the given module. The -doc attribute must be listed anywhere before a function and it will contain the documentation for the following function. For instance, the example below:

-doc "Example".
-spec example() -> ok.
example() -> ok.

is equivalent to:

-spec example() -> ok.
-doc "Example".
example() -> ok.

Listing multiple -doc attributes with string values for the same function should warn or error accordingly, unless the documentation is being set to hidden. For example, this is valid:

-doc "Example".
-doc hidden.
example() -> ok.

But this should warn/error:

-doc "Example".
-doc "Updated example".
example() -> ok.

This as well:

-doc "Example".
example(one) -> 1;
-doc "Updated example".
example(two) -> 2;

Hidden docs #

The module attribute must either be a string OR the atom hidden. Marking a module as hidden means it won’t be part of the doc. For example, imagine the base64 module above delegates some of its logic to a private base64_impl module:

-module(base64_impl).
-moduledoc hidden.

Note a module may be hidden but individual functions can still be documented:

-module(base64_impl).
-moduledoc hidden.

-doc "
Some comments as if it was public.
".
decode64(Binary) ->
  % ...

According to EEP 48, this is intentional. For example, base64_impl should be private for users of the base64 functionality, but a developer working directly on the base64 may still want to access the docs for base64_impl functions directly from their IDE. Each documentation tool should honor hidden accordingly. If no -doc is provided, it defaults to none according to EEP 48.

The -doc attribute accepts the hidden atom too.

File/include docs #

Some developers prefer to not place the documentation alongside the source code. For such cases, -doc and -moduledoc may also provide a {file, Path}, where Path is a relative path from the root of the project to the documentation source:

-moduledoc({file, "doc/src/manual/my_module.asciidoc"}).
-doc({file, "doc/src/manual/my_module.my_function.asciidoc"}).

The file will be read by the compiler and embedded into the chunk at compilation time.

Private functions #

It is up for debate if private functions should support the -doc attribute or not. Elixir warns if this is used, but this has been a source of complaints in the past. Given that Erlang declares the visibility of a function outside of its definition, via the -export attribute, I personally argue that Erlang should allow -doc for non-exported functions, especially to avoid warnings when flags such as -compile export_all are used. In such cases, however, the values of the -doc attribute should never go to the Docs chunk, as per EEP 48.

Callbacks and types #

The -doc attribute can also be used to document types and callbacks. Here, we have two options:

Use the -doc attribute
Introduce distinct -callbackdoc and -typedoc attributes

Those approaches are mostly equivalent.

Metadata #

The new module attributes must also support documentation metadata by passing a map as argument:

-module(beam64).
-moduledoc "
Convenience functions for encoding and decoding from base64.
".
-moduledoc #{
  author => [<<"The Erlang/OTP team">>],
  license => <<"Apache 2 License">>,
  cross_references => [binary]
}.

If the -moduledoc is called multiple times with a map, the maps will be merged. This comes with the added benefit that shared metadata can be moved to a header file:

%% prelude.hrl
-moduledoc #{
  authors => [<<"The Erlang/OTP team">>],
  license => <<"Apache 2 License">>
}.

which we can then include and augment:

-module(beam64).
-include("prelude.hrl").
-moduledoc "
Convenience functions for encoding and decoding from base64.
".
-moduledoc #{cross_references => [binary]}.

A list of built-in attributes is available on EEP 48.

Compilation #

Compiling a module with the -moduledoc or -doc attributes will automatically generate a Docs chunk into its .beam file, making the documentation directly accessible in the shell. A -nodocs flag can be added to erlc to skip the generation of the chunk.

Release tools should also prune the Documentation chunk out of .beam files by default. Note this is already done by beam_lib:strip_release/1 and beam_lib:strip_files/1.

Documentation format #

One important discussion about documentation is what is the documentation format that the documentation should adopt. Luckily, EEP 48 is agnostic to the format, however one must still be listed. There are a couple options to this problem:

The Erlang/OTP picks a documentation format as the default for the Erlang community, such as AsciiDoc.
The documentation format must be explicitly listed via a new attribute called -docformat, such as -docformat "text/asciidoc".

This proposal does not attempt to discuss the prefered documentation format, as it is a separate discussion. It is also possible to support both options above (pick a format but allow it to be overridden).

Other topics #

This section is going to list other topics, and future extensions, that are related to this topic but are not required for the implementation of this EEP.

Unicode binary strings #

In the examples above, we have used char lists for the documentation. However, the Docs chunk requires the documentation to be a binary string. We have a couple of options:

Support binaries exclusively
Support lists exclusively with automatic conversion to binaries at some point
Support both prior formats and automatically convert lists to binaries at some point - preferably in the parser/compiler, to avoid pushing this concern to everyone downstream

The third option is likely the most reasonable for Erlang, since the string module treats both lists and binaries as strings.

Note that similar concerns apply to documentation metadata. Attributes such as License, Authors, and so on, are required to be binaries. Since those attributes are known upfront, they can be validated/normalized by the parser/compiler.

With this in mind, it is important to remember that Erlang binaries are latin1 by default. Therefore, if the documentation contains non-latin1 characters, one would have to write:

-doc <<"Documentation about púnctuation"/utf8>>.

Similarly, the Authors metadata:

-doc #{
  authors => [<<"José Valim"/utf8>>]
}.

For this reason, the Build and Packaging Working Group of the Erlang Ecosystem Foundation has also discussed the option of making Unicode binary strings easier to write in Erlang, such as u"Héllò World". While adding such construct is not a direct part of this EEP, the point made in this subsection is that, if we require the documentation or its metadata to be a binary, we may need to provide conveniences for writing UTF-8 binary strings in Erlang.

Triple-quoted strings #

Another feature often related to structure documentation is triple-quoted strings. To understand why they may be useful, consider the following example that has a double quote in its documentation:

-doc "Remove double-quotes (\") from the given string".
remove_double_quotes(String) ->
  ...

For this reason, we may support triple-quoted strings, which reads better in multi-line format and reduces the need for escaping:

-doc """
Remove double-quotes (") from the given string
""".
remove_double_quotes(String) ->
  ...

Note, however, this is not a strict requirement. While Python and Elixir provide this feature, Clojure is a counter example of a language with structured documentation without triple-quoted strings. On the other hand, some features such as doctests, detailed below, may strongly rely on triple-quoted strings.

Doctests #

A direct consequence of making the documentation more structured and accessible is that Erlang can include doctests, which is the ability to run and validate the examples in your documentation. For example, someone could write this:

-doc """
Encodes the given binary to base64.

    1> base64:encode("hello").
    <<"aGVsbG8=">>

""".
-spec encode(binary()) -> binary().
encode(Binary) ->
  % ....

And then in your test suite:

doctests(_Config) ->
  ct_doctest:run(base64).

The doctest attribute will access the documentation entries in the base64 Docs chunk, extract all of the examples, and run them. Of course, while there is nothing stopping doctests from being implemented on top of EDoc today, this EEP makes doctests considerably simpler to implement.

Doctests would benefit from a separate EEP, as there are some extra considerations, as doctesting exceptions, unparseable formats, etc, but it is worth mentioning them given their benefits to users and documentation authors.

`warn_missing_doc` #

Another optional feature is to support a -compile(warn_missing_doc). that emits a warning whenever an exported function is not documented.

What about EDoc? #

If this proposal is to be accepted, what happens with Edoc?

My proposal would be for EDoc to be repurposed with a new backend that updates the EDoc documentation in existing files to the new module attribute format. This way, EDoc can eventually be deprecated and phased out.

The HTML rendering part of EDoc can be repurposed to work on the Docs chunk, allowing the Docs chunk to benefit from it. Another option is to use ExDoc, which supports Erlang projects either by running it as an escript or via Rebar3 integration.

Similarly, if the goal is to eventually phase out EDoc, the Erlang/OTP team needs to discuss what happens with the existing EDoc to XML conversion in erl_docgen. Two options exist:

Generate EDoc to XML one last time and use the XML as the source from now on
Convert EDoc to the documentation style in this proposal and teach erl_docgen how to convert the Docs chunk to XML

Copyright #

This document is placed in the public domain or under the CC0-1.0-Universal license, whichever is more permissive.