# Type Expansion

Code for this post is available at 651548ffa8.

To confirm that a function type-checks correctly requires inferring the type of the return value and then checking it matches with the type signature. A return value can either be a function parameter, a return value of an inner expression, or a literal.

``````(sig return_value_as_parameter [Integer] Integer)
(def return_value_as_parameter [x]
x)

(sig return_value_from_inner_expression [] Integer)
(def return_value_from_inner_expression []
(* 3 (+ 1 1)))

(sig return_value_as_literal [] Integer)
(def return_value_as_literal [] Integer
0)
``````

In each of these cases, we can easily infer the type of the return value is `Integer` and that it matches with the signature. Things get a bit more complicated when you introduce algebraic data types.

``````(deftype IntOrString []
(union [Integer String]))

(sig foo [] IntOrString)
(def foo []
1)
``````

Now it’s not as simple to compare the inferred type of `Integer` with the signature’s type of `IntOrString`. Instead of an exact `Integer` to `Integer` comparision, we need to expand1 `IntOrString` into a list of its possible types, and then compare `Integer` to each of those until we find a match. If the return type is composed of several other union types, we need to expand recursively to get all possible types.

Recursive expansion does expose the possiblity of an infitie recursive type, like those that define lists and trees. The current version of the Erie compiler will infitie loop when it finds a type like this; fixing that will come later.

``````(deftype RecursiveList [a]
(union [{a nil} {a (RecursiveList a)}]))

; instance of (RecursiveList Integer) might look like
; {4 {3 {2 {1 nil}}}}

(deftype RecursiveTree [a]
(union [{'leaf a} {'left (RecursiveTree a) 'right (RecursiveTree a)}]))

; instance of (RecursiveTree Integer) might look like
; {'left {'leaf 1} 'right {'left {'leaf 2} 'right {'leaf 3}}}
``````

As of today, the Erie compiler is about as naive as it gets when it comes to type expansion. It all happens just-in-time and nothing is cached. This means every time the compiler encounters a signature with a return value of `IntOrString`, it will completely expand it and check, then throw everything away. This is no doubt the slowest way to do type checking, but it was the simplest way for now.

1. I'm using the phrase "type expansion", but I don't know if that is the phrase favored by compiler authors. My googling efforts have been fruitless. It makes sense to me so I'm sticking with it.

# Algebraic Data Types

Code for this post is available at 651548ffa8.

The builtin types are all well and good, but Erie isn’t much without a few additions: maps/dictionaries, structs, and especially alegbraic data types (ADT). Erie will eventually support all of these, but I focused on implementing ADTs first. I refer to them as “union” types throughout the compiler, but they are one in the same.

I want all Erie types to be declared with `deftype`, regardless of the ultimate shape of the type (i.e. ADT, struct, or type alias). By making the signature of `deftype` look like a function, it makes adding type parameters straight forward for supporting polymorphic types.

``````(deftype Maybe [a]
(union [a 'nil]))
``````

One big difference between Erie’s ADTs and those of other langauges that I’m familiar with is Erie doesn’t required “tags” for each option. For example, in Elm, representing `Maybe` looks like:

``````type Maybe a
= Just a
| Nothing
``````

Note how the non-`Nothing` case requires the `Just` tag. This means a value of type `String` does not conform to the `Maybe String` type in Elm. In Erie, `String` will conform to `(Maybe String)` because of the lack of tags. I’m not aware (at least not yet) of why tags are necessary. I may run into the reason later in development, but for now, it seems to be working. If you, as a developer, like the tagged approach you can mimic it with tuples in the definition.

``````(deftype Result [error value]
(union [{'ok value} {'error error}]))
``````

Erie treats a quoted symbol as a type with a single value of itself. It allows for the above which will integrate nicely with Erlang and Elixir.

In the future I want `deftype` to be a macro that generates useful functions regarding that type. Specifically allowing matching in a `case` expression. For now `deftype` is treated as a special form by the compiler. As I continue to expand the functionality of macros, a large amount of that work should shift out of the compiler and into Erie macros.

# Type Checker

Code for this post is available at ed4ee0b78.

Erie now has a very basic type checker. It supports simple builtin types like `Integer` and `String`, and the slightly more complex polymorphic list and tuple types. User-defined types are not supported yet.

## Testing

The type checker has provided a pretty amazing opportunity for test-driven development. The type checker code has so many paths and branches that having a complete test suite helps prevent me from introducing regression bugs.

I have also found `assert_raise` to be indespensible. A proper type checker would collect errors and print them all at the end of compilation, but for now, it `raise`s errors when the type checker finds a mismatch.

``````test "literal list type fails with mismatched types" do
code = """
(sig one [] [Integer])
(def one []
[1 2 "3" 4])
"""

{:ok, forms} = Parser.parse(code)
translator = Translator.from_parsed(forms, {:Core, 1}, false)

assert_raise RuntimeError, fn ->
TypeChecker.check(translator)
end
end
``````

## Error Messages

When writing a type checker, one should aspire to the level of detail and helpfulness in Elm type checker messages. The errors `raise`d by the Erie type checker right now are just good enough to help me debug any example Erie code I write, but definitely not good enough to live in the compiler long-term.

``````def expression_type({:var, _, name}, bindings, _signatures) do
bindings
|> Enum.find(fn {n, _} -> n == name end)
|> case do
nil -> raise "Returning a binding `#{name}` that isn't a function parameter"
{_, type} -> type
end
end
``````

As much as I’d like to add great error messages now, that’s a major project in and of itself.

# REPL Start

Code for this post is available at 2734fc099d.

A basic REPL will go a long way toward making Erie feel like a true Lisp. It doesn’t need to be fancy and support helper functions, tab-completion, or even the “up arrow” to redo the last expression. All we want (for now) is to read an expression, evaluate it on the fly, print the result, and repeat.

I was able to put together something that reads input and starts a new `Task` to compile and evaluate it. But this isn’t terribly useful because Erie is missing a standard library. The only functions that can be called are Elixir or Erlang functions (but they work!).

``````Core(1)> (Elixir.String.split "abc def ghi")
["abc", "def", "ghi"]
Core(2)> ('erlang.atom_to_binary 'derp 'unicode)
"derp"
``````

This is a nice start but defining our own functions in the REPL and calling them would be better.

In Elixir’s REPL, you can do this, but not using `def … do … end` (unless it’s wrapped in `defmodule … do … end`). That’s a bit obnoxious, so I want my REPL to handle `(def …)` gracefully. I do that by stealing a feature of Clojure’s REPL where you’re effectively “in” a module while in the REPL, so any function defined is a part of that module.

I want to make as much use of my `Translator` module as possible, but as it’s currently written, it requires all function definitions to be inside a module (now I see why Eilixr may have made that decision). To get around this, I wrap each call to `(def …)` in a new module thats name is based on a counter. For example, this defintion of `x` in the REPL

``````Core(1)> (def x [] 1)
``````

would be compiled as if I wrote

``````(defmodule Core1) (def x [] 1)
``````

And this defintion of `y`

``````Core(2)> (def y [] 2)
``````

would be compiled as if I wrote

``````(defmodule Core2) (def y [] 2)
``````

This works great, but now calling `x` the intended way produces an error because it’s not defined in the `Core` module.

``````Core(1)> (def x [] 1)
Core(2)> (Core.x)

20:27:31.177 [error] Task #PID<0.143.0> started from Erie.Repl terminating
** (UndefinedFunctionError) function :Core.x/0 is undefined (module :Core is not available)
``````

To solve this, we can keep track of all functions defined in the REPL and their associated modules. If one of them is invoked under the `Core` module, we can intercept the call and translate it to the correct module.

The REPL is by no means complete, but it feels good to be able to write some Erie and see it execute in realtime.

Code for this post is available at 0646b91480.

Erie can now handle some basic macros. Very basic. It only does macro expansion once, and it doesn’t correctly call Erie definied functions from within a macro.

Despite those limitations it can correctly compile this module. However, `cons` and `+` aren’t properly defined as functions, so executing either `one_plus_two` or `the_first` will fail.

``````(defmodule Core)

(defmacro infix [ast]
((lambda [list]
[(Elixir.Enum.at list 1) (Elixir.Enum.at list 0) (Elixir.Enum.at list 2)])
(Elixir.Enum.at ast 0)))

(def one_plus_two []
(infix (1 + 2)))

(defmacro a_list [ast]
['cons 1 nil])

(def the_first []
(Elixir.List.first (a_list nil)))
``````

I’m pretty happy with the progress I’ve made so far on Erie, but there is obviously a lot more to do. Some possible next steps are:

• REPL
• Types
• Standard library
• Syntax highlighting

More advanced macros can definitely wait. A standard library will be very handy, but it’s not preventing me from writing Erie right now. Types are going to be very important soon, but I think a REPL will have the biggest impact. The ability to write small bits of Erie and execute it immediately will help shape the ergonomics and dictate what should be included in the standard library.

# Macro Plan

No Lisp is complete without a macro system. If only it were so easy to implement.

Having never written a language before, let alone a macro system, I’m going to try to break it down into manageable pieces. I have no idea if this is the right approach, but we’ll start with it and see where it takes us.

The steps I’m envisioning are

1. Parse the code.
2. Find instances of `defmacro`.
3. Find invocations of each of those macros.
4. Convert the output of the parser into an Erie abstract syntax tree.
5. Execute each macro with the AST as the argument.
6. Support “unquoting” somehow.
7. Replace the macros in the AST with their results.
8. Repeat until all macros have been expanded.

The simplest macro I can think of is `comment`: replace the inner forms with a single `nil`.

``````(defmacro comment [ast]
nil)

(def loop []
(comment
(Enum.map [1 2 3]
(lambda [x]
(+ 1 x)))))
``````

### Step One

After parsing, we have this.

``````[
[
{:atom, 1, :defmacro},
{:atom, 1, :comment},
{:list, 1, [{:atom, 1, :ast}]},
{:atom, 2, nil}
],
[
{:atom, 4, :def},
{:atom, 4, :loop},
{:list, 4, []},
[
{:atom, 5, :comment},
[
{:symbol, 6, :"Enum.map"},
{:list, 6, [{:integer, 6, 1}, {:integer, 6, 2}, {:integer, 6, 3}]},
[
{:atom, 7, :lambda},
{:list, 7, [{:atom, 7, :x}]},
[{:atom, 8, :+}, {:integer, 8, 1}, {:atom, 8, :x}]
]
]
]
]
]
``````

### Step Two

We can find instances of `defmacro` just like we find instances of `def`. In this case we only find one: `comment`.

### Step Three

We find inovcations of `comment` by looking through the results of the parser for anything that looks like a function call to `comment`.

### Step Four

Converting the output of the parser into an Erie abstract syntax tree will look something like this.

``````[
[:defmacro, :comment, [:ast], nil],
[:def, :loop, [],
[:comment, [:"Enum.map", [1, 2, 3], [:lambda, [:x], [:+, 1, :x]]]]
]
]
``````

### Step Five

To execute the `comment` macro, we need to complie it as if it were a normal function. That is possible with `:compile.forms/1`, but as far as I know, it needs to be contained in a module.

Current strategy is to define all macros as functions in their “macro” module which will be prefixed with `MACRO`. E.g. the module `Erie.Core`’s macros will be defined as functions in `MACRO.Erie.Core`.

TBD.

### Step Seven

The result of the call to `comment` is `nil`, so we can replace that list in the parser output with `nil`. We’ll also remove the macro definition which should leave us with this.

``````[
[
{:atom, 4, :def},
{:atom, 4, :loop},
{:list, 4, []},
{:atom, 5, nil}
]
]
``````

### Step Eight

In this case nothing. `comment` doesn’t need any further expansion, so we can now continue the pipeline and pass this to the translator stage which will produce Erlang Abstract Form for final compilation.

# Proper let Syntax

Code for this post is available at e6c19d930f.

Most of the basic features of Erie are working, but I am still missing `let` bindings. I looked for inspiration in the syntax of other Lisps and found a couple of options. Common Lisp and Racket are techincally the same because Racket treats `[]` as `()`, but I included both because Racket often uses `[]` to make certain forms stand out.

None of the styles strike me as particularly better. Clojure has fewer characters, but I like the way CL and Racket group the binding and the value more explicitly.

### Clojure

``````(let [a 1 b 2]
(+ a b))
``````

### Common Lisp

``````(let ((a 1) (b 2))
(+ a b))
``````

### Racket

``````(let ([a 1]
[b 2])
(+ a b))
``````

Regardless of the syntax, implmenting `let` should be straight forward. Erlang Abstract Format supports `match` which is very similar. In Elixir

``````x = 42
``````

is represented as

``````{:match, 1, {:var, 1, :x}, {:integer, 1, 42}}
``````

in EAF and could be used to represent `let` bindings.

But before implementing that, I found this answer on StackOverflow and it revealed a new way to look at `let`. To sum it up, `let` doesn’t need to be implemented in the compiler. It can be a macro that converts its forms into a tree of nested lambdas. (Samples are using the currently proposed Erie `let` syntax.)

``````(let
[a 1]
[b 2]
(+ a b))
``````

is equivalent to

``````((lambda [a]
((lambda [b]
(+ a b)) 2)) 1)
``````

So instead of adding `let` as a feature in the compiler, I’ll save the syntax decision for when macros are implemented.

# Erie's Function Signatures

Code for this post is available at 9e7d81b810.

I had written previously that a function in Erie might look like this. After the function name is the list of arguments and their types, followed by the return type, a doc comment, and finally the function body.

``````(def to_string [result: (Result String Integer)] String
"""
Convert to result into a printable string to
be used in interfaces.
"""
(case result
[{'ok int} (Integer.to_string int)]
[{'error str} str]))
``````

After further deliberation, I realized including the argument and return types in the definition makes things a little bit awkward if I want to support pattern matching on the arguments. Not only does it make the types redundant, but the colon syntax gets weird.

``````(def to_string [{'ok int}: (Result String Integer)] String
…)

(def to_string [{'error ""}: (Result String Integer)] String
…)

(def to_string [fallthrough: (Result String Integer)] String
…)
``````

Including the comment in the function also gets strange when there are multiple clauses for this function. Which of the clauses should the comment be written under?

My latest thinking is to move the types and doc comment out of the definition and into their own lines. Perhaps something like this.

``````(doc to_string
"""
Convert a result type into a printable
string to be used in interfaces.
""")
(sig to_string (Result String Integer) String)
(def to_string [{'ok int}]
…)

(def to_string [{'error ""}]
…)

(def to_string [fallthrough]
…)
``````

Now the `doc` and `sig` lines could be technically written anywhere in the module, but the formatter can move them to the correct location.

# Parsing Function Definitions

In the previous article, we converted not quite Erie code into Erlang Abstract Format that could be compiled, loaded, and run. I say “not quite” because the code was intentially simplified to get the lexer and parser working. Now we’re going to parse real Erie code with arguments and types in the function definitions. For now all of the type information will be ignored to get things running.

We’ll start with a 0-arity function. The empty list of arguments is represented with `[]`, the return type with the symbol `Integer`, and the last element is the hardcoded return value of `1`.

I’ve been using this guide from Viktor Söderqvist to help me write the tests’ expected values in Erlang Abstract Format.

``````test "0 arity" do
code = """
(defmodule Core)

(def give_me_one [] Integer 1)
"""

{:ok, forms} = Parser.parse(code)

assert {:ok,
[
{:attribute, 1, :module, :"Erie.Core"},
{:attribute, 1, :export, [give_me_one: 0]},
{:function, 3, :give_me_one, 0,
[
{:clause, 3, [], [], [{:integer, 3, 1}]}
]}
]} == Translator.to_eaf(forms)
end
``````

The first thing we need to do is read the name of the module. We can be naïve and expect this to always be the first line. The output from our parser will look something like this. Note the new `:symbol` category that didn’t exist in the previous article.

``````[
[{:atom, 1, :defmodule}, {:symbol, 1, :Core}],
[
{:atom, 3, :def},
{:atom, 3, :give_me_one},
[],
{:symbol, 3, :Integer},
{:atom, 3, 1}
]
]
``````

The new `:symbol` category is applied to any atom that begins with `'` or a capital letter i.e. `'my_symbol` or `MySymbol`. To translate this code to EAF, we’ll build a specialized function that matches the entire `defmodule` list. We automatically prepend `Erie.` to the module name, similar to what Elixir does, to provide some namespacing.

``````def extract_module([[{:atom, _, :defmodule}, {:symbol, line, mod}] | rest]) do
mod = ("Erie." <> Atom.to_string(mod)) |> String.to_atom()
{:ok, mod, line, rest}
end
``````

Once we have the module, we can translate each function. Here we’re matching on the `def` atom, the name of the function, an empty argument list, and ignoring the return type. We’re also storing the function name and arity in a seperate structure so we can export it later to make it publicy invokable.

``````def translate(struct, [form | tail], ret) do
case form do
[{:atom, _, :def}, {:atom, line, name}, [], {:symbol, _l2, _ret_type} | body] ->
body = translate_body(body)

%{struct | functions: [{name, 0} | struct.functions]}
|> translate(tail, [
{:function, line, name, arity, [{:clause, line, [], [], body}]}
| ret
])
end
end

def translate_body([{:integer, line, val} | rest], accum) do
translate_body(rest, [{:integer, line, val} | accum])
end

def translate_body([], accum) do
accum
en
``````

This is able to successfully translate the 0-arity function, which is great, but obviously not enough. Next we’ll translate a 1-arity function that returns the same integer that was passed as an argument.

``````test "1 arity" do
code = """
(defmodule Core)

(def identity ['x Integer] Integer x)
"""

{:ok, forms} = Parser.parse(code)

assert {:ok,
[
{:attribute, 1, :module, :"Erie.Core"},
{:attribute, 1, :export, [identity: 1]},
{:function, 3, :identity, 1,
[
{:clause, 3, [{:var, 3, :x}], [], [{:var, 3, :x}]}
]}
]} == Translator.to_eaf(forms)
end
``````

To handle the `identity` function, we need to translate the arguments and the body. Popping the first element(s) off their respective list, converting to EAF, then rebuilding the list has a been a very simple and straightforward solution.

``````def translate(struct, [form | tail], ret) do
case form do
[{:atom, _, :def}, {:atom, line, name}, params, {:symbol, _l2, _return_type} | body] ->
body = translate_body(body)
params = translate_params(params)
arity = Enum.count(params)

%{struct | functions: [{name, arity} | struct.functions]}
|> translate(tail, [
{:function, line, name, arity, [{:clause, line, params, [], body}]}
| ret
])
end
end

def translate_body(list) do
list |> translate_body([]) |> Enum.reverse()
end

def translate_body([{:atom, line, val} | rest], accum) do
translate_body(rest, [{:var, line, val} | accum])
end

def translate_body([], accum) do
accum
end

def translate_params(list) do
list |> translate_params([]) |> Enum.reverse()
end

def translate_params([{:symbol, line, name}, {:symbol, _, _type} | rest], accum) do
translate_params(rest, [{:var, line, name} | accum])
end

def translate_params([], accum) do
accum
end
``````

With a couple of tweaks to our `.yrl` and `.xrl` files and this code, we can parse something that actually looks like Erie. It also puts us in a good place to extend the parsing code and handle more advanced scenarios.

# Parsing Erie to EAF

To create a new BEAM language, one has many options on what to compile to first, before letting the Erlang compiler take over. Two options are Erlang Abstract Form (EAF) and Core Erlang. Elixir compiles to EAF and Lisp Flavoured Erlang compiles to Core Erlang. The Core Erlang specification is from 2004 and seems a bit too complex. EAF is much simpler to me as it’s a tree made from standard Erlang terms like lists and tuples.

Before getting started on the compiler, spend time learning leex and yecc. There are some fantastic articles linked under References that helped me, and you should read them for more context on this article.

### src/scanner.xrl

This is a simple leex file, but it can already tokenize a basic lisp. It’s not good enough for Erie, but it’s a good base.

It can tokenize integers and floats and treats everything else as an atom. Note how it considers commas as whitespace (stolen from Clojure).

``````Definitions.

Digit      = [0-9]
Alpha      = [A-Za-z_]
Whitespace = [\n\t\s,]

Rules.

\(                 : {token, {'(', TokenLine}}.
\)                 : {token, {')', TokenLine}}.
{Digit}+\.{Digit}+ : {token, {float, TokenLine, TokenChars}}.
{Digit}+           : {token, {int, TokenLine, list_to_integer(TokenChars)}}.
{Alpha}+           : {token, {atom, TokenLine, list_to_atom(TokenChars)}}.
{Whitespace}+      : skip_token.

Erlang code.
``````

Using Elixir, we invoke the resulting `scanner` module. It’s able to identify each token and its category (float/int/atom). Note the call to `to_charlist/1`: our `scanner` module is Erlang and needs special treatment when called from Elixir.

``````iex(1)> code = """
...(1)> (defmodule Core)
...(1)>
...(1)> (def give_me_one 1)
...(1)> """
iex(2)> {:ok, tokens, _} = code |> String.to_charlist() |> :scanner.string()
{:ok,
[
{:"(", 1},
{:atom, 1, :defmodule},
{:atom, 1, :Core},
{:")", 1},
{:"(", 3},
{:atom, 3, :def},
{:atom, 3, :give_me_one},
{:int, 3, 1},
{:")", 3}
], 4}
``````

### src/parser.yrl

After tokenization we need to parse the result, using yecc, into more meaningful forms. Because our lisp is so simple, this parser doesn’t have much to do other than create lists based on the parentheses.

Note that each of the items in the `Terminals` list should map to the the first element of the tuples produced by our `scanner` module.

``````Nonterminals list elems elem group.
Terminals '(' ')' int float atom.
Rootsymbol group.

group -> list       : ['\$1'].
group -> list group : ['\$1'|'\$2'].

list -> '(' ')'       : [].
list -> '(' elems ')' : '\$2'.

elems -> elem       : ['\$1'].
elems -> elem elems : ['\$1'|'\$2'].

elem -> int   : '\$1'.
elem -> float : '\$1'.
elem -> atom  : '\$1'.
elem -> list  : '\$1'.

Erlang code.
``````

The output of our `parser` module looks very similar to our original code.

``````iex(3)> {:ok, list} = :parser.parse(tokens)
{:ok,
[
[{:atom, 1, :defmodule}, {:atom, 1, :Core}],
[{:atom, 3, :def}, {:atom, 3, :give_me_one}, {:int, 3, 1}]
]}
``````

We still don’t have Erlang Abstract Form, but we’re ready to convert the output of the `parser` module into EAF.

### lib/erie.ex

This function is highly customized and won’t work with any other code, but it ultimately produces enough EAF to call our lisp function from IEx.

Note the `:export` line. This is hardcoded, but makes the `give_me_one` function public and allows us to invoke it.

``````def translate([form | tail], ret) do
case form do
[{:atom, _l1, :defmodule}, {:atom, l2, mod}] ->
translate(tail, [
{:attribute, l2, :export, [{:give_me_one, 0}]},
{:attribute, l2, :module, mod} | ret
])

[{:atom, l1, :def}, {:atom, _l2, name}, {:int, l3, val}] ->
translate(tail, [
{:function, l1, name, 0,
[{:clause, l3, [], [], [{:integer, l3, val}]}]
} | ret])

_ ->
translate(tail, ret)
end
end
``````

Our final step is to reverse the forms, compile them, and load the resulting binary. We’re using the builtin Erlang functions `:compile.forms/1` and `:code.load_binary/3`. Note the `'nofile'` charlist. We’re back to calling Erlang from Elixir, so we need to abide.

``````iex(4)> forms = list |> Erie.translate([]) |> Enum.reverse()
[
{:attribute, 1, :module, :Core}
{:attribute, 1, :export, [give_me_one: 0]},
{:function, 3, :give_me_one, 0, [{:clause, 3, [], [], [{:integer, 3, 1}]}]},
]
iex(5)> {:ok, mod, bin} = :compile.forms(forms)
{:ok, :Core, <<70, 79, 82, 49, 0, 0, 1, 188, 66, ...>>}
{:module, :Core}
``````

Now that our module is loaded, we can invoke our function.

``````iex(7)> :Core.give_me_one()
1
``````

And with that, we have successfully tokenized, parsed, and compiled a brand new language on the BEAM.

Archive