Case Expressions

Case matching in Elixir gives you a few different ways to check the type of a value. Here we check if val is a map or list.

def foo(val) do
  case val do
    %{} = map ->
      map |> Map.keys() |> Enum.join(",")

    list when is_list(list) ->
      Enum.join(list, ",")

    _ ->
      ""
  end
end

Erie needs some syntactic way to allow matching in a case expression like this. One option is to compel Elixir’s match operator (=) to work in prefix notation.

(def foo [val]
  (case val
    [(= %{} map) (-> map (Map.keys) (Enum.join ","))
     (= [] list) (Enum.join list ",")
     _ ""]))

I don’t like this (and this applies to Elixir as well) because %{} matches on all maps, regardless of if they’re empty or not, but [] only matches on empty lists. I’d rather match on explicit Erie type names to accomplish this.

(def foo [val]
  (case val
    [(= Map map) …
     (= List list) …
     _ …]))

This could potentially work, but Erie’s union types can make it impossible for the compiler to handle. In this next example, how will the compiler know if list is a (List String) or a (List Integer)? toUpper can only operate on Strings, so the compiler should fail on the last line.

(deftype EitherList []
  (union [(List String) (List Integer)]))

(sig foo [EitherList] (List String))
(def foo [val]
  (case val
    [(= List list) (Enum.map list toUpper)]))

Matching on the entire type name should fix that problem.

(deftype EitherList []
  (union [(List String) (List Integer)]))

(sig foo [EitherList] (List String))
(def foo [val]
  (case val
    [(= (List String) list) (Enum.map list toUpper)
     (= (List Integer) list) (Enum.map list toString)]))

What about when you want undefined polymorphic type parameters? Should this type of matching be allowed? Exploring this in the next example, the first match, on (List String), seems like it should be compilable because it returns the exact same type it operates on. But the second match, on (List Integer) doesn’t seem like it should compile. It returns a different type than it matches on, and how can the compiler know if that fits the EitherList pattern?

(deftype EitherList [a b]
  (union [(List a) (List b)]))

(sig foo [(EitherList a b)] (EitherList a b))
(def foo [val]
  (case val
    [(= (List String) list) (Enum.map list toUpper)
     (= (List Integer) list) (Enum.map list toString)
     (= (List a) list) list]))

For now I’ll disallow matching on the type parameter. I’m not sure it’s even technically possible to ever accomplish so I’m happy to hold off.


That should cover matching on types, but what about values? Elixir’s ability to match on values is also worth copying.

def foo(val) do
  case val do
    [] -> "empty"
    [x] -> "one element"
    [x|_] -> "multiple elements"
  end
end

I don’t really like the [x|_] pattern because it feels too infix-y for a lisp, but I’ll leave it in the examples. It’s likely to change before making its way into Erie for good.

(sig foo [(List String)] String)
(def foo [val]
  (case val
    [[] "empty"
     [x] "one element"
     [x|_] "multiple elements"]))

This should combine well with union type matching. It’s a little dense, but I think it’s still readable.

(deftype EitherList []
  (union [(List String) (List Integer)]))

(sig foo [EitherList] String)
(def foo [val]
  (case val
    [(= (List String) [x]) "one element string list"
     (= (List Integer) []) "empty integer list"
     (= (List String) list) "empty or mulitple value string list"
     _ "anything else"]))

The one problem that still needs exploring is how to get the code to match on the right branch.

(deftype EitherList []
  (union [(List String) (List Integer)]))

(sig foo [EitherList] String)
(def foo [val]
  (case val
    [(= (List String) []) "empty string list"
     (= (List Integer) []) "empty integer list"
     _ "anything else"])

Once this type checks, the compiled Erlang Abstract Form will know that val is a list, but if it’s empty, there won’t be any other type information to glean from it and help with matching. I’m not sure exactly how to fix this, but it’s something to keep thinking about.

Type Expansion

Code for this post is available at 651548ffa8.

To confirm that a function type-checks correctly requires inferring the type of the return value and then checking it matches with the type signature. A return value can either be a function parameter, a return value of an inner expression, or a literal.

(sig return_value_as_parameter [Integer] Integer)
(def return_value_as_parameter [x]
  x)

(sig return_value_from_inner_expression [] Integer)
(def return_value_from_inner_expression []
  (* 3 (+ 1 1)))

(sig return_value_as_literal [] Integer)
(def return_value_as_literal [] Integer
  0)

In each of these cases, we can easily infer the type of the return value is Integer and that it matches with the signature. Things get a bit more complicated when you introduce algebraic data types.

(deftype IntOrString []
  (union [Integer String]))

(sig foo [] IntOrString)
(def foo []
  1)

Now it’s not as simple to compare the inferred type of Integer with the signature’s type of IntOrString. Instead of an exact Integer to Integer comparision, we need to expand1 IntOrString into a list of its possible types, and then compare Integer to each of those until we find a match. If the return type is composed of several other union types, we need to expand recursively to get all possible types.

Recursive expansion does expose the possiblity of an infitie recursive type, like those that define lists and trees. The current version of the Erie compiler will infitie loop when it finds a type like this; fixing that will come later.

(deftype RecursiveList [a]
  (union [{a nil} {a (RecursiveList a)}]))

; instance of (RecursiveList Integer) might look like
; {4 {3 {2 {1 nil}}}}

(deftype RecursiveTree [a]
  (union [{'leaf a} {'left (RecursiveTree a) 'right (RecursiveTree a)}]))

; instance of (RecursiveTree Integer) might look like
; {'left {'leaf 1} 'right {'left {'leaf 2} 'right {'leaf 3}}}

As of today, the Erie compiler is about as naive as it gets when it comes to type expansion. It all happens just-in-time and nothing is cached. This means every time the compiler encounters a signature with a return value of IntOrString, it will completely expand it and check, then throw everything away. This is no doubt the slowest way to do type checking, but it was the simplest way for now.

  1. I'm using the phrase "type expansion", but I don't know if that is the phrase favored by compiler authors. My googling efforts have been fruitless. It makes sense to me so I'm sticking with it.

Algebraic Data Types

Code for this post is available at 651548ffa8.

The builtin types are all well and good, but Erie isn’t much without a few additions: maps/dictionaries, structs, and especially alegbraic data types (ADT). Erie will eventually support all of these, but I focused on implementing ADTs first. I refer to them as “union” types throughout the compiler, but they are one in the same.

I want all Erie types to be declared with deftype, regardless of the ultimate shape of the type (i.e. ADT, struct, or type alias). By making the signature of deftype look like a function, it makes adding type parameters straight forward for supporting polymorphic types.

(deftype Maybe [a]
  (union [a 'nil]))

One big difference between Erie’s ADTs and those of other langauges that I’m familiar with is Erie doesn’t required “tags” for each option. For example, in Elm, representing Maybe looks like:

type Maybe a
  = Just a
  | Nothing

Note how the non-Nothing case requires the Just tag. This means a value of type String does not conform to the Maybe String type in Elm. In Erie, String will conform to (Maybe String) because of the lack of tags. I’m not aware (at least not yet) of why tags are necessary. I may run into the reason later in development, but for now, it seems to be working. If you, as a developer, like the tagged approach you can mimic it with tuples in the definition.

(deftype Result [error value]
  (union [{'ok value} {'error error}]))

Erie treats a quoted symbol as a type with a single value of itself. It allows for the above which will integrate nicely with Erlang and Elixir.

In the future I want deftype to be a macro that generates useful functions regarding that type. Specifically allowing matching in a case expression. For now deftype is treated as a special form by the compiler. As I continue to expand the functionality of macros, a large amount of that work should shift out of the compiler and into Erie macros.

Type Checker

Code for this post is available at ed4ee0b78.

Erie now has a very basic type checker. It supports simple builtin types like Integer and String, and the slightly more complex polymorphic list and tuple types. User-defined types are not supported yet.

Testing

The type checker has provided a pretty amazing opportunity for test-driven development. The type checker code has so many paths and branches that having a complete test suite helps prevent me from introducing regression bugs.

I have also found assert_raise to be indespensible. A proper type checker would collect errors and print them all at the end of compilation, but for now, it raises errors when the type checker finds a mismatch.

test "literal list type fails with mismatched types" do
  code = """
  (sig one [] [Integer])
  (def one []
    [1 2 "3" 4])
  """

  {:ok, forms} = Parser.parse(code)
  translator = Translator.from_parsed(forms, {:Core, 1}, false)

  assert_raise RuntimeError, fn ->
    TypeChecker.check(translator)
  end
end

Error Messages

When writing a type checker, one should aspire to the level of detail and helpfulness in Elm type checker messages. The errors raised by the Erie type checker right now are just good enough to help me debug any example Erie code I write, but definitely not good enough to live in the compiler long-term.

def expression_type({:var, _, name}, bindings, _signatures) do
  bindings
  |> Enum.find(fn {n, _} -> n == name end)
  |> case do
    nil -> raise "Returning a binding `#{name}` that isn't a function parameter"
    {_, type} -> type
  end
end

As much as I’d like to add great error messages now, that’s a major project in and of itself.

REPL Start

Code for this post is available at 2734fc099d.

A basic REPL will go a long way toward making Erie feel like a true Lisp. It doesn’t need to be fancy and support helper functions, tab-completion, or even the “up arrow” to redo the last expression. All we want (for now) is to read an expression, evaluate it on the fly, print the result, and repeat.

I was able to put together something that reads input and starts a new Task to compile and evaluate it. But this isn’t terribly useful because Erie is missing a standard library. The only functions that can be called are Elixir or Erlang functions (but they work!).

Core(1)> (Elixir.String.split "abc def ghi")
["abc", "def", "ghi"]
Core(2)> ('erlang.atom_to_binary 'derp 'unicode)
"derp"

This is a nice start but defining our own functions in the REPL and calling them would be better.

In Elixir’s REPL, you can do this, but not using def … do … end (unless it’s wrapped in defmodule … do … end). That’s a bit obnoxious, so I want my REPL to handle (def …) gracefully. I do that by stealing a feature of Clojure’s REPL where you’re effectively “in” a module while in the REPL, so any function defined is a part of that module.

I want to make as much use of my Translator module as possible, but as it’s currently written, it requires all function definitions to be inside a module (now I see why Eilixr may have made that decision). To get around this, I wrap each call to (def …) in a new module thats name is based on a counter. For example, this defintion of x in the REPL

Core(1)> (def x [] 1)

would be compiled as if I wrote

(defmodule Core1) (def x [] 1)

And this defintion of y

Core(2)> (def y [] 2)

would be compiled as if I wrote

(defmodule Core2) (def y [] 2)

This works great, but now calling x the intended way produces an error because it’s not defined in the Core module.

Core(1)> (def x [] 1)
Core(2)> (Core.x)

20:27:31.177 [error] Task #PID<0.143.0> started from Erie.Repl terminating
** (UndefinedFunctionError) function :Core.x/0 is undefined (module :Core is not available)

To solve this, we can keep track of all functions defined in the REPL and their associated modules. If one of them is invoked under the Core module, we can intercept the call and translate it to the correct module.

The REPL is by no means complete, but it feels good to be able to write some Erie and see it execute in realtime.

Macro and Other Updates

Code for this post is available at 0646b91480.

Erie can now handle some basic macros. Very basic. It only does macro expansion once, and it doesn’t correctly call Erie definied functions from within a macro.

Despite those limitations it can correctly compile this module. However, cons and + aren’t properly defined as functions, so executing either one_plus_two or the_first will fail.

(defmodule Core)

(defmacro infix [ast]
  ((lambda [list]
    [(Elixir.Enum.at list 1) (Elixir.Enum.at list 0) (Elixir.Enum.at list 2)])
    (Elixir.Enum.at ast 0)))

(def one_plus_two []
  (infix (1 + 2)))

(defmacro a_list [ast]
  ['cons 1 nil])

(def the_first []
  (Elixir.List.first (a_list nil)))

I’m pretty happy with the progress I’ve made so far on Erie, but there is obviously a lot more to do. Some possible next steps are:

  • REPL
  • Types
  • Standard library
  • More advanced macros
  • Syntax highlighting

More advanced macros can definitely wait. A standard library will be very handy, but it’s not preventing me from writing Erie right now. Types are going to be very important soon, but I think a REPL will have the biggest impact. The ability to write small bits of Erie and execute it immediately will help shape the ergonomics and dictate what should be included in the standard library.

Macro Plan

No Lisp is complete without a macro system. If only it were so easy to implement.

Having never written a language before, let alone a macro system, I’m going to try to break it down into manageable pieces. I have no idea if this is the right approach, but we’ll start with it and see where it takes us.

The steps I’m envisioning are

  1. Parse the code.
  2. Find instances of defmacro.
  3. Find invocations of each of those macros.
  4. Convert the output of the parser into an Erie abstract syntax tree.
  5. Execute each macro with the AST as the argument.
  6. Support “unquoting” somehow.
  7. Replace the macros in the AST with their results.
  8. Repeat until all macros have been expanded.

The simplest macro I can think of is comment: replace the inner forms with a single nil.

(defmacro comment [ast]
  nil)

(def loop []
  (comment
    (Enum.map [1 2 3]
      (lambda [x]
        (+ 1 x)))))

Step One

After parsing, we have this.

[
  [
    {:atom, 1, :defmacro},
    {:atom, 1, :comment},
    {:list, 1, [{:atom, 1, :ast}]},
    {:atom, 2, nil}
  ],
  [
    {:atom, 4, :def},
    {:atom, 4, :loop},
    {:list, 4, []},
    [
      {:atom, 5, :comment},
      [
        {:symbol, 6, :"Enum.map"},
        {:list, 6, [{:integer, 6, 1}, {:integer, 6, 2}, {:integer, 6, 3}]},
        [
          {:atom, 7, :lambda},
          {:list, 7, [{:atom, 7, :x}]},
          [{:atom, 8, :+}, {:integer, 8, 1}, {:atom, 8, :x}]
        ]
      ]
    ]
  ]
]

Step Two

We can find instances of defmacro just like we find instances of def. In this case we only find one: comment.

Step Three

We find inovcations of comment by looking through the results of the parser for anything that looks like a function call to comment.

Step Four

Converting the output of the parser into an Erie abstract syntax tree will look something like this.

[
  [:defmacro, :comment, [:ast], nil],
  [:def, :loop, [],
    [:comment, [:"Enum.map", [1, 2, 3], [:lambda, [:x], [:+, 1, :x]]]]
  ]
]

Step Five

To execute the comment macro, we need to complie it as if it were a normal function. That is possible with :compile.forms/1, but as far as I know, it needs to be contained in a module.

Current strategy is to define all macros as functions in their “macro” module which will be prefixed with MACRO. E.g. the module Erie.Core’s macros will be defined as functions in MACRO.Erie.Core.

Step Six

TBD.

Step Seven

The result of the call to comment is nil, so we can replace that list in the parser output with nil. We’ll also remove the macro definition which should leave us with this.

[
  [
    {:atom, 4, :def},
    {:atom, 4, :loop},
    {:list, 4, []},
    {:atom, 5, nil}
  ]
]

Step Eight

In this case nothing. comment doesn’t need any further expansion, so we can now continue the pipeline and pass this to the translator stage which will produce Erlang Abstract Form for final compilation.

Proper let Syntax

Code for this post is available at e6c19d930f.

Most of the basic features of Erie are working, but I am still missing let bindings. I looked for inspiration in the syntax of other Lisps and found a couple of options. Common Lisp and Racket are techincally the same because Racket treats [] as (), but I included both because Racket often uses [] to make certain forms stand out.

None of the styles strike me as particularly better. Clojure has fewer characters, but I like the way CL and Racket group the binding and the value more explicitly.

Clojure

(let [a 1 b 2]
  (+ a b))

Common Lisp

(let ((a 1) (b 2))
  (+ a b))

Racket

(let ([a 1]
      [b 2])
  (+ a b))

Regardless of the syntax, implmenting let should be straight forward. Erlang Abstract Format supports match which is very similar. In Elixir

x = 42

is represented as

{:match, 1, {:var, 1, :x}, {:integer, 1, 42}}

in EAF and could be used to represent let bindings.

But before implementing that, I found this answer on StackOverflow and it revealed a new way to look at let. To sum it up, let doesn’t need to be implemented in the compiler. It can be a macro that converts its forms into a tree of nested lambdas. (Samples are using the currently proposed Erie let syntax.)

(let
  [a 1]
  [b 2]
  (+ a b))

is equivalent to

((lambda [a]
  ((lambda [b]
    (+ a b)) 2)) 1)

So instead of adding let as a feature in the compiler, I’ll save the syntax decision for when macros are implemented.

Erie's Function Signatures

Code for this post is available at 9e7d81b810.

I had written previously that a function in Erie might look like this. After the function name is the list of arguments and their types, followed by the return type, a doc comment, and finally the function body.

(def to_string [result: (Result String Integer)] String
  """
  Convert to result into a printable string to
  be used in interfaces.
  """
  (case result
    [{'ok int} (Integer.to_string int)]
    [{'error str} str]))

After further deliberation, I realized including the argument and return types in the definition makes things a little bit awkward if I want to support pattern matching on the arguments. Not only does it make the types redundant, but the colon syntax gets weird.

(def to_string [{'ok int}: (Result String Integer)] String
  …)

(def to_string [{'error ""}: (Result String Integer)] String
  …)

(def to_string [fallthrough: (Result String Integer)] String
  …)

Including the comment in the function also gets strange when there are multiple clauses for this function. Which of the clauses should the comment be written under?

My latest thinking is to move the types and doc comment out of the definition and into their own lines. Perhaps something like this.

(doc to_string
  """
  Convert a result type into a printable
  string to be used in interfaces.
  """)
(sig to_string (Result String Integer) String)
(def to_string [{'ok int}]
  …)

(def to_string [{'error ""}]
  …)

(def to_string [fallthrough]
  …)

Now the doc and sig lines could be technically written anywhere in the module, but the formatter can move them to the correct location.

Parsing Function Definitions

In the previous article, we converted not quite Erie code into Erlang Abstract Format that could be compiled, loaded, and run. I say “not quite” because the code was intentially simplified to get the lexer and parser working. Now we’re going to parse real Erie code with arguments and types in the function definitions. For now all of the type information will be ignored to get things running.

We’ll start with a 0-arity function. The empty list of arguments is represented with [], the return type with the symbol Integer, and the last element is the hardcoded return value of 1.

I’ve been using this guide from Viktor Söderqvist to help me write the tests’ expected values in Erlang Abstract Format.

test "0 arity" do
  code = """
  (defmodule Core)

  (def give_me_one [] Integer 1)
  """

  {:ok, forms} = Parser.parse(code)

  assert {:ok,
          [
            {:attribute, 1, :module, :"Erie.Core"},
            {:attribute, 1, :export, [give_me_one: 0]},
            {:function, 3, :give_me_one, 0,
              [
                {:clause, 3, [], [], [{:integer, 3, 1}]}
              ]}
          ]} == Translator.to_eaf(forms)
end

The first thing we need to do is read the name of the module. We can be naïve and expect this to always be the first line. The output from our parser will look something like this. Note the new :symbol category that didn’t exist in the previous article.

[
  [{:atom, 1, :defmodule}, {:symbol, 1, :Core}],
  [
    {:atom, 3, :def},
    {:atom, 3, :give_me_one},
    [],
    {:symbol, 3, :Integer},
    {:atom, 3, 1}
  ]
]

The new :symbol category is applied to any atom that begins with ' or a capital letter i.e. 'my_symbol or MySymbol. To translate this code to EAF, we’ll build a specialized function that matches the entire defmodule list. We automatically prepend Erie. to the module name, similar to what Elixir does, to provide some namespacing.

def extract_module([[{:atom, _, :defmodule}, {:symbol, line, mod}] | rest]) do
  mod = ("Erie." <> Atom.to_string(mod)) |> String.to_atom()
  {:ok, mod, line, rest}
end

Once we have the module, we can translate each function. Here we’re matching on the def atom, the name of the function, an empty argument list, and ignoring the return type. We’re also storing the function name and arity in a seperate structure so we can export it later to make it publicy invokable.

def translate(struct, [form | tail], ret) do
  case form do
    [{:atom, _, :def}, {:atom, line, name}, [], {:symbol, _l2, _ret_type} | body] ->
      body = translate_body(body)

      %{struct | functions: [{name, 0} | struct.functions]}
      |> translate(tail, [
        {:function, line, name, arity, [{:clause, line, [], [], body}]}
        | ret
      ])
  end
end

def translate_body([{:integer, line, val} | rest], accum) do
  translate_body(rest, [{:integer, line, val} | accum])
end

def translate_body([], accum) do
  accum
en

This is able to successfully translate the 0-arity function, which is great, but obviously not enough. Next we’ll translate a 1-arity function that returns the same integer that was passed as an argument.

test "1 arity" do
  code = """
  (defmodule Core)

  (def identity ['x Integer] Integer x)
  """

  {:ok, forms} = Parser.parse(code)

  assert {:ok,
          [
            {:attribute, 1, :module, :"Erie.Core"},
            {:attribute, 1, :export, [identity: 1]},
            {:function, 3, :identity, 1,
              [
                {:clause, 3, [{:var, 3, :x}], [], [{:var, 3, :x}]}
              ]}
          ]} == Translator.to_eaf(forms)
end

To handle the identity function, we need to translate the arguments and the body. Popping the first element(s) off their respective list, converting to EAF, then rebuilding the list has a been a very simple and straightforward solution.

def translate(struct, [form | tail], ret) do
  case form do
    [{:atom, _, :def}, {:atom, line, name}, params, {:symbol, _l2, _return_type} | body] ->
      body = translate_body(body)
      params = translate_params(params)
      arity = Enum.count(params)

      %{struct | functions: [{name, arity} | struct.functions]}
      |> translate(tail, [
        {:function, line, name, arity, [{:clause, line, params, [], body}]}
        | ret
      ])
  end
end

def translate_body(list) do
  list |> translate_body([]) |> Enum.reverse()
end

def translate_body([{:atom, line, val} | rest], accum) do
  translate_body(rest, [{:var, line, val} | accum])
end

def translate_body([], accum) do
  accum
end

def translate_params(list) do
  list |> translate_params([]) |> Enum.reverse()
end

def translate_params([{:symbol, line, name}, {:symbol, _, _type} | rest], accum) do
  translate_params(rest, [{:var, line, name} | accum])
end

def translate_params([], accum) do
  accum
end

With a couple of tweaks to our .yrl and .xrl files and this code, we can parse something that actually looks like Erie. It also puts us in a good place to extend the parsing code and handle more advanced scenarios.

Archive