Type safety prehistory

Gain a newfound appreciation for progress and preservation by diving into the domain-theoretic arguments of yore and yesteryear.

Most are probably familiar with progress and preservation, the two lemmas underlying a standard (syntactic) proof of type safety. (See e.g. TAPL, PFPL, PLFA, software foundations, etc.) They’re in every PL textbook, and a recent paper even called progress and preservation “one of the “greatest hits” of programming languages research of the past three decades”. And they’re cool, sure, but what’s so special about using these two lemmas? For the most part the proofs don’t involve any fancy tricks or complex machinery – you just bash out the cases.

But this lack of complex machinery is exactly what’s so great about syntactic type safety. To understand why, let’s take a look at a historical technique for proving type safety.

Proof outline

Proof adapted from Milner’s 1977 paper “A theory of type polymorphism in programming”

This is a semantic type safety proof. Whereas syntactic type safety proves type safety relative to a operational semantics, semantic type safety proves safety relative to a denotational semantics. Here are the steps involved:

  1. Define an untyped denotational semantics for the programming language, including a special value wrong indicating a type error happened
  2. Define a “semantic typing relation” which says which semantic values have each type. Importantly, wrong has no type.
  3. Prove lemmas about the semantic typing relation corresponding to all the inference rules

This is where Milner’s paper stops: he doesn’t even write down any inference rules, instead directly using these lemmas to prove soundness of the typechecking algorithm. But from a modern perspective, we can conclude, by induction on the typing derivation, type safety: well-typed programs can’t go wrong.

Throughout we’ll use a simple ML as our programming language of interest: \[ \begin{align*} \text{Variables } x &\in \Sigma, \text{ an infinite set of variables}\\ \text{Built-ins } b &\in B, \text{ a set of built-in operations}\\ \text{Terms } e &::= x \mid b \mid e_1e_2 \mid \lambda x.e \mid \texttt{letrec $x = e_1$ in $e_2$} \\ \text{Types } \tau, \nu &::= T_1 \mid \dots \mid T_n \mid \tau \to \nu \end{align*} \] With a simple type system: \[ \frac{(x:\tau) \in \Gamma}{\Gamma \vdash x:\tau} \hspace{1.5em} \frac{\Gamma \vdash f:\tau\to \nu \hspace{1.5em} \Gamma \vdash x:\tau}{\Gamma \vdash f(x) : \nu} \hspace{1.5em} \frac{\Gamma, x:\tau \vdash e:\nu}{\Gamma \vdash \lambda x.e : \tau \to \nu} \hspace{1.5em} \frac{\Gamma, x:\tau \vdash e_1:\tau \hspace{1.5em} \Gamma,x:\tau \vdash e_2:\nu}{\Gamma \vdash \texttt{letrec $x = e_1$ in $e_2$} : \nu} \] For convenience let’s require all variables in \(\Gamma\) to be distinct, and I’ll assume the built-ins \(b\) and base types \(T_i\) come with their own typing rules as well.

1. A denotational semantics for untyped ML

Our programming language is a simple ML, with a fixed collection of base types \(T_1 \dots T_n\). We will interpret everything as values in a domain \(V\), which will include a special value \(\bot\) representing nontermination.

In order to even define the denotational semantics, we’re gonna need quite a bit of mathematical machinery. A reference here is Abramsky and Jung’s notes, or Amadio and Curien’s “Domains and lambda calculi”.

1.1. Domains and fixed points

Recursive definitions are tricky. A recursive definition can be viewed as a kind of limit of all the partial outputs of an infinite process. This motivates the following definition:

Definition. An \(\omega\)-cpo is a partially ordered set \(X\) with a least element \(\bot\) such that every chain \(x_1 \sqsubseteq x_2 \sqsubseteq x_3 \sqsubseteq \dots\) has a least upper bound \(\bigvee_n x_n\). A function \(f : X \to Y\) between \(\omega\)-cpos is continuous if it is monotone and preserves upper bounds of chains: for every chain \(x_1 \sqsubseteq x_2 \sqsubseteq x_3 \sqsubseteq \dots\) in \(X\), we have \(f(\bigvee_n x_n) = \bigvee_n f(x_n)\).

If we make our domain \(V\) an \(\omega\)-cpo, and make our functions continuous, then we can interpret recursive functions as least fixed points:

Proposition. Let \(X\) be an \(\omega\)-cpo, and \(f : X \to X\) a continuous function. Then \(\bot \sqsubseteq f(\bot) \sqsubseteq f(f(\bot)) \sqsubseteq \dots\) since \(\bot \sqsubseteq f(\bot)\) and \(f\) is monotone, and the least upper bound \(\bigvee_n f^n(\bot)\) is the least fixed point of \(f\).

Some helpful constructions on \(\omega\)-cpos:

  • If \(S\) is any set, we can turn it into an \(\omega\)-cpo \(S^\bot\) by giving it the discrete order, then adding a least element \(\bot\).
  • If \(X, Y\) are \(\omega\)-cpos, then the collection of continuous functions \([X \to Y]\) is an \(\omega\)-cpo, ordered pointwise. The operation \(\texttt{fix} : [X \to X] \to X\) assigning each \(f : X \to X\) to its least fixed point \(\bigvee_n f^n(\bot)\) is continuous.
  • We have arbitrary Cartesian products of \(\omega\)-cpos; these work as you’d expect. As a bonus, functions out of a finite product of \(\omega\)-cpos are continuous iff they’re continuous in each argument separately.
  • When taking disjoint unions, we have to be careful for the result to still have a least element. So we identify \(\bot_X\) with \(\bot_Y\) in \(X \sqcup Y\), to get the reduced sum \(X \oplus Y\). It works as you’d expect (though it’s not quite the categorical coproduct of \(\omega\)-cpos).

We’ll also need ideas of subdomains. If \(D \subseteq D'\) are \(\omega\)-cpos, then \(D\) is a nice subdomain of \(D'\) (not standard terminology) if

  • the inclusion function \(\iota : D \to D'\) is continuous, and
  • there exists a (necessarily unique) continuous projection function \(\pi : D' \to D\) such that \(\pi(x) \sqsubseteq x\), with equality holding iff \(x \in D\).

More generally, we can consider a domain \(D\) as a nice subdomain of \(D'\) whenever we have an injective function \(\iota\) and surjective function \(\pi\) satisfying the above conditions. These subdomains have nice properties from a universal algebra perspective because \(D\) is both a subobject and a quotient of \(D'\), and from a categorical perspective \(\iota\) and \(\pi\) are an adjunction.

1.2. Solving the domain equation

We want a value \(x \in V\) to be either \(\bot\), or an element of one of our base types \(T_i\), or a continuous function \(V \to V\). We’ll also add the extra value \(\texttt{wrong}\) representing that a type error happened. So we are looking for an \(\omega\)-cpo \(V\) satisfying \[V \cong T_1^\bot \oplus \dots \oplus T_n^\bot \oplus \{\texttt{wrong}\}^\bot \oplus [V \to V]\]

Theorem. There exists an \(\omega\)-cpo \(V\) satisfying this isomorphism. To construct it, let \(E\) denote the initial terms \(T_1^\bot\oplus\dots\oplus T_n^\bot\oplus\{\texttt{wrong}\}^\bot\). Construct a sequence of approximate solutions by \[\begin{align*} V_0 &= \{\bot\}\\ V_1 &= E \oplus [V_0 \to V_0]\\ V_2 &= E \oplus [V_1 \to V_1]\\ \vdots \end{align*} \] Consider \(V_0\) as a nice subdomain of \(V_1\) by the inclusion \(\iota_0(\bot) = \bot\) and the projection \(\pi_0(x) = \bot\). Each subsequent \(V_i\) is a nice subdomain of \(V_{i+1}\) by \[ \begin{align*} \iota_i(x) &= x, \text{ if $x \in E$}&&\iota_i(f) = \iota_{i-1}\circ f\circ\pi_{i-1}, \text { if $x \in [V_{i-1} \to V_{i-1}]$}\\ \pi_i(x) &= x, \text{ if $x \in E$}&&\pi_i(f) = \pi_{i-1}\circ f \circ \iota_{i-1}, \text{ if $x \in [V_i \to V_i]$} \end{align*} \] An element of \(V_i\) intuitively represents the “\(i\)th partial output” of a program. We’ll construct \(V\) as the limit of these \(V_i\)’s. Concretely, define \(V\) to be the set of functions \(f : \mathbb{N} \to \bigcup_i V_i\) such that \(f(i) \in V_i\) and \(\pi_i(f(i+1)) = f(i)\). This is an \(\omega\)-cpo and we have \(V \cong E \oplus [V \to V]\).

1.3. The actual denotational semantics

Now that we’ve defined \(V\), we can actually get to defining the denotational semantics. It will be parametrized by an environment \(\rho : \Sigma \to V\) mapping variables to values; concretely, this means our evaluation function \([\![\cdot]\!]\) will map terms to continuous functions \([\Sigma \to V] \to V\). As usual, it’s defined by recursion on the syntax.

\[\begin{align*} [\![ x ]\!]\rho &= \rho(x) \\ [\![ e_1e_2 ]\!]\rho &= \begin{cases} ([\![ e_1 ]\!]\rho)([\![ e_2 ]\!]\rho), &\text{if $[\![ e_1 ]\!] \rho \in [V \to V]$} \\ \texttt{wrong},&\text{otherwise}\end{cases} \\ [\![ \lambda x. e ]\!]\rho &= v \mapsto [\![ e ]\!](\rho[x\coloneqq v])\\ [\![\texttt{letrec $x = e_1$ in $e_2$}]\!]\rho &= [\![ e_2 ]\!](\rho[x \coloneqq v_1]), \text{ where } v_1 = \texttt{fix}(v \mapsto [\![ e_1 ]\!](\rho[x \coloneqq v])) \end{align*} \]I assume that built-ins \(b \in B\) come with an intepretation \([\![ b ]\!]\) as well.

2. The semantic typing relation

The next step is to define when a value \(v \in V\) has a type \(\tau\), which I’ll write as \(v : \tau\) This is a semantic typing relation, defined on the semantic values, in contrast to the syntactic typing judgement from earlier. As always, it’s by recursion on the structure of the type. - For a base type \(T_i\), \(v : T_i\) if either \(v = \bot\) or \(v \in T_i\) (i.e., if \(v \in T_i^\bot\)) - For a function type \(\tau \to \nu\), \(v : \tau \to \nu\) if \(v = f \in [V \to V]\) and, for every \(x \in V\) such that \(x\) has type \(\tau\), \(f(x)\) has type \(\nu\).

This definition comes with an annoying technical proof:

Proposition. For every type \(\tau\), the collection \(\{v \in V \mid v : \tau\}\) is a nice subdomain of \(V\). The proof is straightforward by induction on \(\tau\).

As a corollary, whenever \(f : \tau \to \tau\), its least fixed point \(\texttt{fix}(f)\) has type \(\tau\).

Not every value has a type, and this is important: notably, \(\texttt{wrong}\) has no type. So when we prove that every syntactically well-typed term evaluates to a semantically well-typed value, we’ll get as a corollary that well-typed terms can’t go \(\texttt{wrong}\).

We’ll also need a notion of well-typed environments \(\rho : \Sigma \to V\). I’ll write \(\rho : \Gamma\) to mean that for every \((x : \tau) \in \Gamma\), we have that \(\rho(x) : \tau\).

3. Soundness

Suppose that each built-in \(b\) is assinged a type \(\tau\) and a value \([\![ b ]\!]\) such that \([\![ b ]\!] \rho : \tau\) for all \(\rho\). With all that in place, we can finally state the main theorem:

Theorem. If \(\Gamma \vdash e : \tau\), then for every \(\rho : \Gamma\), we have \([\![ e ]\!] \rho : \tau\).

By induction on the typing derivation.

  • If \(e = x\) is a variable, we have \([\![ x ]\!] \rho = \rho(x) : \tau\) since \(\rho : \Gamma\).
  • If \(e = e_1e_2\) is a function application with \(\Gamma \vdash e_1 : \nu \to \tau\) and \(\Gamma \vdash e_2 : \nu\), we have \(([\![ e_1 ]\!] \rho)(v) : \tau\) for every \(v : \nu\) by inductive hypothesis; in particular, picking \(v=[\![ e_2 ]\!] \rho\), we get \([\![ e_1 e_2 ]\!] : \tau\).
  • If \(e = \lambda x. e_1\) is a lambda abstraction and \(\tau = \mu \to \nu\), suppose \(v : \mu\). Then the environment \(\rho[x \coloneqq v]\) is well-typed at \((\Gamma, x:\mu)\), so by inductive hypothesis we get \([\![ e_1 ]\!] (\rho[x \coloneqq v]) : \nu\).
  • If \(e = \texttt{letrec \(x = e_1\) in \(e_2\)}\) is a let binding, with \(\Gamma, x:\nu \vdash e_1 : \nu\), then let \(v_1 = \texttt{fix}(v \mapsto [\![ e_1 ]\!](\rho[x \coloneqq v]))\) as in the semantics. First, \(v \mapsto [\![ e_1 ]\!](\rho[x \coloneqq v])\) as type \(\nu \to \nu\) by inductive hypothesis as above, so \(v_1 : \nu\). Finally \([\![ e_2 ]\!](\rho[x \coloneqq v_1]) : \tau\) by the second inductive hypothesis.

And in the thrilling conclusion we can finally prove

Corollary (type safety). If \(\cdot \vdash e : \tau\), then “\(e\) does not go wrong”, i.e. \([\![ e ]\!] \rho \neq \texttt{wrong}\): we have that \([\![ e ]\!] \rho : \tau\), but \(\texttt{wrong}\) has no type.

Conclusion

What a lot of mathematical machinery! Especially compared to syntactic type safety, which proves type safety for an operational semantics using basically no math at all.

The worst part is, this kind of domain-theoretic semantics doesn’t really scale very well: when adding new programming language features, like effects, it’s often easy to give them an operational semantics and extend a typical progress and preservation proof. But here, you’d have to figure out how to model them denotationally – which can be really hard, as demonstrated by how much it took just to support the simple feature of recursion!

All that said, semantic type safety does have one very big advantage over progress and preservation, which is that you get a lot of flexibility in how you define the semantic typing relation. So for type systems with relatively complicated runtime invariants, it’s been making a return in recent years (though far removed from any domain theory!) with the people behind Iris having success proving type safety with respect to a denotational semantics in terms of higher-order separation logic.