Mark Barbone's blog – Markis.cool https://markis.cool/atom.xml Mark Barbone 2023-08-05T00:00:00Z Linear programming for fun and profit https://markis.cool/posts/2023-08-05-lp-for-fun-and-profit.html true 2023-08-05T00:00:00Z

# Linear programming for fun and profit

Linear optimization in everyday life

This is a story of papers, probabilities, and puzzles, of essays, exams, and engineering: a story of general education requirements and linear programming solvers.

As finals week started in Spring term, after more than a year and a half of books and essays, I was finally at the end of HUM, UCSD’s grueling gen-ed writing and literature sequence. I faced just one final obstacle: the final exam of the final course of the sequence. From one essay to the next, my grade had been falling, and with how I’d been doing in HUM 5, all depended on this one exam. I was not prepared.

## The Problem

Here was the format of the test:

• The lecturer posted in advance a list of five potential essay prompts for the final exam.
• At the start of the exam, the lecturer would pick three of the five prompts to give to us.
• During the exam, I would have to write essays responding to two of the three chosen prompts, and I’d be graded on both.

Everything was at stake, and yet I had so little time to prepare! I desperately needed to make optimal use of my time. Here were my own constraints:

1. I’ll assume the lecturer chooses randomly, and suppose without loss of generality I study most for prompt 1, less for prompt 2, etc., and least for prompt 5.
2. I’d like to prepare as little as possible for the exam, but I’d like to expect to be at least 95% prepared.

## The Strategy

There are ten ways for the lecturer to pick three prompts: 123, 124, 125, 134, 135, 145, 234, 235, 245, and 345.

Looking at all the cases, if I always write the essays I’m most prepared for, there’s a 60% chance I’ll have to do prompt 1, a 60% chance I’ll have to do prompt 2, a 50% chance I’ll have to do prompt 3, and a 30% chance I’ll have to do prompt 4. (I can always avoid the dreaded prompt 5.) As a sanity check, these add up to 200%, since I have to write two essays.

Therefore, letting $$x_i$$ be the amount I prepare for prompt $$i$$, my expected preparedness for the essays I write is given by

$\mathbb{E}[\textsf{preparedness}] = \frac1{20}\left(6x_1 + 6x_2 + 5x_3 + 3x_4\right)$

This is a linear optimization problem! My constraints are all linear equations, as is what I want to minimize: $$x_1 + \dots + x_5$$, the total amount of preparing.

So to find my optimal studying strategy, I plugged it into an off-the-shelf LP solver. I used GNU GLPK with the MathProg modeling language, since I found a convenient web interface for it. Here’s the program:

var x{1..5} >= 0, <= 1;

# Minimize the total amount of preparing
minimize total: sum{i in 1..5} x[i];

# Decreasing preparedness
subject to c1{i in 1..4}: x[i] >= x[i+1];

# Expected preparedness of at least 95%
subject to c2: 0.95 <= 0.3*x + 0.3*x + 0.25*x + 0.15*x;

end;

Finally, here’s the optimal study plan:

And so I studied the first three prompts fully, only prepared ⅔ of the way for the fourth, and did not prepare at all for the final prompt.

## Epilogue

The lecturer chose prompts 2, 3, and 5. I was as prepared as I could be, I wrote my essays with confidence (and lots of stress), and I passed the class.

]]>
The cohomology of your programming language is boring https://markis.cool/posts/2023-06-27-cohomology.html true 2023-06-27T00:00:00Z

# The cohomology of your programming language is boring

## 0. Background

Topoi are used in programming languages as an abstraction for different logical systems, in order to make defining semantics easier and simplify certain formal reasoning. For example, reasoning about step-indexing can be done using a topos of time-dependent sets.

In these cases, the benefit of using a topos is that it comes with an interpretation of dependent type theory: most of the time, we can just use the familiar syntax of lambdas, pairs, etc., and it does the right thing with respect to step indexing (or whatever else we care about).

On the other hand, topos theory was originally developed by geometers, who use topoi as a setting for cohomology. Given a topos $$\mathscr{X}$$ of sheaves on a space, its cohomology $$H^\ast(\mathscr{X})$$ gives topological information about the space. In general, for an arbitrary topos, we can define its cohomology as $$H^i(\mathscr{X}) = H^i\mathsf{R}\Gamma(\mathbb{Z}_\mathscr{X})$$, where $$\mathbb{Z}_\mathscr{X}$$ is the constant sheaf on $$\mathbb{Z}$$ and $$\mathsf{R}\Gamma : D(\mathscr{X}) \to D(Ab)$$ is the derived functor of the global sections functor. For a topological space, this is equivalent to singular cohomology under mild assumptions. If $$\mathsf{R}\Gamma(\mathbb{Z}_\mathscr{X}) = \mathbb{Z}$$, then $$\mathscr{X}$$ is topologically boring: it’s weakly homotopy equivalent to a point.

But programming languages people never talk about cohomology of their topoi, because it’s not useful. This post is about doing the useless thing. I’ll assume standard facts about derived categories.

## 1. Extensional MLTT

The most important aspect of topoi for programming languages is that they are models of dependent type theory. This statement also has a kind of a converse: all models of extensional MLTT + certain extra features are (elementary) topoi1. So what’s the cohomology of the initial model of extensional MLTT + the extra features?

Topos: The initial model of extensional MLTT (+ extra features)
Verdict: N/A – doesn’t make sense.

Explanation. From a relative point of view, cohomology is about topoi over $$\mathsf{Sets}$$. If $$f : \mathscr{X} \to \mathsf{Sets}$$ is the unique geometric morphism to $$\mathsf{Sets}$$, then we’re looking at $$\mathsf{R}\Gamma = \mathsf{R}f_\ast$$. But any initial model of a type theory can’t be a Grothendieck topos, only an elementary topos, so it doesn’t come with a geometric morphism to $$\mathsf{Sets}$$. In a sense, from the relative point of view, Grothendieck topos theory is exactly the theory of elementary topoi over $$\mathsf{Sets}$$.

## 2. Cubical type theory

Cubical type theory has an intended semantics in cubical sets, that is, presheaves on a cube category $$\square$$. Different type theories choose different cube categories – for example, BCH uses the “symmetric cube category”, CCHM uses the “de Morgan cube category”, and ABCFHL uses the “Cartesian cube category”.

Topos: Cubical sets (any of the variations), $$\widehat{\square}$$.
Verdict: Topologically boring.

Proof. Every cube category contains a terminal object $$\ast$$, so we have $$\Gamma(\mathcal F) = \mathcal F(\ast)$$. But limits are computed pointwise in functor categories, so $$\Gamma$$ is exact! This implies that all higher cohomology groups vanish, i.e., $$\mathsf{R}\Gamma(\mathbb{Z}_{\widehat{\square}}) = \mathbb{Z}$$.

## 3. Normalization proofs

Normalization by evaluation, for any type system ranging from the simply typed lambda calculus all the way up to variations on MLTT, uses logical relations valued in presheaves on a category $$Ren$$ of contexts and renamings. (What exactly $$Ren$$ looks like depends on the type theory.) This ensures that the semantics are invariant under context extensions and variable renamings.

Topos: Presheaves on $$Ren$$ (any of the variations), $$\widehat{Ren}$$.
Verdict: Topologically boring.

Proof. The categories of contexts and renamings all have terminal objects, so $$\Gamma$$ is exact just like above.

## 4. Step-indexing

Practical programming languages tend to have annoying features like the ability to write infinite loops. Step-indexing parameterizes the semantics with a decreasing counter $$k$$ that represents the number of steps to run – now you only need to consider finitely many iterations of the loop, and better yet, you can define your semantics inductively on $$k$$. This can be represented using presheaves on the poset $$\omega = \{ 0 \leq 1 \leq 2 \leq \dots\}$$, called the topos of trees in this paper.

Topos: The topos of trees, $$\mathcal S$$.
Verdict: Topologically boring.

Proof. Unlike the earlier presheaf topoi, this site does not have a terminal object, and so $$\Gamma$$ is not exact. However, since it has an initial object $$0$$, we can compute $$Hom_{\mathcal{S}}(\mathcal F, \mathbb{Z}_{\mathcal S}) = Hom(\mathcal F(0), \mathbb{Z})$$. This implies that the following diagram commutes, where $$f : \mathcal{S} \to \mathsf{Sets}$$ is the unique geometric morphism, and $$Sh(\mathcal{S})$$ is the category of abelian sheaves on $$\mathcal{S}$$: Taking right derived functors, we get the following diagram: Since (co)limits are computed pointwise in a presheaf category, $$f^\ast$$ is exact and $$\mathsf Lf^\ast(\mathbb{Z}) = \mathbb{Z}_\mathcal{S}$$. Putting it all together, \begin{align*} \mathsf{R}\Gamma(\mathbb{Z}_\mathcal S) &= \mathsf{R}Hom_\mathcal{S}(\mathbb{Z}_\mathcal S, \mathbb{Z}_\mathcal S) \\ &= \mathsf{R}Hom_\mathcal{S}(\mathsf Lf^\ast\mathbb{Z}, \mathbb{Z}_\mathcal S) \\ &= \mathsf{R}Hom(\mathbb{Z}, \mathbb{Z}) \\ &= \mathbb{Z}. \end{align*}

## 5. Sterling’s normalization topoi

In recent work, Jon Sterling and collaborators prove impressive metatheory results about dependent type theories (most notably normalization of cubical type theory) using topoi. They construct the topoi using gluing, so finally, we have the chance for some interesting topological behavior to happen!

Here, we start out with the category $$\mathcal T$$, the syntactic model of our type theory, as well as a subcategory $$\mathcal A$$ of “atomic terms” – for MLTT, $$\mathcal A = Ren$$ is our category of contexts and renamings. The functor $$\rho : \mathcal A \hookrightarrow \mathcal T$$ induces a functor $$\rho^\ast : \widehat{\mathcal T} \to \widehat{\mathcal A}$$ by precomposition, which turns out to be the inverse image of a geometric morphism of topoi. Then we define our topos $$G$$ by the following pushout diagram of topoi: Topos: The normalization topos $$G$$.
Verdict: Topologically boring.

Proof. Intuitively, this diagram is reminiscent of the mapping cylinder $$Mf$$ of a continuous map $$f : X \to Y$$, which is constructed using a similar pushout of topological spaces: The inclusion $$Y \hookrightarrow Mf$$ is a homotopy equivalence, so by analogy with spaces, we’d expect $$G$$ to have the same cohomology as $$\widehat{\mathcal T}$$, a presheaf category on a site with a terminal object. This is in fact what happens.

Equivalently, $$G$$ can be described as the comma category $$\widehat{\mathcal{A}} \downarrow \rho^\ast$$, whose objects are pairs of a presheaf $$\mathcal{F}$$ on $$\mathcal{A}$$, a presheaf $$\mathcal{G}$$ on $$\mathcal T$$, and a natural transformation $$\mathcal F \to \rho^\ast \mathcal{G}$$, and whose morphisms are pairs of natural transformations in $$\widehat{\mathcal A}$$ and $$\widehat{\mathcal T}$$ making the obvious diagram commute. But this is equivalently presheaves on a site constructed from the following pushout of categories, where $$I = [0 \leq 1]$$ is the directed interval category: This has a terminal object, the terminal object of $$I \times \mathcal A$$, so again $$\Gamma$$ is exact and $$\mathsf{R}\Gamma(\mathbb{Z}_G) = \mathbb Z$$.

## 6. Sterling and Harper’s parametricity topoi

In their paper Logical Relations as Types, Jon Sterling and Bob Harper give a topos-theoretic framework for understanding parametricity in module systems. They elaborate ML to a core language with a sharp phase distinction between a dependently-typed static module system, and a simply-typed monadic runtime semantics. To support SML-style where clauses, which can only require static data to agree, a special proposition $$\P$$ is introduced, under which all runtime data is considered equal. Using $$\P$$, where this = that clauses can (roughly) be implemented by requiring that $$\P \Rightarrow \texttt{this} = \texttt{that}$$.

Let $$\mathcal{T}$$ be the syntactic model of their core language. The presheaf topos $$\widehat{\mathcal T \sqcup \mathcal T}$$ has two copies of the internal language of $$\mathcal T$$, so let $$\P_L$$, $$\P_R$$ be the two copies of the proposition $$\P$$, and let $$\rho : \widehat{\mathcal T \sqcup \mathcal T} \to \mathbb{S}$$ be the geometric morphism classifying the proposition $$\P_L \lor \P_R$$. They construct a topos $$X$$ using the following pushout of topoi: Topos: The topos of phase-separated parametricity structures $$X$$.
Verdict: Topologically boring.

Proof.

From Computation 5.29 in the paper, an object in $$X$$ consists of the following data:

• An object of $$\mathbb{S}$$, i.e., a family of sets $$A \to B$$
• A presheaf $$(L, R)$$ in $$\widehat{\mathcal{T} \sqcup \mathcal{T}}$$
• Maps $$A \to L(\ast)$$, $$A \to R(\ast)$$, $$B \to L(\P)$$, and $$B \to R(\P)$$ making the obvious squares commute
and morphisms in $$X$$ are the natural ones, i.e., families of maps making the obvious diagrams commute. So $$X$$ also ends up being equivalent to the category of presheaves on a site with a terminal object, and $$\mathsf{R}\Gamma(\mathbb{Z}_X) = \mathbb{Z}$$.

## 7. Conclusion

In conclusion, the topoi used in programming languages aren’t topologically interesting. This makes sense: programming languages researchers only care about the internal logic of topoi, and they don’t care about any geometric/topological aspects. It’s a bit like taking the QR decomposition of a QR code: sure, you can get a result, and it’s kinda funny, but don’t expect it to be useful or even meaningful.

1. The standard reference is Lambek and Scott, “Introduction to Higher-Order Categorical Logic”.↩︎

]]>
Reference interpreter for Snek https://markis.cool/posts/2023-05-05-reference-interpreter.html true 2023-05-05T00:00:00Z

# Reference interpreter for Snek

UCSD Compilers S23

This is a reference interpreter implementing Diamondback, PA5.

]]>
Notes on the SysV ABI calling convention https://markis.cool/posts/2023-04-17-sysv-abi.html true 2023-04-17T00:00:00Z

# Notes on the SysV ABI calling convention

Get ready to align the stack

The SysV ABI defines the AMD64 calling convention used by both Linux and Mac. The official SysV x86_64 reference may be found here. They use “eightbyte” as a unit of memory, since it’s the size of a 64-bit register.

I’m ignoring stuff like long double and __int128.

Integers and pointers are passed in INTEGER registers. Floats and doubles are passed in SSE registers. Struct arguments are translated to a sequence of INTEGER and SSE registers following these rules:

1. If it’s larger than eight eightbytes, it gets passed in memory. (But actually there’s an extra rule later on that implies that if it’s larger than two eightbytes, it’s passed in memory – except maybe super wide SIMD vectors.)

2. If it’s weird (has unaligned members, or weird C++ stuff), it gets passed in memory.

3. Else, no struct member falls across an eightbyte boundary, so split it up into eightbyte chunks.

4. Each eightbyte chunk might still contain multiple members. If it’s all floats it’s SSE, otherwise use an INTEGER register.

Now that we can classify arguments, we need to assign them to registers.

1. Each INTEGER eightbyte gets assigned the next available out of %rdi, %rsi, %rdx, %rcx, %r8, %r9

2. Each SSE eightbyte gets assigned the next available out of %xmm0, …, %xmm7

3. If at any point there’s not enough registers left, the whole argument goes in memory instead.

We also need to handle the return value. It gets classified in the same way as arguments. Then:

1. If it would get passed in memory, the caller allocates space for it on the stack and passes a pointer to this space as %rdi, like an invisible extra zeroth argument. This same pointer must be returned in %rax.

2. Each INTEGER eightbyte get assigned the next available out of %rax, %rdx

3. Each SSE eightbyte gets assigned the next available out of %xmm0, %xmm1

Finally, let’s deal with the stuff that gets passed in memory. In order, push to the stack:

1. Padding, if needed, so that the stack will be 16-byte aligned at the end, and enough space for the return value, if needed.

2. Each argument that needs to be passed in memory, in order

]]>
Type safety prehistory https://markis.cool/posts/2022-12-20-type-safety-prehistory.html true 2022-12-20T00:00:00Z

# Type safety prehistory

Gain a newfound appreciation for progress and preservation by diving into the domain-theoretic arguments of yore and yesteryear.

Most are probably familiar with progress and preservation, the two lemmas underlying a standard (syntactic) proof of type safety. (See e.g. TAPL, PFPL, PLFA, software foundations, etc.) They’re in every PL textbook, and a recent paper even called progress and preservation “one of the “greatest hits” of programming languages research of the past three decades”. And they’re cool, sure, but what’s so special about using these two lemmas? For the most part the proofs don’t involve any fancy tricks or complex machinery – you just bash out the cases.

But this lack of complex machinery is exactly what’s so great about syntactic type safety. To understand why, let’s take a look at a historical technique for proving type safety.

## Proof outline

Proof adapted from Milner’s 1977 paper “A theory of type polymorphism in programming”

This is a semantic type safety proof. Whereas syntactic type safety proves type safety relative to a operational semantics, semantic type safety proves safety relative to a denotational semantics. Here are the steps involved:

1. Define an untyped denotational semantics for the programming language, including a special value wrong indicating a type error happened
2. Define a “semantic typing relation” which says which semantic values have each type. Importantly, wrong has no type.
3. Prove lemmas about the semantic typing relation corresponding to all the inference rules

This is where Milner’s paper stops: he doesn’t even write down any inference rules, instead directly using these lemmas to prove soundness of the typechecking algorithm. But from a modern perspective, we can conclude, by induction on the typing derivation, type safety: well-typed programs can’t go wrong.

Throughout we’ll use a simple ML as our programming language of interest: \begin{align*} \text{Variables } x &\in \Sigma, \text{ an infinite set of variables}\\ \text{Built-ins } b &\in B, \text{ a set of built-in operations}\\ \text{Terms } e &::= x \mid b \mid e_1e_2 \mid \lambda x.e \mid \texttt{letrec x = e_1 in e_2} \\ \text{Types } \tau, \nu &::= T_1 \mid \dots \mid T_n \mid \tau \to \nu \end{align*} With a simple type system: $\frac{(x:\tau) \in \Gamma}{\Gamma \vdash x:\tau} \hspace{1.5em} \frac{\Gamma \vdash f:\tau\to \nu \hspace{1.5em} \Gamma \vdash x:\tau}{\Gamma \vdash f(x) : \nu} \hspace{1.5em} \frac{\Gamma, x:\tau \vdash e:\nu}{\Gamma \vdash \lambda x.e : \tau \to \nu} \hspace{1.5em} \frac{\Gamma, x:\tau \vdash e_1:\tau \hspace{1.5em} \Gamma,x:\tau \vdash e_2:\nu}{\Gamma \vdash \texttt{letrec x = e_1 in e_2} : \nu}$ For convenience let’s require all variables in $$\Gamma$$ to be distinct, and I’ll assume the built-ins $$b$$ and base types $$T_i$$ come with their own typing rules as well.

## 1. A denotational semantics for untyped ML

Our programming language is a simple ML, with a fixed collection of base types $$T_1 \dots T_n$$. We will interpret everything as values in a domain $$V$$, which will include a special value $$\bot$$ representing nontermination.

In order to even define the denotational semantics, we’re gonna need quite a bit of mathematical machinery. A reference here is Abramsky and Jung’s notes, or Amadio and Curien’s “Domains and lambda calculi”.

### 1.1. Domains and fixed points

Recursive definitions are tricky. A recursive definition can be viewed as a kind of limit of all the partial outputs of an infinite process. This motivates the following definition:

Definition. An $$\omega$$-cpo is a partially ordered set $$X$$ with a least element $$\bot$$ such that every chain $$x_1 \sqsubseteq x_2 \sqsubseteq x_3 \sqsubseteq \dots$$ has a least upper bound $$\bigvee_n x_n$$. A function $$f : X \to Y$$ between $$\omega$$-cpos is continuous if it is monotone and preserves upper bounds of chains: for every chain $$x_1 \sqsubseteq x_2 \sqsubseteq x_3 \sqsubseteq \dots$$ in $$X$$, we have $$f(\bigvee_n x_n) = \bigvee_n f(x_n)$$.

If we make our domain $$V$$ an $$\omega$$-cpo, and make our functions continuous, then we can interpret recursive functions as least fixed points:

Proposition. Let $$X$$ be an $$\omega$$-cpo, and $$f : X \to X$$ a continuous function. Then $$\bot \sqsubseteq f(\bot) \sqsubseteq f(f(\bot)) \sqsubseteq \dots$$ since $$\bot \sqsubseteq f(\bot)$$ and $$f$$ is monotone, and the least upper bound $$\bigvee_n f^n(\bot)$$ is the least fixed point of $$f$$.

Some helpful constructions on $$\omega$$-cpos:

• If $$S$$ is any set, we can turn it into an $$\omega$$-cpo $$S^\bot$$ by giving it the discrete order, then adding a least element $$\bot$$.
• If $$X, Y$$ are $$\omega$$-cpos, then the collection of continuous functions $$[X \to Y]$$ is an $$\omega$$-cpo, ordered pointwise. The operation $$\texttt{fix} : [X \to X] \to X$$ assigning each $$f : X \to X$$ to its least fixed point $$\bigvee_n f^n(\bot)$$ is continuous.
• We have arbitrary Cartesian products of $$\omega$$-cpos; these work as you’d expect. As a bonus, functions out of a finite product of $$\omega$$-cpos are continuous iff they’re continuous in each argument separately.
• When taking disjoint unions, we have to be careful for the result to still have a least element. So we identify $$\bot_X$$ with $$\bot_Y$$ in $$X \sqcup Y$$, to get the reduced sum $$X \oplus Y$$. It works as you’d expect (though it’s not quite the categorical coproduct of $$\omega$$-cpos).

We’ll also need ideas of subdomains. If $$D \subseteq D'$$ are $$\omega$$-cpos, then $$D$$ is a nice subdomain of $$D'$$ (not standard terminology) if

• the inclusion function $$\iota : D \to D'$$ is continuous, and
• there exists a (necessarily unique) continuous projection function $$\pi : D' \to D$$ such that $$\pi(x) \sqsubseteq x$$, with equality holding iff $$x \in D$$.

More generally, we can consider a domain $$D$$ as a nice subdomain of $$D'$$ whenever we have an injective function $$\iota$$ and surjective function $$\pi$$ satisfying the above conditions. These subdomains have nice properties from a universal algebra perspective because $$D$$ is both a subobject and a quotient of $$D'$$, and from a categorical perspective $$\iota$$ and $$\pi$$ are an adjunction.

### 1.2. Solving the domain equation

We want a value $$x \in V$$ to be either $$\bot$$, or an element of one of our base types $$T_i$$, or a continuous function $$V \to V$$. We’ll also add the extra value $$\texttt{wrong}$$ representing that a type error happened. So we are looking for an $$\omega$$-cpo $$V$$ satisfying $V \cong T_1^\bot \oplus \dots \oplus T_n^\bot \oplus \{\texttt{wrong}\}^\bot \oplus [V \to V]$

Theorem. There exists an $$\omega$$-cpo $$V$$ satisfying this isomorphism. To construct it, let $$E$$ denote the initial terms $$T_1^\bot\oplus\dots\oplus T_n^\bot\oplus\{\texttt{wrong}\}^\bot$$. Construct a sequence of approximate solutions by \begin{align*} V_0 &= \{\bot\}\\ V_1 &= E \oplus [V_0 \to V_0]\\ V_2 &= E \oplus [V_1 \to V_1]\\ \vdots \end{align*} Consider $$V_0$$ as a nice subdomain of $$V_1$$ by the inclusion $$\iota_0(\bot) = \bot$$ and the projection $$\pi_0(x) = \bot$$. Each subsequent $$V_i$$ is a nice subdomain of $$V_{i+1}$$ by \begin{align*} \iota_i(x) &= x, \text{ if x \in E}&&\iota_i(f) = \iota_{i-1}\circ f\circ\pi_{i-1}, \text { if x \in [V_{i-1} \to V_{i-1}]}\\ \pi_i(x) &= x, \text{ if x \in E}&&\pi_i(f) = \pi_{i-1}\circ f \circ \iota_{i-1}, \text{ if x \in [V_i \to V_i]} \end{align*} An element of $$V_i$$ intuitively represents the “$$i$$th partial output” of a program. We’ll construct $$V$$ as the limit of these $$V_i$$’s. Concretely, define $$V$$ to be the set of functions $$f : \mathbb{N} \to \bigcup_i V_i$$ such that $$f(i) \in V_i$$ and $$\pi_i(f(i+1)) = f(i)$$. This is an $$\omega$$-cpo and we have $$V \cong E \oplus [V \to V]$$.

### 1.3. The actual denotational semantics

Now that we’ve defined $$V$$, we can actually get to defining the denotational semantics. It will be parametrized by an environment $$\rho : \Sigma \to V$$ mapping variables to values; concretely, this means our evaluation function $$[\![\cdot]\!]$$ will map terms to continuous functions $$[\Sigma \to V] \to V$$. As usual, it’s defined by recursion on the syntax.

\begin{align*} [\![ x ]\!]\rho &= \rho(x) \\ [\![ e_1e_2 ]\!]\rho &= \begin{cases} ([\![ e_1 ]\!]\rho)([\![ e_2 ]\!]\rho), &\text{if [\![ e_1 ]\!] \rho \in [V \to V]} \\ \texttt{wrong},&\text{otherwise}\end{cases} \\ [\![ \lambda x. e ]\!]\rho &= v \mapsto [\![ e ]\!](\rho[x\coloneqq v])\\ [\![\texttt{letrec x = e_1 in e_2}]\!]\rho &= [\![ e_2 ]\!](\rho[x \coloneqq v_1]), \text{ where } v_1 = \texttt{fix}(v \mapsto [\![ e_1 ]\!](\rho[x \coloneqq v])) \end{align*}I assume that built-ins $$b \in B$$ come with an intepretation $$[\![ b ]\!]$$ as well.

## 2. The semantic typing relation

The next step is to define when a value $$v \in V$$ has a type $$\tau$$, which I’ll write as $$v : \tau$$ This is a semantic typing relation, defined on the semantic values, in contrast to the syntactic typing judgement from earlier. As always, it’s by recursion on the structure of the type. - For a base type $$T_i$$, $$v : T_i$$ if either $$v = \bot$$ or $$v \in T_i$$ (i.e., if $$v \in T_i^\bot$$) - For a function type $$\tau \to \nu$$, $$v : \tau \to \nu$$ if $$v = f \in [V \to V]$$ and, for every $$x \in V$$ such that $$x$$ has type $$\tau$$, $$f(x)$$ has type $$\nu$$.

This definition comes with an annoying technical proof:

Proposition. For every type $$\tau$$, the collection $$\{v \in V \mid v : \tau\}$$ is a nice subdomain of $$V$$. The proof is straightforward by induction on $$\tau$$.

As a corollary, whenever $$f : \tau \to \tau$$, its least fixed point $$\texttt{fix}(f)$$ has type $$\tau$$.

Not every value has a type, and this is important: notably, $$\texttt{wrong}$$ has no type. So when we prove that every syntactically well-typed term evaluates to a semantically well-typed value, we’ll get as a corollary that well-typed terms can’t go $$\texttt{wrong}$$.

We’ll also need a notion of well-typed environments $$\rho : \Sigma \to V$$. I’ll write $$\rho : \Gamma$$ to mean that for every $$(x : \tau) \in \Gamma$$, we have that $$\rho(x) : \tau$$.

## 3. Soundness

Suppose that each built-in $$b$$ is assinged a type $$\tau$$ and a value $$[\![ b ]\!]$$ such that $$[\![ b ]\!] \rho : \tau$$ for all $$\rho$$. With all that in place, we can finally state the main theorem:

Theorem. If $$\Gamma \vdash e : \tau$$, then for every $$\rho : \Gamma$$, we have $$[\![ e ]\!] \rho : \tau$$.

By induction on the typing derivation.

• If $$e = x$$ is a variable, we have $$[\![ x ]\!] \rho = \rho(x) : \tau$$ since $$\rho : \Gamma$$.
• If $$e = e_1e_2$$ is a function application with $$\Gamma \vdash e_1 : \nu \to \tau$$ and $$\Gamma \vdash e_2 : \nu$$, we have $$([\![ e_1 ]\!] \rho)(v) : \tau$$ for every $$v : \nu$$ by inductive hypothesis; in particular, picking $$v=[\![ e_2 ]\!] \rho$$, we get $$[\![ e_1 e_2 ]\!] : \tau$$.
• If $$e = \lambda x. e_1$$ is a lambda abstraction and $$\tau = \mu \to \nu$$, suppose $$v : \mu$$. Then the environment $$\rho[x \coloneqq v]$$ is well-typed at $$(\Gamma, x:\mu)$$, so by inductive hypothesis we get $$[\![ e_1 ]\!] (\rho[x \coloneqq v]) : \nu$$.
• If $$e = \texttt{letrec \(x = e_1$$ in $$e_2$$}\) is a let binding, with $$\Gamma, x:\nu \vdash e_1 : \nu$$, then let $$v_1 = \texttt{fix}(v \mapsto [\![ e_1 ]\!](\rho[x \coloneqq v]))$$ as in the semantics. First, $$v \mapsto [\![ e_1 ]\!](\rho[x \coloneqq v])$$ as type $$\nu \to \nu$$ by inductive hypothesis as above, so $$v_1 : \nu$$. Finally $$[\![ e_2 ]\!](\rho[x \coloneqq v_1]) : \tau$$ by the second inductive hypothesis.

And in the thrilling conclusion we can finally prove

Corollary (type safety). If $$\cdot \vdash e : \tau$$, then “$$e$$ does not go wrong”, i.e. $$[\![ e ]\!] \rho \neq \texttt{wrong}$$: we have that $$[\![ e ]\!] \rho : \tau$$, but $$\texttt{wrong}$$ has no type.

## Conclusion

What a lot of mathematical machinery! Especially compared to syntactic type safety, which proves type safety for an operational semantics using basically no math at all.

The worst part is, this kind of domain-theoretic semantics doesn’t really scale very well: when adding new programming language features, like effects, it’s often easy to give them an operational semantics and extend a typical progress and preservation proof. But here, you’d have to figure out how to model them denotationally – which can be really hard, as demonstrated by how much it took just to support the simple feature of recursion!

All that said, semantic type safety does have one very big advantage over progress and preservation, which is that you get a lot of flexibility in how you define the semantic typing relation. So for type systems with relatively complicated runtime invariants, it’s been making a return in recent years (though far removed from any domain theory!) with the people behind Iris having success proving type safety with respect to a denotational semantics in terms of higher-order separation logic.

]]>