This is a story of papers, probabilities, and puzzles, of essays, exams, and engineering: a story of general education requirements and linear programming solvers.

As finals week started in Spring term, after more than a year and a half of books and essays, I was finally at the end of HUM, UCSD’s grueling gen-ed writing and literature sequence. I faced just one final obstacle: the final exam of the final course of the sequence. From one essay to the next, my grade had been falling, and with how I’d been doing in HUM 5, all depended on this one exam. I was not prepared.

Here was the format of the test:

- The lecturer posted in advance a list of five potential essay prompts for the final exam.
- At the start of the exam, the lecturer would pick three of the five prompts to give to us.
- During the exam, I would have to write essays responding to two of the three chosen prompts, and I’d be graded on both.

Everything was at stake, and yet I had so little time to prepare! I desperately needed to make optimal use of my time. Here were my own constraints:

- I’ll assume the lecturer chooses randomly, and suppose without loss of generality I study most for prompt 1, less for prompt 2, etc., and least for prompt 5.
- I’d like to prepare as little as possible for the exam, but I’d like to expect to be at least 95% prepared.

There are ten ways for the lecturer to pick three prompts: 123, 124, 125, 134, 135, 145, 234, 235, 245, and 345.

Looking at all the cases, if I always write the essays I’m most prepared for, there’s a 60% chance I’ll have to do prompt 1, a 60% chance I’ll have to do prompt 2, a 50% chance I’ll have to do prompt 3, and a 30% chance I’ll have to do prompt 4. (I can always avoid the dreaded prompt 5.) As a sanity check, these add up to 200%, since I have to write two essays.

Therefore, letting \(x_i\) be the amount I prepare for prompt \(i\), my expected preparedness for the essays I write is given by

\[\mathbb{E}[\textsf{preparedness}] = \frac1{20}\left(6x_1 + 6x_2 + 5x_3 + 3x_4\right)\]

This is a linear optimization problem! My constraints are all linear equations, as is what I want to minimize: \(x_1 + \dots + x_5\), the total amount of preparing.

So to find my optimal studying strategy, I plugged it into an off-the-shelf LP solver. I used GNU GLPK with the MathProg modeling language, since I found a convenient web interface for it. Here’s the program:

```
var x{1..5} >= 0, <= 1;
# Minimize the total amount of preparing
minimize total: sum{i in 1..5} x[i];
# Decreasing preparedness
subject to c1{i in 1..4}: x[i] >= x[i+1];
# Expected preparedness of at least 95%
subject to c2: 0.95 <= 0.3*x[1] + 0.3*x[2] + 0.25*x[3] + 0.15*x[4];
end;
```

Finally, here’s the optimal study plan:

And so I studied the first three prompts fully, only prepared ⅔ of the way for the fourth, and did not prepare at all for the final prompt.

The lecturer chose prompts 2, 3, and 5. I was as prepared as I could be, I wrote my essays with confidence (and lots of stress), and I passed the class.

Topoi are used in programming languages as an abstraction for different logical systems, in order to make defining semantics easier and simplify certain formal reasoning. For example, reasoning about step-indexing can be done using a topos of time-dependent sets.

In these cases, the benefit of using a topos is that it comes with an interpretation of dependent type theory: most of the time, we can just use the familiar syntax of lambdas, pairs, etc., and it does the right thing with respect to step indexing (or whatever else we care about).

On the other hand, topos theory was originally developed by geometers, who use topoi as a setting for cohomology. Given a topos \(\mathscr{X}\) of sheaves on a space, its cohomology \(H^\ast(\mathscr{X})\) gives topological information about the space. In general, for an arbitrary topos, we can define its cohomology as \(H^i(\mathscr{X}) = H^i\mathsf{R}\Gamma(\mathbb{Z}_\mathscr{X})\), where \(\mathbb{Z}_\mathscr{X}\) is the constant sheaf on \(\mathbb{Z}\) and \(\mathsf{R}\Gamma : D(\mathscr{X}) \to D(Ab)\) is the derived functor of the global sections functor. For a topological space, this is equivalent to singular cohomology under mild assumptions. If \(\mathsf{R}\Gamma(\mathbb{Z}_\mathscr{X}) = \mathbb{Z}\), then \(\mathscr{X}\) is topologically boring: it’s weakly homotopy equivalent to a point.

But programming languages people never talk about cohomology of their topoi, because it’s not useful. This post is about doing the useless thing. I’ll assume standard facts about derived categories.

The most important aspect of topoi for programming languages is that they are models of dependent type theory. This statement also has a kind of a converse: all models of extensional MLTT + certain extra features are (elementary) topoi^{1}. So what’s the cohomology of the initial model of extensional MLTT + the extra features?

**Topos:** The initial model of extensional MLTT (+ extra features)

**Verdict:** N/A – doesn’t make sense.

Cubical type theory has an intended semantics in *cubical sets*, that is, presheaves on a cube category \(\square\). Different type theories choose different cube categories – for example, BCH uses the “symmetric cube category”, CCHM uses the “de Morgan cube category”, and ABCFHL uses the “Cartesian cube category”.

**Topos:** Cubical sets (any of the variations), \(\widehat{\square}\).

**Verdict:** Topologically boring.

Normalization by evaluation, for any type system ranging from the simply typed lambda calculus all the way up to variations on MLTT, uses logical relations valued in presheaves on a category \(Ren\) of contexts and renamings. (What exactly \(Ren\) looks like depends on the type theory.) This ensures that the semantics are invariant under context extensions and variable renamings.

**Topos:** Presheaves on \(Ren\) (any of the variations), \(\widehat{Ren}\).

**Verdict:** Topologically boring.

Practical programming languages tend to have annoying features like the ability to write infinite loops. Step-indexing parameterizes the semantics with a decreasing counter \(k\) that represents the number of steps to run – now you only need to consider finitely many iterations of the loop, and better yet, you can define your semantics inductively on \(k\). This can be represented using presheaves on the poset \(\omega = \{ 0 \leq 1 \leq 2 \leq \dots\}\), called the *topos of trees* in this paper.

**Topos:** The topos of trees, \(\mathcal S\).

**Verdict:** Topologically boring.

Taking right derived functors, we get the following diagram:

Since (co)limits are computed pointwise in a presheaf category, \(f^\ast\) is exact and \(\mathsf Lf^\ast(\mathbb{Z}) = \mathbb{Z}_\mathcal{S}\). Putting it all together, \[ \begin{align*} \mathsf{R}\Gamma(\mathbb{Z}_\mathcal S) &= \mathsf{R}Hom_\mathcal{S}(\mathbb{Z}_\mathcal S, \mathbb{Z}_\mathcal S) \\ &= \mathsf{R}Hom_\mathcal{S}(\mathsf Lf^\ast\mathbb{Z}, \mathbb{Z}_\mathcal S) \\ &= \mathsf{R}Hom(\mathbb{Z}, \mathbb{Z}) \\ &= \mathbb{Z}. \end{align*} \]

In recent work, Jon Sterling and collaborators prove impressive metatheory results about dependent type theories (most notably normalization of cubical type theory) using topoi. They construct the topoi using gluing, so finally, we have the chance for some interesting topological behavior to happen!

Here, we start out with the category \(\mathcal T\), the syntactic model of our type theory, as well as a subcategory \(\mathcal A\) of “atomic terms” – for MLTT, \(\mathcal A = Ren\) is our category of contexts and renamings. The functor \(\rho : \mathcal A \hookrightarrow \mathcal T\) induces a functor \(\rho^\ast : \widehat{\mathcal T} \to \widehat{\mathcal A}\) by precomposition, which turns out to be the inverse image of a geometric morphism of topoi. Then we define our topos \(G\) by the following pushout diagram of topoi:

**Topos:** The normalization topos \(G\).

**Verdict:** Topologically boring.

The inclusion \(Y \hookrightarrow Mf\) is a homotopy equivalence, so by analogy with spaces, we’d expect \(G\) to have the same cohomology as \(\widehat{\mathcal T}\), a presheaf category on a site with a terminal object. This is in fact what happens.

Equivalently, \(G\) can be described as the comma category \(\widehat{\mathcal{A}} \downarrow \rho^\ast\), whose objects are pairs of a presheaf \(\mathcal{F}\) on \(\mathcal{A}\), a presheaf \(\mathcal{G}\) on \(\mathcal T\), and a natural transformation \(\mathcal F \to \rho^\ast \mathcal{G}\), and whose morphisms are pairs of natural transformations in \(\widehat{\mathcal A}\) and \(\widehat{\mathcal T}\) making the obvious diagram commute. But this is equivalently presheaves on a site constructed from the following pushout of categories, where \(I = [0 \leq 1]\) is the directed interval category:This has a terminal object, the terminal object of \(I \times \mathcal A\), so again \(\Gamma\) is exact and \(\mathsf{R}\Gamma(\mathbb{Z}_G) = \mathbb Z\).

In their paper Logical Relations as Types, Jon Sterling and Bob Harper give a topos-theoretic framework for understanding parametricity in module systems. They elaborate ML to a core language with a sharp phase distinction between a dependently-typed static module system, and a simply-typed monadic runtime semantics. To support SML-style `where`

clauses, which can only require static data to agree, a special proposition \(\P\) is introduced, under which all runtime data is considered equal. Using \(\P\), `where this = that`

clauses can (roughly) be implemented by requiring that \(\P \Rightarrow \texttt{this} = \texttt{that}\).

**Topos:** The topos of phase-separated parametricity structures \(X\).

**Verdict:** Topologically boring.

From Computation 5.29 in the paper, an object in \(X\) consists of the following data:

- An object of \(\mathbb{S}\), i.e., a family of sets \(A \to B\)
- A presheaf \((L, R)\) in \(\widehat{\mathcal{T} \sqcup \mathcal{T}}\)
- Maps \(A \to L(\ast)\), \(A \to R(\ast)\), \(B \to L(\P)\), and \(B \to R(\P)\) making the obvious squares commute

In conclusion, the topoi used in programming languages aren’t topologically interesting. This makes sense: programming languages researchers only care about the internal logic of topoi, and they don’t care about any geometric/topological aspects. It’s a bit like taking the QR decomposition of a QR code: sure, you can get a result, and it’s kinda funny, but don’t expect it to be useful or even meaningful.

The standard reference is Lambek and Scott, “Introduction to Higher-Order Categorical Logic”.↩︎

This is a reference interpreter implementing Diamondback, PA5.

The SysV ABI defines the AMD64 calling convention used by both Linux and Mac. The official SysV x86_64 reference may be found here. They use “eightbyte” as a unit of memory, since it’s the size of a 64-bit register.

I’m ignoring stuff like `long double`

and `__int128`

.

Integers and pointers are passed in INTEGER registers. Floats and doubles are passed in SSE registers. Struct arguments are translated to a sequence of INTEGER and SSE registers following these rules:

If it’s larger than eight eightbytes, it gets passed in memory. (But actually there’s an

*extra*rule later on that implies that if it’s larger than*two*eightbytes, it’s passed in memory – except maybe super wide SIMD vectors.)If it’s

*weird*(has unaligned members, or weird C++ stuff), it gets passed in memory.Else, no struct member falls across an eightbyte boundary, so split it up into eightbyte chunks.

Each eightbyte chunk might still contain multiple members. If it’s all floats it’s SSE, otherwise use an INTEGER register.

Now that we can classify arguments, we need to assign them to registers.

Each INTEGER eightbyte gets assigned the next available out of

`%rdi`

,`%rsi`

,`%rdx`

,`%rcx`

,`%r8`

,`%r9`

Each SSE eightbyte gets assigned the next available out of

`%xmm0`

, …,`%xmm7`

If at any point there’s not enough registers left, the

*whole*argument goes in memory instead.

We also need to handle the return value. It gets classified in the same way as arguments. Then:

If it would get passed in memory, the caller allocates space for it on the stack and passes a pointer to this space as

`%rdi`

, like an invisible extra zeroth argument. This same pointer must be returned in`%rax`

.Each INTEGER eightbyte get assigned the next available out of

`%rax`

,`%rdx`

Each SSE eightbyte gets assigned the next available out of

`%xmm0`

,`%xmm1`

Finally, let’s deal with the stuff that gets passed in memory. In order, push to the stack:

Padding, if needed, so that the stack will be 16-byte aligned at the end, and enough space for the return value, if needed.

Each argument that needs to be passed in memory, in order

Most are probably familiar with progress and preservation, the two lemmas underlying a standard (syntactic) proof of type safety. (See e.g. TAPL, PFPL, PLFA, software foundations, etc.) They’re in every PL textbook, and a recent paper even called progress and preservation “one of the “greatest hits” of programming languages research of the past three decades”. And they’re cool, sure, but what’s so special about using these two lemmas? For the most part the proofs don’t involve any fancy tricks or complex machinery – you just bash out the cases.

But this lack of complex machinery is exactly what’s so great about syntactic type safety. To understand why, let’s take a look at a historical technique for proving type safety.

Proof adapted from Milner’s 1977 paper “A theory of type polymorphism in programming”

This is a *semantic* type safety proof. Whereas syntactic type safety proves type safety relative to a operational semantics, semantic type safety proves safety relative to a denotational semantics. Here are the steps involved:

- Define an untyped denotational semantics for the programming language, including a special value
`wrong`

indicating a type error happened - Define a “semantic typing relation” which says which semantic values have each type. Importantly,
`wrong`

has no type. - Prove lemmas about the semantic typing relation corresponding to all the inference rules

This is where Milner’s paper stops: he doesn’t even write down any inference rules, instead directly using these lemmas to prove soundness of the typechecking algorithm. But from a modern perspective, we can conclude, by induction on the typing derivation, type safety: well-typed programs can’t go `wrong`

.

Throughout we’ll use a simple ML as our programming language of interest: \[ \begin{align*} \text{Variables } x &\in \Sigma, \text{ an infinite set of variables}\\ \text{Built-ins } b &\in B, \text{ a set of built-in operations}\\ \text{Terms } e &::= x \mid b \mid e_1e_2 \mid \lambda x.e \mid \texttt{letrec $x = e_1$ in $e_2$} \\ \text{Types } \tau, \nu &::= T_1 \mid \dots \mid T_n \mid \tau \to \nu \end{align*} \] With a simple type system: \[ \frac{(x:\tau) \in \Gamma}{\Gamma \vdash x:\tau} \hspace{1.5em} \frac{\Gamma \vdash f:\tau\to \nu \hspace{1.5em} \Gamma \vdash x:\tau}{\Gamma \vdash f(x) : \nu} \hspace{1.5em} \frac{\Gamma, x:\tau \vdash e:\nu}{\Gamma \vdash \lambda x.e : \tau \to \nu} \hspace{1.5em} \frac{\Gamma, x:\tau \vdash e_1:\tau \hspace{1.5em} \Gamma,x:\tau \vdash e_2:\nu}{\Gamma \vdash \texttt{letrec $x = e_1$ in $e_2$} : \nu} \] For convenience let’s require all variables in \(\Gamma\) to be distinct, and I’ll assume the built-ins \(b\) and base types \(T_i\) come with their own typing rules as well.

Our programming language is a simple ML, with a fixed collection of base types \(T_1 \dots T_n\). We will interpret everything as values in a domain \(V\), which will include a special value \(\bot\) representing nontermination.

In order to even define the denotational semantics, we’re gonna need quite a bit of mathematical machinery. A reference here is Abramsky and Jung’s notes, or Amadio and Curien’s “Domains and lambda calculi”.

Recursive definitions are tricky. A recursive definition can be viewed as a kind of limit of all the partial outputs of an infinite process. This motivates the following definition:

**Definition.** An *\(\omega\)-cpo* is a partially ordered set \(X\) with a least element \(\bot\) such that every chain \(x_1 \sqsubseteq x_2 \sqsubseteq x_3 \sqsubseteq \dots\) has a least upper bound \(\bigvee_n x_n\). A function \(f : X \to Y\) between \(\omega\)-cpos is *continuous* if it is monotone and preserves upper bounds of chains: for every chain \(x_1 \sqsubseteq x_2 \sqsubseteq x_3 \sqsubseteq \dots\) in \(X\), we have \(f(\bigvee_n x_n) = \bigvee_n f(x_n)\).

If we make our domain \(V\) an \(\omega\)-cpo, and make our functions continuous, then we can interpret recursive functions as least fixed points:

**Proposition.** Let \(X\) be an \(\omega\)-cpo, and \(f : X \to X\) a continuous function. Then \(\bot \sqsubseteq f(\bot) \sqsubseteq f(f(\bot)) \sqsubseteq \dots\) since \(\bot \sqsubseteq f(\bot)\) and \(f\) is monotone, and the least upper bound \(\bigvee_n f^n(\bot)\) is the least fixed point of \(f\).

Some helpful constructions on \(\omega\)-cpos:

- If \(S\) is any set, we can turn it into an \(\omega\)-cpo \(S^\bot\) by giving it the discrete order, then adding a least element \(\bot\).
- If \(X, Y\) are \(\omega\)-cpos, then the collection of continuous functions \([X \to Y]\) is an \(\omega\)-cpo, ordered pointwise. The operation \(\texttt{fix} : [X \to X] \to X\) assigning each \(f : X \to X\) to its least fixed point \(\bigvee_n f^n(\bot)\) is continuous.
- We have arbitrary Cartesian products of \(\omega\)-cpos; these work as you’d expect. As a bonus, functions out of a finite product of \(\omega\)-cpos are continuous iff they’re continuous in each argument separately.
- When taking disjoint unions, we have to be careful for the result to still have a least element. So we identify \(\bot_X\) with \(\bot_Y\) in \(X \sqcup Y\), to get the
*reduced sum*\(X \oplus Y\). It works as you’d expect (though it’s not quite the categorical coproduct of \(\omega\)-cpos).

We’ll also need ideas of subdomains. If \(D \subseteq D'\) are \(\omega\)-cpos, then \(D\) is a *nice subdomain* of \(D'\) (not standard terminology) if

- the inclusion function \(\iota : D \to D'\) is continuous, and
- there exists a (necessarily unique) continuous projection function \(\pi : D' \to D\) such that \(\pi(x) \sqsubseteq x\), with equality holding iff \(x \in D\).

More generally, we can consider a domain \(D\) as a nice subdomain of \(D'\) whenever we have an injective function \(\iota\) and surjective function \(\pi\) satisfying the above conditions. These subdomains have nice properties from a universal algebra perspective because \(D\) is both a subobject and a quotient of \(D'\), and from a categorical perspective \(\iota\) and \(\pi\) are an adjunction.

We want a value \(x \in V\) to be either \(\bot\), or an element of one of our base types \(T_i\), or a continuous function \(V \to V\). We’ll also add the extra value \(\texttt{wrong}\) representing that a type error happened. So we are looking for an \(\omega\)-cpo \(V\) satisfying \[V \cong T_1^\bot \oplus \dots \oplus T_n^\bot \oplus \{\texttt{wrong}\}^\bot \oplus [V \to V]\]

**Theorem.** There exists an \(\omega\)-cpo \(V\) satisfying this isomorphism.
To construct it, let \(E\) denote the initial terms \(T_1^\bot\oplus\dots\oplus T_n^\bot\oplus\{\texttt{wrong}\}^\bot\). Construct a sequence of approximate solutions by
\[\begin{align*}
V_0 &= \{\bot\}\\
V_1 &= E \oplus [V_0 \to V_0]\\
V_2 &= E \oplus [V_1 \to V_1]\\
\vdots
\end{align*}
\]
Consider \(V_0\) as a nice subdomain of \(V_1\) by the inclusion \(\iota_0(\bot) = \bot\) and the projection \(\pi_0(x) = \bot\). Each subsequent \(V_i\) is a nice subdomain of \(V_{i+1}\) by
\[
\begin{align*}
\iota_i(x) &= x, \text{ if $x \in E$}&&\iota_i(f) = \iota_{i-1}\circ f\circ\pi_{i-1}, \text { if $x \in [V_{i-1} \to V_{i-1}]$}\\
\pi_i(x) &= x, \text{ if $x \in E$}&&\pi_i(f) = \pi_{i-1}\circ f \circ \iota_{i-1}, \text{ if $x \in [V_i \to V_i]$}
\end{align*}
\]
An element of \(V_i\) intuitively represents the “\(i\)th partial output” of a program. We’ll construct \(V\) as the limit of these \(V_i\)’s. Concretely, define \(V\) to be the set of functions \(f : \mathbb{N} \to \bigcup_i V_i\) such that \(f(i) \in V_i\) and \(\pi_i(f(i+1)) = f(i)\). This is an \(\omega\)-cpo and we have \(V \cong E \oplus [V \to V]\).

Now that we’ve defined \(V\), we can actually get to defining the denotational semantics. It will be parametrized by an environment \(\rho : \Sigma \to V\) mapping variables to values; concretely, this means our evaluation function \([\![\cdot]\!]\) will map terms to continuous functions \([\Sigma \to V] \to V\). As usual, it’s defined by recursion on the syntax.

\[\begin{align*} [\![ x ]\!]\rho &= \rho(x) \\ [\![ e_1e_2 ]\!]\rho &= \begin{cases} ([\![ e_1 ]\!]\rho)([\![ e_2 ]\!]\rho), &\text{if $[\![ e_1 ]\!] \rho \in [V \to V]$} \\ \texttt{wrong},&\text{otherwise}\end{cases} \\ [\![ \lambda x. e ]\!]\rho &= v \mapsto [\![ e ]\!](\rho[x\coloneqq v])\\ [\![\texttt{letrec $x = e_1$ in $e_2$}]\!]\rho &= [\![ e_2 ]\!](\rho[x \coloneqq v_1]), \text{ where } v_1 = \texttt{fix}(v \mapsto [\![ e_1 ]\!](\rho[x \coloneqq v])) \end{align*} \]I assume that built-ins \(b \in B\) come with an intepretation \([\![ b ]\!]\) as well.

The next step is to define when a value \(v \in V\) has a type \(\tau\), which I’ll write as \(v : \tau\) This is a *semantic* typing relation, defined on the semantic values, in contrast to the *syntactic* typing judgement from earlier. As always, it’s by recursion on the structure of the type.
- For a base type \(T_i\), \(v : T_i\) if either \(v = \bot\) or \(v \in T_i\) (i.e., if \(v \in T_i^\bot\))
- For a function type \(\tau \to \nu\), \(v : \tau \to \nu\) if \(v = f \in [V \to V]\) and, for every \(x \in V\) such that \(x\) has type \(\tau\), \(f(x)\) has type \(\nu\).

This definition comes with an annoying technical proof:

**Proposition.** For every type \(\tau\), the collection \(\{v \in V \mid v : \tau\}\) is a nice subdomain of \(V\). The proof is straightforward by induction on \(\tau\).

As a corollary, whenever \(f : \tau \to \tau\), its least fixed point \(\texttt{fix}(f)\) has type \(\tau\).

Not every value has a type, and this is important: notably, \(\texttt{wrong}\) has no type. So when we prove that every *syntactically* well-typed term evaluates to a *semantically* well-typed value, we’ll get as a corollary that well-typed terms can’t go \(\texttt{wrong}\).

We’ll also need a notion of well-typed environments \(\rho : \Sigma \to V\). I’ll write \(\rho : \Gamma\) to mean that for every \((x : \tau) \in \Gamma\), we have that \(\rho(x) : \tau\).

Suppose that each built-in \(b\) is assinged a type \(\tau\) and a value \([\![ b ]\!]\) such that \([\![ b ]\!] \rho : \tau\) for all \(\rho\). With all that in place, we can finally state the main theorem:

**Theorem.** If \(\Gamma \vdash e : \tau\), then for every \(\rho : \Gamma\), we have \([\![ e ]\!] \rho : \tau\).

By induction on the typing derivation.

- If \(e = x\) is a variable, we have \([\![ x ]\!] \rho = \rho(x) : \tau\) since \(\rho : \Gamma\).
- If \(e = e_1e_2\) is a function application with \(\Gamma \vdash e_1 : \nu \to \tau\) and \(\Gamma \vdash e_2 : \nu\), we have \(([\![ e_1 ]\!] \rho)(v) : \tau\) for every \(v : \nu\) by inductive hypothesis; in particular, picking \(v=[\![ e_2 ]\!] \rho\), we get \([\![ e_1 e_2 ]\!] : \tau\).
- If \(e = \lambda x. e_1\) is a lambda abstraction and \(\tau = \mu \to \nu\), suppose \(v : \mu\). Then the environment \(\rho[x \coloneqq v]\) is well-typed at \((\Gamma, x:\mu)\), so by inductive hypothesis we get \([\![ e_1 ]\!] (\rho[x \coloneqq v]) : \nu\).
- If \(e = \texttt{letrec \(x = e_1\) in \(e_2\)}\) is a let binding, with \(\Gamma, x:\nu \vdash e_1 : \nu\), then let \(v_1 = \texttt{fix}(v \mapsto [\![ e_1 ]\!](\rho[x \coloneqq v]))\) as in the semantics. First, \(v \mapsto [\![ e_1 ]\!](\rho[x \coloneqq v])\) as type \(\nu \to \nu\) by inductive hypothesis as above, so \(v_1 : \nu\). Finally \([\![ e_2 ]\!](\rho[x \coloneqq v_1]) : \tau\) by the second inductive hypothesis.

And in the thrilling conclusion we can finally prove

**Corollary** (type safety). If \(\cdot \vdash e : \tau\), then “\(e\) does not go wrong”, i.e. \([\![ e ]\!] \rho \neq \texttt{wrong}\): we have that \([\![ e ]\!] \rho : \tau\), but \(\texttt{wrong}\) has no type.

What a lot of mathematical machinery! Especially compared to syntactic type safety, which proves type safety for an operational semantics using basically no math at all.

The worst part is, this kind of domain-theoretic semantics doesn’t really scale very well: when adding new programming language features, like effects, it’s often easy to give them an operational semantics and extend a typical progress and preservation proof. But here, you’d have to figure out how to model them denotationally – which can be really hard, as demonstrated by how much it took just to support the simple feature of recursion!

All that said, semantic type safety does have one very big advantage over progress and preservation, which is that you get a lot of flexibility in how you define the semantic typing relation. So for type systems with relatively complicated runtime invariants, it’s been making a return in recent years (though far removed from any domain theory!) with the people behind Iris having success proving type safety with respect to a denotational semantics in terms of higher-order separation logic.