Algebraic Subtyping (pt.1) Recursion Scheme

2026-05-31algebraic-subtyping, 2026, draft
Lately I found it's (far) more intriguing to read PhD dissertations compared with other kinds of papers/textbooks. When reading 「A Behavioral Notion of Subtyping」 by Barbara H. Liskov, i derived some interest at subtyping, but wanted to study about it in the functional world intead. Some (re)searches led me to Stephen Dolan's PhD dissertation 「Algebraic Subtyping」. The title immediately illuminated me, since i've been addicted to algebraic stuff (algebraic effects, etc.) but have never actually studied closely about them. This series of pages serve as my interpretation of his complete algebraic subtyping system as well as (some of) the obscure concepts behind.
{-# LANGUAGE DeriveFunctor #-}

import           Data.Map ( Map )
import qualified Data.Map as Map

In Chapter.3 Constructing types of Algebraic Subtyping, Stephen defines the following systems of types progressively, each with more structure than the previous:

\begin{array}{ll} \mathcal{T}_\mathrm{s} & \text{simple types} \\ \mathcal{T}_\mathrm{o} & \text{simple types, subtyping order} \\ \mathcal{T}_\mathrm{b} & \text{simple types, subtyping order (bounded)} \\ \mathcal{T}_\mathrm{l} & \text{simple types, subtyping lattice} \\ \mathcal{T}_\mathrm{V} & \text{simple types, subtyping lattice, type variables} \\ \widetilde{\mathcal{T}_\mathrm{V}} & \text{simple types, subtyping lattice, type variables, recursive types} \end{array}

This page will focus on the first system

\mathcal{T}_\mathrm{s}: simple types

Stephen studies a minimal calculus with subtyping, containing only 3 (foundational) types: booleans, functions & records. Though reductive, these three features are just sufficient to make the subtyping relation realistically awkward:

Booleans give a base type, incomparable to types made with any other type constructor.

Functions give type constructor with type parameters of different variance: contravariant in their domain and covariant in their range.

Records give subtyping relations between type constructors of different arity, since a record type with more fields/labels is a subtype of one with fewer.

Varianece describes how subtyping relationships/directions on types propagate through a type constructor. Concretely, Given A <: B (A is a subtype of B, or wherever you need something of type B, you can use something of type A), what subtyping relationship (if any) holds between F[A] and F[B] for some type constructor (Functor) F.

Covariance means the subtyping direction is preserved: if A <: B, then F[A] <: F[B]. The return type of a funciton is covariant, if f: X \rightarrow A and you need something of type X \rightarrow B, f works fine because every A is a B, thus (X \rightarrow A) <: (X \rightarrow B). Contravariance means the subtyping direction is reversed: if A <: B, then F[B] <: F[A]. Suppose you need a predicate A \rightarrow \text{Bool} on A, the predicate B \rightarrow \text{Bool} works fine, thus (B \rightarrow \text{Bool}) <: (A \rightarrow \text{Bool}).

Stephen formalizes \mathcal{T}_\mathrm{s} as a one-line definition laying out their syntax:

\tau \in \mathcal{T}_\mathrm{s} ::= \text{bool} \mid \tau_1 \rightarrow \tau_2 \mid \{\ell_1 : \tau_1, \ldots, \ell_n : \tau_n\}

Note that in this first simple type system, there ain't any subtyping yet.

The Two Viewpoints of Types

According to Stephen:

Such a definition can be read in two ways. First, we have the syntactic viewpoint: types are trees of symbols according to the above grammar. Alternatively, we have the algebraic viewpoint: types are members of the initial algebra of a functor on Set.

The tree of symbols

Syntactically, the embodiment of \mathcal{T}_\mathrm{s} is just a tree (algebraic data type):

data Ty
   = TBool
   | TFunc Ty Ty
   | TRec  [(Label, Ty)]
   deriving (Show, Eq)

type Label = String

where bool is a leaf, \tau_1 \rightarrow \tau_2 is a node with two children, and \{\ell_1 : \tau_1, \ldots, \ell_n : \tau_n\} is a node whose children are label-type pairs.

Algebra of an endofunctor, in other words, F-algebra

For a category C and endofunctor F, an algebra of F (F-algebra) is a tuple (X, \alpha) where X is an object in C (the carrier of the algebra) and \alpha : F(X) \rightarrow X is a morphism (the structure map, or the evaluator) that collapses one application of F into X.

In the context of haskell, the category C is the Hask category.

A homomorphism between two F-algebras (X, \alpha) and (Y, \beta) is a morphism m : X \rightarrow Y in C such that the following diagram commutes:

\require{amscd} \begin{CD} F(X) @>{F(m)}>> F(Y) \\ @V{\alpha}VV @VV{\beta}V \\ X @>>{m}> Y \end{CD}

Composition of such homomorphisms of algebras is given by composition of the underlying morphisms in C. This yields the category of F-algebras.

Initial F-algebra

An initial algebra of an endofunctor F on a category C is the initial object in the category of F-algebra, where an initial object in a category C is an object Ø such that for all objects x \in C, there's a unique morphism Ø \xrightarrow{\exists!} x.

Essentially, it is an F-algebra (\mu F, \text{in}) such that for every other F-algebra (X, \varphi), there is a unique homomorphism (catamorphism) \text{cata}_{\varphi} : \mu F \rightarrow X, such that the following diagram commutes:

\require{amscd} \begin{CD} F(\mu F) @>{F(\mathsf{cata}_\varphi)}>> FA \\ @V{\mathsf{in}}VV @VV{\varphi}V \\ \mu F @>>{\mathsf{cata}_\varphi}> A \end{CD}

which means \mu F is the freest carrier, the one that contains exactly what the functor generates and nothing more. We can then extend the type system based on this initial algebra.

Finding the endofunctor

Revisiting the set \mathcal{T}_s of simple types:

\tau \in \mathcal{T}_s ::= \text{bool} \mid \tau_1 \rightarrow \tau_2 \mid \{\ell_1 : \tau_1, \ldots, \ell_n : \tau_n\}

Stephen assumes some set \mathcal{L} of record labels, over which \ell_i range. Rather than using a syntactic list, record types are defined as a partial function from record labels to types, making the order of label-type pairs in record types irrelevant. Occasionally, he explicitly denotes this: \{\ell_1 : \tau_1 , \ell_2 : \tau_2\} is the same as {f} where \text{dom} f = \{\ell_1, \ell_2\},\,f(\ell_1) = \tau_1,\,f(\ell_2) = \tau_2.

He then constructs three functors, \text{Bool}_s, \text{Func}_s & \text{Rec}_s, and combines them into a functor \mathcal{F}_s of which \mathcal{T}_s is the (carrier of the) initial algebra.

The boolean type is a closed type constructor which takes no type arguments. There is exactly one boolean type bool with no subtypes embedded in it. The functor \text{Bool}_s : \textbf{Set} \rightarrow \textbf{Set} is therefore the constant functor \text{Bool}_s(A) = 1 where 1 denotes the terminal object in Set, i.e., any singleton set \{⋆\}.

data BoolsF a = BoolsF
   deriving (Show, Eq, Functor)

Function types are parametrized by a pair of types (the domain and the codomain), so the functor \text{Func}_s returns a product: \text{Func}_s(A) = A \times A

data FuncsF a = FuncsF a a
   deriving (Show, Eq, Functor)

A record type associates some labels (from a fixed set \mathcal{L}) to types, but not every label needs to appear, hence it's a partial function from \mathcal{L} to A. The key insight is an isomorphism between partial and total functions:

(A \rightharpoonup B) \cong (A \rightarrow B + 1)

A partial function f : \mathcal{L} \rightharpoonup A either maps a label \ell to some type f(\ell) \in A, or is undefined at \ell. The right-hand side f : \mathcal{L} \rightarrow (A+1) encodes this choice explicitly: inl(a) means "label maps to type a", and inr(⋆) (the unique element of 1) means "label is absent from the record."

Using this isomorphism and the set-theoretic notation for function spaces as exponentials (since the set of functions X \rightarrow Y is written as Y^X), the functor becomes:

\text{Rec}_s(A) = (A+1)^{\mathcal{L}}

This is an exponential functor, or the set of all total functions from \mathcal{L} to A + 1. Each such function is a record-type descriptor: for each label \ell \in \mathcal{L}, it says either which type occupies that field, or that the field is absent.

For the haskell representation, we use Map Label (Maybe a) because:

Maybe aa + 1 (Just = inl, Nothing = inr)

Map Label (Maybe a) ≅ total functions f : \mathcal{L} \rightarrow (A+1)

which is isomorphic to partial functions f : \mathcal{L} \rightharpoonup A

newtype RecsF a = RecsF (Map Label (Maybe a))
   deriving (Show, Eq, Functor)

Now, since a type is either a boolean, or a function type, or a record type, the combined functor \mathcal{F}_s : \textbf{Set} \rightarrow \textbf{Set} is the coproduct (disjoint union) of the three:

\mathcal{F}_s = \text{Bool}_s(A) + \text{Func}_s(A) + \text{Rec}_s(A)
data FsF a
   = FsBool (BoolsF a)
   | FsFunc (FuncsF a)
   | FsRec  (RecsF a)
   deriving (Show, Eq, Functor)

\mathcal{T}_s is Fix FsF

By Lambek's lemma, the structure map \text{in} : F(\mu F) \rightarrow \mu F of an initial algebra is an isomorphism. This means \mu F \cong F(\mu F). In other words, the carrier of the initial algebra is also a fixed point of F.

Albeit the lemma doesn't claim the fixed point of \mathcal{F}_s to be the carrier of initial \mathcal{F}_s-algebra, it's still tempting to prove this exact case for our FsF.

newtype Ts = Ts { unTs :: FsF Ts }
   deriving (Show, Eq)

Given the definition, it's natural/evident that (Ts, Ts :: FsF Ts -> Ts) is a well-typed \mathcal{F}_s-algebra.

We can now define the catemorphism from our (candidate) initial \mathcal{F}_s-algebra into any other \mathcal{F}_s-algebra. In essence, given any \mathcal{F}_s-algebra (A, \varphi), the following diagram must commute:

\require{amscd} \begin{CD} \mathcal{F}_s(\mathcal{T}_s) @>{\mathcal{F}_s(\mathsf{cata}\,\varphi)}>> \mathcal{F}_s(A) \\ @V{\mathsf{Ts}}VV @VV{\varphi}V \\ \mathcal{T}_s @>>{\mathsf{cata}\,\varphi}> A \end{CD}

The commutativity means \mathsf{cata}\,\varphi \circ \mathsf{Ts} = \varphi \circ \mathcal{F}_s(\mathsf{cata}\,\varphi). Note that we can substitute \mathcal{F}_s(\mathsf{cata}\,\varphi) with \mathsf{fmap}\,\mathsf{cata}\,\varphi, thus: \mathsf{cata}\,\varphi \circ \mathsf{Ts} = \varphi \circ \mathsf{fmap}\,\mathsf{cata}\,\varphi.

cata :: (FsF a -> a) -> Ts -> a
cata alg = alg . fmap (cata alg) . unTs
TODO: prove the uniqueness of cata

In this way, we successfully find the endofunctor \mathcal{F}_s, and the simple types \mathcal{T}_s. What's more, the extensibility of initial algebra means anything we can construct in \mathcal{T}_s can be mapped in a structure-preserving way to any other \mathcal{F}_s-algebra, even if the other algebra defines other types as well.

Example \mathcal{F}_s-algebras

This section contains the examples of other \mathcal{F}_s-algebras.

tsBool :: Ts
tsBool = Ts (FsBool BoolsF)

tsFunc :: Ts -> Ts -> Ts
tsFunc dom rng = Ts (FsFunc (FuncsF dom rng))

tsRec :: Map Label (Maybe Ts) -> Ts
tsRec fields = Ts (FsRec (RecsF fields))

prettyAlg :: FsF String -> String
prettyAlg (FsBool  BoolsF)           = "Bool"
prettyAlg (FsFunc  (FuncsF dom rng)) = "(" ++ dom ++ " -> " ++ rng ++ ")"
prettyAlg (FsRec   (RecsF fields))   =
  "{ " ++ Map.foldlWithKey' showField "" fields ++ "}"
  where
    showField acc l Nothing  = acc ++ l ++ " : ∅, "
    showField acc l (Just t) = acc ++ l ++ " : " ++ t ++ ", "

prettyTs :: Ts -> String
prettyTs = cata prettyAlg

depthAlg :: FsF Int -> Int
depthAlg (FsBool  _)                = 0
depthAlg (FsFunc  (FuncsF d r))   = 1 + max d r
depthAlg (FsRec   (RecsF fields)) =
   1 + maximum (0 : [d | Just d <- Map.elems fields])

depthTs :: Ts -> Int
depthTs = cata depthAlg

testTs :: IO ()
testTs = do
   let t1 = tsBool
       t2 = tsFunc tsBool tsBool
       t3 = tsFunc t2 tsBool
       t4 = tsRec (Map.fromList
                    [("name",   Just tsBool)
                    ,("active", Just tsBool)
                    ,("address",Nothing)])
       t5 = tsRec (Map.fromList
                    [("name",    Just tsBool)
                    ,("process", Just t2)])

   mapM_ (\(n,t) -> putStrLn $ n ++ " = " ++ prettyTs t)
         [("t1",t1),("t2",t2),("t3",t3),("t4",t4),("t5",t5)]

\mathcal{F}_s-coalgebra

For a category C and endofunctor F, n coalgebra of F (F-coalgebra) is a tuple (X, \alpha) where X is an object in C (the carrier of the algebra) and a morphism \alpha : X \rightarrow F(X).

Given two F-coalgebras (X, \eta : X \rightarrow F X),\;(Y, \theta : Y \rightarrow F Y), a coalgebra homomorphism is a morphism f : X \rightarrow Y which the following diagram commute:

\require{amscd} \begin{CD} X @>{f}>> Y \\ @V{\eta}VV @VV{\theta}V \\ F(X) @>>{F(f)}> F(Y) \end{CD}

Terminal \mathcal{F}_s-coalgebra

A terminal coalgebra for an endofunctor F on a category C is the terminal object in the category of F-coalgebra, where a terminal object in a category C is an object 1 such that for all objects x \in C, there's a unique morphism !: x \xrightarrow{\exists!} 1.

Essentially, it is an F-coalgebra (\nu F, \text{out}) such that for every other F-coalgebra (X, \gamma), there is a unique coalgebra homomorphism (anamorphism) \text{ana}_{\gamma} : \nu F \rightarrow X, such that the following diagram commutes:

\require{amscd} \begin{CD} X @>{\mathsf{ana}_\gamma}>> \gamma F \\ @V{\gamma}VV @VV{\mathsf{out}}V \\ F X @>>{F(\mathsf{ana}_\gamma)}> F(\gamma F) \end{CD}

which means \nu F is the most complete carrier that supports all possible \mathcal{F}_s-coalgebra structures,