We review the fundamentals of the theory of stochastic optimal control and its application to market making in limit order books.
The usual setup is to work with controlled diffusion processes of the form,
$$ dX_s = b(X_s, \alpha_s)ds + \sigma(X_s, \alpha_s)dW_s $$For example, in the classic Merton's Problem, we have the following,
The wealth process $X_t$ has the dynamics,
$$ \begin{align} dX_t &= H_t dS_t + r\left(X_t - H_tS_t\right) dt \\ &= H_t\left(\mu S_t dt + \sigma S_t dW_t\right)+ r\left(X_t - H_tS_t\right) dt\\ &= rX_t dt +H_t \left[ (\mu-r) S_t dt + \sigma S_t dW_t\right]\\ &= rX_t dt +H_t S_t\left[ (\mu-r) dt + \sigma dW_t\right]\\ &= rX_t dt +\underbrace{H_t S_t}_{\pi_t} \sigma\left[ \underbrace{\frac{(\mu-r)}{\sigma}}_{\lambda} dt + dW_t\right]\\ &= rX_t dt +\pi_t \sigma\left[ \lambda dt + dW_t\right] \end{align} $$
where in the above, $\pi_t$ is the control and $\lambda$ is the market price of risk for the stock.
The finite horizon problem is posed as having the agent maximise his expected utility of terminal wealth over the horizon $\left[0,T\right]$. The value function for the agent is,
$$ u(x):= \underbrace{\sup}_{\pi \in \mathcal{A}(x)} \mathbb{E}\left[\left.U(X_T)\right| X_0 = x\right] $$
where $x$ is the initial wealth and $U(\cdot)$ is some utility function.
In general, the value function is of the form,
$$ u(x):= \sup_{\pi \in \mathcal{A}(x)} \mathbb{E}\left[\left.\int_0^T f(s, X_s, \pi_s)ds + F(X_T) \right|X_0=x\right] $$Why is the value function defined as such?
For example, in Merton's Utility from Terminal Wealth and Intermediate Consumption Problem, the value function is,
$$ u(x):= \sup_{(\pi, c) \in \mathcal{A}(x)} \mathbb{E}\left[\left.\int_0^T e^{-\delta t}U_1(c_t) ds + U_2(X_T) \right|X_0=x\right] $$where $c_t$ is consumption at time $t$ and $U_1, U_2$ are utility functions.
So what the above intuitively says is that, we want to chose optimal controls $(\pi, c)$ to maximize the expected utility of,
We now explore how to go about solving control problems such as the above using the dynamic programming formulation and explore its link to HJB equations.
As usual, we work with controlled diffusions of the form,
$$ dX_s = b(X_s, \alpha_s)ds + \sigma(X_s, \alpha_s)dW_s $$And we're interested in the finite horizon problem,
$$ u(x):= \sup_{\alpha \in \mathcal{A}(x)} \mathbb{E}\left[\left.\int_0^T f(s, X_s, \alpha_s)ds + F(X_T) \right|X_0=x\right] $$where $T \in (0, \infty)$, $\mathcal{A}(x)$ is the set of admissible controls, given $X_0=x$.
The dynamic programming formulation tackles the problem by considering a starting state $(t, x)$, and let $\mathcal{A}(t,x)$ be the set of admissible controls with this starting state.
A new notation is also introduced where we will write $\left(X_s^{t, x}\right)_{s\in (t,T)}$ to denote a path of the controlled diffusion starting at $X_t=x$.
To cater to different starting points, we define an objective functional,
$$ J(t, x;\alpha) := \mathbb{E}\left[\left.\int_t^T f(s, X_s^{t,x}, \alpha_s)ds + F(X_T^{t,x}) \right|X_t=x\right] $$The objective functional states that, if we start from time $t$, with initial value of $X_t=x$, and apply the control $\alpha$ (any arbitrary control), the expectation of the terminal reward $F(X_T^{t,x})$ and the ongoing sum of rewards $\int_t^T f(s, X_s^{t,x}, \alpha_s)ds$, is defined as $J(t, x;\alpha)$.
With that we can write the value function starting at time $t$ as,
$$ u(t, x) := \sup_{\alpha \in \mathcal{A}(t,x)} J(t, x;\alpha) $$and using this notation, we can write the original value function starting at $t=0$ as,
$$ u(x) \equiv u(0,x) $$So far, we have only been introducing additional notation to facilitate the formulation of the problem as a dynamic programming problem.
In this section we introduce the dynamic programming principle, which will lead us to the HJB equation involving a PDE on $u(t,x)$. Solving the PDE and maximising it over the space of admissible controls would give us the optimal control to use.
The dynamic programming principle starts of with comparing two strategies,
With the above formulation, it is immediately obvious that strategy $\mathbf{I}$ is at least as good (since $\hat{\alpha}_s$ is the optimal control over the whole period) as strategy $\mathbf{II}$.
Given this, we can write,
$$ u(t,x) \ge \mathbb{E}\left[\underbrace{\int_t^{t+h} f(s, X_s^{t,x}, \alpha_s)ds}_{\text{running reward from $t$ to $t+h$}} + \underbrace{u(t+h, X_{t+h}^{t,x})}_{\text{the optimal value from $t+h$ to $T$}} \right] $$Note that in the above, the inequality results from the term $\mathbb{E}\left[\int_t^{t+h} f(s, X_s^{t,x}, \alpha_s)ds\right]$, due to the fact that it is using an arbitrary control $\alpha_s$.
To get equality, we need to maximize the RHS over the space of controls. This means we must write,
$$ u(t,x) = \sup_{(\alpha_x)_{s\in[t,t+h)}} \mathbb{E}\left[\underbrace{\int_t^{t+h} f(s, X_s^{t,x}, \alpha_s)ds}_{\text{running reward from $t$ to $t+h$}} + \underbrace{u(t+h, X_{t+h}^{t,x})}_{\text{the optimal value from $t+h$ to $T$}} \right] $$In the next step, we apply Ito on $u(t, x)$,
$$ \begin{aligned} du &= u_t dt + u_x dX_t + \frac{1}{2} u_{xx} d\langle X\rangle_t\\ &= u_t dt + u_x \left[b(X_t, \alpha_t)dt + \sigma(X_t, \alpha_t)dW_t\right] + \frac{1}{2} u_{xx} \sigma^2(X_t, \alpha_t)dt \\ &=u_t dt + b(X_t, \alpha_t)u_x dt + \frac{1}{2} \sigma^2(X_t, \alpha_t) u_{xx} dt + \sigma(X_t, \alpha_t)u_xdW_t \end{aligned} $$In integral form, we can also write,
$$ \begin{aligned} u(t+h, X_{t+h}^{t,x}) &= u(t, x) + \int_t^{t+h} \frac{\partial u}{\partial t}(s, X_s^{t,x}) + \underbrace{b(X_s, \alpha_s)\frac{\partial u}{\partial x}(s, X_s^{t,x}) + \frac{1}{2} \sigma^2(X_s, \alpha_s) \frac{\partial^2 u}{\partial x^2}(s, X_s^{t,x})}_{\text{this is termed } \mathcal{L}^\alpha(\cdot) \text{, the generator of diffusion}} ds + \int_t^{t+h} \sigma(X_s, \alpha_s)\frac{\partial u}{\partial x}(X_s, \alpha_s)dW_s\\ &= u(t, x) +\int_t^{t+h}\left( \frac{\partial u}{\partial t} + \mathcal{L}^\alpha u\right )\left(s, X_s^{t,x}\right)ds+ \int_t^{t+h} \sigma(X_s, \alpha_s)\frac{\partial u}{\partial x}(X_s, \alpha_s)dW_s \end{aligned} $$If we substitute $u(t+h, X_{t+h}^{t,x})$ into the inequality involving $u(t,x)$, the stochastic integral would disappear under the expectation operator, and we get,
$$ \begin{aligned} &u(t,x) \ge \mathbb{E}\left[\int_t^{t+h} f(s, X_s^{t,x}, \alpha_s)ds + u(t, x) +\int_t^{t+h}\left( \frac{\partial u}{\partial t} + \mathcal{L}^\alpha u\right )\left(s, X_s^{t,x}\right)ds\right]\\ \Rightarrow &\mathbb{E}\left[\int_t^{t+h} f(s, X_s^{t,x}, \alpha_s)ds +\left( \frac{\partial u}{\partial t} + \mathcal{L}^\alpha u\right )\left(s, X_s^{t,x}\right)ds\right]\le 0 \end{aligned} $$Now we divide by $h$, move it within the expectation, and let $h$ tends to zero. This gives us,
$$ \begin{aligned} f(t, x, \alpha) + \left( \frac{\partial u}{\partial t} + \mathcal{L}^\alpha u\right ) \left(t, x\right) &\le 0\\ \Rightarrow \frac{\partial u}{\partial t} + \mathcal{L}^\alpha u(t, x)+f(t, x, \alpha) &\le 0 \end{aligned} $$This inequality result is for any arbitrary control $\alpha$.
To get equality, we must have the optimal $\alpha$, with which we can write the HJB Equation,
$$ \frac{\partial u}{\partial t} + \sup_{\alpha \in \mathcal{A}}\left[ \mathcal{L}^\alpha u(t, x)+f(t, x, \alpha)\right] = 0 $$In other words, for the optimal $\alpha=\hat{\alpha}$, we have,
$$ \frac{\partial u}{\partial t} + \mathcal{L}^\hat{\alpha} u(t, x)+f(t, x, \hat{\alpha}) = 0 $$So far we've worked with diffusions as the driving source of uncertainty in the control problem. In market-making, where-by participants post limit-orders and market-orders, it is also important to incorporate counting process for driving uncertainty in the model.
We will provide a quick introduction to Poisson Counting Processes and see how they fit in to the same dynamic programming framework as we have described above.
A Poisson process is the continuous time analogue of the Bernoulli process with the following properties,
and note that in the above, we mean the above holds in the limit as $dt$ goes to zero. The "arrival rate" $\lambda$ is the expected number of arrivals per unit time.
Using the above assumption, we consider the probability that there are $k$ arrivals in time interval $t+\Delta t$, denoted by $P(k, t+dt)$. By the law of total probability, we must have the following relation,
$$ \begin{aligned} P(k, t+dt) &= \underbrace{P(k, t) }_{\text{$k$ arrivals in time $t$}}\times \underbrace{P(0, dt)}_{\text{0 arrivals in time $dt$}} + \underbrace{P(k-1, t)}_{\text{$k-1$ arrivals in time $t$}} \times \underbrace{P(1, dt)}_{\text{1 arrival in time $dt$}}\\ &= P(k, t) \times (1-\lambda dt) + P(k-1, t) \times \lambda dt \end{aligned} $$
The next step is to manipulate the above expression into a differential equation.
We begin by re-arranging terms,
$$ \begin{aligned} P(k, t+dt) &= P(k, t) \times (1-\lambda dt) + P(k-1, t) \times \lambda dt\\ P(k, t+dt) &= P(k, t) - P(k, t) \lambda dt+ P(k-1, t) \lambda dt\\ P(k, t+dt) - P(k,t) &= \lambda \left[P(k-1, t) - P(k, t)\right] dt\\ \end{aligned} $$From the above, if we go to the limit as $dt$ tends to zero,
$$ \begin{aligned} \frac{\partial P(k, t)}{\partial t} &= \lim_{dt \rightarrow 0} \frac{P(k, t+dt) - P(k,t) }{dt}\\ &= \lambda \left[P(k-1, t) - P(k, t)\right] \end{aligned} $$We now show that the following function satisfy the above differential equation,
$$ P(k, t) = e^{-\lambda t}\frac{(\lambda t)^k}{k!} $$Differentiating wrt $t$ gives,
$$ \begin{aligned} \frac{\partial P(k, t)}{\partial t} &= e^{-\lambda t} \frac{\lambda k(\lambda t)^{k-1}}{k!} -\lambda e^{-\lambda t}\frac{(\lambda t)^k}{k!}\\ &= \lambda \left[ e^{-\lambda t} \frac{ k(\lambda t)^{k-1}}{k!} - e^{-\lambda t}\frac{(\lambda t)^k}{k!}\right]\\ &= \lambda \left[ e^{-\lambda t} \frac{ (\lambda t)^{k-1}}{k-1!} - e^{-\lambda t}\frac{(\lambda t)^k}{k!}\right]\\ &= \lambda [P(k-1, t)-P(k,t)] \end{aligned} $$which is as desired.
The simplest Poisson Process is that with a jump size of 1, which is termed the Poisson Counting Process.
It is defined as $(N_t)_{t\ge0}$ with,
Importantly, it also has stationary independent increments, i.e. for $ 0\le s\le t$, the following holds,
$\mathbb{P}(N_{t+h} -N_{s+h} = k) = \mathbb{P}(N_t - N_s = k) = e^{-\lambda (t-s)}\frac{(\lambda( t-s))^k}{k!}$ for all $h > 0$.
Another useful result is that $N_t - \lambda t$ is a martingale.
This is very important because remember in the derivation of the HJB equation, we need to make the stochastic integral wrt to the Wiener Process disappear under the expectation operator. We would need to same for the derivation later on involving the Counting Process.
Since we often need to work with SDEs of stock price involving jump diffusions like the below,
$$ \frac{dS_t}{S_t} = \mu dt + \sigma dW_t - \delta dN_t $$we need to now define what is meant by $dN_t$.
Recall that, roughly speaking, the probability of one count in an interval of amplitude $dt$ is of order $dt$ (given by $\lambda dt$ earlier), while the probability of more than one count is of higher order. We can translate this into the rule,
$$ \begin{aligned} dN_t&=\begin{cases} 1, \quad\text{if there is a jump in the time interval $(t, t+dt]$},\\ 0, \quad\text{otherwise},\\ \end{cases}\\ &d\langle N\rangle_t = dN_t, \qquad dN_t dt = 0 \end{aligned} $$
We also need Ito's lemma to work with jumps, and it is but a straight forward extension of the usual Ito's lemma with the addition work of adding back the all the jumps which occured in between the limits of integration.
Recall that for a continuous process $dX_s = b(X_s, \alpha_s)ds + \sigma(X_s, \alpha_s)dW_s$, Ito's lemma is as follows,
$$ \begin{aligned} df(X_t) &= f_x dX_t + \frac{1}{2}f_{xx} d\langle X\rangle _t\\ &= f_x \left[b(\cdot) dt + \sigma(\cdot) dW_t\right] + \frac{1}{2}f_{xx} \sigma^2(\cdot) dt \end{aligned} $$In integral form, we have,
$$ \begin{aligned} X_t &= X_0 + \int_0^t f_x b(\cdot) ds + \int_0^t f_x \sigma(\cdot) dW_s + \frac{1}{2}\int_0^t f_{xx} \sigma^2(\cdot) ds \end{aligned} $$Since conceptually, the integral is a sum, and if we add jumps at finite number of points to the process $X_t$, what we need to extend Ito's lemma is simply to just add back the jumps.
Assume to following notation,
$$ N_s = N_{s^-} + 1 \quad \text{if} \quad dN_s=1 $$Then the extended Ito's lemma is,
$$ \begin{aligned} X_t &= X_0 + \int_0^t f_x b(\cdot) ds + \int_0^t f_x \sigma(\cdot) dW_s + \frac{1}{2}\int_0^t f_{xx} \sigma^2(\cdot) ds + \sum_{0\le s\le t} \left[f\left(X_s\right)-f\left(X_{s^-}\right)\right]\\ &= X_0 + \int_0^t f_x b(\cdot) ds + \int_0^t f_x \sigma(\cdot) dW_s + \frac{1}{2}\int_0^t f_{xx} \sigma^2(\cdot) ds + \int_0^t f\left(X_s\right)-f\left(X_{s^-}\right) dN_s \end{aligned} $$Therefore, the differential form of Ito's lemma is,
$$ \begin{aligned} df(X_t) &= f_x dX_t + \frac{1}{2}f_{xx} d\langle X\rangle _t + \left[f\left(X_s\right)-f\left(X_{s^-}\right) \right]dN_s\\ \end{aligned} $$Note : The reason for all the above hard work on Ito's lemma for Counting Process is so that we can approach the problem of market making using HJB equations with jumps. For example, one of the simplest way to model the arrival of market orders is to use counting processes associated with the number of market orders up to time $t$.
For the counting process we defined earlier, the assumption was that the intensity, or "arrival rate", is a constant $\lambda$. In actual fact, we can make $\lambda$ a function of time as well, using $\lambda(t)$,
$$ P(k, dt)=\begin{cases} 1-\lambda(t) dt, \quad\text{if $k=0$}\\ \lambda(t) dt, \,\,\,\qquad\text{if $k=1$} \\ 0, \,\quad\qquad\text{if $k>1$} \end{cases} $$and it can be shown using similar approach as previous that we must have the differential equation,
$$ \begin{aligned} \frac{\partial P(k, t)}{\partial t} &= \lambda(t) \left[P(k-1, t) - P(k, t)\right] \end{aligned} $$and that the solution is,
$$ P(k, t) = \exp{\left(-\int_0^t \lambda(u) du\right)}\frac{\left(-\int_0^t \lambda(u) du\right)^k}{k!} $$If $N_t$ is the counting process with the time-dependent intensity $\lambda(t)$, then it is standard result that $M_t = N_t - \int_0^t \lambda(u) du$ is a martingale.
The above result also generalizes to,
To begin the HJB formulation, we consider again the familiar setup with,
The value function is,
$$ H(n):= \sup_{u \in \mathcal{A}} \mathbb{E}\left[\int_0^T F(s, N_s^u, u_s)ds + G(N_T^u) \right] $$If $u$ is an arbitrary control, the performance criteria is,
$$ H^u(n):= \mathbb{E}\left[\int_0^T F(s, N_s^u, u_s)ds + G(N_T^u) \right] $$and the agent seeks to maximise this performance criteria, i.e.,
$$ H(n) = \sup_{u \in \mathcal{A}} H^u(n) $$Next we again introduce the time-indexed notation to facilitate the resolution of the problem using the dynamic programming approach.
We define,
$$ H(t, n) = \sup_{u \in \mathcal{A}} H^u(t, n) $$with
$$ H^u(t, n) = \mathbb{E}_{t, n}\left[\int_t^T F(s, N_s^u, u_s)ds + G(N_T^u) \right] $$where $ \mathbb{E}_{t, n}[\cdot]$ represents expectation conditional on $N_t=n$, and $u$ is an arbitrary control when we refer to $H^u(t, n)$.
We take a slightly easier route to the derivation of the HJB equation this time round by using the martingale optimality principle.
Consider the process,
$$ \left(\underbrace{\int_0^t F(s, N_s^u, u_s)ds}_{\text{using arbitrary control $u$ up to time $t$}} + \underbrace{H(t,N_t)}_{\text{using the optimal control from time $t$ onwards}}\right)_{t\in [0,T]} $$The martingale optimality principle says that the above process is a super-martingale for an arbitrary control $u$, and it is a martingale for the optimal control $\hat{u}$.
More formally, let $(R_t)_{t\in[0,t]}$ be defined as,
$$ R_t = \int_0^t F(s, N_s^u, u_s)ds + H(t,N_t) $$By definition, $R_0 = H(0, n)$, which is our original objective $H(0, n)=\sup_{u \in \mathcal{A}} H^u(0, n)$ (see definitions above).
Now let us examine $R_T$, which is,
$$ R_T = \int_0^T F(s, N_s^u, u_s)ds + G(N_T^u) $$We can then write the following inequality,
$$ \underbrace{R_0 = H(0, n)}_{\text{the optimal value function using optimal control $\hat{u}$}} \ge \underbrace{\mathbb{E}[R_T] = \mathbb{E}\left[\int_0^T F(s, N_s^u, u_s)ds + G(N_T^u)\right]}_{\text{the value function using arbitrary control $u$}} $$What the above says is that $R_0 \ge \mathbb{E}[R_T]$, meaning $R_t$ is a super-martingale.
Equality in the above is achieved when we use the optimal control $\hat{u}$, in which case $R_t$ is a martingale, i.e., $R_0 =\mathbb{E}[R_T]$.
The dynamics of $R_t$ would provide a clue on when $R_t$ would be a martingale. By Ito,
$$ dR_t = F(t, N_t^u, u_t)dt + dH(t, N_t) $$Using Ito's lemma on $H(t, n)$, we have,
$$ dH(t, n) = H_t dt + \left[H\left(t, n+1\right)-H\left(t, n\right) \right]dN_t $$Continuing, we have,
$$ dR_t = F(t, N_t^u, u_t)dt + H_t dt + \left[H\left(t, n+1\right)-H\left(t, n\right) \right]dN_t $$$dN_t$ in the above is not a martingale, which is problematic. But recall the standard result that $M_t = N_t - \int_0^t \lambda(u) du$ is a martingale. This implies that,
$$ dM_t = dN_t -\lambda(t) dt $$So again continuing,
$$ \begin{aligned} dR_t &= F(t, N_t^u, u_t)dt + H_t dt + \left[H\left(t, n+1\right)-H\left(t, n\right) \right][dM_t + \lambda(t) dt]\\ &=F(t, N_t^u, u_t)dt + H_t dt + \lambda(t)\left[H\left(t, n+1\right)-H\left(t, n\right) \right] dt + \underbrace{\left[H\left(t, n+1\right)-H\left(t, n\right) \right]dM_t}_{\text{local martingale}} \end{aligned} $$This implies that if we take expectations, the local martingale term disappears and we get,
$$ \begin{aligned} \mathbb{E}[dR_t] &= F(t, N_t^u, u_t)dt + H_t dt + \lambda(t)\left[H\left(t, n+1\right)-H\left(t, n\right) \right] dt \\ &= [F(t, N_t^u, u_t)+H_t + \lambda(t)\left[H\left(t, n+1\right)-H\left(t, n\right) \right] dt \end{aligned} $$For $R_t$ to be a martingale, we need the entire expression in the brackets to disappear, and we must do this by optimizing over the space of controls $u \in \mathcal{A}$.
This leads just to the HJB equation,
$$ \frac{\partial H}{\partial t}(t, n) + \sup_{u \in \mathcal{A}}\left[F(t, N_t^u, u_t) + \lambda(t)\left[H\left(t, n+1\right)-H\left(t, n\right) \right]\right] = 0 $$With all the basics in place, we are now ready to formulate the market making problem in terms of an optimal control problem.
We model how a market maker (MM) maximises terminal wealth by trading in and out of positions using limit orders (LOs). The MM provides liquidity to the limit order book (LOB) by posting buy and sell LOs and the control variable is the depth, which is measured from the midprice, at which these LOs are posted. To formalise the problem, we list the relevant variables that we use throughout this section:
$(X_t^{\delta})_{0\le t\le T}$ denotes the market maker's cash process, and satisfies the SDE,
$$ dX_t^{\delta} = \underbrace{(S_{t^-} + \delta^+_t)\,dN_t^{\delta, +}}_{\text{selling high}} - \underbrace{(S_{t^-} - \delta^-_t)\,dN_t^{\delta, -}}_{\text{buying low}} $$
$(Q_t^{\delta})_{0\le t\le T}$ denote the agent's inventory process, with $Q_t^{\delta} = \underbrace{N_t^{\delta,-}}_{\text{qty bought}} - \underbrace{N_t^{\delta,+}}_{\text{qty sold} }$.
In this section we assume that the MM seeks the strategy $(\delta^{\pm}_t)_{0\le t\le T}$ that maximises cash at the terminal date $T$. We also assume that at time $T$ the MM liquidates her terminal inventory $Q_T$ using an MO at a price which is worse than the midprice to account for liquidity taking fees as well as the MO walking the LOB. Finally, the MM caps her inventory so that it is bounded above by $\bar{q}$ > 0 and below by $\underline{q}$ < 0, both finite, and also includes a running inventory penalty so that the performance criterion is,
$$ H^\delta(t,x,S,q) = \mathbb{E}_{t,x,q,S}\left[\underbrace{X_T}_{\text{terminal cash}} + \underbrace{Q_T^\delta \left(S_T^\delta - \alpha Q_T^\delta \right)}_{\text{liquidate inventory with penalty at time $T$}} - \underbrace{\phi \int_t^T \left(Q_u\right)^2 du}_{\text{running penalty term for holding inventory}}\right] $$In the above where $\alpha \ge 0$ represents the fees for taking liquidity (i.e. using an MO) as well as the impact of the MO walking the LOB, and $\phi\ge 0$ is the running inventory penalty parameter. The MM’s value function is,
$$ H(t,x,S,q) = \sup_{\delta^\pm \in \mathcal{A}}H^\delta(t,x,S,q) $$As done previously, we use the martingale optimality principle, but this time round, we take addition care to break up the process $R_t$ into its continuous part and the jump part.
The process $R_t$ is,
$$ R_t = -\phi \int_0^t Q_u^2\,du + H(t,X_t, S_t, Q_t) $$which we break up into the continuous part $R_{t, C}$ (assuming $x$ and $q$ fixed, while $t$ and $S$ vary),
$$ \require{cancel} R_{t, C} = -\cancel{\phi \int_0^t Q_u^2\,du} + H_C(t,\cdot, S_t, \cdot) $$and the jump part $R_{t, J}$ (assuming $S$ fixed),
$$ R_{t, J} = -\phi \int_0^t Q_u^2\,du + H_J(t,X_t, \cdot, Q_t) $$Using Ito on $R_{t, C}$,
$$ \begin{aligned} dR_{t, C} &= H_t dt + H_S dS_t + \frac{1}{2}H_{SS}d\langle S \rangle_t\\ &=H_t dt + H_S \sigma dW_t + \frac{1}{2}H_{SS}\sigma^2 dt\\ \mathbb{E}[dR_{t, C}] &= \left[\underbrace{H_t + \frac{1}{2}H_{SS}\sigma^2}_{\text{we need this to be zero to make $R_t$ a martingale}}\right] dt \end{aligned} $$The variables $x$ and $q$ are related as follows,
That is to say that $Q_t^\delta$ jumps due to two different scenarios, $dN_t^{\delta, +}$ jumping or $dN_t^{\delta, -}$ jumping.
Using Ito on $R_{t, J}$,
$$ \begin{aligned} dR_{t, J} &= -\phi q^2 dt + H_t dt + \left[H(t, x + (S+\delta^+), S, q-1) - H(t, x, S, q)\right] dN_t^{\delta, +}\\ &\quad + \left[H(t, x - (S-\delta^-), S, q+1) - H(t, x, S, q)\right] dN_t^{\delta, -}\\ &=-\phi q^2 dt + H_t dt + \left[H(t, x + (S+\delta^+), S, q-1) - H(t, x, S, q)\right]\left[dM_t^{\delta, +}+ \Lambda_t^{\delta, +}dt\right]\\ &\quad + \left[H(t, x - (S-\delta^-), S, q+1) - H(t, x, S, q)\right]\left[dM_t^{\delta, -}+ \Lambda_t^{\delta, -}dt\right]\\ \mathbb{E}[dR_{t, J}] &= \left[-\phi q^2 + H_t + \lambda^+ e^{-\kappa^+ \delta^+_t}\left[H(t, x + (S+\delta^+), S, q-1) - H(t, x, S, q)\right]\right. \\ &\quad \left.\lambda^- e^{-\kappa^- \delta^-_t}\left[H(t, x - (S-\delta^-), S, q+1) - H(t, x, S, q)\right]\right]dt \end{aligned} $$Combining the continuous and jump parts, and recall that $R_t$ is a martingale for the optimal control, this means we have the HJB equation,
$$ \begin{aligned} 0 &= H_t + \frac{1}{2}\sigma^2 H_{SS} - \phi q^2 \\ &\quad + \lambda^+ \sup_{\delta^+}\left\{{e^{-\kappa^+ \delta^+_t}\left[H(t, x + (S+\delta^+), S, q-1) - H(t, x, S, q)\right]}\right\}\\ &\quad + \lambda^- \sup_{\delta^-}\left\{{e^{-\kappa^- \delta^-_t}\left[H(t, x - (S-\delta^-), S, q+1) - H(t, x, S, q)\right]}\right\}\\ \end{aligned} $$The terminal condition (when $t=T$) is $H(T, x, S, q)= x+q(S-\alpha q)$, since the penalty term vanish.
Since the terminal condition for $H$ is $H(T, x, S, q)= x+q(S-\alpha q)=x + qS - \alpha q^2$, we use the ansatz $H(t, x, S, q)= x+qS+g(t,q)$, with the requirement that $g(T, q)=-\alpha q^2$ so that we recover the terminal condition.
Next we substitute our ansatz into the HJB equation,
$$ \begin{aligned} 0 &= g_t - \phi q^2 \\ &\quad + \lambda^+ \sup_{\delta^+}\left\{{e^{-\kappa^+ \delta^+_t}\left[\delta^+ +g(t, q-1) - g(t, q)\right]}\right\}\\ &\quad + \lambda^- \sup_{\delta^-}\left\{{e^{-\kappa^- \delta^-_t}\left[\delta^- +g(t, q+1) - g(t, q)\right]}\right\}\\ g(T, q) &= -\alpha q^2 \end{aligned} $$To find the supremum, we take first order conditions wrt $\delta^+$ and $\delta^-$ and set them to zero,
$$ \begin{aligned} \frac{\partial}{\partial \delta^+}\left[e^{-\kappa^+ \delta^+_t}\left[\delta^+ +g(t, q-1) - g(t, q)\right]\right]&=0\\ e^{-\kappa^+ \delta^+_t} - \kappa^+ e^{-\kappa^+ \delta^+_t}\left[\delta^+ +g(t, q-1) - g(t, q)\right]&=0\\ e^{-\kappa^+ \delta^+_t}\left[1-\kappa^+\left[ \delta^+ +g(t, q-1) - g(t, q)\right]\right]&=0\\ \delta^+ = \frac{1}{\kappa^+}- \left[g(t, q-1) - g(t, q)\right] \end{aligned} $$The expression for $\delta^-$ is similar, and we can summarize by, saying that the optimal control $\delta^{\pm, *}$ is given by,
$$ \delta^{\pm, *} = \frac{1}{\kappa^\pm}- \left[g(t, q\mp 1) - g(t, q)\right] $$Substituting the expression for the optimal control back into the HJB equation gives,
$$ \begin{aligned} 0 &= g_t - \phi q^2 \\ &\quad + \lambda^+ \left\{{e^{-\kappa^+ (\frac{1}{\kappa^+}- \left[g(t, q- 1) - g(t, q)\right])}\left[\frac{1}{\kappa^+}- \left[g(t, q - 1) - g(t, q)\right] +g(t, q-1) - g(t, q)\right]}\right\}\\ &\quad + \lambda^- \left\{{e^{-\kappa^- (\frac{1}{\kappa^-}- \left[g(t, q+ 1) - g(t, q)\right])}\left[\frac{1}{\kappa^-}- \left[g(t, q+ 1) - g(t, q)\right] +g(t, q+1) - g(t, q)\right]}\right\}\\ &= g_t - \phi q^2 +\frac{\lambda^+}{e \kappa^+}e^{\kappa^+ \left[g(t, q- 1) - g(t, q)\right]}+\frac{\lambda^-}{e \kappa^-}e^{\kappa^- \left[g(t, q+ 1) - g(t, q)\right]} \end{aligned} $$We need to solve for $g(t, q)$ to find the final form of the expression for $\delta^{\pm, *}$.
In general, we need to apply numerical methods for solving the ODE system above. However, for the special case where $\kappa^+=\kappa^-=\kappa$, there is an analytic solution which we will work through below.
Assume $g(t, q) = \frac{1}{\kappa} \log{\omega(t, q)}$ and $\kappa^+=\kappa^-=\kappa$. Substituting into the ODE gives,
$$ \begin{aligned} \frac{1}{\kappa \omega(t, q)} \omega_t(t, q) - \phi q^2+\frac{\lambda^+}{e \kappa}e^{\kappa \left[\frac{1}{\kappa} \log{\omega(t, q-1)} - \frac{1}{\kappa} \log{\omega(t, q)}\right]}+\frac{\lambda^-}{e \kappa}e^{\kappa \left[\frac{1}{\kappa} \log{\omega(t, q+1)} - \frac{1}{\kappa} \log{\omega(t, q)}\right]} &= 0\\ \Rightarrow \frac{1}{\kappa \omega(t, q)} \omega_t(t, q) - \phi q^2+\frac{\lambda^+}{e \kappa}e^{\kappa \left[\frac{1}{\kappa} \log{\frac{\omega(t, q-1)}{\omega(t, q)}}\right]}+\frac{\lambda^-}{e \kappa}e^{\kappa \left[\frac{1}{\kappa} \log{\frac{\omega(t, q+1)}{\omega(t, q)}}\right] } &= 0\\ \Rightarrow \frac{1}{\kappa \omega(t, q)} \omega_t(t, q) - \phi q^2+\frac{\lambda^+}{e \kappa} \frac{\omega(t, q-1)}{\omega(t, q)}+\frac{\lambda^-}{e \kappa} \frac{\omega(t, q+1)}{\omega(t, q)} &= 0\\ \Rightarrow \omega_t(t, q) - \phi \kappa q^2 \omega(t, q)+\lambda^+e^{-1} \omega(t, q-1) +\lambda^-e ^{-1} \omega(t, q+1) &= 0\\ \end{aligned} $$If we impose the bounds $\bar{q} \ge q \ge \underline{q}$, then we can write the following $(\bar{q} - \underline{q} + 1)$-square matrix representing the system of ODEs, $$ \partial_t \underbrace{ \begin{bmatrix} \omega(t, \bar{q})\\ \omega(t, \bar{q}-1)\\ \vdots\\ \vdots\\ \vdots\\ \omega(t, \underline{q}+1)\\ \omega(t, \underline{q})\\ \end{bmatrix}}_{\boldsymbol{\omega(t)}} + \underbrace{ \begin{bmatrix} -\phi \kappa \bar{q}^2 & \lambda^+e^{-1} & 0 & \dots& \dots & 0 \\ \lambda^-e^{-1} & - \phi \kappa (\bar{q}-1)^2 & \lambda^+e^{-1} & 0 &\dots &\vdots\\ 0 & \lambda^-e^{-1} & - \phi \kappa (\bar{q}-2)^2 & \lambda^+e^{-1} & 0 &\vdots \\ \vdots & \vdots &\ddots & \ddots & \ddots &0 \\ \vdots & \vdots & 0 & \lambda^-e^{-1} & - \phi \kappa (\underline{q}+1)^2 & \lambda^+e^{-1} \\ 0&0 & 0 & 0 & \lambda^-e^{-1} &- \phi \kappa \underline{q}^2 \end{bmatrix}}_{\mathbf{A}} \underbrace{\begin{bmatrix} \omega(t, \bar{q})\\ \omega(t, \bar{q}-1)\\ \vdots\\ \vdots\\ \vdots\\ \omega(t, \underline{q}+1)\\ \omega(t, \underline{q})\\ \end{bmatrix}}_{\boldsymbol{\omega(t)}}=0 $$ which can be written as a matrix ODE system, $$ \partial_t \boldsymbol{\omega(t)} + \mathbf{A}\boldsymbol{\omega(t)} = \mathbf{0} $$ The above has standard solution in terms of the matrix exponential, $$ \boldsymbol{\omega(t)} = e^{\mathbf{A}(T-t)}\mathbf{z} $$
At the terminal time $T$, the terminal condition is $g(T, q) = -\alpha q^2$, this implies, $$ \begin{aligned} g(T, q) &= \frac{1}{\kappa} \log{\omega(T, q)} = -\alpha q^2\\ \Rightarrow \omega(T,q) &= e^{-\kappa\alpha q^2} \end{aligned} $$ Therefore, $\mathbf{z}$ is given by, $$ \mathbf{z} = \boldsymbol{\omega(T)} = \begin{bmatrix} e^{-\alpha\kappa \bar{q}^2}\\ e^{-\alpha\kappa (\bar{q}-1)^2}\\ \vdots\\ e^{-\alpha\kappa (\underline{q}+1)^2}\\ e^{-\alpha\kappa \underline{q}^2}\\ \end{bmatrix} $$ Next we turn our attention to the term $e^{\mathbf{A}(T-t)}$. Since $\mathbf{A}$ is a real symmetric matrix, we know that we can perform eigen-decomposition of $\mathbf{A}(T-t)$ into,
$$ \mathbf{A}(T-t)\ = PDP^{-1} $$with $D$ the diagonal matrix with elements being the eigenvalues $\lambda_j$, and $P$ is a matrix whose columns are the corresponding eigenvectors $v_j$.
Finally, we use the standard result, $$ \begin{aligned} e^{\mathbf{A}(T-t)} &= Pe^{D}P^{-1}\\ &=P\begin{bmatrix} e^{\lambda_1}& 0 & \ldots & 0\\ 0 & e^{\lambda_2} & \ldots & 0\\ \vdots & \vdots & \ddots & \vdots\\ 0 & 0 &\ldots & e^{\lambda_{(\bar{q} - \underline{q}+1)}} \end{bmatrix}P^{-1} \end{aligned} $$ which would enable us to work out the final expression for $\boldsymbol{\omega(t)}$.
We can now work through an example of how the solution looks like with the parameters,
import numpy as np
from scipy.sparse import diags
from numpy import linalg as LA
import matplotlib.pyplot as plt
import matplotlib as mpl
%matplotlib inline
np.set_printoptions(linewidth=200)
def compute_analytic( lamda, kappa, phi, alpha, q_max, q_min, T):
n = q_max - q_min + 1 # size of our ODE system
z = np.array( [ np.exp(-alpha * kappa * i**2) for i in range(q_max, q_min-1, -1)] )
diagonals = [
[ -phi * kappa * i**2 for i in range(q_max, q_min-1, -1) ],
[lamda / np.e] * (n-1),
[lamda / np.e] * (n-1),
]
M = diags(diagonals, [0,1,-1])
ts = np.linspace(0, 30, 100)
G = np.zeros((n, len(ts)))
for idx, t in enumerate(ts):
A = M * (T-t)
a, v = LA.eigh(A.todense()) #eigen decomposition.
w_tq = v.dot( diags(np.exp(a)).todense().dot(v.T)).dot(z)
g_tq = np.log(w_tq) / kappa
G[:,idx] = g_tq
del_plus = np.zeros(((n-1), len(ts)))
f, (ax0, ax1) = plt.subplots(1,2,figsize=(14,6))
q_map = dict( (q, i) for i, q in enumerate(range(q_max, q_min-1, -1)))
lookup = lambda q : q_map[q]
ax0.set_ylabel("Sell Depth ($\delta^+$)")
ax0.set_xlabel("Time (secs)")
ax1.set_ylabel("Buy Depth ($\delta^-$)")
ax1.set_xlabel("Time (secs)")
def get_color( q ):
return ((q-q_min) / float( q_max - q_min))
for q in range(q_max, q_min, -1):
del_plus[lookup(q)] = ( G[lookup(q)] - G[lookup(q-1)] ) + ( 1. / kappa )
a = 0.3 if q !=2 else 1.
ax0.plot(ts, del_plus[lookup(q)],label = 'q = %s' %q, lw='4', color=plt.cm.Paired(get_color(q)), alpha=a )
for q in range(q_min, q_max ):
del_plus[lookup(q)-1] = ( G[lookup(q)] - G[lookup(q+1)] ) + ( 1. / kappa )
a = 0.3 if q !=2 else 1.
ax1.plot(ts, del_plus[lookup(q)-1], label = 'q = %s' %q, lw='4', color=plt.cm.Paired(get_color(q)), alpha=a)
ax0.legend()
ax1.legend()
title = '$\kappa^+=\kappa^+={},\lambda^+=\lambda^-={},\phi={},\\alpha={},T={}$'
title = title.format(kappa, lamda, phi, alpha, T)
f.suptitle(title)
plt.show()
lamda = 1.0 # arrival rate
kappa = 100.0 # fill rate constant
phi = 0.0001 # running inventory penalty. try 0.001 to see -ve sell depth
alpha = 0.0001 # penalty for terminal liquidation.
T = 30.0 # terminal time.
q_max = 3 # max inventory level
q_min = -3 # min inventory level
compute_analytic( lamda, kappa, phi, alpha, q_max, q_min, T)
We have seen in the above the special case of the HJB equation with analytic formula. The next thing is to devise a numerical solution using finite difference to compute solutions for which there is no restriction to the values of $\kappa^+=\kappa^-=\kappa$.
Recall once again that we're trying to solve the ODE,
$$ \begin{aligned} \frac{\partial g}{\partial t}(t,q) &- \phi q^2 +\frac{\lambda^+}{e \kappa^+}e^{\kappa^+ \left[g(t, q- 1) - g(t, q)\right]}+\frac{\lambda^-}{e \kappa^-}e^{\kappa^- \left[g(t, q+ 1) - g(t, q)\right]}=0\\ g(T, q) &= -\alpha q^2 \end{aligned} $$To implement a finite difference solution, we define,
And we write a numerical approximation for $g$ as,
$$ g(t_m, q) \approx g(m\Delta t, q) = g^m_q $$Using backward differencing, we have,
$$ \frac{\partial g}{\partial t}(m\Delta t, q) = \frac{g(m\Delta t, q)-g((m-1)\Delta t, q)}{\Delta t} + \mathcal{O}(\Delta t) $$Substituting into the ODE gives,
$$ \begin{aligned} \frac{g^{m}_q-g^{m-1}_q}{\Delta t} - \phi q^2 +\frac{\lambda^+}{e \kappa^+}e^{\kappa^+ \left[g^m_{q-1} - g^m_q\right]}+\frac{\lambda^-}{e \kappa^-}e^{\kappa^- \left[g^m_{q+1} - g^m_q\right]}=0\\ g^{m}_q-g^{m-1}_q - \phi q^2 \Delta t+\frac{\lambda^+}{e \kappa^+}e^{\kappa^+ \left[g^m_{q-1} - g^m_q\right]}\Delta t+\frac{\lambda^-}{e \kappa^-}e^{\kappa^- \left[g^m_{q+1} - g^m_q\right]}\Delta t=0\\ g^{m-1}_q =g^m_q - \phi q^2 \Delta t+\underbrace{\frac{\lambda^+}{e \kappa^+}}_{C_1}e^{\kappa^+ \left[g^m_{q-1} - g^m_q\right]}\Delta t+\underbrace{\frac{\lambda^-}{e \kappa^-}}_{C_2}e^{\kappa^- \left[g^m_{q+1} - g^m_q\right]}\Delta t \end{aligned} $$So we just need to solve the above from time $m\Delta t= T$ back to $t=0$, where the terminal condition is given by $g^M_q = -\alpha q^2$.
def compute_finite_difference( lamda_p, lamda_m, kappa_p, kappa_m, alpha, q_max, q_min, T):
M = 100
dt = T / M
C1 = lamda_p / (np.e * kappa_p)
C2 = lamda_m / (np.e * kappa_m)
q_max = 3 # max inventory level
q_min = -3 # min inventory level
n = q_max - q_min + 1 # size of our ODE system
# terminal conditions.
z = np.array([ -alpha * i**2 for i in range(q_max, q_min-1, -1)])
q_map = dict( (q, i) for i, q in enumerate(range(q_max, q_min-1, -1)))
lookup = lambda q : q_map[q]
G = np.zeros((n, M+1))
G[:,-1] = z
for idx in range(M, 0, -1):
gm_prev = np.zeros(n)
gm = G[:,idx]
for q in range(q_max, q_min-1, -1):
if q == q_max:
gm_prev[lookup(q)] = gm[lookup(q)] + (- (phi*q**2)
+ C1 * np.exp( kappa_p*( gm[lookup(q-1)]-gm[lookup(q)] ) ) ) * dt
elif q == q_min:
gm_prev[lookup(q)] = gm[lookup(q)] + (- (phi*q**2)
+ C2 * np.exp( kappa_m*( gm[lookup(q+1)]-gm[lookup(q)] ) ) ) * dt
else:
gm_prev[lookup(q)] = gm[lookup(q)] + (- (phi*q**2)
+ C1 * np.exp(kappa_p*(gm[lookup(q-1)]-gm[lookup(q)]))
+ C2 * np.exp(kappa_m*(gm[lookup(q+1)]-gm[lookup(q)])))*dt
G[:,idx-1] = gm_prev
ts = np.linspace(0, T, M+1)
del_plus = np.zeros_like(G)
f, (ax0, ax1) = plt.subplots(1,2,figsize=(14,6))
def get_color( q ):
return ((q-q_min) / float( q_max - q_min))
ax0.set_ylabel("Sell Depth ($\delta^+$)")
ax0.set_xlabel("Time (secs)")
ax1.set_ylabel("Buy Depth ($\delta^-$)")
ax1.set_xlabel("Time (secs)")
for q in range(q_max, q_min, -1):
del_plus[lookup(q)] = ( G[lookup(q)] - G[lookup(q-1)] ) + ( 1. / kappa )
a = 0.3 if q !=2 else 1.
ax0.plot(ts, del_plus[lookup(q)], label = 'q = %s' %q, lw='4', color=plt.cm.Paired(get_color(q)), alpha=a)
for q in range(q_min, q_max ):
del_plus[lookup(q)-1] = ( G[lookup(q)] - G[lookup(q+1)] ) + ( 1. / kappa )
a = 0.3 if q !=2 else 1.
ax1.plot(ts, del_plus[lookup(q)-1], label = 'q = %s' %q, lw='4', color=plt.cm.Paired(get_color(q)), alpha=a)
ax0.legend()
ax1.legend()
title1 = '$\kappa^+=\kappa^+={},\lambda^+=\lambda^-={},\phi={},\\alpha={},T={}$'
title = title1.format(kappa, lamda, phi, alpha, T)
f.suptitle(title)
plt.show()
lamda_p = lamda_m = 1.0 # arrival rate
kappa_p = kappa_m = 100. # fill rate constant
phi = 0.0001 # running inventory penalty. try 0.001 to see -ve sell depth
alpha = 0.0001 # penalty for terminal liquidation.
q_max = 3 # max inventory level
q_min = -3 # min inventory level
T = 30.0 # terminal time.
compute_finite_difference( lamda_p, lamda_m, kappa_p, kappa_m, alpha, q_max, q_min, T)
As we can see from the above, the finite difference solutions exactly matches that of the analytic solution with the same parameters.
We can next use the finite difference solution and choose different values for $\kappa^+$ and $\kappa^-$ to see how they affect the solutions.
We can now investigate the effects of how changing the different parameters influence the behavior of the solutions.
Let's try for the case where the arrival rates $\lambda^{\pm}$ is different between the bid and ask side.
lamda_p = 0.75 # buy arrival rate
lamda_m = 1.1 # sell arrival rate
kappa_p = kappa_m = 100. # fill rate constant
phi = 0.0001 # running inventory penalty. try 0.001 to see -ve sell depth
alpha = 0.0001 # penalty for terminal liquidation.
q_max = 3 # max inventory level
q_min = -3 # min inventory level
T = 30.0 # terminal time.
compute_finite_difference( lamda_p, lamda_m, kappa_p, kappa_m, alpha, q_max, q_min, T)
lamda_p = 1.0 # buy arrival rate
lamda_m = 1.0 # sell arrival rate
kappa_p = 150.
kappa_m = 50. # fill rate constant
phi = 0.0001 # running inventory penalty. try 0.001 to see -ve sell depth
alpha = 0.0001 # penalty for terminal liquidation.
q_max = 3 # max inventory level
q_min = -3 # min inventory level
T = 30.0 # terminal time.
compute_finite_difference( lamda_p, lamda_m, kappa_p, kappa_m, alpha, q_max, q_min, T)
lamda_p = 1.0 # buy arrival rate
lamda_m = 1.0 # sell arrival rate
kappa_p = 150.
kappa_m = 50. # fill rate constant
phi = 0.0001 # running inventory penalty. try 0.001 to see -ve sell depth
alpha = 0.001 # penalty for terminal liquidation.
q_max = 3 # max inventory level
q_min = -3 # min inventory level
T = 30.0 # terminal time.
compute_finite_difference( lamda_p, lamda_m, kappa_p, kappa_m, alpha, q_max, q_min, T)
lamda_p = 1.0 # buy arrival rate
lamda_m = 1.0 # sell arrival rate
kappa_p = 150.
kappa_m = 50. # fill rate constant
phi = 0.0005 # running inventory penalty. try 0.001 to see -ve sell depth
alpha = 0.0001 # penalty for terminal liquidation.
q_max = 3 # max inventory level
q_min = -3 # min inventory level
T = 30.0 # terminal time.
compute_finite_difference( lamda_p, lamda_m, kappa_p, kappa_m, alpha, q_max, q_min, T)