MDI508 S25 Cheat Sheet

Notation and Convention

Deterministic quantities

Deterministic scalar quantities: lowercase letters. E.g., $x, y, \theta$.
Deterministic vectors: bold lowercase letters. E.g. $\mathbf x, \mathbf y, \boldsymbol \theta$.
- On the blackboard, we shall instead write an arrow overhead, e.g., $\vec \theta$.
- If we wish to emphasize the individual entries, we shall write
  
  $$ \mathbf x = (x_1, x_2, ..., x_n) $$
  
  That is, we shall use the unbolded letter to denote entries.
Deterministic matrices: underlined uppercase letters. E.g. $\underline A, \underline \Sigma$. If we wish to emphasize the individual entries, we shall write

$$ ⁍. $$

where $\sigma_{i,j}$ is the entry of the $i$-th row and $j$-th column. In particular, we shall use the lower-case letter to denote the entry of a matrix.

Random quantities

Random variables: uppercase letters. E.g., $X, Y, \Theta$.
Random vectors: bold uppercase letters. E.g., $\mathbf X, \mathbf Y, \boldsymbol \Theta$.
- We shall write an arrow overhead on the blackboard, e.g., $\vec \Theta$.
- If we wish to emphasize the individual entries, we shall write
  
  $$ ⁍ $$
  
  That is, we shall use the unbolded letter to denote entries.
We will rarely encounter random matrices, but if we do, we shall use underlined uppercase letters as in the deterministic case.
- However, we shall write their entries with non-underlined capital letters to indicate that the entries are random variables. That is, if $\underline A$ is a random matrix, then we will denote its entries as
  
  $$ ⁍ $$

Data

Data often consists of input/output pairs that we shall view as specifications of an experiment (experimental inputs, designs, or descriptors) and noisy experimental responses or observations.
- Inputs are row vectors $\mathbf x = (x_1, …, x_d)$.
We can write a data set of $n$ such pairs as $\mathcal D = \left\{(\mathbf x_i, y_i)\right\}_{i=1}^n$, where $\mathbf x_i$ is a specification of experimental inputs $\mathbf x_i$ and the corresponding scalar responses $\mathbf y_i$.
- We often collect the responses into a response vector
  
  $$ \mathbf y = \begin{pmatrix} y_1 \\ y_2 \\ \vdots \\y_n \end{pmatrix}, $$
  
  and data/design matrix
  
  $$ \underline X = \begin{pmatrix} - & \mathbf x_1 & - \\ - & \mathbf x_2 & - \\ & \vdots & \\ - &\mathbf x_n & - \end{pmatrix} $$
We often model this data using a model $g$, which could be parametric with parameters $\boldsymbol \Theta$, or nonparametric. We write $g(\underline X, \boldsymbol \Theta)$ or $g(\underline X)$ to denote the vector of model outputs

$$ g(\underline X, \boldsymbol \Theta) = \begin{pmatrix} g(\mathbf x_1, \boldsymbol \Theta) \\ g(\mathbf x_2, \boldsymbol \Theta) \\ \vdots \\g(\mathbf x_n, \boldsymbol \Theta)\end{pmatrix},~~~g(\underline X) = \begin{pmatrix}g(\mathbf x_1) \\ g(\mathbf x_2) \\ \vdots \\ g(\mathbf x_n)\end{pmatrix} $$
Specifically, we model the noisy response using the random vector

$$ \mathbf Y = g(\underline X, \boldsymbol \Theta) + \mathbf W $$

where $\mathbf W = (W_1, …, W_n)$ are $n$ independent and identically distributed additive Gaussian noise terms $W_i \sim \mathcal N(0, \sigma^2_W)$.

$\LaTeX$ Examples

$\\mathbf X \\sim \\mathcal N(\\boldsymbol \\mu, \\Sigma)$

$$ \mathbf X \sim \mathcal N(\boldsymbol \mu,\Sigma) $$

\\begin{align*}
y 
&= f(\\mathbf x) + W \\\\
&= \\theta_1 \\varphi_1(\\mathbf x) + \\cdots + 
   \\theta_n \\varphi_n(\\mathbf x) + W \\\\
&= \\sum_{i=1}^n \\theta_i \\varphi_i(\\mathbf x) + W
\\end{align*}

$$ \begin{align*} y &= f(\mathbf x) + W \\ &= \theta_1 \varphi_1(\mathbf x) + \cdots + \theta_n \varphi_n(\mathbf x)+W \\ &= \sum_{i=1}^n \theta_i \varphi_i(\mathbf x) + W\end{align*} $$

$$
A = 
\\begin{pmatrix}
1 & 2 & 3 \\\\
3 & 4 & 5 \\\\
8 & 9 & 10
\\end{pmatrix}
$$

$$ A = \begin{pmatrix} 1 & 2 & 3 \\ 3 & 4 & 5 \\ 8 & 9 & 10 \end{pmatrix} $$