Simulation created successfully
December 5th, 2025
As a first approach, I decompose the Bias in a setting with sample selection.
It’s a simpler setting, which allows us to establish some initial work from which to build up to more complex settings.
I work under standard SUTVA, consistency and positivity assumptions.
Population of individuals:
\(i = 1,2,\ldots,N\)
Binary treatment:
\(\mathcal{D} = \{0,1\}\)
\(Y_{0i}\): outcome for individual \(i\) under no treatment
\(Y_{1i}\): outcome for individual \(i\) under treatment
Only one is observed:
\[ Y_i = D_i\,Y_{1i} + (1-D_i)\,Y_{0i} \]
Individual treatment effect: \[ \tau_i = Y_{1i} - Y_{0i} \]
Target causal parameter: \[ ATE = \mathbb{E}[\tau_i] = \mathbb{E}[Y_{1i} - Y_{0i}] \]
Sample estimator for \(ATE\) in a randomized setting:
\[ \widehat{ATE} = \frac{1}{n_1}\sum_{i=1}^n D_i Y_i \;-\; \frac{1}{n_0}\sum_{i=1}^n (1 - D_i) Y_i \]
Where:
Individuals belong to an unobserved group:
\[ \mathcal{G} = \{L, S\} \]
Interpretation:
Define the average effect within each latent group:
\[ \tau_g = \mathbb{E}[Y_{1i} \mid G_i = g] - \mathbb{E}[Y_{0i} \mid G_i = g] \]
Stack into a vector:
\[ \tau = \begin{bmatrix} \tau_L \\ \tau_S \end{bmatrix}, \qquad \tau_L > \tau_S \]
Define the population distribution of latent groups:
\[ \lambda = \begin{bmatrix} \lambda_L \\ \lambda_S \end{bmatrix}, \qquad \lambda_L + \lambda_S = 1 \]
Interpretation:
\(\lambda_g = P(G_i = g)\) in the full population.
Define the sample distribution of latent groups:
\[ \gamma = \begin{bmatrix} \gamma_L \\ \gamma_S \end{bmatrix}, \qquad \gamma_L + \gamma_S = 1 \]
Interpretation:
\(\gamma_g = P(G_i = g)\) among sampled individuals.
Even if treatment is perfectly randomized within the sample, we may get:
\[ \gamma \neq \lambda \]
With heterogeneous effects \(\tau_L \neq \tau_S\),
this mismatch becomes the key driver of bias
from the naïve difference-in-means estimator.
Start from the definition based on potential outcomes:
\[ ATE := \mathbb{E}[Y_{1i}] - \mathbb{E}[Y_{0i}] \]
Use the law of total expectation over latent groups:
\[ \begin{aligned} ATE &= \lambda_L \cdot \big(\mathbb{E}[Y_{1i} \mid G_i = L] - \mathbb{E}[Y_{0i} \mid G_i = L]\big) \\ &\quad + \lambda_S \cdot \big(\mathbb{E}[Y_{1i} \mid G_i = S] - \mathbb{E}[Y_{0i} \mid G_i = S]\big) \end{aligned} \]
Recognize the group-specific treatment effects:
\[ \tau_L = \mathbb{E}[Y_{1i} \mid G_i = L] - \mathbb{E}[Y_{0i} \mid G_i = L] \] \[ \tau_S = \mathbb{E}[Y_{1i} \mid G_i = S] - \mathbb{E}[Y_{0i} \mid G_i = S] \]
Then the ATE becomes:
\[ ATE = \lambda_L \tau_L + \lambda_S \tau_S = \lambda^T \tau \]
We define
\[ n_{dg} = \sum_{i=1}^n \mathbf{1}\{D_i = d,\; G_i = g\} \;,\; \forall (d, g) \in \mathcal{D, G} \]
These satisfy: \[ n_{1L} + n_{1S} = n_1, \qquad n_{0L} + n_{0S} = n_0 \]
Define the subgroup gammas:
\[ \gamma_{dg} = \frac{n_{dg}}{n_d} \]
Interpretation:
\(\gamma_{dg}\) is the fraction of treated (or untreated) individuals who belong to group \(g\).
In expectation: \(\mathbb{E}[\gamma_{dg}] = P(G=g \mid D=d)\)
Because treatment is randomized a single iteration won’t see: \[ \gamma_{1g} = \gamma_g, \qquad \gamma_{0g} = \gamma_g \]
But, in expectation, random assignment reproduces \(\gamma_g\):
\[ \mathbb{E}[\gamma_{1g}] = \mathbb{E}[\gamma_{0g}] = \gamma_g \]
Therefore, when computing \(\mathbb{E}[\widehat{ATE}]\), we can safely replace \(\gamma_{dg}\) with \(\gamma_g\).
We start from the difference-in-means estimator for the ATE and, after taking expectations, we reach the following expression for it:
\[ \begin{aligned} \mathbb{E}[\widehat{ATE}] &= \gamma_L \big(\mathbb{E}[Y_{1i} \mid G_i = L] - \mathbb{E}[Y_{0i} \mid G_i = L]\big) \\ &\quad + \gamma_S \big(\mathbb{E}[Y_{1i} \mid G_i = S] - \mathbb{E}[Y_{0i} \mid G_i = S]\big) \\[4pt] &= \gamma_L \tau_L + \gamma_S \tau_S = \gamma^T \tau \end{aligned} \]
\[ \begin{aligned} Bias &= \mathbb{E}[\widehat{ATE}] - ATE \\ &= \gamma^T \tau - \lambda^T \tau \\ &= (\gamma_L - \lambda_L)(\tau_L - \tau_S) \end{aligned} \]
The extrapolation of the previous expression into k Groups is not too complex, and it results in the following expression:
\[ Bias = \sum_{g=2}^k (\gamma_g - \lambda_g)(\tau_g - \tau_1) \]
What changes:
Key consequence:
\[ \mathbb{E}[\gamma_{1g}] \neq \mathbb{E}[\gamma_{0g}], \qquad g \in \mathcal{G} \]
\[ \delta = P(D = 1) = \sum_{g \in \mathcal{G}} \text{P}(D = 1 \mid G = g) \cdot \text{P}(G = g) = \sum_{g \in \mathcal{G}} \pi_g \lambda_g \]
where: \[ \pi_g = P(D = 1 \mid G = g) \]
\[ \begin{aligned} \pi_g &= P(D = 1 \mid G = g) = \frac{P(G = g \mid D = 1)\,P(D = 1)}{P(G = g)} \\ &= \mathbb{E}[\gamma_{1g}] \cdot \frac{\delta}{\lambda_g} \end{aligned} \]
hence
\[ \mathbb{E}[\gamma_{1g}] = \pi_g \cdot \frac{\lambda_g}{\delta} \]
\[ \begin{aligned} \mathbb{E}[\gamma_{0g}] &= P(G = g \mid D = 0) \\ &= \frac{P(G = g) - P(G = g \mid D = 1)\,P(D = 1)}{P(D = 0)} \\ [4pt] &= \frac{\lambda_g - \mathbb{E}[\gamma_{1g}]\,\delta}{1 - \delta} = (1 - \pi_g)\,\frac{\lambda_g}{1 - \delta} \end{aligned} \]
Even though treatment is now selected (not randomized), the population ATE definition does not change:
\[ ATE := \mathbb{E}[Y_{1i}] - \mathbb{E}[Y_{0i}] \]
So the true ATE remains:
\[ ATE = \lambda^T \tau \]
For clarity, define:
\[ \mu_{dg} := \mathbb{E}[Y_{di} \mid G_i = g], \qquad d \in \mathcal{D},\; g \in \mathcal{G} \]
Interpretation:
So the group-specific treatment effect: becomes
\[ \tau_g = \mu_{1g} - \mu_{0g} \]
Equivalently:
\[ \mu_{1g} = \mu_{0g} + \tau_g \]
Under treatment selection, the bias of the naïve difference-in-means estimator is:
\[ \begin{aligned} Bias &= \mathbb{E}[\widehat{ATE}] - ATE \\[4pt] &= \lambda_L \left[ \left(\frac{\pi_L}{\delta} - 1\right)(\tau_L - \tau_S) \;+\; \frac{\pi_L - \delta}{\delta(1 - \delta)}(\mu_{0L} - \mu_{0S}) \right] \end{aligned} \]
In this case, in the k Group setting, with the same configuration as before, we recover:
\[ Bias = \sum_{g=2}^k \lambda_g \left[ \left(\frac{\pi_g}{\delta} - 1\right)(\tau_g - \tau_1) \;+\; \frac{\pi_g - \delta}{\delta(1 - \delta)}(\mu_{0g} - \mu_{01}) \right] \]
We now introduce time with two periods: T = {1, 2}
Potential outcomes are now indexed by:
\[ Y_{dit} \quad \text{for } d \in \mathcal{D},\; t \in T \]
Main differences vs Section 2:
Standard DiD Parallel Trends focuses on treated vs untreated:
\[ \begin{aligned} &\mathbb{E}[Y_{0i2} \mid D_i = 1] - \mathbb{E}[Y_{0i1} \mid D_i = 1] \\ &= \mathbb{E}[Y_{0i2} \mid D_i = 0] - \mathbb{E}[Y_{0i1} \mid D_i = 0] \end{aligned} \]
In our setting, TE differ by latent group, so we need a different PT:
\[ \begin{aligned} &\mathbb{E}[Y_{0i2} \mid G_i = L] - \mathbb{E}[Y_{0i1} \mid G_i = L] \\ &= \mathbb{E}[Y_{0i2} \mid G_i = S] - \mathbb{E}[Y_{0i1} \mid G_i = S] \end{aligned} \]
Target parameter: the Average Treatment Effect on the Treated:
\[ ATT := \mathbb{E}[Y_{1i2} \mid D_i = 1] - \mathbb{E}[Y_{0i2} \mid D_i = 1] \]
We obtain the DiD representation:
\[ \begin{aligned} ATT &= \big( \mathbb{E}[Y_{1i2} \mid D_i = 1] - \mathbb{E}[Y_{1i1} \mid D_i = 1] \big) \\ &\quad- \big( \mathbb{E}[Y_{0i2} \mid D_i = 0] - \mathbb{E}[Y_{0i1} \mid D_i = 0] \big) \end{aligned} \]
As before, define:
\[ \mu_{dtg} := \mathbb{E}[Y_{dit} \mid G_i = g] \;,\; \forall (d, t, g) \in \mathcal{D,}\,T\mathcal{,G} \]
Let \(\tau_g\) be the group-specific treatment effect in period 2:
\[ \tau_g = \mathbb{E}[Y_{1i2} \mid G_i = g] - \mathbb{E}[Y_{0i2} \mid G_i = g] \]
\[ ATT = \sum_{g \in \mathcal{G}} \frac{\pi_g}{\delta}\,\lambda_g\,\tau_g \]
Define sample averages for treated/untreated by period:
\[ \bar{Y}_{dt} := \frac{1}{n_d} \sum_{i=1}^n Y_{it}\,\mathbf{1}\{D_i = d\} \;,\; \forall (d, t) \in \mathcal{D},\,T \]
The standard 2-period DiD estimator:
\[ \widehat{ATT} := (\bar{Y}_{12} - \bar{Y}_{11}) - (\bar{Y}_{02} - \bar{Y}_{01}) \]
\[ \mathbb{E}[\widehat{ATT}] = \sum_{g \in \mathcal{G}} \frac{\pi_g}{\delta}\,\lambda_g\,\tau_g \]
Bias of the DiD estimator: Bias = 0
Interpretation:
Simulation created successfully
| X | G | D | Y | Y0 | Y1 | True_TE | All | GxD | |
|---|---|---|---|---|---|---|---|---|---|
| 0 | 12.00 | L | 1 | 103.54 | 33.54 | 103.54 | 70.0 | All | L_T |
| 1 | 9.00 | L | 1 | 92.93 | 22.93 | 92.93 | 70.0 | All | L_T |
| 2 | 14.09 | L | 0 | 43.00 | 43.00 | 113.00 | 70.0 | All | L_U |
| 3 | 18.13 | L | 0 | 61.03 | 61.03 | 131.03 | 70.0 | All | L_U |
| 4 | 21.10 | S | 0 | 26.00 | 26.00 | 27.00 | 1.0 | All | S_U |