Regression Discontinuity Design with Distribution-Valued Outcomes (R3D)

Fits a regression discontinuity design with distribution-valued outcomes (random distributions), as developed in Van Dijcke (2025) , using local polynomial regression on quantiles (method="simple") or local Fréchet regression (method="frechet"). Supports both sharp and fuzzy designs.

r3d(
  X,
  Y_list,
  T = NULL,
  cutoff = 0,
  method = c("simple", "frechet"),
  p = 2,
  q_grid = seq(0.01, 0.99, 0.01),
  fuzzy = FALSE,
  kernel_fun = c("epanechnikov", "triangular", "uniform"),
  s = 1,
  bandwidths = NULL,
  boot = FALSE,
  boot_reps = 200,
  boot_cores = 1,
  alpha = 0.05,
  test = c("none", "nullity", "homogeneity", "gini"),
  test_ranges = NULL,
  coverage = FALSE,
  weights = NULL,
  ...
)

Arguments

X

Numeric vector of running variable values (length \(n\)).

Y_list

A list of length \(n\), where each element is a numeric vector representing the sample from the outcome distribution of one unit.

T

(Optional) For a fuzzy design: numeric or logical vector of length \(n\) giving partial treatment status (0 or 1). If fuzzy=FALSE, this can be omitted.

cutoff

Numeric scalar: the threshold for treatment assignment. The design treats \(X \ge cutoff\) as above the cutoff.

method

Either "simple" or "frechet":

"simple": Local polynomial regression is done pointwise for each quantile, and then optionally rearranged to maintain monotonicity.
"frechet": A global local Fréchet regression approach is done, projecting the entire curve onto quantile functions.

p

Integer specifying the local polynomial order (e.g., 2 for local quadratic).

q_grid

Numeric vector of quantiles at which to estimate the distributional treatment effect.

fuzzy

Logical flag for fuzzy design. Default FALSE for a sharp design.

kernel_fun

Name of the kernel function. Possible choices are "epanechnikov", "triangular", or "uniform". Defaults to "triangular".

s

Integer expansion order in the pilot local polynomial step used for bandwidth selection. Defaults to 1.

bandwidths

Optional user-specified bandwidths. If NULL (default), bandwidths are automatically selected via r3d_bwselect. For a sharp design, can be either a single numeric value or a vector of length length(q_grid). For a fuzzy design, must be a list with two components: num (for the outcome equation) and den (for the first-stage equation). Where num can be a scalar or vector of length length(q_grid), and den must be a scalar.

boot

Logical. If TRUE, automatically runs r3d_bootstrap after estimation to produce uniform confidence bands and tests.

boot_reps

Integer, number of bootstrap draws if boot=TRUE.

boot_cores

Number of CPU cores for parallelizing the bootstrap. Default 1.

alpha

Significance level for the uniform confidence bands. Default 0.05.

test

Character vector of tests to perform. Options include: "none", "nullity", "homogeneity", or "gini", or a vector combining multiple test types. When "none", no tests are performed. When "nullity", tests the null hypothesis that the treatment effect is zero across all quantiles. When "homogeneity", tests the null hypothesis that the treatment effect is constant across all quantiles. When "gini", tests the null hypothesis that there is no difference in the Gini coefficients of the distributions above and below the cutoff.

test_ranges

List of numeric vectors defining the quantile ranges for testing. Each element should be a vector of length 2 or more defining the ranges. For example, list(c(0.25, 0.75)) to test on quantiles between 0.25 and 0.75, or list(c(0.25, 0.5, 0.75)) to test on ranges \[0.25, 0.5\] and \[0.5, 0.75 \]. If NULL (default), tests are performed on the entire q_grid.

coverage

Logical indicating whether to apply the coverage correction rule of thumb of Calonico et al. (2018) . Default is FALSE.

weights

Optional numeric vector of weights for each observation that will be used to compute weighted quantiles. Defaults to unweighted.

...

Additional arguments passed to the bandwidth selection function r3d_bwselect.

Value

An S3 object of class "r3d", containing (among others):

tau: Estimated distributional treatment effect \(\tau(q)\) for each \(q\) in q_grid.
q_grid: The quantile grid used.
bandwidths: The bandwidths used (either user-provided or MSE-/IMSE-optimal).
w_plus, w_minus: The internal local polynomial weights on the plus/minus side.
e1_mat, e2_mat: Residual matrices used for the multiplier bootstrap.
boot_out: If boot=TRUE, a list of bootstrap results from r3d_bootstrap.

Details

This is the main user-facing function for the R3D approach. Internally, it:

Selects bandwidth(s) via r3d_bwselect (if bandwidths=NULL) or uses user-provided bandwidths,
Computes local polynomial fits on each side of the cutoff for each quantile in q_grid,
For method="frechet", performs a projection of the fitted quantile curves onto the space of valid quantile functions (via isotonic regression),
(Optionally) runs the multiplier bootstrap for uniform inference.

When test includes one or more test types and test_ranges is specified, the function will perform the specified tests on each range in test_ranges. For example, if test=c("nullity", "homogeneity") and test_ranges=list(c(0.25, 0.75), c(0.1, 0.9)), four tests will be performed: nullity and homogeneity tests on the range \[0.25, 0.75\], and nullity and homogeneity tests on the range \[0.1, 0.9\].

The Gini test (test="gini") examines whether there is a statistically significant difference in inequality (as measured by the Gini coefficient) between the distributions above and below the cutoff. It uses the estimated conditional quantile functions to calculate the Gini coefficients.

References

Calonico S, Cattaneo MD, Farrell MH (2018). “On the effect of bias estimation on coverage accuracy in nonparametric inference.” Journal of the American Statistical Association, 113(522), 767–779.

Van Dijcke D (2025). “Regression Discontinuity Design with Distribution-Valued Outcomes.” Working paper.

Examples

if (FALSE) { # \dontrun{
  # Simulate data
  set.seed(123)
  n <- 100
  X <- runif(n, -1, 1)
  Y_list <- lapply(seq_len(n), function(i) {
    # Suppose distribution is Normal with mean depending on X
    rnorm(sample(30:50,1), mean=2 + 2*(X[i]>=0))
  })
  T <- as.numeric(X >= 0) * rbinom(n, 1, 0.8)  # Fuzzy treatment

  # Example 1: Test nullity and homogeneity on full range
  fit1 <- r3d(X, Y_list, T=T, cutoff=0,
              method="frechet", p=2, fuzzy=TRUE,
              boot=TRUE, boot_reps=200, alpha=0.05, 
              test=c("nullity", "homogeneity"))

  # Example 2: Test Gini coefficient difference
  fit2 <- r3d(X, Y_list, cutoff=0, method="simple", 
              boot=TRUE, test="gini")
              
  # Example 3: Test on multiple ranges
  fit3 <- r3d(X, Y_list, cutoff=0, 
              boot=TRUE, test=c("nullity", "homogeneity", "gini"), 
              test_ranges=list(c(0.25, 0.5, 0.75)))

  # Inspect results
  print(fit1)
  summary(fit1)
  plot(fit1)
} # }