Distributional Synthetic Controls Estimation • DiSCos

The DiSCos package contains tools for computing counterfactual quantile functions in a Distributional Synthetic Controls (DiSco) setting, following the method proposed in Gunsilius (2023).

Getting Started

Have a look at the vignette replicating the empirical application in the paper to get started.

Installation

To install the latest stable version, run

install.packages("DiSCos")

You can install latest development version from GitHub with:

devtools::install_github("Davidvandijcke/DiSCos")

If you find any bugs or have any questions, please email dvdijcke@umich.edu.

If you are using this package in your research, please cite: Gunsilius, Florian F. 2023. “Distributional Synthetic Controls.” Econometrica 91 (3): 1105–17. Van Dijcke, David, Florian Gunsilius, and Austin Wright. “Return to Office and the Tenure Distribution.” arXiv preprint arXiv:2405.04352 (2024).

FAQ

Q: Why does the code sometimes run slower than expected?
A: The approach in DiSCo is non-parametric and requires integrating over entire distributions, which naturally increases computational complexity. In particular, it can be costly for large datasets or large values of M and G.

Q: Is there a faster backend?
A: The optimization relies on libraries (e.g., “quadprog” or “CVXR”) that ultimately call C++ or FORTRAN under the hood. Even though these libraries are optimized, fully non-parametric distance-matching remains computationally heavy.

Q: How can I reduce the runtime while developing or debugging?
A: - Use fewer quantile points by setting M = 100 in DiSCo.
- Set a smaller grid, e.g., G = 100, to reduce evaluation points.
- Start with only two time periods (Ts = 2).
- Disable time-consuming features, such as confidence intervals (CI=FALSE) and permutation tests (permutation=FALSE).
- Increase parallel execution with num.cores in DiSCo, for example, by using a cluster or multiple cores to split bootstrap/permutation tasks.

Q: Are there other suggestions for handling large real-world data?
A: - Verify correctness with a minimal working example, then gradually increase M, G, and the number of periods.
- Use discrete or categorical grid options (grid.cat) if suitable for your data to reduce the dimension of the problem.
- Ensure memory and CPU availability align with the size of the task.

Q: Can I get a progress bar?
A: When the package was first implemented, there were no functional solutions for progress bars with parallel computation in R. This may have changed—if you’d like to contribute, feel free to become that hero 🦸 and either enhance this repo or create your own parallel progress bar package!