by Pilot Sampling and One-Step Updating
-
Spark >= 2.3.1 -
Python >= 3.7.0pyarrow >= 0.15.0Please read this Compatibility issue with Spark 2.3.x or 2.4.xstatsmodels >= 0.12.0
-
See
setup.pyfor detailed requirements.
- Zip the code into a portable package (a zipped file
dqr.zipwill be placed into theprojectsfolder)
make zip- Run the project on a Spark platform
PYSPARK_PYTHON=/usr/local/bin/python3.7 \
spark-submit --py-files projects/dqr.zip \
projects/dqr_spark.pyYou could also build the code into standard Python module and deploy to Spark clusters.
python setup.py bdist-
Contributed by @edwardguo61
-
The required
Rversion:3.5.1 -
Files:
dqr/Restimator.R: one-shot estimation and one-step estimation for distributed quantile regressiondqr/R/simulator.R: simulation functions to generate random/non-random datadqr/R/uilts.R: other functions usedprojects/dqr_demo.R: generate data, conduct estimation and generate plot. Please rundqr_demo.Rto see how to use the functions.
- Rui Pan, Tunan Ren, Baishan Guo, Feng Li, Guodong Li and Hansheng Wang (2021). A Note on Distributed Quantile Regression by Pilot Sampling and One-Step Updating, Journal of Business and Economic Statistics. (in press).