Skip to content

Blazingly fast implementation of the Datasaurus paper. Same Stats, Different Graphs.

License

Notifications You must be signed in to change notification settings

araffin/datasaurust

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

39 Commits
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

CI

DatasauRust

Blazingly fast implementation of the Datasaurus paper (500x faster than the original): "Same Stats, Different Graphs: Generating Datasets with Varied Appearance and Identical Statistics through Simulated Annealing" by Justin Matejka and George Fitzmaurice.

datasaurust.mp4

Usage

To run with plot -p (using gnuplot):

cargo run --release -- -d data/seed_datasets/Datasaurus_data.csv -p

With pre-defined shape:

cargo run --release -- -p -n 3000000 --decimals 2 --shape cat --allowed-distance 0.1

Starting from Gaussian noise:

cargo run --release -- -p -n 3000000 --decimals 2 --shape cat --allowed-distance 0.1 --gaussian

Create videos

Create video and gif (use --save-plot):

pip install moviepy ffmpeg-python

python scripts/create_video.py logs/cat/ logs/cat.mp4

From one shape to another:

cargo run --release -- -p -n 2000000 --decimals 1 --shape dog --allowed-distance 0.1 --log-interval 10000 -d logs/gaussian_cat/output.csv --save-plots

Note: The original datasets and python code comes from http://www.autodeskresearch.com/papers/samestats

About

Blazingly fast implementation of the Datasaurus paper. Same Stats, Different Graphs.

Resources

License

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published