Data::ExampleDatasets Raku package

Raku package for (obtaining) example datasets.

Currently, this repository contains only datasets metadata. The datasets are downloaded from the repository Rdatasets, [VAB1].

Usage examples

Setup

Here we load the Raku modules Data::Generators, Data::Summarizers, and this module, Data::ExampleDatasets:

use Data::Reshapers;
use Data::Summarizers;
use Data::ExampleDatasets;

# (Any)

Get a dataset by using an identifier

Here we get a dataset by using an identifier and display part of the obtained dataset:

my @tbl = example-dataset('Baumann', :headers);
say to-pretty-table(@tbl[^6]);

# +-------+-------------+-----------+-------------+----------+-----------+-------------+
# | group | post.test.2 | pretest.1 | post.test.3 | rownames | pretest.2 | post.test.1 |
# +-------+-------------+-----------+-------------+----------+-----------+-------------+
# | Basal |      4      |     4     |      41     |    1     |     3     |      5      |
# | Basal |      5      |     6     |      41     |    2     |     5     |      9      |
# | Basal |      3      |     9     |      43     |    3     |     4     |      5      |
# | Basal |      5      |     12    |      46     |    4     |     6     |      8      |
# | Basal |      9      |     16    |      46     |    5     |     5     |      10     |
# | Basal |      8      |     15    |      45     |    6     |     13    |      9      |
# +-------+-------------+-----------+-------------+----------+-----------+-------------+

Here we summarize the dataset obtained above:

records-summary(@tbl)

# +-------------+--------------------+--------------------+----------------+--------------------+--------------------+---------------------+
# | group       | pretest.1          | post.test.1        | rownames       | pretest.2          | post.test.2        | post.test.3         |
# +-------------+--------------------+--------------------+----------------+--------------------+--------------------+---------------------+
# | Basal => 22 | Min    => 4        | Min    => 1        | Min    => 1    | Min    => 1        | Min    => 0        | Min    => 30        |
# | Strat => 22 | 1st-Qu => 8        | 1st-Qu => 5        | 1st-Qu => 17   | 1st-Qu => 3        | 1st-Qu => 5        | 1st-Qu => 40        |
# | DRTA  => 22 | Mean   => 9.787879 | Mean   => 8.075758 | Mean   => 33.5 | Mean   => 5.106061 | Mean   => 6.712121 | Mean   => 44.015152 |
# |             | Median => 9        | Median => 8        | Median => 33.5 | Median => 5        | Median => 6        | Median => 45        |
# |             | 3rd-Qu => 12       | 3rd-Qu => 11       | 3rd-Qu => 50   | 3rd-Qu => 6        | 3rd-Qu => 8        | 3rd-Qu => 49        |
# |             | Max    => 16       | Max    => 15       | Max    => 66   | Max    => 13       | Max    => 13       | Max    => 57        |
# +-------------+--------------------+--------------------+----------------+--------------------+--------------------+---------------------+

Remark: The values for the first argument of example-dataset correspond to the values of the columns "Item" and "Package", respectively, in theA metadata dataset from the GitHub repository "Rdatasets", [VAB1]. See the datasets metadata sub-section below.

The first argument of example-dataset can take as values:

Strings that correspond to the column "Items" of the metadata dataset
- E.g. example-dataset("mtcars")
Strings that correspond to the columns "Package" and "Items" of the metadata dataset
- E.g. example-dataset("COUNT::titanic")
Regexes
- E.g. example-dataset(/ .* mann $ /)
Whatever or WhateverCode

Get a dataset by using an URL

Here we get a dataset by using an URL and display a summary of the obtained dataset:

my $url = 'https://raw.githubusercontent.com/antononcube/Raku-Data-Reshapers/main/resources/dfTitanic.csv';
my @tbl2 = example-dataset($url, :headers);
records-summary(@tbl2, field-names => <id passengerSex passengerClass passengerAge passengerSurvival>);

# +-----------------+---------------+----------------+---------------------+-------------------+
# | id              | passengerSex  | passengerClass | passengerAge        | passengerSurvival |
# +-----------------+---------------+----------------+---------------------+-------------------+
# | Min    => 1     | male   => 843 | 3rd => 709     | Min    => -1        | died     => 809   |
# | 1st-Qu => 327.5 | female => 466 | 1st => 323     | 1st-Qu => 10        | survived => 500   |
# | Mean   => 655   |               | 2nd => 277     | Mean   => 23.550038 |                   |
# | Median => 655   |               |                | Median => 20        |                   |
# | 3rd-Qu => 982.5 |               |                | 3rd-Qu => 40        |                   |
# | Max    => 1309  |               |                | Max    => 80        |                   |
# +-----------------+---------------+----------------+---------------------+-------------------+

Datasets metadata

Here we:

Get the dataset of the datasets metadata
Filter it to have only datasets with 13 rows
Keep only the columns "Item", "Title", "Rows", and "Cols"
Display it in "pretty table" format

my @tblMeta = get-datasets-metadata();
@tblMeta = @tblMeta.grep({ $_<Rows> == 13}).map({ $_.grep({ $_.key (elem) <Item Title Rows Cols>}).Hash });
say to-pretty-table(@tblMeta, field-names => <Item Title Rows Cols>)

# +------------+--------------------------------------------------------------------+------+------+
# |    Item    |                               Title                                | Rows | Cols |
# +------------+--------------------------------------------------------------------+------+------+
# | Snow.pumps |    John Snow's Map and Data on the 1854 London Cholera Outbreak    |  13  |  4   |
# |    BCG     |                          BCG Vaccine Data                          |  13  |  7   |
# |   cement   |                  Heat Evolved by Setting Cements                   |  13  |  5   |
# |  kootenay  |   Waterflow Measurements of Kootenay River in Libby and Newgate    |  13  |  2   |
# | Newhouse77 | Medical-Care Expenditure: A Cross-National Survey (Newhouse, 1977) |  13  |  5   |
# |   Saxony   |                         Families in Saxony                         |  13  |  2   |
# +------------+--------------------------------------------------------------------+------+------+

Keeping downloaded data

By default the data is obtained over the web from Rdatasets, but example-dataset has an option to keep the data "locally." (The data is saved in XDG_DATA_HOME, see [JS1].)

This can be demonstrated with the following timings of a dataset with ~1300 rows:

my $startTime = now;
my $data = example-dataset( / 'COUNT::titanic' $ / ):keep;
my $endTime = now;
say "Geting the data first time took { $endTime - $startTime } seconds";

# Geting the data first time took 0.76011044 seconds

$startTime = now;
$data = example-dataset( / 'COUNT::titanic' $/ ):keep;
$endTime = now;
say "Geting the data second time took { $endTime - $startTime } seconds";

# Geting the data second time took 0.764633055 seconds

References

Functions, packages, repositories

[AAf1] Anton Antonov, ExampleDataset, (2020), Wolfram Function Repository.

[VAB1] Vincent Arel-Bundock, Rdatasets, (2020), GitHub/vincentarelbundock.

[JS1] Jonathan Stowe, XDG::BaseDirectory, (last updated on 2021-03-31), Raku Modules.

Interactive interfaces

[AAi1] Anton Antonov, Example datasets recommender interface, (2021), Shinyapps.io.

Name		Name	Last commit message	Last commit date
Latest commit History 70 Commits
.github/workflows		.github/workflows
examples		examples
lib/Data		lib/Data
resources		resources
t		t
.gitignore		.gitignore
LICENSE		LICENSE
META6.json		META6.json
README-work.md		README-work.md
README.md		README.md
sparrow.yaml		sparrow.yaml

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

Data::ExampleDatasets Raku package

Usage examples

Setup

Get a dataset by using an identifier

Get a dataset by using an URL

Datasets metadata

Keeping downloaded data

References

Functions, packages, repositories

Interactive interfaces

About

Uh oh!

Releases

Packages

Contributors 2

Uh oh!

Languages

License

antononcube/Raku-Data-ExampleDatasets

Folders and files

Latest commit

History

Repository files navigation

Data::ExampleDatasets Raku package

Usage examples

Setup

Get a dataset by using an identifier

Get a dataset by using an URL

Datasets metadata

Keeping downloaded data

References

Functions, packages, repositories

Interactive interfaces

About

Topics

Resources

License

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Contributors 2

Uh oh!

Languages

Packages