Skip to content

Feature/903 stansummary sampler diags#907

Merged
mitzimorris merged 47 commits intodevelopfrom
feature/903-stansummary-sampler-diags
Jul 19, 2020
Merged

Feature/903 stansummary sampler diags#907
mitzimorris merged 47 commits intodevelopfrom
feature/903-stansummary-sampler-diags

Conversation

@mitzimorris
Copy link
Member

@mitzimorris mitzimorris commented Jul 11, 2020

Submisison Checklist

  • Run tests: ./runCmdStanTests.py src/test
  • Declare copyright holder and open-source license: see below

Summary:

Modify stansummary console and csv outputs:

  • on the console, there are separate summaries for sampler and model params
  • the csv file contains just the model param summaries

Refactor file stansummary.cpp

  • break processing into per-task functions and move from main in stansummary.cpp into stansummary_helper.hpp
  • whenever possible, generalize reporting for both console and csv outputs.

Intended Effect:

Improved readability and understandability.

  • summary columns "MCSE", "EFF" and "R-hat" don't make sense for sampler params
  • when calculating column widths and formats, including sampler params in these columns resulted in these values always being reported using scientific notation; omitting this leads to easier to read N_eff and R-hat values.

e.g., current output:

~/.cmdstanpy/cmdstan-2.23.0/bin/stansummary output.csv 
Inference for Stan model: lotka_volterra_model
1 chains: each with iter=(1000); warmup=(0); thin=(1); 1000 iterations saved.

Warmup took (6.5) seconds, 6.5 seconds total
Sampling took (6.7) seconds, 6.7 seconds total

                 Mean     MCSE   StdDev     5%    50%    95%    N_Eff  N_Eff/s    R_hat
lp__               28  1.1e-01  2.1e+00     24     29     31  3.3e+02  5.0e+01  1.0e+00
accept_stat__    0.94  2.0e-03  6.7e-02   0.80   0.97   1.00  1.1e+03  1.7e+02  1.0e+00
stepsize__      0.100      nan  1.7e-16  0.100  0.100  0.100      nan      nan      nan
treedepth__       4.8  2.2e-02  6.2e-01    3.0    5.0    5.0  8.2e+02  1.2e+02  1.0e+00
n_leapfrog__       36  5.5e-01  1.6e+01     15     31     63  8.4e+02  1.3e+02  1.0e+00
divergent__      0.00      nan  0.0e+00   0.00   0.00   0.00      nan      nan      nan
energy__          -24  1.6e-01  2.9e+00    -29    -25    -19  3.3e+02  4.9e+01  1.0e+00
theta[1]         0.55  4.8e-03  6.4e-02   0.45   0.55   0.66  1.8e+02  2.6e+01  1.0e+00
theta[2]        0.028  3.0e-04  4.3e-03  0.021  0.028  0.035  2.0e+02  2.9e+01  1.0e+00
theta[3]         0.80  7.0e-03  9.0e-02   0.66   0.79   0.96  1.7e+02  2.5e+01  1.0e+00
theta[4]        0.024  2.7e-04  3.4e-03  0.019  0.024  0.030  1.6e+02  2.4e+01  1.0e+00

new output:

Input file: output.csv
Inference for Stan model: lotka_volterra_model
1 chains: each with iter=(1000); warmup=(0); thin=(1); 1000 iterations saved.

Warmup took 6.5 seconds
Sampling took 6.7 seconds
                 Mean   StdDev     5%    50%    95%
accept_stat__    0.94  6.7e-02   0.80   0.97   1.00
stepsize__      0.100  1.7e-16  0.100  0.100  0.100
treedepth__       4.8  6.2e-01    3.0    5.0    5.0
n_leapfrog__       36  1.6e+01     15     31     63
divergent__      0.00  0.0e+00   0.00   0.00   0.00
energy__          -24  2.9e+00    -29    -25    -19

                 Mean     MCSE   StdDev     5%    50%    95%  N_Eff  N_Eff/s  R_hat
lp__               28  1.1e-01  2.1e+00     24     29     31    333       50   1.00
theta[1]         0.55  4.8e-03  6.4e-02   0.45   0.55   0.66    176       26    1.0
theta[2]        0.028  3.0e-04  4.3e-03  0.021  0.028  0.035    197       29    1.0
theta[3]         0.80  7.0e-03  9.0e-02   0.66   0.79   0.96    168       25    1.0
theta[4]        0.024  2.7e-04  3.4e-03  0.019  0.024  0.030    162       24   1.00

How to Verify:

unit tests

Side Effects:

console and csv outputs will be different; downstream processing which expects a fixed format may break.

Documentation:

online CmdStan User's Guide

Copyright and Licensing

Please list the copyright holder for the work you are submitting (this will be you or your assignee, such as a university or company): Columbia University

By submitting this pull request, the copyright holder is agreeing to license the submitted work under the following licenses:

@stan-buildbot
Copy link
Contributor


Name Old Result New Result Ratio Performance change( 1 - new / old )
gp_pois_regr/gp_pois_regr.stan 3.95 4.02 0.98 -1.69% slower
low_dim_corr_gauss/low_dim_corr_gauss.stan 0.02 0.02 0.99 -0.99% slower
eight_schools/eight_schools.stan 0.09 0.09 0.98 -1.99% slower
gp_regr/gp_regr.stan 0.19 0.19 1.0 0.15% faster
irt_2pl/irt_2pl.stan 5.35 5.36 1.0 -0.18% slower
performance.compilation 86.42 84.98 1.02 1.66% faster
low_dim_gauss_mix_collapse/low_dim_gauss_mix_collapse.stan 7.7 7.69 1.0 0.02% faster
pkpd/one_comp_mm_elim_abs.stan 19.65 21.2 0.93 -7.89% slower
sir/sir.stan 92.08 94.09 0.98 -2.18% slower
gp_regr/gen_gp_data.stan 0.04 0.04 0.98 -1.88% slower
low_dim_gauss_mix/low_dim_gauss_mix.stan 3.04 3.04 1.0 -0.0% slower
pkpd/sim_one_comp_mm_elim_abs.stan 0.33 0.32 1.01 1.25% faster
arK/arK.stan 1.79 1.79 1.0 0.28% faster
arma/arma.stan 0.73 0.72 1.01 1.05% faster
garch/garch.stan 0.52 0.52 1.0 0.01% faster
Mean result: 0.992270978206

Jenkins Console Log
Blue Ocean
Commit hash: c45e3d9


Machine information ProductName: Mac OS X ProductVersion: 10.11.6 BuildVersion: 15G22010

CPU:
Intel(R) Xeon(R) CPU E5-1680 v2 @ 3.00GHz

G++:
Configured with: --prefix=/Applications/Xcode.app/Contents/Developer/usr --with-gxx-include-dir=/usr/include/c++/4.2.1
Apple LLVM version 7.0.2 (clang-700.1.81)
Target: x86_64-apple-darwin15.6.0
Thread model: posix

Clang:
Apple LLVM version 7.0.2 (clang-700.1.81)
Target: x86_64-apple-darwin15.6.0
Thread model: posix

@stan-buildbot
Copy link
Contributor


Name Old Result New Result Ratio Performance change( 1 - new / old )
gp_pois_regr/gp_pois_regr.stan 3.97 4.08 0.97 -2.82% slower
low_dim_corr_gauss/low_dim_corr_gauss.stan 0.02 0.02 1.02 1.6% faster
eight_schools/eight_schools.stan 0.09 0.09 0.96 -3.88% slower
gp_regr/gp_regr.stan 0.19 0.19 0.98 -1.6% slower
irt_2pl/irt_2pl.stan 5.35 5.37 1.0 -0.24% slower
performance.compilation 86.21 84.92 1.02 1.5% faster
low_dim_gauss_mix_collapse/low_dim_gauss_mix_collapse.stan 7.71 7.71 1.0 0.1% faster
pkpd/one_comp_mm_elim_abs.stan 20.71 19.99 1.04 3.48% faster
sir/sir.stan 93.73 92.01 1.02 1.83% faster
gp_regr/gen_gp_data.stan 0.04 0.04 1.02 1.91% faster
low_dim_gauss_mix/low_dim_gauss_mix.stan 3.04 3.04 1.0 -0.12% slower
pkpd/sim_one_comp_mm_elim_abs.stan 0.33 0.35 0.94 -6.72% slower
arK/arK.stan 1.77 1.78 0.99 -0.52% slower
arma/arma.stan 0.72 0.72 1.0 0.36% faster
garch/garch.stan 0.52 0.52 1.0 -0.02% slower
Mean result: 0.99719326257

Jenkins Console Log
Blue Ocean
Commit hash: 0b9463b


Machine information ProductName: Mac OS X ProductVersion: 10.11.6 BuildVersion: 15G22010

CPU:
Intel(R) Xeon(R) CPU E5-1680 v2 @ 3.00GHz

G++:
Configured with: --prefix=/Applications/Xcode.app/Contents/Developer/usr --with-gxx-include-dir=/usr/include/c++/4.2.1
Apple LLVM version 7.0.2 (clang-700.1.81)
Target: x86_64-apple-darwin15.6.0
Thread model: posix

Clang:
Apple LLVM version 7.0.2 (clang-700.1.81)
Target: x86_64-apple-darwin15.6.0
Thread model: posix

@mitzimorris
Copy link
Member Author

looking for a reviewer with free cycles this weekend.

it would be lovely to get this in for the code freeze, but I know everyone's super busy and everyone feels this way about their stuff, so if not, that's OK too.

@bbbales2
Copy link
Member

the csv file contains just the model param summaries

There's now a csv output of the summary written too? Is there a flag for this? Or was there always this option and I just missed it?

@mitzimorris
Copy link
Member Author

There's now a csv output of the summary written too? Is there a flag for this? Or was there always this option and I just missed it?

yes, this option has been there for a while. the flag is --csv_file. CmdStanPy uses this to get the summary - perhaps other interfaces do as well.

there's also an option --percentiles="p1,p2,p3,...pN" - this was added recently - #890

@mitzimorris
Copy link
Member Author

mitzimorris commented Jul 11, 2020

There should be more tests for this feature. I tested it on a set of 4 csv output files from the Bernoulli model and a single csv output file from the the Lotka-Volterra case study model & data. the latter is a good test case because it has a parameter which is a 2-D array. the former is also good because a single parameter is the minimal parameters allowed, so it's a good edge case.

will add relevant model, data, and output files and add logic to unit tests to run these checks.

@bbbales2
Copy link
Member

@betanalpha should energy__ be one of the things we compute ESS and Rhat on? Is that useful?

Newline comments:

Warmup took 0.0080 seconds
Sampling took 0.012 seconds ########## Put an empty line after this
                Mean   StdDev    5%   50%  95%
accept_stat__   0.91  1.3e-01  0.63  0.97  1.0
stepsize__       1.1  6.7e-16   1.1   1.1  1.1
treedepth__      1.4  4.8e-01   1.0   1.0  2.0
n_leapfrog__     2.4  1.1e+00   1.0   3.0  3.0
divergent__     0.00  0.0e+00  0.00  0.00 0.00
energy__         7.8  1.0e+00   6.8   7.5  9.8

                Mean     MCSE  StdDev     5%   50%   95%  N_Eff  N_Eff/s  R_hat
lp__            -7.3  3.8e-02    0.74   -8.8  -7.0  -6.8    391    32609   1.00
theta           0.25  6.1e-03    0.12  0.076  0.23  0.46    390    32532   1.00
############ Delete an empty line here

Samples were drawn using hmc with nuts.
For each parameter, N_Eff is a crude measure of effective sample size,
and R_hat is the potential scale reduction factor on split chains (at 
convergence, R_hat=1).

Calling stansummary with no arguments segfaults:

$ bin/stansummary
Segmentation fault (core dumped)

Might be the same segfault if the file with the draws doesn't exist:

$ bin/stansummary --csv_file csv.csv asdfasdfadsf
csv_filename csv.csv
Input file: asdfasdfadsf
Warning: non-fatal error reading metadata
Error: error reading header
terminate called after throwing an instance of 'std::invalid_argument'
  what():  Error with header of input file in parse
Aborted (core dumped)

And I got another segfault trying to write a csv (this is using one chain of the default bernoulli model):

$ bin/stansummary --csv_file csv.csv output.csv
csv_filename csv.csv
Input file: output.csv
Inference for Stan model: bernoulli_model
1 chains: each with iter=(1000); warmup=(0); thin=(1); 1000 iterations saved.

Warmup took 0.0080 seconds
Sampling took 0.012 seconds
                Mean   StdDev    5%   50%  95%
accept_stat__   0.91  1.3e-01  0.63  0.97  1.0
...
and R_hat is the potential scale reduction factor on split chains (at 
convergence, R_hat=1).

stansummary: stan/lib/stan_math/lib/eigen_3.3.7/Eigen/src/Core/DenseCoeffsBase.h:118: Eigen::DenseCoeffsBase<type-parameter-0-0, 0>::CoeffReturnType Eigen::DenseCoeffsBase<Eigen::Matrix<double, -1, -1, 0, -1, -1>, 0>::operator()(Eigen::Index, Eigen::Index) const [Derived = Eigen::Matrix<double, -1, -1, 0, -1, -1>, Level = 0]: Assertion `row >= 0 && row < rows() && col >= 0 && col < cols()' failed.
Aborted (core dumped)

@mitzimorris
Copy link
Member Author

according to Aki: "there is no sense computing or reporting Rhat or ESS for sampler diagnostic columns."
is energy__ not a sampler diagnostic?

https://discourse.mc-stan.org/t/cmdstan-guide-now-online/16368/10

@mitzimorris
Copy link
Member Author

@bbbales2 and/or @SteveBronder - fixed newlines, added checks and error messages for all command line args.
ready for code review.

@mitzimorris
Copy link
Member Author

remaining todos on this PR:

  • better usage message
  • function-level unit tests for header, stats, and output functions.

@SteveBronder
Copy link
Contributor

@mitzimorris ping me when they two above are done and I'll review!

@mitzimorris
Copy link
Member Author

Rewrote usage message as follows:

> bin/stansummary
Usage: stansummary [OPTIONS] stan_csv_file(s)
Report statistics for one or more Stan csv files from HMC sampler run.
Example:  stansummary model_1.csv model_2.csv
Options:
  -a, --autocorr [n]          Display the chain autocorrelation for the n-th
                              input file, in addition to statistics.
  -c, --csv_filename [file]   Write statistics to a csv file.
  -h, --help                  Produce help message, then exit.
  -p, --percentiles [values]  Percentiles to report as ordered set of
                              comma-separated integers from (1,99), inclusive.
                              Default is 5,50,95.
  -s, --sig_figs [n]          Significant figures reported. Default is 2.
                              Must be an integer from (1, 10), inclusive.

@bbbales2 - whaddya think?

@mitzimorris
Copy link
Member Author

C++11 raw strings rock!

  std::string usage = R"(Usage: stansummary [OPTIONS] stan_csv_file(s)
Report statistics for one or more Stan csv files from HMC sampler run.
Example:  stansummary model_1.csv model_2.csv
Options:
  -a, --autocorr [n]          Display the chain autocorrelation for the n-th
                              input file, in addition to statistics.
  -c, --csv_filename [file]   Write statistics to a csv file.
  -h, --help                  Produce help message, then exit.
  -p, --percentiles [values]  Percentiles to report as ordered set of
                              comma-separated integers from (1,99), inclusive.
                              Default is 5,50,95.
  -s, --sig_figs [n]          Significant figures reported. Default is 2.
                              Must be an integer from (1, 10), inclusive.
)";
  if (argc < 2) {
    std::cout << usage << std::endl;
    return 0;
  }

@bbbales2
Copy link
Member

@mitzimorris very nice!

@mitzimorris
Copy link
Member Author

@SteveBronder ready for re-review.

@stan-buildbot
Copy link
Contributor


Name Old Result New Result Ratio Performance change( 1 - new / old )
gp_pois_regr/gp_pois_regr.stan 4.19 4.23 0.99 -0.77% slower
low_dim_corr_gauss/low_dim_corr_gauss.stan 0.02 0.02 0.99 -1.32% slower
eight_schools/eight_schools.stan 0.09 0.09 0.99 -1.34% slower
gp_regr/gp_regr.stan 0.19 0.19 1.02 1.77% faster
irt_2pl/irt_2pl.stan 5.48 5.39 1.02 1.68% faster
performance.compilation 87.19 85.95 1.01 1.42% faster
low_dim_gauss_mix_collapse/low_dim_gauss_mix_collapse.stan 8.38 8.38 1.0 0.0% slower
pkpd/one_comp_mm_elim_abs.stan 26.84 25.96 1.03 3.3% faster
sir/sir.stan 115.02 112.11 1.03 2.53% faster
gp_regr/gen_gp_data.stan 0.05 0.05 1.01 0.68% faster
low_dim_gauss_mix/low_dim_gauss_mix.stan 3.26 3.26 1.0 -0.03% slower
pkpd/sim_one_comp_mm_elim_abs.stan 0.38 0.41 0.93 -8.01% slower
arK/arK.stan 1.82 1.83 1.0 -0.24% slower
arma/arma.stan 0.65 0.7 0.93 -7.84% slower
garch/garch.stan 0.6 0.6 1.0 0.07% faster
Mean result: 0.995571088249

Jenkins Console Log
Blue Ocean
Commit hash: 3bb7fae


Machine information ProductName: Mac OS X ProductVersion: 10.11.6 BuildVersion: 15G22010

CPU:
Intel(R) Xeon(R) CPU E5-1680 v2 @ 3.00GHz

G++:
Configured with: --prefix=/Applications/Xcode.app/Contents/Developer/usr --with-gxx-include-dir=/usr/include/c++/4.2.1
Apple LLVM version 7.0.2 (clang-700.1.81)
Target: x86_64-apple-darwin15.6.0
Thread model: posix

Clang:
Apple LLVM version 7.0.2 (clang-700.1.81)
Target: x86_64-apple-darwin15.6.0
Thread model: posix

@bbbales2
Copy link
Member

Running this it looks good.

@mitzimorris
Copy link
Member Author

just fixed makefile - stansummary uses boost::program_options, linker was complaining about visibility.
see issue #904

@mitzimorris
Copy link
Member Author

@bbbales2 or @SteveBronder - if this builds, ready for re-review.

changed makefile so that changes to src/cmdstan/stansummary_helper.hpp will trigger rebuild of bin/stansummary
and added compiler flag so that visilbility matches boost::program_options visibility - otherwise ld writes an error novel.

@stan-buildbot
Copy link
Contributor


Name Old Result New Result Ratio Performance change( 1 - new / old )
gp_pois_regr/gp_pois_regr.stan 4.27 4.16 1.03 2.54% faster
low_dim_corr_gauss/low_dim_corr_gauss.stan 0.02 0.02 0.95 -4.73% slower
eight_schools/eight_schools.stan 0.09 0.09 1.05 4.94% faster
gp_regr/gp_regr.stan 0.19 0.19 1.0 0.48% faster
irt_2pl/irt_2pl.stan 5.41 5.46 0.99 -1.01% slower
performance.compilation 86.89 86.42 1.01 0.54% faster
low_dim_gauss_mix_collapse/low_dim_gauss_mix_collapse.stan 8.39 8.44 0.99 -0.59% slower
pkpd/one_comp_mm_elim_abs.stan 26.1 27.42 0.95 -5.09% slower
sir/sir.stan 112.49 110.42 1.02 1.84% faster
gp_regr/gen_gp_data.stan 0.05 0.05 0.97 -2.6% slower
low_dim_gauss_mix/low_dim_gauss_mix.stan 3.3 3.45 0.95 -4.75% slower
pkpd/sim_one_comp_mm_elim_abs.stan 0.39 0.38 1.03 2.58% faster
arK/arK.stan 1.81 1.81 1.0 -0.04% slower
arma/arma.stan 0.66 0.65 1.02 2.0% faster
garch/garch.stan 0.6 0.59 1.01 0.54% faster
Mean result: 0.998588575779

Jenkins Console Log
Blue Ocean
Commit hash: 7deae49


Machine information ProductName: Mac OS X ProductVersion: 10.11.6 BuildVersion: 15G22010

CPU:
Intel(R) Xeon(R) CPU E5-1680 v2 @ 3.00GHz

G++:
Configured with: --prefix=/Applications/Xcode.app/Contents/Developer/usr --with-gxx-include-dir=/usr/include/c++/4.2.1
Apple LLVM version 7.0.2 (clang-700.1.81)
Target: x86_64-apple-darwin15.6.0
Thread model: posix

Clang:
Apple LLVM version 7.0.2 (clang-700.1.81)
Target: x86_64-apple-darwin15.6.0
Thread model: posix

Copy link
Contributor

@SteveBronder SteveBronder left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Cool I think this looks good! One quick thing, idk what's kosher here but for double hypen options is it normally --sig_figs 2 or --sig_figs=2? I always thought single flags has no = while double flags usually have =. Though I'm fine with whatever

@mitzimorris
Copy link
Member Author

not sure what the convention is, but can put the "=" into the docs.

@avehtari
Copy link
Member

Sorry I was on vacation during the discussion. I make this comment here to ping the people in this thread, but I can also make eventually a separate issue. Would it be possible at some point to change N_Eff to ESS as N is misleading as it in manual refers to to the number iterations in each chain (or is often used for the number of observations). ESS would be more clear as it is acronym for effective sample size. posterior package used .e.g by CmdStanR has at least moved to use ESS.

@mitzimorris
Copy link
Member Author

created new issue.

@bob-carpenter
Copy link
Member

+1 to that. The problem with "n_eff" is that it's not the number of effective samples. We consider the set of draws to be a single sample from the posterior. It's the effective sample size.

I don't like the n_ prefix for "number" elsewhere, either, but that's a different issue.

@betanalpha
Copy link
Contributor

betanalpha commented Jul 26, 2020 via email

@WardBrian WardBrian deleted the feature/903-stansummary-sampler-diags branch September 6, 2023 20:16
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

8 participants