Feature/903 stansummary sampler diags#907
Conversation
Jenkins Console Log Machine informationProductName: Mac OS X ProductVersion: 10.11.6 BuildVersion: 15G22010CPU: G++: Clang: |
…b.com/stan-dev/cmdstan into feature/903-stansummary-sampler-diags
Jenkins Console Log Machine informationProductName: Mac OS X ProductVersion: 10.11.6 BuildVersion: 15G22010CPU: G++: Clang: |
|
looking for a reviewer with free cycles this weekend. it would be lovely to get this in for the code freeze, but I know everyone's super busy and everyone feels this way about their stuff, so if not, that's OK too. |
There's now a csv output of the summary written too? Is there a flag for this? Or was there always this option and I just missed it? |
yes, this option has been there for a while. the flag is there's also an option --percentiles="p1,p2,p3,...pN" - this was added recently - #890 |
|
There should be more tests for this feature. I tested it on a set of 4 csv output files from the Bernoulli model and a single csv output file from the the Lotka-Volterra case study model & data. the latter is a good test case because it has a parameter which is a 2-D array. the former is also good because a single parameter is the minimal parameters allowed, so it's a good edge case. will add relevant model, data, and output files and add logic to unit tests to run these checks. |
|
@betanalpha should energy__ be one of the things we compute ESS and Rhat on? Is that useful? Newline comments: Calling stansummary with no arguments segfaults: Might be the same segfault if the file with the draws doesn't exist: And I got another segfault trying to write a csv (this is using one chain of the default bernoulli model): |
|
according to Aki: "there is no sense computing or reporting Rhat or ESS for sampler diagnostic columns." https://discourse.mc-stan.org/t/cmdstan-guide-now-online/16368/10 |
|
@bbbales2 and/or @SteveBronder - fixed newlines, added checks and error messages for all command line args. |
|
remaining todos on this PR:
|
|
@mitzimorris ping me when they two above are done and I'll review! |
|
Rewrote usage message as follows: @bbbales2 - whaddya think? |
|
C++11 raw strings rock! |
|
@mitzimorris very nice! |
…b.com/stan-dev/cmdstan into feature/903-stansummary-sampler-diags
|
@SteveBronder ready for re-review. |
…4.1 (tags/RELEASE_600/final)
Jenkins Console Log Machine informationProductName: Mac OS X ProductVersion: 10.11.6 BuildVersion: 15G22010CPU: G++: Clang: |
|
Running this it looks good. |
…b.com/stan-dev/cmdstan into feature/903-stansummary-sampler-diags
|
just fixed makefile - stansummary uses boost::program_options, linker was complaining about visibility. |
|
@bbbales2 or @SteveBronder - if this builds, ready for re-review. changed makefile so that changes to |
Jenkins Console Log Machine informationProductName: Mac OS X ProductVersion: 10.11.6 BuildVersion: 15G22010CPU: G++: Clang: |
SteveBronder
left a comment
There was a problem hiding this comment.
Cool I think this looks good! One quick thing, idk what's kosher here but for double hypen options is it normally --sig_figs 2 or --sig_figs=2? I always thought single flags has no = while double flags usually have =. Though I'm fine with whatever
|
not sure what the convention is, but can put the "=" into the docs. |
|
Sorry I was on vacation during the discussion. I make this comment here to ping the people in this thread, but I can also make eventually a separate issue. Would it be possible at some point to change N_Eff to ESS as N is misleading as it in manual refers to to the number iterations in each chain (or is often used for the number of observations). ESS would be more clear as it is acronym for effective sample size. posterior package used .e.g by CmdStanR has at least moved to use ESS. |
|
created new issue. |
|
+1 to that. The problem with "n_eff" is that it's not the number of effective samples. We consider the set of draws to be a single sample from the posterior. It's the effective sample size. I don't like the |
|
N_eff and ESS are largely synonymous in practical use and both are regularly
interpreted as integers. In particular people read “sample size” and presume
it is an integer just as strongly as when reading the “N_”. Moreover when
the significant figures are set such that the N_eff/ESS column returns integers
users will often presume an integer meaning regardless.
The only real solution is to provide better documentation that clarifies what
each column means, perhaps a —verbose option.
Ultimately I don’t have a problem with changing N_eff to ESS, although it would
break the stansummary UI and user written code that parses for the columns by
name (same problem for the summary functions in RStan and PyStan). But it
definitely won’t be sufficient to avoid the “N_eff/ESS is an integer equal to the
number of independent samples” confusion.
… On Jul 24, 2020, at 9:45 AM, Aki Vehtari ***@***.***> wrote:
Sorry I was on vacation during the discussion. I make this comment here to ping the people in this thread, but I can also make eventually a separate issue. Would it be possible at some point to change N_Eff to ESS as N is misleading as it in manual refers to to the number iterations in each chain (or is often used for the number of observations). ESS would be more clear as it is acronym for effective sample size. posterior package used .e.g by CmdStanR has at least moved to use ESS.
—
You are receiving this because you were mentioned.
Reply to this email directly, view it on GitHub <#907 (comment)>, or unsubscribe <https://github.com/notifications/unsubscribe-auth/AALU3FTSXT467FNFM5EFZKTR5GGBPANCNFSM4OXBP7PA>.
|
Submisison Checklist
./runCmdStanTests.py src/testSummary:
Modify stansummary console and csv outputs:
Refactor file
stansummary.cppmaininstansummary.cppintostansummary_helper.hppIntended Effect:
Improved readability and understandability.
e.g., current output:
new output:
How to Verify:
unit tests
Side Effects:
console and csv outputs will be different; downstream processing which expects a fixed format may break.
Documentation:
online CmdStan User's Guide
Copyright and Licensing
Please list the copyright holder for the work you are submitting (this will be you or your assignee, such as a university or company): Columbia University
By submitting this pull request, the copyright holder is agreeing to license the submitted work under the following licenses: