You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
#1864 updated the very old and janky LD code to be a bit easier to work with from a coding perspecive. However, the API of creating a object to do calculations with is an anti-pattern and we want to provide some direct computation of LD values.
We should add some methods to the TreeSequence Python API for computing LD that will do the same thing and use the same underlying C code, but getting rid of the low-level C-Python LDCalculator object.
The obvious thing to do from a C perspective would be to add the methods:
(The motivation for adding the r2_matrix method is that there are significant performance gains to be made from doing tree transitions in a clever way when computing many pairs of r2 values. We can't take advantage of this by doing the r2_array independently.)
That will cover all the bases in terms of replicating the existing functionality, but there's a few API design choices we should think through first, especially in light of the planned generalisations (#432)
Should we think about windowing here?
What about "mode", should this argument be put in? (We can fail if it's not "site" initially)
What about "sample_sets", does this make sense here too?
Is "r2" a good enough name or should we call it "ld_r2" maybe? Then we could also have "ld_D", "ld_r" (Calculate r LD coefficient #405)? Or should we name it something else entirely?
Basically, it would be good if we could get the basic API "shape" in place for the general two-site statistics so that we can port the existing LD code, and finally deprecate the Python LDCalculator.
#1864 updated the very old and janky LD code to be a bit easier to work with from a coding perspecive. However, the API of creating a object to do calculations with is an anti-pattern and we want to provide some direct computation of LD values.
We should add some methods to the TreeSequence Python API for computing LD that will do the same thing and use the same underlying C code, but getting rid of the low-level C-Python LDCalculator object.
The obvious thing to do from a C perspective would be to add the methods:
(The motivation for adding the
r2_matrixmethod is that there are significant performance gains to be made from doing tree transitions in a clever way when computing many pairs of r2 values. We can't take advantage of this by doing the r2_array independently.)That will cover all the bases in terms of replicating the existing functionality, but there's a few API design choices we should think through first, especially in light of the planned generalisations (#432)
Basically, it would be good if we could get the basic API "shape" in place for the general two-site statistics so that we can port the existing LD code, and finally deprecate the Python LDCalculator.
Any thoughts @apragsdale @petrelharp?