Add TreeSequence methods for r2 LD and remove LD calculator from external C API

#1864 updated the very old and janky LD code to be a bit easier to work with from a coding perspecive. However, the API of creating a object to do calculations with is an anti-pattern and we want to provide some direct computation of LD values. 

We should add some methods to the TreeSequence Python API for computing LD that will do the same thing and use the same underlying C code, but getting rid of the low-level C-Python LDCalculator object.

The obvious thing to do from a C perspective would be to add the methods:
```c
int tsk_treeseq_get_r2(tsk_treeseq_t *self, tsk_id_t a, tsk_id_t b, double *r2);
int tsk_treeseq_get_r2_array(tsk_treeseq_t *self, tsk_id_t a, int direction,
    tsk_size_t max_sites, double max_distance, double *r2, tsk_size_t *num_r2_values);
int tsk_treeseq_get_r2_matrix(tsk_treeseq_t *self, 
     tsk_size_t num_a_sites,  tsk_id_t *a,   tsk_size_t num_b_sites, tsk_id_t *b, double *A);
```

(The motivation for adding the ``r2_matrix`` method is that there are significant performance gains to be made from doing tree transitions in a clever way when computing many pairs of r2 values. We can't take advantage of this by doing the r2_array independently.)

That will cover all the bases in terms of replicating the existing functionality, but there's a few API design choices we should think through first, especially in light of the planned generalisations (#432)

- Should we think about windowing here?
- What about "mode", should this argument be put in? (We can fail if it's not "site" initially)
- What about "sample_sets", does this make sense here too?
- Is "r2" a good enough name or should we call it "ld_r2" maybe? Then we could also have "ld_D", "ld_r" (#405)? Or should we name it something else entirely?

Basically, it would be good if we could get the basic API  "shape"  in place for the general two-site statistics so that we can port the existing LD code, and finally deprecate the Python LDCalculator.

Any thoughts @apragsdale @petrelharp?


Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Add TreeSequence methods for r2 LD and remove LD calculator from external C API #1900

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Add TreeSequence methods for r2 LD and remove LD calculator from external C API #1900

Description

Metadata

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Issue actions