We provide a walk-through of how to run tests for segregation distortion at many loci in parallel.
Data sets ufit
, ufit2
, and
ufit3
contain the genotyping output of
updog::multidog()
using three different models.
ufit
: Uses the norm
model without
specifying who the parents are.ufit2
: Uses the f1pp
model, specifying the
parents.ufit3
: Uses the f1
model, specifying the
parents.You can convert this genotyping output to what
multi_lrt()
expects using multidog_to_g()
.
If you did not use the f1pp
or f1
models, use ether the all_gl
(to run tests using genotype
log-likelihoods) or all_g
(to run tests assuming genotypes
are known) options, and you must specify the ID’s of the parents.
o1 <- multidog_to_g(ufit, type = "all_g", p1 = "indigocrisp", p2 = "sweetcrisp")
o2 <- multidog_to_g(ufit, type = "all_gl", p1 = "indigocrisp", p2 = "sweetcrisp")
If you did use the f1pp
or f1
models, use either the off_gl
(to run tests using genotype
log-likelihoods) or off_g
(to run tests assuming genotypes
are known) options.
o3 <- multidog_to_g(ufit2, type = "off_g")
o4 <- multidog_to_g(ufit2, type = "off_gl")
o5 <- multidog_to_g(ufit3, type = "off_g")
o6 <- multidog_to_g(ufit3, type = "off_gl")
We would recommend always using genotype log-likelihoods. But the option is there for known genotypes.
Parallelization support is provided through the future package. You
specify how to implement parallelization using
future::plan()
. Then you run multi_lrt()
. Then
you shut down the parallelization with future::plan()
. The
most common plan is future::multisession()
, where you
specify the number of parallel processes with the workers
argument. You can get the maximum number of workers via
future::availableCores()
So a typically workload looks something like this:
future::plan(future::multisession(workers = 2))
mout <- multi_lrt(g = o2$g, p1 = o2$p1, p2 = o2$p2)
future::plan(future::sequential())
The output is a data frame. The most important parts of this are the
snp
and p_value
columns. As a reminder, please
ignore the alpha
, xi1
, and xi2
estimates. Those are noisy. Please don’t use them for real work.
mout[c("snp", "p_value")]
#> snp p_value
#> 1 1_44673475 0.205250847
#> 2 11_28836119 0.200142085
#> 3 12_31029646 0.018008460
#> 4 12_8487773 1.000000000
#> 5 2_1426329 0.453995964
#> 6 2_20070837 0.001131421
#> 7 2_40108108 0.410718596
#> 8 4_37820371 0.212515165
#> 9 6_30619745 0.617023681
#> 10 6_4350249 0.649505305
It looks like SNPs 12_31029646 and 2_20070837 are in possible segregation distortion. Let’s look at the posterior mode genotypes of SNP 2_20070837:
o1$g["2_20070837", ]
#> 0 1 2 3 4
#> 40 166 32 2 0
o1$p1["2_20070837"]
#> 2_20070837
#> 0
o1$p2["2_20070837"]
#> 2_20070837
#> 2
So SNP 2_20070837 likely got flagged because of two individuals that are very likely a genotype of 3, which is impossible if the parents have genotypes 0 and 2.
For SNP 12_31029646, let’s compare the expected frequencies against the observed modes.
offspring_gf_2(alpha = 0.1249, xi1 = 1/3, p1 = 2, p2 = 0)
#> [1] 0.2083 0.5834 0.2083 0.0000 0.0000
o1$g["12_31029646", ] / sum(o1$g["12_31029646", ])
#> 0 1 2 3 4
#> 0.1541667 0.5833333 0.2625000 0.0000000 0.0000000
o1$p1["12_31029646"]
#> 12_31029646
#> 2
o1$p2["12_31029646"]
#> 12_31029646
#> 0
So SNP 12_31029646 likely got flagged because there are too many individuals with a genotype of 2, and not enough with a genotype of 0.