--- title: "How To Simulate Traits" output: rmarkdown::html_vignette vignette: > %\VignetteIndexEntry{How To Simulate Traits} %\VignetteEngine{knitr::rmarkdown} %\VignetteEncoding{UTF-8} --- ```{r, include = FALSE} knitr::opts_chunk$set( collapse = TRUE, comment = "#>", message = FALSE, warning = FALSE ) ``` ```{r setup} library(smer) library(genio) ``` For a simple illustration, we generate a synthetic allele count matrix with minor allele frequency `maf` greater than 5%. The function `simulate_traits()` requires the input genotype data to be stored in the PLINK format (`.bim`, `.bed`, and `.fam` files). Here we use the package `genio` to write the allele count matrix to a PLINK file. ```{r genotypes, eval=FALSE} plink_file <- tempfile() n_samples <- 3000 n_snps <- 5000 maf <- 0.05 + 0.45 * runif(n_snps) random_minor_allele_counts <- (runif(n_samples * n_snps) < maf) + (runif(n_samples * n_snps) < maf) allele_count_matrix <- matrix(random_minor_allele_counts, nrow = n_samples, ncol = n_snps, byrow = TRUE, ) # Create .fam and .bim data frames fam <- data.frame( fam = sprintf("F%d", 1:n_samples), # Family ID id = sprintf("I%d", 1:n_samples), # Individual ID pat = 0, # Paternal ID mat = 0, # Maternal ID sex = sample(1:2, n_samples, replace = TRUE), pheno = -9 ) bim <- data.frame( chr = 1, # Chromosome number id = sprintf("rs%d", 1:n_snps), # SNP name posg = 0, # Genetic distance (cM) pos = 1:n_snps, # Base-pair position ref = sample(c("A", "C", "G", "T"), n_snps, replace = T), # Minor allele alt = sample(c("A", "C", "G", "T"), n_snps, replace = T) # Major allele ) # Write to .bed, .fam, and .bim files write_plink( file = plink_file, X = t(allele_count_matrix), fam = fam, bim = bim ) ``` To simulate traits, we need to specify the genetic architecture. E.g., the function requires us to specify - the narrow sense heritability $h^2$ - the indices SNPs contributing to the narrow sense heritability - the heritability due to epistasis - the indices of SNPs contributing to the epistatic trait variance - the path to the output file. The simulated trait is written to file in the PLINK phenotype format. ```{r traits, eval=FALSE} pheno_file <- tempfile() additive_heritability <- 0.3 gxg_heritability <- 0.25 n_snps_gxg_group <- 5 n_snps_additive <- 100 additive_snps <- sort(sample(1:n_snps, n_snps_additive, replace = F)) gxg_group_1 <- sort(sample(additive_snps, n_snps_gxg_group, replace = F)) gxg_group_2 <- sort(sample(setdiff(additive_snps, gxg_group_1), n_snps_gxg_group, replace = F)) simulate_traits( plink_file, pheno_file, additive_heritability, gxg_heritability, additive_snps, gxg_group_1, gxg_group_2 ) ``` The variance components of the traits are constructed to match the input. See Crawford et al. (2017) and [mvMAPIT Documentation: Simulate Traits](https://lcrawlab.github.io/mvMAPIT/articles/tutorial-simulations.html) for a full description of the simulation scheme. ## SessionInfo ```{r seesionInfo} sessionInfo() ```