get_lead_snps()
Get the top variants within 1 MB windows of the genome with association p-values below the given threshold
get_lead_snps(
df,
thresh = 5e-09,
region_size = 1e+06,
protein_coding_only = FALSE,
chr = NULL,
.checked = FALSE,
verbose = NULL,
keep_chr = TRUE
)
Dataframe
A number. P-value threshold, only extract variants with p-values below this threshold (5e-09 by default)
An integer (default = 1000000) (or a string represented as 100kb or 1MB) indicating the window size for variant labeling. Increase this number for sparser annotation and decrease for denser annotation.
Logical, set this variable to TRUE to only use protein_coding genes for annotation
String, get the top variants from one chromosome only, e.g. chr="chr1"
Logical, if the input data has already been checked, this can be set to TRUE so it wont be checked again (FALSE by default)
Logical, set to TRUE to get printed information on number of SNPs extracted
Logical, set to FALSE to remove the "chr" prefix before each chromosome if present (TRUE by default)
Dataframe of lead variants. Returns the best variant per MB (by default, change the region size with the region argument) with p-values below the input threshold (thresh=5e-09 by default)
get_lead_snps(CD_UKBB)
#> # A tibble: 9 × 8
#> CHROM POS ID REF ALT P OR AF
#> <chr> <int> <chr> <chr> <chr> <dbl> <dbl> <dbl>
#> 1 chr6 31660620 rs148844907 T A 1.47e-24 2.43 0.0103
#> 2 chr16 50729867 rs2066847 G GC 7.37e-24 2.14 0.0159
#> 3 chr1 67216513 rs11576518 G A 8.04e-20 0.777 0.441
#> 4 chr16 50485831 rs76176364 A G 8.18e-16 1.75 0.0227
#> 5 chr6 32708532 rs144614916 A C 1.28e-15 1.80 0.0200
#> 6 chr7 50274703 rs2219345 T G 8.52e-14 1.23 0.593
#> 7 chr9 4984530 rs1887428 G C 5.04e-11 0.833 0.633
#> 8 chr5 40439961 rs7713270 C T 7.43e-11 1.20 0.602
#> 9 chr2 233237298 rs13418066 A C 1.70e- 9 1.18 0.506