Intro into topr input datasets

Example input datasets

Note!!! It is highly recommended to check the number of datapoints in your dataset before you plot, since a very large dataset will take a long time to plot.

I usually reduce the size of the GWAS datataset prior to plotting by filtering out variants with P>1e-03.

topr comes with three example GWAS datasets, one on Ulcerative Colitis retrieved from the UKBB (UC_UKBB), and the other two on Crohn’s disease (CD_UKBB and CD_FINNGEN) obtained from the FinnGen and UK biobanks respectively. topr utilizes gene and exon datasets from Ensembl (GRCh38.pxx) (ENSGENES and ENSEXONS).

See topr reference for more details on the in-built datasets.

Input datasets must include least three columns (CHROM, POS and P), where naming of the columns is flexible (i.e the chr label can be either chr or chrom and is case insensitive).

topr has 3 in-built datasets (GWASes), take a look at Crohn’s GWAS (CD_UKBB) by issuing the following command:

head(CD_UKBB)

  CHROM     POS          ID           P       OR
1  chr1 1006415 rs145588482 0.000468758 0.583384
2  chr1 1006415 rs145588482 0.000468758 0.583384
3  chr1 1007256  rs76233940 0.000401567 0.579783
4  chr1 1007256  rs76233940 0.000401567 0.579783
5  chr1 1007256  rs76233940 0.000401567 0.579783
6  chr1 1341559 rs376494450 0.000151216 1.320130

The chromosome in the CHROM column can be represented with or without the chr suffix, e.g (chr1 or 1)

Input data column names and alternatives understood by topr:

Required columns and alternative namings:
CHROM:          CHR,chr,Chrom,chrom,chromosome,CHROMOSOME
POS:            pos,BP,bp,base_pair_location
P:              p,PVAL,pval,PVALUE,pvalue,p_value

Optional columns and alternative namings:
ID:             rsid,rsId,RSID,SNP,snp,rsName,rsname,RSNAME
Gene_Symbol:    Gene_symbol,GENENAME,geneName,genename,GENE,gene
Max_Impact:     max_impact,Impact,impact
REF:            ref
ALT:            alt
BETA:           beta,b,B
OR:             or,odds_ratio

Thorhildur Juliusdottir

Example input datasets