
PRSice-2 is one of the dedicated PRS programs which automates many of the steps from the previous page that used a sequence of PLINK functions (plus some QC steps). On this page you will run a PRS analysis using PRSice-2, which implements the standard C+T method.

Obtaining PRSice-2

PRSice-2 can be downloaded from:

Operating System Link
Linux 64-bit v2.3.3
OS X 64-bit v2.3.3

and can be directly used after extracting the file.

In this tutorial, you will only need PRSice.R and PRSice_XXX where XXX is the operation system

Required Data

This analysis assumes that you have the following files (or you can download it from here):

File Name Description
Height.QC.gz The post QC base data file. While PRSice-2 can automatically apply most filtering on the base file, it cannot remove duplicated SNPs
EUR.QC.bed This file contains the genotype data that passed the QC steps
EUR.QC.bim This file contains the list of SNPs that passed the QC steps
EUR.QC.fam This file contains the samples that passed the QC steps
EUR.height This file contains the phenotype data of the samples
EUR.cov This file contains the covariates of the samples
EUR.eigenvec This file contains the principal components (PCs) of the samples

Running PRS analysis

To run PRSice-2 we need a single covariate file, and therefore our covariate file and PCs file should be combined. This can be done with R as follows:

covariate <- read.table("EUR.cov", header=T)
pcs <- read.table("EUR.eigenvec", header=F)
colnames(pcs) <- c("FID","IID", paste0("PC",1:6))
cov <- merge(covariate, pcs, by=c("FID", "IID"))
write.table(cov,"EUR.covariate", quote=F, row.names=F)
covariate <- fread("EUR.cov")
pcs <- fread("EUR.eigenvec", header=F)
colnames(pcs) <- c("FID","IID", paste0("PC",1:6))
cov <- merge(covariate, pcs)
fwrite(cov,"EUR.covariate", sep="\t")

which generates EUR.covariate.

PRSice-2 can then be run to obtain the PRS results as follows:

Rscript PRSice.R \
    --prsice PRSice_linux \
    --base Height.QC.gz \
    --target EUR.QC \
    --binary-target F \
    --pheno EUR.height \
    --cov EUR.covariate \
    --base-maf MAF:0.01 \
    --base-info INFO:0.8 \
    --stat OR \
    --or \
    --out EUR
Rscript PRSice.R \
    --prsice PRSice_mac \
    --base Height.QC.gz \
    --target EUR.QC \
    --binary-target F \
    --pheno EUR.height \
    --cov EUR.covariate \
    --base-maf MAF:0.01 \
    --base-info INFO:0.8 \
    --stat OR \
    --or \
    --out EUR
Rscript PRSice.R ^
    --prsice PRSice_win64.exe ^
    --base Height.QC.gz ^
    --target EUR.QC ^
    --binary-target F ^
    --pheno EUR.height ^
    --cov EUR.covariate ^
    --base-maf MAF:0.01 ^
    --base-info INFO:0.8 ^
    --stat OR ^
    --or ^
    --out EUR

The meaning of the parameters are as follow:

Paramter Value Description
prsice PRSice_xxx Informs PRSice.R that the location of the PRSice binary
base Height.QC.gz Informs PRSice that the name of the GWAS summary statistic
target EUR.QC Informs PRSice that the input genotype files should have a prefix of EUR.QC
binary-target F Indicate if the phenotype of interest is a binary trait. F for no
pheno EUR.height Provide PRSice with the phenotype file
cov EUR.covariate Provide PRSice with the covariate file
base-maf MAF:0.01 Filter out SNPs with MAF < 0.01 in the GWAS summary statistics, using information in the MAF column
base-info INFO:0.8 Filter out SNPs with INFO < 0.8 in the GWAS summary statistics, using information in the INFO column
stat OR Column name of the column containing the effect size
or - Inform PRSice that the effect size is an Odd Ratio
out EUR Informs PRSice that all output should have a prefix of EUR

This will automatically perform "high-resolution scoring" and generate the "best-fit" PRS (in, with associated plots of the results. Users should read Section 4.6 of our paper to learn more about issues relating to overfitting in PRS analyses.

Which P-value threshold generates the "best-fit" PRS?


How much phenotypic variation does the "best-fit" PRS explain?
