manhattanly 패키지를 활용해서 interactive plot을 그려보도록 하겠다. 자세한 내용은 http://sahirbhatnagar.com/manhattanly/ 를 참고하기 바란다. Interactive plot은 용량이 커 로드에 시간이 걸린다. 본 실습에서는 top 10000개의 결과만 그리겠다. 서버 https://secondmath.shinyapps.io/gwas/ 에도 올려놓았으니 참고하기 바란다.

Read data: Result of (1)

library(data.table);library(ggplot2);library(manhattanly);library(DT);library(glue)
## See example usage at http://sahirbhatnagar.com/manhattanly/
dir="/home/jinseob2kim/Dropbox/example"
setwd(dir)
res.final = fread("ResGWASAddgene.csv")[order(P)][1:10000]
#set.seed(5385)
#res.prune.addgene = fread("ResPruneAddgene.csv")
#res.final=res.prune.addgene
#res.final = res.prune.addgene[sample(.N,10000)]

## SNP of interest
SNP.interest = res.final[P<=5e-8,SNP]

QQ plot

#devtools::install_github("sahirbhatnagar/manhattanly", build_vignettes = TRUE)
qqly(res.final, snp = "SNP",p="P",gene="symbol", highlight =SNP.interest, annotation1="CHR", annotation2="BP")

Example: qqly

Manhattan plot

manhattanly(res.final, snp = "SNP",p="P", chr = "CHR", bp = "BP", gene="symbol",highlight = SNP.interest ,annotation1="BP",annotation2="BETA")

Example: manhattanly

Volcano plot

Volcano plot은 Beta/OR 값과 p-value를 동시에 살펴볼 수 있는 그림이다. 패키지가 불완전한 관계로 아래의 예를 참고하여 effect size 변수를 지정하는 것을 잊지 말자.

## pre-setting: effect size
z=volcanor(res.final, p = "P",effect_size = "BETA",snp="SNP",gene="symbol",annotation1="BP",annotation2="CHR")
names(z$data)[1]="BETA"

## Plot
volcanoly(z, highlight = SNP.interest, effect_size_line = c(-10,10),genomewideline = -log10(5e-08))

Regional plot

Regional plot은 manhattan plot에서 특정 위치를 자세히 살펴보는 그림이며 locuszoom이 대표적인 웹사이트이다. http://locuszoom.org/genform.php?type=yourdata 에 분석결과를 웹에 업로드한 후 몇 가지 옵션을 지정하면 아래와 같은 그림을 PDF 파일로 받아볼 수 있다.

Locuszoom 실행결과

Locuszoom 실행결과

아래와 같이 결과 파일을 업로드한 후 P값, SNP의 컬럼을 지정하고 보여줄 화면(SNP, Gene, 특정 region)을 지정하면 된다.
Locuszoom 실행화면

Locuszoom 실행화면

Interactive regional plot

TG 결과의 rs2075290으로 Interactive regional plot을 그려보겠다.

Download recombination information

ftp://ftp.hapmap.org/hapmap/recombination/2008-03_rel22_B36/rates/ 에서 미리 다운받자. 먼저 해당 주소의 파일이름들을 모은다.

library(RCurl)
## Loading required package: bitops
dir <- "/home/secondmath/Dropbox/example"

## your base url
url <- "ftp://ftp.hapmap.org/hapmap/recombination/2008-03_rel22_B36/rates/"

filenames=getURL(url,ftp.use.epsv = FALSE, dirlistonly = TRUE)
filenames
## [1] "genetic_map_chr10_b36.txt\ngenetic_map_chr11_b36.txt\ngenetic_map_chr12_b36.txt\ngenetic_map_chr13_b36.txt\ngenetic_map_chr14_b36.txt\ngenetic_map_chr15_b36.txt\ngenetic_map_chr16_b36.txt\ngenetic_map_chr17_b36.txt\ngenetic_map_chr18_b36.txt\ngenetic_map_chr19_b36.txt\ngenetic_map_chr1_b36.txt\ngenetic_map_chr20_b36.txt\ngenetic_map_chr21_b36.txt\ngenetic_map_chr22_b36.txt\ngenetic_map_chr2_b36.txt\ngenetic_map_chr3_b36.txt\ngenetic_map_chr4_b36.txt\ngenetic_map_chr5_b36.txt\ngenetic_map_chr6_b36.txt\ngenetic_map_chr7_b36.txt\ngenetic_map_chr8_b36.txt\ngenetic_map_chr9_b36.txt\n"
filenames=strsplit(filenames, "\r*\n")[[1]]
filenames
##  [1] "genetic_map_chr10_b36.txt" "genetic_map_chr11_b36.txt"
##  [3] "genetic_map_chr12_b36.txt" "genetic_map_chr13_b36.txt"
##  [5] "genetic_map_chr14_b36.txt" "genetic_map_chr15_b36.txt"
##  [7] "genetic_map_chr16_b36.txt" "genetic_map_chr17_b36.txt"
##  [9] "genetic_map_chr18_b36.txt" "genetic_map_chr19_b36.txt"
## [11] "genetic_map_chr1_b36.txt"  "genetic_map_chr20_b36.txt"
## [13] "genetic_map_chr21_b36.txt" "genetic_map_chr22_b36.txt"
## [15] "genetic_map_chr2_b36.txt"  "genetic_map_chr3_b36.txt" 
## [17] "genetic_map_chr4_b36.txt"  "genetic_map_chr5_b36.txt" 
## [19] "genetic_map_chr6_b36.txt"  "genetic_map_chr7_b36.txt" 
## [21] "genetic_map_chr8_b36.txt"  "genetic_map_chr9_b36.txt"

이제 다운로드한다.

for (i in 1:length(filenames)){
  download.file(glue("{url}{filenames[i]}"),glue("{dir}/recombination/{filenames[i]}"))
}

Setting

데이터와 SNP 정보를 미리 setting한다.

## Directory
dir="/home/jinseob2kim/Dropbox/example"
setwd(dir)

## GWAS result with gene
res.final = fread("ResGWASAddgene.csv")
setkey(res.final,BP)

## SNP info 
SNP.show="rs2075290"
chr.show= res.final[SNP==SNP.show,CHR]
SNP.pos = res.final[SNP==SNP.show,BP]

##  Top SNP +- BP range
range = 400* 1000

Recombination 데이터 불러오기

## chr.show = 11
recomb=fread(glue("{dir}/recombination/genetic_map_chr",{chr.show},"_b36.txt"))
setnames(recomb,"position","BP")
setkey(recomb,BP)

LD 계산

PLINK를 이용하여 rs2075290과 주위 \(\pm 400K\)범위의 SNP들 사이의 \(R^2\)값을 계산하겠다. 먼저 parameter를 설정한다.

bfile=glue("{dir}/Nong_1787")
SNP.show="rs2075290"
range = 400* 1000
out.ld = SNP.show

위의 parameter들을 적용하여 PLINK를 실행하여 \(R^2\)값을 구한다.

system(glue("plink --noweb --bfile {bfile} --r2 --ld-snp {SNP.show} --ld-window-kb {range/1000} --ld-window 99999 --ld-window-r2 0 --out {dir}/{out.ld}"),intern=T)
##  [1] ""                                                                                       
##  [2] "@----------------------------------------------------------@"                           
##  [3] "|        PLINK!       |     v1.07      |   10/Aug/2009     |"                           
##  [4] "|----------------------------------------------------------|"                           
##  [5] "|  (C) 2009 Shaun Purcell, GNU General Public License, v2  |"                           
##  [6] "|----------------------------------------------------------|"                           
##  [7] "|  For documentation, citation & bug-report instructions:  |"                           
##  [8] "|        http://pngu.mgh.harvard.edu/purcell/plink/        |"                           
##  [9] "@----------------------------------------------------------@"                           
## [10] ""                                                                                       
## [11] "Skipping web check... [ --noweb ] "                                                     
## [12] "Writing this text to log file [ /home/jinseob2kim/Dropbox/example/rs2075290.log ]"      
## [13] "Analysis started: Sun Dec  1 12:47:58 2019"                                             
## [14] ""                                                                                       
## [15] "Options in effect:"                                                                     
## [16] "\t--noweb"                                                                              
## [17] "\t--bfile /home/jinseob2kim/Dropbox/example/Nong_1787"                                  
## [18] "\t--r2"                                                                                 
## [19] "\t--ld-snp rs2075290"                                                                   
## [20] "\t--ld-window-kb 400"                                                                   
## [21] "\t--ld-window 99999"                                                                    
## [22] "\t--ld-window-r2 0"                                                                     
## [23] "\t--out /home/jinseob2kim/Dropbox/example/rs2075290"                                    
## [24] ""                                                                                       
## [25] "Reading map (extended format) from [ /home/jinseob2kim/Dropbox/example/Nong_1787.bim ] "
## [26] "139025 markers to be included from [ /home/jinseob2kim/Dropbox/example/Nong_1787.bim ]" 
## [27] "Reading pedigree information from [ /home/jinseob2kim/Dropbox/example/Nong_1787.fam ] " 
## [28] "1787 individuals read from [ /home/jinseob2kim/Dropbox/example/Nong_1787.fam ] "        
## [29] "0 individuals with nonmissing phenotypes"                                               
## [30] "Assuming a disease phenotype (1=unaff, 2=aff, 0=miss)"                                  
## [31] "Missing phenotype value is also -9"                                                     
## [32] "0 cases, 0 controls and 1787 missing"                                                   
## [33] "845 males, 942 females, and 0 of unspecified sex"                                       
## [34] "Reading genotype bitfile from [ /home/jinseob2kim/Dropbox/example/Nong_1787.bed ] "     
## [35] "Detected that binary PED file is v1.00 SNP-major mode"                                  
## [36] "Before frequency and genotyping pruning, there are 139025 SNPs"                         
## [37] "1787 founders and 0 non-founders found"                                                 
## [38] "Total genotyping rate in remaining individuals is 0.999403"                             
## [39] "0 SNPs failed missingness test ( GENO > 1 )"                                            
## [40] "0 SNPs failed frequency test ( MAF < 0 )"                                               
## [41] "After frequency and genotyping pruning, there are 139025 SNPs"                          
## [42] "After filtering, 0 cases, 0 controls and 1787 missing"                                  
## [43] "After filtering, 845 males, 942 females, and 0 of unspecified sex"                      
## [44] "Writing LD statistics to [ /home/jinseob2kim/Dropbox/example/rs2075290.ld ] "           
## [45] ""                                                                                       
## [46] "Analysis finished: Sun Dec  1 12:48:06 2019"                                            
## [47] ""

결과는 rs2075290.ld 파일에 저장했다.

ld = fread(glue("{dir}/{out.ld}.ld"))
datatable(ld, rownames=F, caption = glue("LD calculation: {SNP.show}"))

Merge: GWAS결과 + LD + Recombination 정보

Plot을 위한 최종 데이터를 만들기 위해서 GWAS, LD, recombination 정보를 합치겠다. 먼저 GWAS결과와 LD정보를 합치자.

setnames(ld, "BP_B", "BP")
setkey(ld,BP)

## Left join
res.final.ld = res.final[ld[,c("BP","R2")]]
datatable(res.final.ld, rownames=F, caption="Merge: GWAS & LD calculation")

이 데이터와 recombination 정보를 합치자.

zz=merge(res.final.ld, recomb, all=T)[(BP>=SNP.pos-range) & (BP <= SNP.pos+range)]

Interactive plot: ggiraph 패키지 이용

#devtools::install_github('davidgohel/ggiraph')
library(ggiraph)

mlogp.max = zz[,max(-log10(P),na.rm=T)]
zz[,BP:=BP/10^6]
zz[,tooltip := c(paste0("SNP = ",SNP , "\n BP = ", BP*10^6, "\n Gene = ", symbol, "\n P = ", P ,"\n BETA = ", BETA,"\n R2 = ", R2))]
zzsub= zz[SNP==SNP.show]


kk=ggplot(zz) + geom_point_interactive(aes(BP,-log10(P),tooltip=tooltip, color=R2),size=1.5) + 
  geom_point_interactive(data=zzsub, aes(BP,-log10(P),tooltip=tooltip),colour="red") + 
  geom_line(data=zz[!is.na(get("COMBINED_rate(cM/Mb)")),], aes(BP, get("COMBINED_rate(cM/Mb)")* mlogp.max/100),color="blue") + 
  scale_y_continuous(sec.axis = sec_axis(~ . * 100 / mlogp.max , name = "Recombination fraction (cM/Mb)")) + 
  theme( axis.line.y.right = element_line(color = "blue"), axis.ticks.y.right = element_line(color = "blue"),axis.text.y.right = element_text(color = "blue"),axis.title.y.right = element_text(color = "blue")) + 
  ggtitle(glue("Regional plot: {SNP.show}")) + 
  theme(plot.title = element_text(hjust = 0.5)) +  
  geom_text(data =zzsub, aes(x = zzsub[,BP], y = zzsub[,-log10(P)]+0.4, label = SNP.show)) + 
  xlab(glue("Position on Chr {chr.show} (Mb)")) +
  ylab(expression(-log[10](italic(P)))) +
  labs(colour=expression(italic(R)^2)) + 
  theme(legend.position ="right")
  

ggiraph(code= print(kk),zoom_max = 7, width=1,width_svg=7, height_svg=3.5,hover_css = "cursor:pointer;fill:red;stroke:red;")
## Warning: Removed 927 rows containing missing values (geom_interactive_point).

Copyright ©2016 Jinseob Kim