기술통계 in 의학연구

" />

class: center, middle, inverse, title-slide

.title[
# 기술통계 in 의학연구
]
.author[
### 김진섭; <a href="https://github.com/jinseob2kim"><span><i class="fab  fa-github faa-float animated "></i>jinseob2kim</span></a>
]
.institute[
### 차라투
]

---

layout: true

<div class="my-footer"><span><a href="https://www.zarathu.com">Zarathu Co.,Ltd</a>   
&emsp;&emsp;&emsp;&emsp;&emsp;&emsp;&emsp;&emsp;&emsp;&emsp;&emsp;&emsp;&emsp;
&emsp;&emsp;&emsp;&emsp;&emsp;&emsp;&emsp;&emsp;&emsp;&emsp;&emsp;&emsp;&emsp;
<a href="https://github.com/jinseob2kim">김진섭</a></span></div>

---
# Executive Summary

.large[
- 연속변수의 2그룹 비교: 정규분포 인정하면 **t-test**, 아니면 **Wilcox-test**

- 연속변수의 3그룹 이상 비교: 정규분포 인정하면 **one-way ANOVA**, 아니면 **Kruskal–Wallis one-way ANOVA**

- 범주형 변수의 그룹 비교: 샘플수 충분하면 **Chisq-test**, 아니면 **Fisher-test**

]

---
# 기술통계?

- 평균(mean), 중위수(median), 분산(variance), 빈도표(frequency table)등의 **데이터를 설명하는 숫자**들, 히스토그램(histogram), 상자그림(box-plot)같은 **그래프** 를 의미한다.

- 그러나 대부분의 의학 연구는 **단순한 기술 통계가 아닌 그것들의 그룹 비교**(ex: 성별, 질환 유무)가 **Table 1**에 제시됨.

---
# 연구의 흐름?

.large[
1. **기술 통계**로 데이터를 보여주고

2. **단변량(univariate)** 분석으로 가설검정

3. **다변량(multivariate) or 소그룹(subgroup)** 분석 - 다른 변수들의 효과를 보정.
]

--
단변량 분석에서 끝나는 간단한 연구도 많고, 이는 본질적으로 기술 통계의 그룹 비교와 같음. 따라서 Table 1에 필요한 통계를 알면 그것만으로도 간단한 의학 연구를 할 수 있다.

---
class: center, middle

# 연속 변수의 그룹 비교

---
# 2그룹: t-test

.large[
2 그룹 **평균** 비교

- 각 그룹의 평균과 표준편차 필요.

https://www.evanmiller.org/ab-testing/t-test.html 에서 바로 가능.
]

---

<div id="htmlwidget-ce19871d0f7592c64136" style="width:100%;height:auto;" class="datatables html-widget"></div>
<script type="application/json" data-for="htmlwidget-ce19871d0f7592c64136">{"x":{"filter":"none","vertical":false,"caption":"<caption>data.t: sex and total cholesterol<\/caption>","data":[["person 1","person 2","person 3","person 4","person 5","person 6","person 7","person 8","person 9","person 10","person 11","person 12","person 13","person 14","person 15","person 16","person 17","person 18","person 19","person 20","person 21","person 22","person 23","person 24","person 25","person 26","person 27","person 28","person 29","person 30"],["Female","Female","Female","Female","Female","Male","Female","Female","Male","Female","Male","Male","Female","Male","Male","Male","Female","Female","Male","Male","Female","Female","Female","Male","Female","Male","Male","Male","Female","Male"],[185,181,156,126,140,189,145,139,146,185,158,124,147,159,171,121,142,167,130,170,94,182,149,142,174,114,136,149,120,128]],"container":"<table class=\"display\">\n  <thead>\n    <tr>\n      <th> <\/th>\n      <th>sex<\/th>\n      <th>tChol<\/th>\n    <\/tr>\n  <\/thead>\n<\/table>","options":{"columnDefs":[{"className":"dt-right","targets":2},{"orderable":false,"targets":0}],"order":[],"autoWidth":false,"orderClasses":false}},"evals":[],"jsHooks":[]}</script>

---

```r
nev.ttest <- t.test(tChol ~ sex, data = data.t, var.equal = F)
nev.ttest
```

```
## 
## 	Welch Two Sample t-test
## 
## data:  tChol by sex
## t = 0.74283, df = 27.957, p-value = 0.4638
## alternative hypothesis: true difference in means between group Female and group Male is not equal to 0
## 95 percent confidence interval:
##  -11.42555  24.42555
## sample estimates:
## mean in group Female   mean in group Male 
##                152.0                145.5
```

.large[
여자의 평균 콜레스테롤 값은 152, 남자는 145.5 이고 `$p$`-value는 0.464

]

- `var.equal = F`는 등분산 가정 없이 분석하겠다는 뜻. 등분산 가정이란 두 그룹의 분산이 같다고 가정하는 것인데, 계산이 좀 더 쉽다는 이점이 있으나 아무 근거 없이 분산이 같다고 가정하는 것은 위험한 가정이다.

---

```r
ev.ttest <- t.test(tChol ~ sex, data = data.t, var.equal = T)
ev.ttest
```

```
## 
## 	Two Sample t-test
## 
## data:  tChol by sex
## t = 0.73386, df = 28, p-value = 0.4691
## alternative hypothesis: true difference in means between group Female and group Male is not equal to 0
## 95 percent confidence interval:
##  -11.64327  24.64327
## sample estimates:
## mean in group Female   mean in group Male 
##                152.0                145.5
```

앞서는 **Welch t-test** 였는데 이름이 바뀐 것을 확인할 수 있고 `$p$`-value도 0.469로 다름. 
- 특별한 경우가 아니고서야 위험한 등분산가정을 할 필요가 없음.

- **R** 에서 디폴트도 `F` 임.

- 2그룹에서 ANOVA 하면 (등분산 가정한) t-test와 동일.

: 등분산 가정없는 ANOVA 도 있음.

---

```r
library(ggpubr)
ggarrange(
  ggboxplot(data.t, "sex", "tChol", fill = "sex"),
  ggbarplot(data.t, "sex", "tChol", fill = "sex", add = "mean_sd")
)
```

---

```r
ggboxplot(data.t, "sex", "tChol", fill = "sex", add = "dotplot") + 
  stat_compare_means(method = "t.test", method.args = list(var.equal = F))
```

---

```r
ggviolin(data.t, "sex", "tChol", fill = "sex", add = "boxplot") + 
   stat_compare_means(method = "t.test", method.args = list(var.equal = T), label.y = 250)
```

---
# 2그룹: wilcox-test

.large[

정규분포 못믿을때: 2 그룹 **중위값** 비교

- 값 자체가 아닌 순위정보만 이용 - **비모수검정**

- **Median [IQR(25%-75% quantile)]** 로 표시
]

--
.large[
- 정규분포 따르는지 아닌지 테스트하는 정규성검정은 불필요.

- 임상적으로 판단하는것이 좋다. 키나 몸무게는 정규분포, CRP나 자녀수는 정규분포 안함.
]

- [정규분포 참고자료](https://jinseob2kim.github.io/Normal_distribution.html)

---

```r
res.wilcox <- wilcox.test(tChol ~ sex, data = data.t)
res.wilcox
```

```
## 
## 	Wilcoxon rank sum test with continuity correction
## 
## data:  tChol by sex
## W = 131, p-value = 0.4417
## alternative hypothesis: true location shift is not equal to 0
```

---

```r
ggboxplot(data.t, "sex", "tChol", fill = "sex") + 
  stat_compare_means(method = "wilcox.test")
```

---
# 3그룹 이상: One-way ANOVA

.large[
3 그룹 이상의 평균 비교

- 2그룹씩 짝을 지어서 **t-test**를 반복할 수도 있으나, Table 1은 보통 하나의 `$P$`-value만 제시함.

- 전체적으로 **튀는 것이 하나라도 있는가?**를 테스트하는 **ANOVA**.

- 어떤 그룹이 차이나는지는 관심없음.

]

--
- 사후(post-hoc) 분석을 이용, 어떤 것이 튀는지를 알아볼 수도 있다.

- 보통 우리가 쓰는 **ANOVA**는 비교할 모든 그룹에서 분산이 같다는 등분산 가정 하에 분석을 수행하며, 실제로 2 그룹일 때 **ANOVA**를 수행하면 등분산 가정 하에 수행한 **t-test**와 동일한 결과를 얻는다. 물론 등분산 가정없는 generalized ANOVA 도 있고 본사는 이것을 디폴트로 사용한다.

---

<div id="htmlwidget-36f0aba97100ab870d43" style="width:100%;height:auto;" class="datatables html-widget"></div>
<script type="application/json" data-for="htmlwidget-36f0aba97100ab870d43">{"x":{"filter":"none","vertical":false,"caption":"<caption>data.t: group and total cholesterol<\/caption>","data":[["person 1","person 2","person 3","person 4","person 5","person 6","person 7","person 8","person 9","person 10","person 11","person 12","person 13","person 14","person 15","person 16","person 17","person 18","person 19","person 20","person 21","person 22","person 23","person 24","person 25","person 26","person 27","person 28","person 29","person 30"],["C","C","C","B","C","B","B","B","C","A","B","B","A","B","C","A","C","C","A","A","A","A","C","B","C","B","A","B","C","B"],[148,126,128,144,140,117,147,182,146,115,125,171,140,111,132,146,177,145,160,53,127,159,113,163,174,145,187,122,162,162]],"container":"<table class=\"display\">\n  <thead>\n    <tr>\n      <th> <\/th>\n      <th>group<\/th>\n      <th>tChol<\/th>\n    <\/tr>\n  <\/thead>\n<\/table>","options":{"columnDefs":[{"className":"dt-right","targets":2},{"orderable":false,"targets":0}],"order":[],"autoWidth":false,"orderClasses":false}},"evals":[],"jsHooks":[]}</script>

---

```r
res.aov1 <- oneway.test(tChol ~ group, data = data.aov, var.equal = F)
res.aov1
```

```
## 
## 	One-way analysis of means (not assuming equal variances)
## 
## data:  tChol and group
## F = 0.16299, num df = 2.000, denom df = 14.992, p-value = 0.8511
```

```r
res.aov2 <- oneway.test(tChol ~ group, data = data.aov, var.equal = T)
res.aov2
```

```
## 
## 	One-way analysis of means
## 
## data:  tChol and group
## F = 0.28648, num df = 2, denom df = 27, p-value = 0.7532
```

.large[
> 등분산 가정없는 `$p$`-value인 0.851를 이용하며, 의미는 "**3 그룹에서 총콜레스테롤 값이 비슷하다**(다른 것이 있다고 할 수 없다)" 이다.
]

---

```r
ggboxplot(data.aov, "group", "tChol", fill = "group", order = c("A", "B", "C")) + 
  stat_compare_means(method = "anova")
```

---

```r
ggboxplot(data.aov, "group", "tChol", fill = "group", order = c("A", "B", "C")) + 
  stat_compare_means(method = "anova", label.y = 250) + 
  stat_compare_means(method = "t.test", comparisons = list(c("A", "B"), c("B", "C"), c("C", "A")))
```

---
# 3그룹이상: Kruskal–Wallis ANOVA

.large[

정규분포 못믿을때: 3 그룹 **중위값** 비교

- 값 자체가 아닌 순위정보만 이용 - **비모수검정**

- **Median [IQR(25%-75% quantile)]** 로 표시
]

---

```r
res.kruskal <- kruskal.test(tChol ~ group, data = data.aov)
res.kruskal
```

```
## 
## 	Kruskal-Wallis rank sum test
## 
## data:  tChol by group
## Kruskal-Wallis chi-squared = 0.11726, df = 2, p-value = 0.9431
```

.large[
> `$p$`-value는 0.943. 
]

---

```r
ggboxplot(data.aov, "group", "tChol", fill = "group", order = c("A", "B", "C")) + 
  stat_compare_means(method = "kruskal.test")
```

---
class: center, middle

# 범주형 변수의 그룹 비교

### 그룹 수나 정규분포를 고려할 필요가 없어 연속 변수일 때보다 훨씬 간단하며 딱 하나, **샘플 수**가 충분한지만 확인하면 된다.

---
# 샘플 수 충분: Chi-square test

.large[
**Chi-square test**는 두 범주형 변수가 관계가 있는지 없는지를 파악하는 테스트.

]

- 세 범주형 변수일 때도 이용할 수 있으나 본 강의에서는 생략한다.

---

<div id="htmlwidget-f097dfe9eb00744fc7da" style="width:100%;height:auto;" class="datatables html-widget"></div>
<script type="application/json" data-for="htmlwidget-f097dfe9eb00744fc7da">{"x":{"filter":"none","vertical":false,"caption":"<caption>data.chi: HTN & DM medication<\/caption>","data":[["person 1","person 2","person 3","person 4","person 5","person 6","person 7","person 8","person 9","person 10","person 11","person 12","person 13","person 14","person 15","person 16","person 17","person 18","person 19","person 20","person 21","person 22","person 23","person 24","person 25","person 26","person 27","person 28","person 29","person 30","person 31","person 32","person 33","person 34","person 35","person 36","person 37","person 38","person 39","person 40","person 41","person 42","person 43","person 44","person 45","person 46","person 47","person 48","person 49","person 50"],[1,0,0,0,1,1,0,0,0,0,0,0,0,1,1,0,1,0,0,0,1,0,0,0,0,1,1,0,0,1,0,1,0,0,0,1,1,0,0,1,1,1,1,1,0,1,1,1,0,1],[0,0,1,1,0,0,0,0,1,1,0,0,1,1,0,1,0,0,0,0,1,0,1,0,0,0,0,0,1,0,0,1,1,0,0,1,1,1,1,0,1,1,0,0,1,0,0,1,1,0]],"container":"<table class=\"display\">\n  <thead>\n    <tr>\n      <th> <\/th>\n      <th>HTN_medi<\/th>\n      <th>DM_medi<\/th>\n    <\/tr>\n  <\/thead>\n<\/table>","options":{"columnDefs":[{"className":"dt-right","targets":[1,2]},{"orderable":false,"targets":0}],"order":[],"autoWidth":false,"orderClasses":false}},"evals":[],"jsHooks":[]}</script>

---

```r
tb.chi <- table(data.chi)
tb.chi
```

```
##         DM_medi
## HTN_medi  0  1
##        0 15 13
##        1 14  8
```

언뜻 봐서는 관계가 있는지 아닌지 잘 모르겠다. **Chi-square test**를 해보자.

```r
res.chi <- chisq.test(tb.chi)
res.chi
```

```
## 
## 	Pearson's Chi-squared test with Yates' continuity correction
## 
## data:  tb.chi
## X-squared = 0.18246, df = 1, p-value = 0.6693
```

> `$p$`-value는 0.669, **혈압약 복용과 당뇨약 복용은 유의한 관계가 없다**고 말할 수 있다.

---
# 샘플 수 부족: Fisher's exact test

.large[
분석할 테이블에서 **샘플 수가 너무 작은 항이 있으면** **chi-square test** 가 부정확해진다. 이 때는 **fisher's exact test**를 쓴다.

- 현 상황보다 더 극한 상황이 나올 확률을 [직접 계산](https://techntalk.tistory.com/entry/%ED%94%BC%EC%85%94%EC%9D%98-%EC%A0%95%ED%99%95%EB%8F%84-%EA%B2%80%EC%A0%95Fishers-Exact-Probability-Test)함.  
]

---

<div id="htmlwidget-a0416e37bd34686af8c0" style="width:100%;height:auto;" class="datatables html-widget"></div>
<script type="application/json" data-for="htmlwidget-a0416e37bd34686af8c0">{"x":{"filter":"none","vertical":false,"caption":"<caption>data.fisher: HTN & DM medication<\/caption>","data":[["person 1","person 2","person 3","person 4","person 5","person 6","person 7","person 8","person 9","person 10","person 11","person 12","person 13","person 14","person 15","person 16","person 17","person 18","person 19","person 20","person 21","person 22","person 23","person 24","person 25","person 26","person 27","person 28","person 29","person 30","person 31","person 32","person 33","person 34","person 35","person 36","person 37","person 38","person 39","person 40","person 41","person 42","person 43","person 44","person 45","person 46","person 47","person 48","person 49","person 50"],[1,0,0,0,1,1,0,0,0,0,0,0,0,0,1,0,0,0,0,0,1,0,0,0,0,1,0,0,0,1,0,0,0,0,0,1,0,0,0,0,0,1,0,0,0,0,1,0,0,1],[0,0,0,1,0,0,0,0,1,0,0,0,0,1,0,1,0,0,0,0,0,0,0,0,0,0,0,0,1,0,0,0,0,0,0,1,1,1,1,0,0,1,0,0,0,0,0,0,0,0]],"container":"<table class=\"display\">\n  <thead>\n    <tr>\n      <th> <\/th>\n      <th>HTN_medi<\/th>\n      <th>DM_medi<\/th>\n    <\/tr>\n  <\/thead>\n<\/table>","options":{"columnDefs":[{"className":"dt-right","targets":[1,2]},{"orderable":false,"targets":0}],"order":[],"autoWidth":false,"orderClasses":false}},"evals":[],"jsHooks":[]}</script>

---

아까와 마찬가지로 테이블로 두 약물 복용상태를 비교하면 아래와 같다.

```r
tb.fisher <- table(data.fisher)
tb.fisher
```

```
##         DM_medi
## HTN_medi  0  1
##        0 31  8
##        1  9  2
```

--
혈압약과 당뇨약을 모두 복용한 사람이 2명으로 좀 작아보이지만 무시하고 **chi-square test**를 수행하면 결과는 나오나 `Warning` 메시지가 뜬다.

```r
chisq.test(tb.fisher)
```

```
## Warning in chisq.test(tb.fisher): Chi-squared approximation may be incorrect
```

```
## 
## 	Pearson's Chi-squared test with Yates' continuity correction
## 
## data:  tb.fisher
## X-squared = 4.5971e-31, df = 1, p-value = 1
```

**두 약을 모두 복용한 사람이 2명뿐**, 분석할 테이블에서 샘플 수가 너무 작은 항이 있으면 **chi-square test**의 계산이 부정확해진다.

---

**fisher's exact test** 를 해 보자.

```r
res.fisher <- fisher.test(tb.fisher)
res.fisher
```

```
## 
## 	Fisher's Exact Test for Count Data
## 
## data:  tb.fisher
## p-value = 1
## alternative hypothesis: true odds ratio is not equal to 1
## 95 percent confidence interval:
##  0.07627205 5.55561549
## sample estimates:
## odds ratio 
##  0.8636115
```

.large[
> `$p$`-value는 1, 마찬가지로 혈압약 복용과 당뇨약 복용은 유의한 관계가 없다고 할 수 있다. 
]

--
의문: **무조건 fisher's test만 하면 간단한데** **chi-square test**는 왜 하나?

- 샘플 수가 작을 때는 **fisher's test**만 하는 것이 실제로 더 간단하고 방법론적으로도 아무 문제가 없다. 그러나 샘플 수나 그룹 수가 늘어날수록 **fisher's test**는 계산량이 급격하게 증가한다. **chi-square test**를 먼저 수행하는 것을 권유한다.

---
class: center, middle

# 연속변수의 짝지은 그룹비교

### Paired t-test

---
# 2그룹 : Paired t-test

각 사람의 혈압을 한 번은 사람이 직접, 한 번은 자동혈압계로 측정했다고 하자. 이 때 직접 잰 혈압과 자동혈압계의 측정값을 비교한다면 **t-test**로 충분할까?

- t-test는 혈압 재는 방법마다 **평균을 먼저 구한 후** 그것이 같은지를 테스트하므로  **짝지은 정보를 활용하지 못한다.**

- 각 사람마다 두 혈압값의 **차이를 먼저 구한 후** 평균이 0인지를 테스트하면, **짝지은 정보를 활용**하면서 계산도 더 간단한 방법이 된다.

--
.large[
- **차이값을 먼저 구한 후** 그 평균이 0인지를 테스트하는 방법이 **paired t-test**
]

---
<div id="htmlwidget-0ffd69b92e1a57914fb9" style="width:100%;height:auto;" class="datatables html-widget"></div>
<script type="application/json" data-for="htmlwidget-0ffd69b92e1a57914fb9">{"x":{"filter":"none","vertical":false,"caption":"<caption>data.pt: systolic blood pressure measured by hand & machine<\/caption>","data":[["person 1","person 2","person 3","person 4","person 5","person 6","person 7","person 8","person 9","person 10","person 11","person 12","person 13","person 14","person 15","person 16","person 17","person 18","person 19","person 20","person 21","person 22","person 23","person 24","person 25","person 26","person 27","person 28","person 29","person 30"],[128,128,120,119,124,130,127,131,127,120,119,131,129,129,125,129,126,132,123,111,125,125,119,112,129,126,125,128,125,128],[128,119,119,126,118,124,128,129,117,130,123,128,128,133,127,126,127,119,122,125,124,127,128,126,127,123,129,123,142,123]],"container":"<table class=\"display\">\n  <thead>\n    <tr>\n      <th> <\/th>\n      <th>SBP_hand<\/th>\n      <th>SBP_machine<\/th>\n    <\/tr>\n  <\/thead>\n<\/table>","options":{"columnDefs":[{"className":"dt-right","targets":[1,2]},{"orderable":false,"targets":0}],"order":[],"autoWidth":false,"orderClasses":false}},"evals":[],"jsHooks":[]}</script>

---
그냥 t-test 로 비교해보자.

```r
pt.ttest <- t.test(data.pt$SBP_hand, data.pt$SBP_machine)
pt.ttest
```

```
## 
## 	Welch Two Sample t-test
## 
## data:  data.pt$SBP_hand and data.pt$SBP_machine
## t = -0.45768, df = 57.863, p-value = 0.6489
## alternative hypothesis: true difference in means is not equal to 0
## 95 percent confidence interval:
##  -3.224307  2.024307
## sample estimates:
## mean of x mean of y 
##     125.0     125.6
```

각 방법의 **평균을 먼저 구한 후 차이를 비교** 했고, `$p$`-value는 0.649이다. 이제 **paired t-test**를 수행하자.

---

```r
pt.ttest.pair <- t.test(data.pt$SBP_hand, data.pt$SBP_machine, paired = TRUE)
pt.ttest.pair
```

```
## 
## 	Paired t-test
## 
## data:  data.pt$SBP_hand and data.pt$SBP_machine
## t = -0.46171, df = 29, p-value = 0.6477
## alternative hypothesis: true mean difference is not equal to 0
## 95 percent confidence interval:
##  -3.257804  2.057804
## sample estimates:
## mean difference 
##            -0.6
```

이번에는 **사람마다 차이값을 먼저 구한 후 그것이 0인지 테스트** 했고, `$p$`-value는 0.648이다.

---

```r
ggpaired(data.pt, cond1 = "SBP_hand", cond2 = "SBP_machine", fill = "condition", palette = "jco") + 
  stat_compare_means(method = "t.test", paired = T)
```

---
**Paired t-test**의 비모수버전은 **wilcoxon-signed rank test** 이며 아래와 같이 실행한다.

```r
pt.wilcox.pair <- wilcox.test(data.pt$SBP_hand, data.pt$SBP_machine, paired = T)
pt.wilcox.pair
```

```
## 
## 	Wilcoxon signed rank test with continuity correction
## 
## data:  data.pt$SBP_hand and data.pt$SBP_machine
## V = 214, p-value = 0.9482
## alternative hypothesis: true location shift is not equal to 0
```

- 본 강의에서는 다루지 않겠지만 짝지은 3개 이상의 그룹은 [**repeated measure ANOVA**](https://statistics.laerd.com/statistical-guides/repeated-measures-anova-statistical-guide.php) 를 이용한다.

---

```r
ggpaired(data.pt, cond1 = "SBP_hand", cond2 = "SBP_machine", fill = "condition", palette = "jco") + 
  stat_compare_means(method = "wilcox.test", paired = T)
```

---
class: center, middle

# 범주형 변수의 짝지은 그룹비교

### Mcnemar test, Symmetry test for a paired contingency table

---
# 2그룹: Mcnemar test

.large[
약 복용 전후로 복통증상 발생에 차이가 있는지 알아본다고 하자.
]

---

<div id="htmlwidget-c95e5874afb5d04a4ac7" style="width:100%;height:auto;" class="datatables html-widget"></div>
<script type="application/json" data-for="htmlwidget-c95e5874afb5d04a4ac7">{"x":{"filter":"none","vertical":false,"caption":"<caption>data.mc: Abdominal pain - before & after<\/caption>","data":[["person 1","person 2","person 3","person 4","person 5","person 6","person 7","person 8","person 9","person 10","person 11","person 12","person 13","person 14","person 15","person 16","person 17","person 18","person 19","person 20","person 21","person 22","person 23","person 24","person 25","person 26","person 27","person 28","person 29","person 30"],[0,1,0,0,0,1,0,0,0,0,1,1,0,1,0,1,1,0,0,0,1,1,1,0,1,1,0,1,0,1],[0,1,1,0,1,0,0,0,0,1,0,0,0,1,1,0,1,0,1,1,1,0,0,1,0,0,0,1,1,0]],"container":"<table class=\"display\">\n  <thead>\n    <tr>\n      <th> <\/th>\n      <th>Pain_before<\/th>\n      <th>Pain_after<\/th>\n    <\/tr>\n  <\/thead>\n<\/table>","options":{"columnDefs":[{"className":"dt-right","targets":[1,2]},{"orderable":false,"targets":0}],"order":[],"autoWidth":false,"orderClasses":false}},"evals":[],"jsHooks":[]}</script>

---

이 데이터를 2 by 2 테이블로 정리하면 아래와 같다.

```r
table.mc <- table(data.mc)
table.mc
```

```
##            Pain_after
## Pain_before 0 1
##           0 8 8
##           1 9 5
```

--
먼저 앞서 배운 **Chi-sqaure test** 를 이용한 결과를 보자.

```r
mc.chi <- chisq.test(table.mc)
mc.chi
```

```
## 
## 	Pearson's Chi-squared test with Yates' continuity correction
## 
## data:  table.mc
## X-squared = 0.17514, df = 1, p-value = 0.6756
```

이것은 약 복용 전 복통 증상과 복용 후의 복통 증상이 얼마나 관계가 있는지 알아보는 테스트로 **짝지은 정보를 활용하지 않는다**.

---
짝지은 정보를 활용하는 **mcnemar test**를 수행하자.

```r
mc.mcnemar <- mcnemar.test(table.mc)
mc.mcnemar
```

```
## 
## 	McNemar's Chi-squared test with continuity correction
## 
## data:  table.mc
## McNemar's chi-squared = 0, df = 1, p-value = 1
```

.large[
> **Mcnemar test**는 약 복용후 증상발생이 달라진 사람 즉, **discordant pair**만 분석에 이용한다. 따라서 **condordant pair** 가 어떻든지 통계결과는 동일하게 나온다.
]

---
# 3그룹: symmetry test

.large[
Symmetry test for a paired contingency table*

- Mcnemar test의 일반화, 3그룹 이상에서도 가능. 
]

http://rcompanion.org/handbook/H_05.html

```
##              Yes.after No.after Maybe.after
## Yes.before           6        0           1
## No.before            5        3           7
## Maybe.before        11        1          12
```

---

`nominalSymmetryTest` 함수를 쓴다.

```r
## Symmetry test
nominalSymmetryTest(AndersonRainGarden)
```

```
## $Global.test.for.symmetry
##   Dimensions  p.value
## 1      3 x 3 0.000476
## 
## $Pairwise.symmetry.tests
##                                        Comparison p.value p.adjust
## 1       Yes.before/Yes.after : No.before/No.after  0.0736   0.0771
## 2 Yes.before/Yes.after : Maybe.before/Maybe.after 0.00937   0.0281
## 3   No.before/No.after : Maybe.before/Maybe.after  0.0771   0.0771
## 
## $p.adjustment
##   Method
## 1    fdr
## 
## $statistical.method
##         Method
## 1 McNemar test
```

---
class: center, middle

# 실습

---
# [웹 애플리케이션](https://app.zarathu.com)

.large[
직접 만든 기초통계 웹: https://app.zarathu.com/basic.

- 5메가 이하의 **excel**, **csv** 형태 혹은 **sas**, **spss** 프로그램으로 만든 데이터를 업로드하면 Table 1과 회귀분석, 로지스틱 회귀분석을 간단하게 수행하고 결과를 **excel**로 바로 다운받을 수 있다. 
]

---
<img src="https://blog.zarathu.com/posts/2018-11-24-basic-biostatistics/app.gif" style="display: block; margin: auto;" />

---
# [Rstudio Addins](https://rstudio.github.io/rstudioaddins/)

5메가보다 큰 데이터는 [**jsmodule** R package](https://github.com/jinseob2kim/jsmodule) 를 개인컴퓨터에 설치하여 이용한다.

```r
install.packages("jsmodule")
```

패키지를 설치한 후 **[Rstudio](https://www.rstudio.com/)** 프로그램의 **Addins** 탭을 누르면 **Basic statistics** 항목이 보일 것이다. 데이터를 읽고 그것의 이름을 드래그 한 상태로 **Basic statistics** 를 누르면 된다.

---
<img src="https://blog.zarathu.com/posts/2018-11-24-basic-biostatistics/addin.gif" style="display: block; margin: auto;" />

---
# Executive Summary

.large[
- 연속변수의 2그룹 비교: 정규분포 인정하면 **t-test**, 아니면 **Wilcox-test**

- 연속변수의 3그룹 이상 비교: 정규분포 인정하면 **one-way ANOVA**, 아니면 **Kruskal–Wallis one-way ANOVA**

- 범주형 변수의 그룹 비교: 샘플수 충분하면 **Chisq-test**, 아니면 **Fisher-test**

본사가 개발한 [웹](www.app.zarathu.com) 과 [R 패키지](https://CRAN.R-project.org/package=jsmodule) 에서 바로 Table 1 을 얻을 수 있다.
]

- 강의록은 https://blog.zarathu.com/posts/2018-11-24-basic-biostatistics 참고

---
class: center, middle

# END