class: center middle main-title section-title-1 # Missing Data .class-info[ **Session 18** .light[STA 379/679: Causal Inference <br> Lucy D'Agostino McGowan ] ] --- class: title title-1 # Visualizing missingness .left-code[ ```r library(visdat) vis_dat(nhanes) ``` ] .right-plot[ <img src="18-missing-data_files/figure-html/unnamed-chunk-2-1.png" width="504" style="display: block; margin: auto;" /> ] --- class: title title-1 #
Application Exercise 1. `install.packages("visdat")` 2. Load the causaldata package and use the `vis_dat()` function to examine the `nhefs` data frame 3. Fit a linear model predicting `pregnancies` from `birthcontrol` 4. Knit, commit, push to GitHub
05
:
00
--- class: title title-1 # Types of Missing Data .pull-left-3[ .box-inv-1[Missing completely at random] .box-1[The probability a variable is missing is the same for all observations] ] -- .pull-middle-3[ .box-inv-1[Missing at random] .box-1[The probability a variable is missing depends only on available information (the other variables)] ] -- .pull-right-3[ .box-inv-1[Missing not at random] .box-1[The probability a variable is missing depends on external factors] ] --- class: title title-1 # Types of Missing Data .pull-left-3[ .box-2[Missing completely at random] .box-2[The probability a variable is missing is the same for all observations] ] .pull-middle-3[ .box-inv-1[Missing at random] .box-1[The probability a variable is missing depends only on available information (the other variables)] ] .pull-right-3[ .box-2[Missing not at random] .box-2[The probability a variable is missing depends on external factors] ] --- class: title title-1 # Imputation .box-1[Impute the mean/median or most common value for all missing variable] -- .box-1[Use the observed variables to create a *model* to predict what the missing value would be] -- .box-1[Use the observed variables to create many *models* to predict what the missing value would be (multiple imputation)] --- class: title title-1 # Predictive mean matching .box-inv-1[Variable A has some missing values] -- .box-1[Calculates the predicted value of A for all observations using the other measured variables] -- .box-1[For observation i, finds the set of observations that have the closest predicted value for variable A.] -- .box-1[Randomly select one of the non-missing values of A from this set for the missing observation] --- class: title title-1 # Example .left-code[ ```r library(visdat) vis_dat(nhanes) ``` ] .right-plot[ <img src="18-missing-data_files/figure-html/unnamed-chunk-4-1.png" width="504" style="display: block; margin: auto;" /> ] --- class: title title-1 # Predictive mean matching ```r library(mice) set.seed(1) nhanes_imp <- mice(nhanes, m = 1, method = "pmm") ``` --- class: title title-1 # Predictive mean matching ```r *library(mice) set.seed(1) nhanes_imp <- mice(nhanes, m = 1, method = "pmm") ``` --- class: title title-1 # Predictive mean matching ```r library(mice) set.seed(1) *nhanes_imp <- mice(nhanes, m = 1, method = "pmm") ``` ``` ## ## iter imp variable ## 1 1 bmi hyp chl ## 2 1 bmi hyp chl ## 3 1 bmi hyp chl ## 4 1 bmi hyp chl ## 5 1 bmi hyp chl ``` --- class: title title-1 # Predictive mean matching .pull-left[ ```r nhanes_imp$imp$bmi ``` ``` ## 1 ## 1 29.6 ## 3 27.2 ## 4 20.4 ## 6 25.5 ## 10 21.7 ## 11 22.0 ## 12 22.7 ## 16 26.3 ## 21 30.1 ``` ] .pull-right[ ```r nhanes %>% filter(bmi == 29.6) ``` ``` ## age bmi hyp chl ## 15 1 29.6 1 NA ``` ] --- class: title title-1 # Predictive mean matching ```r complete(nhanes_imp) ``` ``` ## age bmi hyp chl ## 1 1 29.6 1 131 ## 2 2 22.7 1 187 ## 3 1 27.2 1 187 ## 4 3 20.4 2 204 ## 5 1 20.4 1 113 ## 6 3 25.5 2 184 ## 7 1 22.5 1 118 ## 8 1 30.1 1 187 ## 9 2 22.0 1 238 ## 10 2 21.7 2 131 ## 11 1 22.0 1 187 ## 12 2 22.7 1 187 ## 13 3 21.7 1 206 ## 14 2 28.7 2 204 ## 15 1 29.6 1 187 ## 16 1 26.3 1 187 ## 17 3 27.2 2 284 ## 18 2 26.3 2 199 ## 19 1 35.3 1 218 ## 20 3 25.5 2 218 ## 21 1 30.1 1 187 ## 22 1 33.2 1 229 ## 23 1 27.5 1 131 ## 24 3 24.9 1 284 ## 25 2 27.4 1 186 ``` --- class: title title-1 #
Application Exercise 1. `install.packages("mice")` 2. Use the `mice()` function to create a single imputed dataset of the `nhefs` data using predictive mean matching 3. Fit a linear model predicting `pregancies` from `birthcontrol` on the imputed dataset 4. Explore the help files for `?mice`. What types of univariate imputation methods are included?
08
:
00
--- class: title title-1 # Multiple imputation ```r set.seed(1) nhanes_imp <- mice(nhanes, m = 5, method = "pmm") ``` --- class: title title-1 # Multiple imputation ```r set.seed(1) nhanes_imp <- mice(nhanes, * m = 5, method = "pmm") ``` --- class: title title-1 # Multiple imputation ```r nhanes_imp$imp$bmi ``` ``` ## 1 2 3 4 5 ## 1 27.2 35.3 27.2 29.6 24.9 ## 3 28.7 27.2 35.3 30.1 29.6 ## 4 25.5 20.4 22.0 20.4 25.5 ## 6 24.9 21.7 22.7 24.9 27.4 ## 10 28.7 22.7 20.4 20.4 22.7 ## 11 30.1 29.6 35.3 22.7 29.6 ## 12 22.0 27.2 26.3 27.4 22.5 ## 16 22.0 27.2 35.3 22.7 24.9 ## 21 26.3 29.6 27.2 29.6 27.2 ``` --- class: title title-1 # Multiple imputation ```r complete(nhanes_imp, 1) ``` ``` ## age bmi hyp chl ## 1 1 27.2 1 199 ## 2 2 22.7 1 187 ## 3 1 28.7 1 187 ## 4 3 25.5 1 218 ## 5 1 20.4 1 113 ## 6 3 24.9 1 184 ## 7 1 22.5 1 118 ## 8 1 30.1 1 187 ## 9 2 22.0 1 238 ## 10 2 28.7 1 206 ## 11 1 30.1 1 199 ## 12 2 22.0 1 187 ## 13 3 21.7 1 206 ## 14 2 28.7 2 204 ## 15 1 29.6 1 206 ## 16 1 22.0 1 187 ## 17 3 27.2 2 284 ## 18 2 26.3 2 199 ## 19 1 35.3 1 218 ## 20 3 25.5 2 199 ## 21 1 26.3 1 229 ## 22 1 33.2 1 229 ## 23 1 27.5 1 131 ## 24 3 24.9 1 206 ## 25 2 27.4 1 186 ``` --- class: title title-1 # Multiple imputation ```r complete(nhanes_imp, 2) ``` ``` ## age bmi hyp chl ## 1 1 35.3 1 284 ## 2 2 22.7 1 187 ## 3 1 27.2 1 187 ## 4 3 20.4 1 118 ## 5 1 20.4 1 113 ## 6 3 21.7 2 184 ## 7 1 22.5 1 118 ## 8 1 30.1 1 187 ## 9 2 22.0 1 238 ## 10 2 22.7 1 187 ## 11 1 29.6 1 206 ## 12 2 27.2 1 199 ## 13 3 21.7 1 206 ## 14 2 28.7 2 204 ## 15 1 29.6 1 186 ## 16 1 27.2 1 184 ## 17 3 27.2 2 284 ## 18 2 26.3 2 199 ## 19 1 35.3 1 218 ## 20 3 25.5 2 218 ## 21 1 29.6 1 204 ## 22 1 33.2 1 229 ## 23 1 27.5 1 131 ## 24 3 24.9 1 238 ## 25 2 27.4 1 186 ``` --- class: title title-1 # Multiple imputation ```r complete(nhanes_imp, 3) ``` ``` ## age bmi hyp chl ## 1 1 27.2 1 187 ## 2 2 22.7 1 187 ## 3 1 35.3 1 187 ## 4 3 22.0 1 184 ## 5 1 20.4 1 113 ## 6 3 22.7 2 184 ## 7 1 22.5 1 118 ## 8 1 30.1 1 187 ## 9 2 22.0 1 238 ## 10 2 20.4 1 187 ## 11 1 35.3 2 184 ## 12 2 26.3 1 206 ## 13 3 21.7 1 206 ## 14 2 28.7 2 204 ## 15 1 29.6 1 229 ## 16 1 35.3 2 187 ## 17 3 27.2 2 284 ## 18 2 26.3 2 199 ## 19 1 35.3 1 218 ## 20 3 25.5 2 206 ## 21 1 27.2 1 131 ## 22 1 33.2 1 229 ## 23 1 27.5 1 131 ## 24 3 24.9 1 284 ## 25 2 27.4 1 186 ``` --- class: title title-1 # Multiple imputation ```r complete(nhanes_imp, 4) ``` ``` ## age bmi hyp chl ## 1 1 29.6 1 187 ## 2 2 22.7 1 187 ## 3 1 30.1 1 187 ## 4 3 20.4 1 187 ## 5 1 20.4 1 113 ## 6 3 24.9 1 184 ## 7 1 22.5 1 118 ## 8 1 30.1 1 187 ## 9 2 22.0 1 238 ## 10 2 20.4 1 238 ## 11 1 22.7 1 131 ## 12 2 27.4 2 184 ## 13 3 21.7 1 206 ## 14 2 28.7 2 204 ## 15 1 29.6 1 229 ## 16 1 22.7 1 113 ## 17 3 27.2 2 284 ## 18 2 26.3 2 199 ## 19 1 35.3 1 218 ## 20 3 25.5 2 184 ## 21 1 29.6 1 206 ## 22 1 33.2 1 229 ## 23 1 27.5 1 131 ## 24 3 24.9 1 199 ## 25 2 27.4 1 186 ``` --- class: title title-1 # Multiple imputation ```r complete(nhanes_imp, 5) ``` ``` ## age bmi hyp chl ## 1 1 24.9 1 238 ## 2 2 22.7 1 187 ## 3 1 29.6 1 187 ## 4 3 25.5 2 238 ## 5 1 20.4 1 113 ## 6 3 27.4 1 184 ## 7 1 22.5 1 118 ## 8 1 30.1 1 187 ## 9 2 22.0 1 238 ## 10 2 22.7 1 187 ## 11 1 29.6 1 206 ## 12 2 22.5 1 187 ## 13 3 21.7 1 206 ## 14 2 28.7 2 204 ## 15 1 29.6 1 206 ## 16 1 24.9 1 238 ## 17 3 27.2 2 284 ## 18 2 26.3 2 199 ## 19 1 35.3 1 218 ## 20 3 25.5 2 187 ## 21 1 27.2 1 187 ## 22 1 33.2 1 229 ## 23 1 27.5 1 131 ## 24 3 24.9 1 218 ## 25 2 27.4 1 186 ``` --- class: title title-1 #
Application Exercise 1. Use the `mice()` function to create a 5 imputed datasets of the `nhefs` data using predictive mean matching 3. Fit a linear model predicting `pregnancies` from `birthcontrol` on each of the 5 imputed datasets 4. Knit, commit, and push to GitHub
08
:
00
--- class: title title-1 # Pool Results ```r fit <- with(data = nhanes_imp, lm(age ~ bmi)) ``` --- class: title title-1 # Pool Results ```r fit <- with(data = nhanes_imp, lm(age ~ bmi)) fit ``` ``` ## call : ## with.mids(data = nhanes_imp, expr = lm(age ~ bmi)) ## ## call1 : ## mice(data = nhanes, m = 5, method = "pmm") ## ## nmis : ## age bmi hyp chl ## 0 9 8 10 ## ## analyses : ## [[1]] ## ## Call: ## lm(formula = age ~ bmi) ## ## Coefficients: ## (Intercept) bmi ## 3.71598 -0.07405 ## ## ## [[2]] ## ## Call: ## lm(formula = age ~ bmi) ## ## Coefficients: ## (Intercept) bmi ## 4.5686 -0.1054 ## ## ## [[3]] ## ## Call: ## lm(formula = age ~ bmi) ## ## Coefficients: ## (Intercept) bmi ## 4.28034 -0.09311 ## ## ## [[4]] ## ## Call: ## lm(formula = age ~ bmi) ## ## Coefficients: ## (Intercept) bmi ## 3.84211 -0.07974 ## ## ## [[5]] ## ## Call: ## lm(formula = age ~ bmi) ## ## Coefficients: ## (Intercept) bmi ## 3.74803 -0.07538 ``` --- class: title title-1 # Pool Results ```r fit <- with(data = nhanes_imp, lm(age ~ bmi)) pool(fit) ``` ``` ## Class: mipo m = 5 ## term m estimate ubar b t dfcom df ## 1 (Intercept) 5 4.03102333 1.069419646 0.1415549368 1.239285570 23 16.86913 ## 2 bmi 5 -0.08554488 0.001494635 0.0001806221 0.001711381 23 17.25864 ## riv lambda fmi ## 1 0.1588394 0.1370676 0.2239293 ## 2 0.1450164 0.1266501 0.2128701 ``` --- class: title title-1 # Pool Results ```r fit <- with(data = nhanes_imp, lm(age ~ bmi)) *pool(fit) ``` ``` ## Class: mipo m = 5 ## term m estimate ubar b t dfcom df ## 1 (Intercept) 5 4.03102333 1.069419646 0.1415549368 1.239285570 23 16.86913 ## 2 bmi 5 -0.08554488 0.001494635 0.0001806221 0.001711381 23 17.25864 ## riv lambda fmi ## 1 0.1588394 0.1370676 0.2239293 ## 2 0.1450164 0.1266501 0.2128701 ``` --- class: title title-1 # Pool Results ```r with(data = nhanes_imp, lm(age ~ bmi)) %>% pool() %>% tidy() ``` ``` ## term estimate std.error statistic p.value b ## 1 (Intercept) 4.03102333 1.11323204 3.621009 0.002132875 0.1415549368 ## 2 bmi -0.08554488 0.04136885 -2.067857 0.053982698 0.0001806221 ## df dfcom fmi lambda m riv ubar ## 1 16.86913 23 0.2239293 0.1370676 5 0.1588394 1.069419646 ## 2 17.25864 23 0.2128701 0.1266501 5 0.1450164 0.001494635 ``` --- class: title title-1 # Pooling effect estimates .box-1.medium[ `$$\bar\theta = \frac{1}{m}\sum_{i=1}^m\theta_i$$` ] --- class: title title-1 # Pooling standard errors .box-1.medium[ `$$V_{Total} = V_{within} + V_{between} + \frac{V_{between}}{m}$$` ] -- .box-1["Rubin's Rules"] --- class: title title-1 # Pooling standard errors .box-1.medium[ `$$V_{within} =\frac{1}{m}\sum_{i=1}^mSE_i^2$$` ] --- class: title title-1 # Pooling standard errors .box-1.medium[ `$$V_{between} =\frac{\sum_{i=1}^m(\theta_i-\bar{\theta})^2}{m-1}$$` ] --- class: title title-1 #
Application Exercise 1. Use the `with()` function to fit a model on each of your 5 imputed datasets. 2. Use the `pool()` function to pool the results. Compare to your previous results 3. Knit, commit, and push to GitHub
08
:
00
--- class: title title-1 # Adding in the PS weighting .box-1[Across] -- .box-inv-1[Create m imputed datasets, estimate the propensity score within each one then average across the m datasets, pool the results] -- .box-1[Within] -- .box-inv-1[Create m imputed datasets, estimate the propensity score within each one, run a seperate outcome model for each propensity score, pool the results.] --- class: title title-1 # Adding in the PS weighting ```r nhanes$exp <- rbinom(25, 1, 0.5) nhanes_imp <- mice(nhanes, m = 5, method = "pmm") ``` --- class: title title-1 # MatchThem package .box-1[From the makers of `MatchIt` we have a package that can handle the weights for you!] ```r library(MatchThem) nhanes_weighted <- weightthem(exp ~ bmi + hyp + chl, nhanes_imp, approach = "across", estimand = "ATE") ``` --- class: title title-1 # MatchThem package .box-1[From the makers of `MatchIt` we have a package that can handle the weights for you!] ```r *library(MatchThem) nhanes_weighted <- weightthem(exp ~ bmi + hyp + chl, nhanes_imp, approach = "across", estimand = "ATE") ``` --- class: title title-1 # MatchThem package .box-1[From the makers of `MatchIt` we have a package that can handle the weights for you!] ```r library(MatchThem) *nhanes_weighted <- weightthem(exp ~ bmi + hyp + chl, nhanes_imp, approach = "across", estimand = "ATE") ``` --- class: title title-1 # MatchThem package .box-1[From the makers of `MatchIt` we have a package that can handle the weights for you!] ```r library(MatchThem) nhanes_weighted <- weightthem(exp ~ bmi + hyp + chl, * nhanes_imp, approach = "across", estimand = "ATE") ``` --- class: title title-1 # MatchThem package .box-1[From the makers of `MatchIt` we have a package that can handle the weights for you!] ```r library(MatchThem) nhanes_weighted <- weightthem(exp ~ bmi + hyp + chl, nhanes_imp, * approach = "across", estimand = "ATE") ``` --- class: title title-1 # MatchThem package .box-1[From the makers of `MatchIt` we have a package that can handle the weights for you!] ```r library(MatchThem) nhanes_weighted <- weightthem(exp ~ bmi + hyp + chl, nhanes_imp, approach = "across", * estimand = "ATE") ``` --- class: title title-1 # MatchThem package ```r library(survey) with(nhanes_weighted, svyglm(age ~ exp)) %>% pool() %>% tidy() ``` ``` ## term estimate std.error statistic p.value b df dfcom ## 1 (Intercept) 1.57185008 0.1910444 8.2276685 4.828489e-08 0 21.22865 23 ## 2 exp 0.06565242 0.3074550 0.2135351 8.329471e-01 0 21.22865 23 ## fmi lambda m riv ubar ## 1 0.08254692 0 5 0 0.03649797 ## 2 0.08254692 0 5 0 0.09452855 ``` --- class: title title-1 # MatchThem package ```r *library(survey) with(nhanes_weighted, svyglm(age ~ exp)) %>% pool() %>% tidy() ``` ``` ## term estimate std.error statistic p.value b df dfcom ## 1 (Intercept) 1.57185008 0.1910444 8.2276685 4.828489e-08 0 21.22865 23 ## 2 exp 0.06565242 0.3074550 0.2135351 8.329471e-01 0 21.22865 23 ## fmi lambda m riv ubar ## 1 0.08254692 0 5 0 0.03649797 ## 2 0.08254692 0 5 0 0.09452855 ``` --- class: title title-1 # MatchThem package ```r library(survey) *with(nhanes_weighted, svyglm(age ~ exp)) %>% pool() %>% tidy() ``` ``` ## term estimate std.error statistic p.value b df dfcom ## 1 (Intercept) 1.57185008 0.1910444 8.2276685 4.828489e-08 0 21.22865 23 ## 2 exp 0.06565242 0.3074550 0.2135351 8.329471e-01 0 21.22865 23 ## fmi lambda m riv ubar ## 1 0.08254692 0 5 0 0.03649797 ## 2 0.08254692 0 5 0 0.09452855 ``` --- class: title title-1 # MatchThem package ```r library(survey) with(nhanes_weighted, svyglm(age ~ exp)) %>% * pool() %>% tidy() ``` ``` ## term estimate std.error statistic p.value b df dfcom ## 1 (Intercept) 1.57185008 0.1910444 8.2276685 4.828489e-08 0 21.22865 23 ## 2 exp 0.06565242 0.3074550 0.2135351 8.329471e-01 0 21.22865 23 ## fmi lambda m riv ubar ## 1 0.08254692 0 5 0 0.03649797 ## 2 0.08254692 0 5 0 0.09452855 ``` --- class: title title-1 # MatchThem package ```r library(survey) with(nhanes_weighted, svyglm(age ~ exp)) %>% pool() %>% * tidy() ``` ``` ## term estimate std.error statistic p.value b df dfcom ## 1 (Intercept) 1.57185008 0.1910444 8.2276685 4.828489e-08 0 21.22865 23 ## 2 exp 0.06565242 0.3074550 0.2135351 8.329471e-01 0 21.22865 23 ## fmi lambda m riv ubar ## 1 0.08254692 0 5 0 0.03649797 ## 2 0.08254692 0 5 0 0.09452855 ``` --- class: title title-1 # MatchThem package .box-1[From the makers of `MatchIt` we have a package that can handle the weights for you!] ```r library(MatchThem) nhanes_weighted <- weightthem(exp ~ bmi + hyp + chl, nhanes_imp, * approach = "within", estimand = "ATE") ``` --- class: title title-1 # MatchThem package ```r library(survey) with(nhanes_weighted, svyglm(age ~ exp)) %>% pool() %>% tidy() ``` ``` ## term estimate std.error statistic p.value b df ## 1 (Intercept) 1.58535946 0.2354188 6.7342084 9.412533e-05 0.017759358 8.811045 ## 2 exp 0.03782741 0.3140540 0.1204487 9.053624e-01 0.005292547 19.462903 ## dfcom fmi lambda m riv ubar ## 1 23 0.4887464 0.38452637 5 0.6247650 0.03411080 ## 2 23 0.1476952 0.06439279 5 0.0688246 0.09227887 ``` --- class: title title-1 #
Application Exercise 1. `install.packages("MatchThem")` 2. Use the `weightthem()` function to estimate the uncertainty for the causal effect of `birthcontrol` on `pregnancies` 2. Knit, Commit, Push to GitHub
08
:
00
---