Module 11 Assignment
12.1: Set up additive model for the ashina data, as part of ISwR package. This data contains additive effects on subjects, period, and treatment. Compare the results with those obtained from t test.
> # 12.1
> library(ISwR)
> data(ashina)
> str(ashina)
'data.frame': 16 obs. of 3 variables:
$ vas.active: int -167 -127 -58 -103 -35 -164 -3 25 -61 -45 ...
$ vas.plac : int -102 -39 32 28 16 -42 -27 -30 -47 8 ...
$ grp : int 1 1 1 1 1 1 1 1 1 1 ...
> # Add subject as a factor
> ashina$subject <-factor(1:16)
> # Create two data frames: active and placebo
> attach(ashina)> act <- data.frame(vas = vas.active, subject, treat = 1, period = grp)
> plac <- data.frame(vas = vas.plac, subject, treat = 0, period = grp)
> combined <- rbind(act, plac)
> combined$treat <- factor(combined$treat)
> combined$period <- factor(combined$period)
> # Fit the additive model
> model <- lm(vas ~ subject + period + treat, data = combined)
> summary(model)
Call:
lm(formula = vas ~ subject + period + treat, data = combined)
Residuals:
Min 1Q Median 3Q Max
-48.94 -18.44 0.00 18.44 48.94
Coefficients: (1 not defined because of singularities)
Estimate Std. Error t value Pr(>|t|)
(Intercept) -113.06 27.39 -4.128 0.000895
subject2 51.50 37.58 1.370 0.190721
subject3 121.50 37.58 3.233 0.005573
subject4 97.00 37.58 2.581 0.020867
subject5 125.00 37.58 3.326 0.004604
subject6 31.50 37.58 0.838 0.415070
subject7 119.50 37.58 3.180 0.006215
subject8 132.00 37.58 3.513 0.003142
subject9 80.50 37.58 2.142 0.049003
subject10 116.00 37.58 3.087 0.007518
subject11 121.50 37.58 3.233 0.005573
subject12 154.50 37.58 4.111 0.000925
subject13 131.00 37.58 3.486 0.003318
subject14 125.00 37.58 3.326 0.004604
subject15 99.00 37.58 2.634 0.018768
subject16 80.50 37.58 2.142 0.049003
period2 NA NA NA NA
treat1 -42.87 13.29 -3.227 0.005644
(Intercept) ***
subject2
subject3 **
subject4 *
subject5 **
subject6
subject7 **
subject8 **
subject9 *
subject10 **
subject11 **
subject12 ***
subject13 **
subject14 **
subject15 *
subject16 *
period2
treat1 **
---
Signif. codes:
0 ‘***’ 0.001 ‘**’ 0.01 ‘*’ 0.05 ‘.’ 0.1 ‘ ’ 1
Residual standard error: 37.58 on 15 degrees of freedom
Multiple R-squared: 0.7566, Adjusted R-squared: 0.4969
F-statistic: 2.914 on 16 and 15 DF, p-value: 0.02229
> # Compare with paired t-test
> t.test(vas.active, vas.plac, paired = TRUE)
Paired t-test
data: vas.active and vas.plac
t = -3.2269, df = 15, p-value = 0.005644
alternative hypothesis: true mean difference is not equal to 0
95 percent confidence interval:
-71.1946 -14.5554
sample estimates:
mean difference
-42.875
In this section, I used the ashina data set from the ISwR package to build an additive model in order to examine the effects of treatment, subject, and period. I combined both the active and placebo treatments into one dataset and fit the model lm(vas ~ subject + period + treat). The results showed that the active treatment significantly reduced the pain (estimate = -42.87, p= .005644). Many of the subject effects were also significant, which makes sense since pain responses vary across individuals. Period 2 variable showed a singularity, likely due to how R handles factor coding. A paired t-test produced nearly identical results (mean difference = -42.875, p= .005644), confirming that the treatment's effect. But the additive model provides a clearer picture by accounting for subject and period differences.
12.3: Consider the following definitions
Note:
The rnorm() is a built-in R function that generates a vector of normally distributed random numbers. The rnorm() method takes a sample size as input and generates that many random numbers.
Your assignment:
Generate the model matrices for models z~a*b, z~a:b, etc. In your blog posting, discuss the implications. Carry out the model fits and notice which models contain singularities.
> # 12.3
> # Define the g1 function and generate a , b
> g1 <- function(from, to, length.out)
+ factor(rep(from:to, length.out = length.out))
> a <- g1(2, 3, 8)
> b <- g1(2, 4, 8)
> x <- 1:8
> y <- c(1:4, 8:5)
> z <- rnorm(8)
> # Generate model matrices
> model.matrix(~ a * b)
(Intercept) a3 b3 b4 a3:b3 a3:b4
1 1 0 0 0 0 0
2 1 1 1 0 1 0
3 1 0 0 1 0 0
4 1 1 0 0 0 0
5 1 0 1 0 0 0
6 1 1 0 1 0 1
7 1 0 0 0 0 0
8 1 1 1 0 1 0
attr(,"assign")
[1] 0 1 2 2 3 3
attr(,"contrasts")
attr(,"contrasts")$a
[1] "contr.treatment"
attr(,"contrasts")$b
[1] "contr.treatment"
> model.matrix(~ a : b)
(Intercept) a2:b2 a3:b2 a2:b3 a3:b3 a2:b4 a3:b4
1 1 1 0 0 0 0 0
2 1 0 0 0 1 0 0
3 1 0 0 0 0 1 0
4 1 0 1 0 0 0 0
5 1 0 0 1 0 0 0
6 1 0 0 0 0 0 1
7 1 1 0 0 0 0 0
8 1 0 0 0 1 0 0
attr(,"assign")
[1] 0 1 1 1 1 1 1
attr(,"contrasts")
attr(,"contrasts")$a
[1] "contr.treatment"
attr(,"contrasts")$b
[1] "contr.treatment"
> model.matrix(~ a + b)
(Intercept) a3 b3 b4
1 1 0 0 0
2 1 1 1 0
3 1 0 0 1
4 1 1 0 0
5 1 0 1 0
6 1 1 0 1
7 1 0 0 0
8 1 1 1 0
attr(,"assign")
[1] 0 1 2 2
attr(,"contrasts")
attr(,"contrasts")$a
[1] "contr.treatment"
attr(,"contrasts")$b
[1] "contr.treatment"
> # Fit models and check for singularities
> summary(lm(z ~ a * b))
Call:
lm(formula = z ~ a * b)
Residuals:
1 2 3 4
-9.956e-01 5.472e-01 0.000e+00 -5.551e-17
5 6 7 8
4.996e-16 5.551e-17 9.956e-01 -5.472e-01
Coefficients:
Estimate Std. Error t value Pr(>|t|)
(Intercept) -0.1023 0.8033 -0.127 0.910
a3 0.4713 1.3913 0.339 0.767
b3 -1.2353 1.3913 -0.888 0.468
b4 -0.5064 1.3913 -0.364 0.751
a3:b3 0.2805 1.9676 0.143 0.900
a3:b4 1.9707 2.1253 0.927 0.452
Residual standard error: 1.136 on 2 degrees of freedom
Multiple R-squared: 0.7063, Adjusted R-squared: -0.0279
F-statistic: 0.962 on 5 and 2 DF, p-value: 0.5807
> summary(lm(z ~ a : b))
Call:
lm(formula = z ~ a:b)
Residuals:
1 2 3 4
-9.956e-01 5.472e-01 0.000e+00 0.000e+00
5 6 7 8
0.000e+00 1.665e-16 9.956e-01 -5.472e-01
Coefficients: (1 not defined because of singularities)
Estimate Std. Error t value Pr(>|t|)
(Intercept) 1.833 1.136 1.614 0.248
a2:b2 -1.936 1.391 -1.391 0.299
a3:b2 -1.464 1.607 -0.911 0.458
a2:b3 -3.171 1.607 -1.974 0.187
a3:b3 -2.419 1.391 -1.739 0.224
a2:b4 -2.442 1.607 -1.520 0.268
a3:b4 NA NA NA NA
Residual standard error: 1.136 on 2 degrees of freedom
Multiple R-squared: 0.7063, Adjusted R-squared: -0.0279
F-statistic: 0.962 on 5 and 2 DF, p-value: 0.5807
> summary(lm(z ~ a + b))
Call:
lm(formula = z ~ a + b)
Residuals:
1 2 3 4 5 6
-0.7824 0.4275 -0.6656 -0.4263 0.2393 0.6656
7 8
1.2087 -0.6668
Coefficients:
Estimate Std. Error t value Pr(>|t|)
(Intercept) -0.3155 0.6138 -0.514 0.634
a3 1.1108 0.7223 1.538 0.199
b3 -1.2614 0.8340 -1.513 0.205
b4 0.3724 0.9008 0.413 0.701
Residual standard error: 0.978 on 4 degrees of freedom
Multiple R-squared: 0.5647, Adjusted R-squared: 0.2382
F-statistic: 1.73 on 3 and 4 DF, p-value: 0.2986
This part of the assignment helped me understand how interactions work in model matrices and what happened when they're included or excluded in a linear model. I used custom function to create two factors (a and b), to generate a numeric response variable z with random normal values. Once I fixed an issue where oneof the factor had only one level, I test three models: a full factorial (z~a*b), interaction-only (z~a:b), and additive-only (z~a+b). The full model included both the main and interaction effects but had limited degrees of freedom due to the small sample size. The interaction-only model worked but showed singularity, meaning one combination didn't have enough data. The additive model was simpler and still showed one significant effect. Overall, this assignment showed how R handles factor coding, why singularities occur, and the importance of picking a model that fits your data instead of automatically using the most complex one.
Comments
Post a Comment