## Linear Regression

1)  ### Solution n1= no of observation in Y = 24
n2 =no of observation in X = 24
s12 = (Y - Ybar)2/ n-1 = 52744
s22 = (Y - Ybar)2/ n-1 = 1113194
SE of slope = sqrt [ Σ(Y - Ypred)2 / (n - 2) ] / sqrt [ Σ(X - Xbar)2 ] = 1.49
b1 = Σ [ (X - Xbar)(Y - Ybar) ] / Σ [ (X - Xbar)2]= 0.19465675
bo = Ybar -
b1 * Xbar = 244.4506

2) The relationship is linear. Below is the scatter plot. b) The correlation is 0.89

c) Hours = 244.45 + 0.19 * New Subscriptions

A federation of play-based nursery and elementarty schools has found, in the childred at its schools, a correlation of 0.65 between time spent playing with brick-based construction toys in nursery school(in minutes per day) and first grade math score (on a scale of 0 to 100). Which of the following is implied by this information?

(A) Playing with brick-based construction toys in nursery school auses an increase in first grade math score.
(B) If the time spent playing with brick-based construction toys had been recorded in hours per day instead of minutes per day, then the correlation would have been 0.65/60.
(C) When time spent playing with brick-based construction toys (in minutes per day) increases by 1, the average increase in first grade math score is 0.65.
(D) If two childred are selected from those included in the study, then the one with the greater time spent playing with brick-based construction toys will have the higher first grade math score.
(E) Less than half of the variation in first grade math score can be explained by the regression line of first grade math score on time spent playing with brick-based construction toys in nursery school.

Solution
Since r=0.65
Hence R2 = 0.65 * 0.65 = 0.42
Theefore the answer is E (Less than half of the variation in the first grade math score can be explainedby the regression line of the fist grade math score on time spent playing with brick based Construction toys in nursery school.)

3) Please consider the data presented above for the monthly sales of Ever-cool brand of refrigerators in 1,000s of dollars and answer the following questions:

Independent variables are

Price (in dollars); Promotional Expenditure (in 1,000s of dollars); Quality of service (scale of 1-10); location (categorical variable: city area: 1; suburban area: 0).

A. Based on the relevant residual plots, do you see any evidence of violation of assumptions (Linearity, Normality, Equal variance)?
B. State the multiple regression equation and interpret the meaning of the slopes, b1, b2, b3, and b4.
C. At the 0.05 level of significance, determine whether each independent variable makes a significant contribution to the regression model. On the basis of these results, indicate the independent variables to include in this model. (Based on t - test results)
D. Construct a 95% confidence interval estimate of the population slope between Quality and the monthly sales () (please note that Minitab can’t do this directly, however you may use the relevant information from Minitab output and then construct the confidence interval manually)
E. Perform the overall F- test and comment on the significance of the model.

Solution
A) Linearity : From the above plot we can see the data is fairly linear.
Equal Variance: The data does follow an pattern, its completely random. Hence no violation from homoskedasticity. Normality : We can see from the above table data is normally distributed, except the tail there is no much variation.

B) Multiple linear Regression equation:

Sales = 327.96 - 0.222 * b1 + 2.841 * b2 + 0.275*b3 + 3.738 * b4
b1: A unit increase in b1, sales decreases by -0.222, keeping all other variables constant.
b2: A unit increase in b2, sales increases by 2.841, keeping all other variables constant.
b3: A unit increase in b3, sales increases by 0.275, keeping all other variables constant.
b4: A unit increase in b4, sales increases by 3.738, keeping all other variables constant.

C) From the above table, we can see clearly

That except City:1/Suburban:0 all other variables are significant for the model as the p-values of all the other variables will be less than 0.05. The p-values of City:1/Suburban:0 is 0.289 , which is insignificant.

D) E)    ANOVA 