Please read the instructions and questions carefully in ” Assignment_2_2024.pdf” file and use “Auto.csv” to finish the assignment. You should submit both 1) an R code ; 2) A PDF report with answers through the link “Submit Assignment 2 Here”.
Guidelines: ➢ Use R and R Studio for this assignment (do not use Excel or any other software) ➢ Submit both R code and PDF Report on findings ➢ Work is to be done individually for this assignment Simple Linear Regression This exercise involves the Auto Data set studied in the lab, which can be found in the file Auto.csv. Make sure that the missing values have been removed from the data. 1) Use read.csv () to load the Auto.csv. Use na.omit() to remove the rows containing missing observations. Use the lm () function to perform a simple linear regression with mpg as the dependent variable and weight as the predictor. Use the summary () function to print the regression results. Take a screenshot of your output. 2) From the regression results, please answer the following questions: a. Is there a relationship between the predictor (weight) and the dependent variable (DV) (mpg)? b. How significant is the relationship between the predictor (weight) and the DV (mpg)? c. Is the relationship between the predictor and the DV positive or negative? d. What is the predicted mpg associated with a weight of 2000? What are the associated 95% confidence and prediction intervals? 3) Please make a scatter plot between the dependent variable (mpg) and the predictor (weight). Please display the least squares regression line in red color. Take a screenshot of your output. 4) Please produce four diagnostic plots of the least squares regression fit. Comment on each plot and conclude whether each plot indicates/shows some problems. Multiple Linear Regression This exercise relates to the College data set, which can be found in the file College.csv. It contains a number of variables for 777 different universities and colleges in the US. The variables are: • Private : Public/private indicator • Apps : Number of applications received 2 • Accept : Number of applicants accepted • Enroll : Number of new students enrolled • Top10perc : New students from top 10% of high school class • Top25perc : New students from top 25% of high school class • F.Undergrad : Number of full-time undergraduates • P.Undergrad : Number of part-time undergraduates • Outstate : Out-of-state tuition • Room.Board : Room and board costs • Books : Estimated book costs • Personal : Estimated personal spending • PhD : Percent of faculty with Ph.D.’s • Terminal : Percent of faculty with terminal degree • S.F.Ratio : Student/faculty ratio • perc.alumni : Percent of alumni who donate • Expend : Instructional expenditure per student • Grad.Rate : Graduation rate 5) First load the data. Use the lm () function to perform a multiple linear regression with Grad.Rate as the dependent variable and other 10 variables including Private, Apps, Accept, Enroll, Top10perc, Top 25perc, PhD, Terminal, S.F.Ratio, Expend as the predictors (independent variables). Use the summary( ) function to print the results. Take a screenshot of your output. 6) From the result, which predictors appear to have statistically significant effects on the dependent variable? 7) What do the results imply? For example, for the positive coefficient of Top10perc, we can interpret that the number of new students from top 10 % of high school class will have a positive and significant influence on the graduation rate. How to interpret the coefficients for all the other significant variables? 8) First use the * symbol to fit the linear regression model with interaction effects (suppose the dependent variable is Grad.Rate; the two independent variables are Private and Top10perc; the interaction term is the product of Private and Top10perc). Then, use : symbol to fit the same linear regression model with interaction effects (the dependent variable is Grad.Rate; the two independent variables are Private and Top10perc; the interaction term is the product of Private and Top10perc). Take a screenshot of your output and then answer the question. Is the interaction term significant? 3 9) Use the lm () function to perform a multiple linear regression with Grad.Rate as the dependent variable and other variables such as Private, Apps, Accept, Enroll, Top10perc, Top 25perc, PhD, Terminal, S.F.Ratio, Expend as the predictors (independent variables) as we did in Question (5). And then test VIF (refer page 101-102 from the textbook for understanding this concept). Do VIF values for some variables indicate a problematic amount of collinearity? Take a screenshot of your output and then answer the question. 10) Use the lm () function to perform a multiple linear regression with Grad.Rate as the dependent variable and other variables such as Private, Apps, Accept, Enroll, Top10perc, Top 25perc, PhD, Terminal, S.F.Ratio, Expend as the predictors (independent variables) as we did in Question (5). And then use Backward Selection Method (refer textbook p.79) to decide the optimal model with all the remaining variables having p-values below 0.05. Take a screenshot of the regression results for the final optimal model (i.e. all the remaining variables have p values below 0.05). What to submit: 1. R code. a. Should include all the code to accomplish the tasks. b. Clear and concise comments to indicate what part of the assignment each code chunk pertains to. c. Code should be easily readable. d. Filename should be in the format of: LastnameFirstname_A2.R 2. Report. a. Take screenshots of your outputs in R Studio and answer all the questions. b. Submit in PDF format. c. Answers questions clearly and concisely. d. Includes appropriate plots. Make sure the plots are properly labeled. e. The assignment will be graded on the correctness of the answers, comprehensiveness of the analysis, clarity of results’ presentation and neatness of the report.