Our Services

Get 15% Discount on your First Order

[rank_math_breadcrumb]

R programming

Check the attachments 

Please read the instructions and questions carefully in ” Assignment_3_ 2024.pdf” file and use “Auto.csv” to finish the assignment. You should submit both 1) an R code ; 2) A PDF report with answers through the link “Submit Assignment 3 Here”.

Guidelines:

· Use R only for the part 2 in this assignment

· Submit both R code and Report on findings

· Work is to be done individually for this assignment

1. Suppose we collect data for a group of students in a statistics class with variables X1 =hours studied, X2

=undergrad GPA, and Y = receive an A. We fit a logistic regression and produce estimated coefficient,

𝛽̂0 = −7, 𝛽̂1 = 0.06, 𝛽̂2 = 1. (You do not need R code to solve this question).

(1) Estimate the probability that a student who studies for 50 hours and has an undergrad GPA of 3.5 gets

an A in the class. (Hint: For logistic regression, 𝑝(𝑥) = 𝑒𝛽0+𝛽1𝑋1+𝛽2𝑋2

)

1+𝑒𝛽0+𝛽1𝑋1+𝛽2𝑋2

(2) How many hours would a student with GPA 3.4 need to study to have a 50% chance of getting an A

1

in the class? (Hint: We can use the equation log (
𝑝(𝑥) ) = 𝛽

+ 𝛽 𝑋

+ 𝛽 𝑋 ))

1−𝑝(𝑥)

0 1 1 2 2

2. The following questions (3) to (8) should be answered using the
Weekly data set, which is part of the
ISLR package. This data is similar in nature to the Smarket data from this chapter’s lab, except that it contains 1089 weekly returns for 21 years, from the beginning of 1990 to the end of 2010.

(3) Use require(ISLR) and library (ISLR) to load the ISLR package.

a) Use summary( ) function to produce some numerical summaries of the
Weekly data.

b) Use pairs ( ) function to produce a scatterplot matrix of the variables of the data.

c) Do you see the relationship between
Year and
Volume? What is the pairwise correlation value between
Year and
Volume?

d) Is the relationship positive or negative?

(4) Use the full dataset to perform a logistic regression with
Direction as the dependent variable and
Lag1, Lag2, Lag3, Lag4 and
Volume as independent variables (i.e. predictors). Use the summary() function to print the results. Do any of the predictors appear to be statistically significant? If so, which ones? Take a screenshot of your outputs and then answer the questions.

(5) Based on 4)’s results, compute the confusion matrix and overall faction of correct predictions (Hint: refer the code from Chapter 4 lab session on the textbook; we use 0.5 as the predicted probability cut-off for the classifier). What is the precision rate? What is the recall rate? Take a screenshot of your output and then answer the questions.

(6) Now fit the logistic regression model using a training data period from
1990 to 2009 with
Lag 2 as the only predictor. Compute the confusion matrix and the overall fraction of correct predictions for the held out data (i.e. test data) (the data from
2010). In addition, please calculate the precision rate and recall rate. (Hint: refer the code from Chapter 4 lab session on the textbook; we use 0.5 as the predicted probability cut-off for the classifier). Take a screenshot of your output and then answer the questions.

(7) Repeat (6) using KNN with K=1. Compute the confusion matrix and the overall fraction of correct predictions for the held-out data. In addition, please calculate the precision rate and recall rate. (Hint: refer the code from Chapter 4 lab session on the textbook; If you encounter some errors such as “dims of ‘test’ and ‘train’ differ”, try to use knn(data.frame(train.X), …) ). (Use set.seed(1))

(8) Repeat (6) using KNN with K=10. Compute the confusion matrix and the overall fraction of correct predictions for the held-out data. In addition, please calculate the precision rate and recall rate.

3. The quantity
𝑝(𝑋) is called the
odds. Please answer the following questions (You do not need R code

1−𝑝(𝑋)

to solve this question):

(9) On average, what fraction of people with an odds of 0.35 of defaulting on their credit card payment will in fact default?

(10) Suppose that an individual has a 15% chance of defaulting on her credit card payment. What are the odds that she will default?

4. The logistic regression model that results from predicting the probability of default from student status can be seen in the following table. We create a dummy variable that takes on a value of 1 for students and 0 for non-students. Please answer the following questions (You do not need R code for these questions).

(11) How to explain the coefficient before Student[Yes]?

(12) If it is a non-student, what are the estimated odds? Is the probability of default less than the probability of not default?

What to submit:

1. R code.

a.

b.

c.

d.

2. Report.

a.

b.

c.

d.

e.

Should include all the code to accomplish the tasks.

Clear and concise comments to indicate what part of the assignment each code chunk pertains to.

Code should be easily readable.

Filename should be in the format of: LastnameFirstname_A3.R

Take screenshots of your outputs in R Studio and answer all the questions. Submit in PDF format.

Answers questions clearly and concisely.

Includes appropriate plots. Make sure the plots are properly labeled.

The assignment will be graded on the correctness of the answers, comprehensiveness of the analysis, clarity of results’ presentation and neatness of the report.

image1.jpeg

Share This Post

Email
WhatsApp
Facebook
Twitter
LinkedIn
Pinterest
Reddit

Order a Similar Paper and get 15% Discount on your First Order

Related Questions

Computer Science 2 Assignments

Operational Excellence Week 2 Assignment Information Systems for Business and Beyond Questions: · Chapter 3 – study questions 1-8, Exercise 2, 4 & 5 Information Technology and Organizational Learning Assignment: Chapter 3 – Complete the two essay assignments noted below:  · Review the strategic integration section.  Note what strategic integration is and how

Discussion 3: generative adversarial nets

  Generative adversarial nets are mentioned in 2014 by Ian Goodfellow et al.  Why is generative adversarial network a key turning point in the history of generative modeling? Why is the field of image generation important? 

Week 3 – Linear Regression & Business Decision Making

attached file.  An asset management company must replace the manager of its two signature mutual funds, who is about to retire. Two candidates have been short-listed. The management team is divided and cannot decide which of the two candidates would make the better mutual fund manager. The retiring manager presents

data science

Final Exam Due Saturday 11:59 pm (Week 15) You cannot use any of the datasets in our assignments, class notes, and your own midterm project. If you are using the same one, you will receive 0 for your final project. 1. Question Formulation (5 points): You need to devise a

Letter of Recommendations

Hi  Attached is the sample of Letter of recommendation  Please write about it accordingly  1. Write about author :AUTHOR WILL BE professor David Kimble I will give links about his Biography write accordingly or you can use your own search engines about him to write it. 2 . How the

Letter of Recommendations

Hi  Attached is the sample of Letter of recommendation  Please write about it accordingly  1. Write about author :AUTHOR WILL BE professor David Kimble I will give links about his Biography write accordingly or you can use your own search engines about him to write it. 2 . How the

data science

Final Exam Due Saturday 11:59 pm (Week 15) You cannot use any of the datasets in our assignments, class notes, and your own midterm project. If you are using the same one, you will receive 0 for your final project. 1. Question Formulation (5 points): You need to devise a

IT 202

5/15/24, 10:59 AM Assignment Information 1/3 IT 202 Project One Milestone Guidelines and Rubric Overview For the purposes of this assignment, imagine that you are a systems architect at a medium-sized publishing company with 130 employees. The company primarily publishes books, both in print and online. It also produces other

Assessments

Perimeter defense techniques Evaluate the types of assessments, select one that you might use, and explain why it is important. Of the top eight areas to research when conducting an assessment, select no less than three and explain how one should approach the research and why it should be approached

project ppt presentation

Project 3 – Ensemble Methods and Unsupervised Learning In this project you will explore some techniques in unsupervised learning as well as ensemble methods. It is important to realize that understanding an algorithm or technique requires understanding how it behaves under a variety of circumstances. You will go through the

Week 2 understanding on Python.

PDF for reference purpose other file is requirement Python Installation & Examples Atif Farid Mohammad PhD 1. Open any Browser 2. Go to 3. Click at Download button 4. Go to your Download Folder (In both Windows and Mac) a. In Windows you will have the file: Anaconda3-2022.05-Windows-x86_64.exe b. Double

Computer Science Assignments

Operational Excellence Week 2 Assignment information Systems for Business and Beyond Questions · Chapter 2 – study questions 1-10, Exercise 2      Information Technology and Organizational Learning Questions · Chapter 2 – Note why the IT organizational structure is an important concept to understand.  Also, note the role of

Computer Science IT project assignment

Pg. 01 Project I Project Deadline: Sunday 12/5/2024 @ 23:59 [Total Mark is 14] Introduction to Database IT244 College of Computing and Informatics Project Instructions · You can work on this project as a group (minimum 2 and maximum 3 students). Each group member must submit the project individually with

project ppt presentation

Project 3 – Ensemble Methods and Unsupervised Learning In this project you will explore some techniques in unsupervised learning as well as ensemble methods. It is important to realize that understanding an algorithm or technique requires understanding how it behaves under a variety of circumstances. You will go through the

coding

Assignment 6 Due Saturday 11:59 pm (Week 14) Part 1 (50 points) We will explore the Marvel Network Universe. The dataset which you will find in Blackboard consists of the hero’s networks. For this dataset, you will need to ask yourself 3 questions (i.e which superhero knows more superheroes?) ,

project ppt presentation

Project 3 – Ensemble Methods and Unsupervised Learning In this project you will explore some techniques in unsupervised learning as well as ensemble methods. It is important to realize that understanding an algorithm or technique requires understanding how it behaves under a variety of circumstances. You will go through the

How hackers get info

Identify at least two ways in which hackers gather information about companies. What can companies do to limit this access, specifically to the ways you have identified? Which type of information can be gathered with enumeration? How and why should companies protect themselves against enumeration attempts?