Our Services

Get 15% Discount on your First Order

[rank_math_breadcrumb]

R programming

check the attachements

Please read the instructions and questions carefully in ” Assignment_4_2023_Fall.pdf” file and use “Auto.csv” to finish the assignment. You should submit both 1) an R code ; 2) A PDF report with answers through the link “Submit Assignment 4 Here”

Guidelines:

· Use only R for this assignment

· Submit both R code and Report on findings

· Work is to be done individually for this assignment

Fitting a Classification Tree

1.
This problem involves the OJ data set which is part of the ISLR package (
Hint: the first three lines of codes should be: library (tree), library (ISLR), attach (OJ)).

1.1 Create a training set containing a random sample of 800 observations, and a test set containing the remaining observations. Take a screenshot of your code. (Hint: set.seed (2), train=sample())

1.2 Fit a tree to
the training data, with
Purchase as the response and the other variables as predictors. Use the summary( ) function to produce summary statistics about the tree. Take a screenshot of the summary statistics. How many terminal nodes does the tree have? What is the training misclassification error rate?

1.3 Plot the tree and take a screenshot of the tree (Hint: plot() and text())

1.4 Predict the response on the test data, and produce a confusion matrix comparing the test labels to the predicted test labels. What is the accuracy rate?

1.5 Apply the cv.tree() function to the training set in order to determine the optimal tree size. (Use set.seed(7)). Print the results (Hint: the results should contain the size, k, method etc).

1.6 Produce a plot with tree size (i.e. size) on the x-axis and cross-validated classification error rate (i.e. dev) on the y-axis.

1.7 Which tree size corresponds to the lowest cross-validated classification error rate (i.e. dev)?

1.8 Produce a pruned tree corresponding to the optimal tree size obtained using cross-validation. Take a screenshot of a pruned tree. What is the accuracy rate for the pruned tree? Is it improved compared to the accuracy rate in (1.4)?

1.9 If cross-validation does not lead to selection of a pruned tree (i.e. the accuracy rate produced in (1.8) is lower than the one in (1.4)), then create a pruned tree with five terminal nodes. What is the accuracy rate now?

1

Fitting a Regression Tree

2.
In the lab, a classification tree was applied to the Carseats data set after converting Sales into a qualitative response variable. Now we will seek to predict Sales using regression trees and related approaches, treating the response as a quantitative variable.

2.1 Using the validation-set approach to split the data set into a training set and a test set (Hint:
use set.seed(2); validation-set approach: half of the observations are selected as the training dataset while half of observations are treated as the test dataset). Take a screenshot of your code.

2.2 Fit a regression tree to the training set.

a) Use summary () to print out the results. How many terminal nodes do you get? What is RMD (Residual Mean Deviance)?

b) Plot the tree and take a screenshot of the tree;

c) What test MSE do you obtain?

2.3 Use cross-validation in order to determine the optimal level of tree complexity (use set.seed(2)).

a) Produce a plot with tree size on the x-axis and cross-validated classification error rate on the y-axis.

b) What is the optimal level of tree complexity?

c) Using the optimal level of tree size to prune the tree, does pruning the tree improve the test MSE?

2.4 Use the bagging approach in order to analyze this data. Take a screenshot of the results. What test MSE do you obtain? (Hint: use set.seed (1);
mtry=10 since we have 10 predictors in Carseats dataset and we use all of the predictors in the bagging approach).

2.5 Use random forests to analyze this data.

a) What test MSE do you obtain? (Hint: use set.seed(1);
mtry=10/3 since we usually use 1/3 of the predictors when building a random forest of regression trees)

b) Use the importance() function to determine which variables are most important. Take a screenshot of your results.

c) Plots of these importance measures can be produced using the varImpPlot() function. Take a screenshot of your output.

d) So which variables are most important?

What to submit:

1. R code.

a.

b.

c.

d.

2. Report.

a.

b.

c.

d.

e.

Should include all the code to accomplish the tasks.

Clear and concise comments to indicate what part of the assignment each code chunk pertains to.

Code should be easily readable.

Filename should be in the format of: LastnameFirstname_A4.R

Take screenshots of your outputs in R Studio and answer all the questions. Submit in PDF format.

Answers questions clearly and concisely.

Includes appropriate plots. Make sure the plots are properly labeled.

The assignment will be graded on the correctness of the answers, comprehensiveness of the analysis, clarity of results’ presentation and neatness of the report.

Share This Post

Email
WhatsApp
Facebook
Twitter
LinkedIn
Pinterest
Reddit

Order a Similar Paper and get 15% Discount on your First Order

Related Questions

Computer Science 2 Assignments

Operational Excellence Week 2 Assignment Information Systems for Business and Beyond Questions: · Chapter 3 – study questions 1-8, Exercise 2, 4 & 5 Information Technology and Organizational Learning Assignment: Chapter 3 – Complete the two essay assignments noted below:  · Review the strategic integration section.  Note what strategic integration is and how

Discussion 3: generative adversarial nets

  Generative adversarial nets are mentioned in 2014 by Ian Goodfellow et al.  Why is generative adversarial network a key turning point in the history of generative modeling? Why is the field of image generation important? 

Week 3 – Linear Regression & Business Decision Making

attached file.  An asset management company must replace the manager of its two signature mutual funds, who is about to retire. Two candidates have been short-listed. The management team is divided and cannot decide which of the two candidates would make the better mutual fund manager. The retiring manager presents

data science

Final Exam Due Saturday 11:59 pm (Week 15) You cannot use any of the datasets in our assignments, class notes, and your own midterm project. If you are using the same one, you will receive 0 for your final project. 1. Question Formulation (5 points): You need to devise a

Letter of Recommendations

Hi  Attached is the sample of Letter of recommendation  Please write about it accordingly  1. Write about author :AUTHOR WILL BE professor David Kimble I will give links about his Biography write accordingly or you can use your own search engines about him to write it. 2 . How the

Letter of Recommendations

Hi  Attached is the sample of Letter of recommendation  Please write about it accordingly  1. Write about author :AUTHOR WILL BE professor David Kimble I will give links about his Biography write accordingly or you can use your own search engines about him to write it. 2 . How the

data science

Final Exam Due Saturday 11:59 pm (Week 15) You cannot use any of the datasets in our assignments, class notes, and your own midterm project. If you are using the same one, you will receive 0 for your final project. 1. Question Formulation (5 points): You need to devise a

IT 202

5/15/24, 10:59 AM Assignment Information 1/3 IT 202 Project One Milestone Guidelines and Rubric Overview For the purposes of this assignment, imagine that you are a systems architect at a medium-sized publishing company with 130 employees. The company primarily publishes books, both in print and online. It also produces other

Assessments

Perimeter defense techniques Evaluate the types of assessments, select one that you might use, and explain why it is important. Of the top eight areas to research when conducting an assessment, select no less than three and explain how one should approach the research and why it should be approached

project ppt presentation

Project 3 – Ensemble Methods and Unsupervised Learning In this project you will explore some techniques in unsupervised learning as well as ensemble methods. It is important to realize that understanding an algorithm or technique requires understanding how it behaves under a variety of circumstances. You will go through the

Week 2 understanding on Python.

PDF for reference purpose other file is requirement Python Installation & Examples Atif Farid Mohammad PhD 1. Open any Browser 2. Go to 3. Click at Download button 4. Go to your Download Folder (In both Windows and Mac) a. In Windows you will have the file: Anaconda3-2022.05-Windows-x86_64.exe b. Double

Computer Science Assignments

Operational Excellence Week 2 Assignment information Systems for Business and Beyond Questions · Chapter 2 – study questions 1-10, Exercise 2      Information Technology and Organizational Learning Questions · Chapter 2 – Note why the IT organizational structure is an important concept to understand.  Also, note the role of

Computer Science IT project assignment

Pg. 01 Project I Project Deadline: Sunday 12/5/2024 @ 23:59 [Total Mark is 14] Introduction to Database IT244 College of Computing and Informatics Project Instructions · You can work on this project as a group (minimum 2 and maximum 3 students). Each group member must submit the project individually with

project ppt presentation

Project 3 – Ensemble Methods and Unsupervised Learning In this project you will explore some techniques in unsupervised learning as well as ensemble methods. It is important to realize that understanding an algorithm or technique requires understanding how it behaves under a variety of circumstances. You will go through the

coding

Assignment 6 Due Saturday 11:59 pm (Week 14) Part 1 (50 points) We will explore the Marvel Network Universe. The dataset which you will find in Blackboard consists of the hero’s networks. For this dataset, you will need to ask yourself 3 questions (i.e which superhero knows more superheroes?) ,

project ppt presentation

Project 3 – Ensemble Methods and Unsupervised Learning In this project you will explore some techniques in unsupervised learning as well as ensemble methods. It is important to realize that understanding an algorithm or technique requires understanding how it behaves under a variety of circumstances. You will go through the

How hackers get info

Identify at least two ways in which hackers gather information about companies. What can companies do to limit this access, specifically to the ways you have identified? Which type of information can be gathered with enumeration? How and why should companies protect themselves against enumeration attempts?