Our Services

Get 15% Discount on your First Order

[rank_math_breadcrumb]

Computer Science Homework 2

Homework 2.

Question 1. Decision Tree Classifier [10 Points]

Data: The zip file “
hw2.q1.data.zip” contains 3 CSV files:

· “
hw2.q1.train.csv” contains 10,000 rows and 26 columns. The first column ‘
y’ is the output variable with 2 classes: 0, 1. The remaining 25 columns contain input features:
x_1, …, x
_25.

· “
hw2.q1.test.csv” contains 2,000 rows and 26 columns. The first column ‘
y’ is the output variable with 2 classes: 0, 1. The remaining 25 columns contain input features:
x_1, …, x
_25.

· “
hw2.q1.new.csv” contains 30 rows and 26 columns. The first column ‘
ID’ is an identifier for 30 unlabeled samples. The remaining 25 columns contain input features:
x_1, …, x
_25.

Task 1. [4 points]

Use 5-fold cross-validation with the 10,000 labeled exampled from “
hw2.q1.train.csv” to determine the fewest number of rules using which a decision tree classifier can achieve mean cross-validation accuracy of at least 0.96. Report the number of rules needed, the cross-validation accuracy obtained, and all the hyper-parameter values for the
DecisionTreeClassifier.

Fewest number of rules needed: ………………. (to achieve mean cross-validation accuracy of at least 0.96)

Mean cross-validation accuracy: ………………………. (
rounded to 4 decimal places)

Non-default hHyper-parameter values for selected DecisionTreeClassifier model:


Task 2. [2 Points]

Train a
DecisionTreeClassifier with the hyper-parameter values determined in Task 1 on all 10,000 training samples and use it to predict the output class ‘
y’ for the 2,000 examples in “
hw2.q1.test.csv
. Report the following:

·
Accuracy on 2,000 test examples: …………………… (rounded to 4 decimal places)

·
Classification report for the 2,000 test examples:

·
Confusion matrix for the 2,000 test examples:


Task 3. [2 Points]

Use the model trained in Task 2 to predict the output class ‘
y’ for the 30 examples in “
hw2.q1.new.csv”. Specify the predicted classes in the table below:

ID

predicted y

1

 

2

 

3

 

4

 

5

 

6

 

7

 

8

 

9

 

10

 

11

 

12

 

13

 

14

 

15

 

16

 

17

 

18

 

19

 

20

 

21

 

22

 

23

 

24

 

25

 

26

 

27

 

28

 

29

 

30

 



Task 4. [2 Points]

Of the 25 input variables which ones are relevant for this classification task?

The following … input variables are relevant for this classification task: …………………

Display your trained decision tree:

Question 2. Supervised machine learning classifiers [10 Points]

Data: The zip file “
hw2.q2.data.zip” contains 3 CSV files:

· “
hw2.q2.train.csv” contains 8,000 rows and 11 columns. The first column ‘
y’ is the output variable with 4 classes: 0, 1, 2, 3. The remaining 10 columns contain input features:
x1, …, x
10.

· “
hw2.q2.test.csv” contains 2,000 rows and 11 columns. The first column ‘
y’ is the output variable with 4 classes: 0, 1, 2, 3. The remaining 10 columns contain input features:
x1, …, x
10.

· “
hw2.q1.new.csv” contains 30 rows and 10 columns. The first column ‘
ID’ is an identifier for 30 unlabeled samples. The remaining 10 columns contain input features:
x1, …, x
10.

Task 1. [6 points]

Use 4-fold cross-validation with the 8,000 labeled exampled from “
hw2.q2.train.csv” to identify a classifier that achieves mean cross-validation accuracy of at least 0.96. You should try several
Scikit-Learn classifiers, including:
GaussianNB, DecisionTreeClassifier, RandomForestClassifier, ExtraTreesClassifier, KNeighborsClassifier, LogisticRegression, SVC, and MLPClassifier. Try different hyper-parameter values for the better performing classifiers to obtain a good set of hyper-parameter values. Then select the best performing model. Report the following:


Selected model with hyper-parameter values

:

Mean cross-validation accuracy: ………………………. (
rounded to 4 decimal places)


Task 2. [2 Points]

Train the classifier with the hyper-parameter values determined in Task 1 on all 8,000 training samples and use it to predict the output class ‘
y’ for the 2,000 examples in “
hw2.q2.test.csv
. Report the following:

·
Accuracy on 2,000 test examples: …………………… (rounded to 4 decimal places)

·
Classification report for the 2,000 test examples:

·
Confusion matrix for the 2,000 test examples:


Task 3. [2 Points]

Use the model trained in Task 2 to predict the output class ‘
y’ for the 30 examples in “
hw2.q2.new.csv”. Specify the predicted classes in the table below:

ID

predicted y

ID_001

 

ID_002

 

ID_003

 

ID_004

 

ID_005

 

ID_006

 

ID_007

 

ID_008

 

ID_009

 

ID_010

 

ID_011

 

ID_012

 

ID_013

 

ID_014

 

ID_015

 

ID_016

 

ID_017

 

ID_018

 

ID_019

 

ID_020

 

ID_021

 

ID_022

 

ID_023

 

ID_024

 

ID_025

 

ID_026

 

ID_027

 

ID_028

 

ID_029

 

ID_030

 

Share This Post

Email
WhatsApp
Facebook
Twitter
LinkedIn
Pinterest
Reddit

Order a Similar Paper and get 15% Discount on your First Order

Related Questions

Blockchain Presentation

presentation about a blockchain Term Project CIS 5730- Blockchain Application Project Guideline Part ii–Presentation Along with the project you will present what you did to the class in a short presentation. If you want, you can demonstrate the project, or just present an overview of what you did. Presentations should

Network Threats

  Questions: Below is a list of common network attacks: Distributed Denial of Service (DDoS) DNS poisoning ARP poisoning Domain hijacking MAC flooding MAC cloning Man-in-the-Middle Explain two of these network attacks and discuss methods/techniques for protecting the network against them.

Best Practices for Role-Based Privilege Management in Databases

  Research the best practices for managing and securing role-based privileges in databases and address the following questions in your discussion post:  What are the key challenges in managing role-based privileges, and how do they impact database security? Can you provide a real-world scenario where poor role-based privilege management led

Computer Science Cloud Computing assignment

Please see attachment for details 4 [ Note: To complete this template, replace the bracketed text with your own content. Remove this note before you submit your paper.] Cloud Computing Evaluation Paper 1 Cloud Computing [Differentiate between cloud computing models and their uses. What are the different types of deployment

Computer Science- Python Python Programming Assignment

Computer Science Python Programming Lab assignment help. Design a Python program that calculates a credit card customer’s minimum payment based on their balance. Please use If Else and elif statements in your code. You must add comments in your code. Please round the minimum payment to two decimal places using

Blockchain Application

I need help with my coursework project. Let me know if you need any help from me.  Lab 2: Blockchain-Based Academic Credentials Lab Lastname: ______________ Firstname:________________ 1. Overview AETHELRED UNIVERSITY Registrar’s Office wants to start issuing a digital transcripts and diploma for graduating students. You are called to build the

LOREM, IPSUM

The ability to develop a risk register is a skill needed by all cybersecurity leaders when assessing cybersecurity risks. A risk register provides a detailed listing of known risks as well as quantitative or qualitative assessments of those risks, resulting in the prioritization of action.

Virtual LANs

  Questions: A VLAN allows different devices to be connected virtually to each other as if they were in a LAN sharing a single broadcast domain. 1. Why a network engineer would want to deploy VLANs? 2. How do VLANs improve network security?

compliance and rules to follow in cybersecurity.

Follow the attached instructions to complete this work. Note: Make sure to follows rubric or aligns with Rubric. Unit 8 Assignment Directions: Case Study Review the following hypothetical case study. Consider the big-picture ideas and the specific concerns. Make use of the key terms and concepts from the readings in

Discussion on data ( computer science)

Follow the attached direction to complete this work Unit 7 Discussion   Overview Consider this scenario: PQR Corporation provides facial recognition technology to customers. Its products include customer access to consumer electronics as well as mass surveillance capabilities through networked camera systems. While operating legally, PQR has maintained a low

Computer Science – Machine Learning Python Programming Assignment

Assignment Help. Please don’t forget to add comments in the code Page 1 of 3 NorQuest College – CMPT 1011: Lab Assignment 5 CMPT 1011: Introduction to Computing Lab Assignment 2: Variables, mathematical operations and data types Value This coding challenge is worth 3% of your final grade. Background In

Public safety Communications

Subscribe The Communications and Cyber Resiliency Toolkit provides guidance for establishing resiliency measures, public safety communications can better withstand potential disruptions to service. This toolkit, developed by CISA, describes networks and systems critical to successful communication and cyber resiliency and possible threats while providing many resources and additional links for

Case Study 4 o Data (computer)

Follow the attached instructions to complete this work Unit 4 Case Study Directions Review the following case study. Consider both the big-picture ideas and the specific concerns. Make use of the key terms and concepts from the readings in your written responses to the questions below. The case study paper

Discussion 5 and 6

Follow  the attached instructions to complete this work Unit 5 Discussion   Overview In this discussion, you will be considering the emphasis on aspects such as privacy and safety. You will reflect on the significance of the legal concerns and goals of public-private partnerships to address cybersecurity. You will also

SQL injection

Hey! ????  I need an expert in SQL injection, DDOS attack, Code injection attack, XSS attack! To talk further please contact me on discord at mara411 so we can talk more freely and then I will hire you on here! Thanks ???? 

Free CAD, FeniCS or paraview

I have attached the picture and sample work too, I need work as like sampl, but not the copypasted Make sure you can ask me multiple questions but not dont do rubbish work

database

2. Final Assignment – equivalent to 4,000 words The final module mark is based on two deliverables focused on the CarNow case study described below. – 50% of the final mark a. An advisory report – 50 % of the final mark Includes 5% (of the module grade) given for