Our Services

Get 15% Discount on your First Order

[rank_math_breadcrumb]

Computer Science Homework 2

Homework 2.

Question 1. Decision Tree Classifier [10 Points]

Data: The zip file “
hw2.q1.data.zip” contains 3 CSV files:

· “
hw2.q1.train.csv” contains 10,000 rows and 26 columns. The first column ‘
y’ is the output variable with 2 classes: 0, 1. The remaining 25 columns contain input features:
x_1, …, x
_25.

· “
hw2.q1.test.csv” contains 2,000 rows and 26 columns. The first column ‘
y’ is the output variable with 2 classes: 0, 1. The remaining 25 columns contain input features:
x_1, …, x
_25.

· “
hw2.q1.new.csv” contains 30 rows and 26 columns. The first column ‘
ID’ is an identifier for 30 unlabeled samples. The remaining 25 columns contain input features:
x_1, …, x
_25.

Task 1. [4 points]

Use 5-fold cross-validation with the 10,000 labeled exampled from “
hw2.q1.train.csv” to determine the fewest number of rules using which a decision tree classifier can achieve mean cross-validation accuracy of at least 0.96. Report the number of rules needed, the cross-validation accuracy obtained, and all the hyper-parameter values for the
DecisionTreeClassifier.

Fewest number of rules needed: ………………. (to achieve mean cross-validation accuracy of at least 0.96)

Mean cross-validation accuracy: ………………………. (
rounded to 4 decimal places)

Non-default hHyper-parameter values for selected DecisionTreeClassifier model:


Task 2. [2 Points]

Train a
DecisionTreeClassifier with the hyper-parameter values determined in Task 1 on all 10,000 training samples and use it to predict the output class ‘
y’ for the 2,000 examples in “
hw2.q1.test.csv
. Report the following:

·
Accuracy on 2,000 test examples: …………………… (rounded to 4 decimal places)

·
Classification report for the 2,000 test examples:

·
Confusion matrix for the 2,000 test examples:


Task 3. [2 Points]

Use the model trained in Task 2 to predict the output class ‘
y’ for the 30 examples in “
hw2.q1.new.csv”. Specify the predicted classes in the table below:

ID

predicted y

1

 

2

 

3

 

4

 

5

 

6

 

7

 

8

 

9

 

10

 

11

 

12

 

13

 

14

 

15

 

16

 

17

 

18

 

19

 

20

 

21

 

22

 

23

 

24

 

25

 

26

 

27

 

28

 

29

 

30

 



Task 4. [2 Points]

Of the 25 input variables which ones are relevant for this classification task?

The following … input variables are relevant for this classification task: …………………

Display your trained decision tree:

Question 2. Supervised machine learning classifiers [10 Points]

Data: The zip file “
hw2.q2.data.zip” contains 3 CSV files:

· “
hw2.q2.train.csv” contains 8,000 rows and 11 columns. The first column ‘
y’ is the output variable with 4 classes: 0, 1, 2, 3. The remaining 10 columns contain input features:
x1, …, x
10.

· “
hw2.q2.test.csv” contains 2,000 rows and 11 columns. The first column ‘
y’ is the output variable with 4 classes: 0, 1, 2, 3. The remaining 10 columns contain input features:
x1, …, x
10.

· “
hw2.q1.new.csv” contains 30 rows and 10 columns. The first column ‘
ID’ is an identifier for 30 unlabeled samples. The remaining 10 columns contain input features:
x1, …, x
10.

Task 1. [6 points]

Use 4-fold cross-validation with the 8,000 labeled exampled from “
hw2.q2.train.csv” to identify a classifier that achieves mean cross-validation accuracy of at least 0.96. You should try several
Scikit-Learn classifiers, including:
GaussianNB, DecisionTreeClassifier, RandomForestClassifier, ExtraTreesClassifier, KNeighborsClassifier, LogisticRegression, SVC, and MLPClassifier. Try different hyper-parameter values for the better performing classifiers to obtain a good set of hyper-parameter values. Then select the best performing model. Report the following:


Selected model with hyper-parameter values

:

Mean cross-validation accuracy: ………………………. (
rounded to 4 decimal places)


Task 2. [2 Points]

Train the classifier with the hyper-parameter values determined in Task 1 on all 8,000 training samples and use it to predict the output class ‘
y’ for the 2,000 examples in “
hw2.q2.test.csv
. Report the following:

·
Accuracy on 2,000 test examples: …………………… (rounded to 4 decimal places)

·
Classification report for the 2,000 test examples:

·
Confusion matrix for the 2,000 test examples:


Task 3. [2 Points]

Use the model trained in Task 2 to predict the output class ‘
y’ for the 30 examples in “
hw2.q2.new.csv”. Specify the predicted classes in the table below:

ID

predicted y

ID_001

 

ID_002

 

ID_003

 

ID_004

 

ID_005

 

ID_006

 

ID_007

 

ID_008

 

ID_009

 

ID_010

 

ID_011

 

ID_012

 

ID_013

 

ID_014

 

ID_015

 

ID_016

 

ID_017

 

ID_018

 

ID_019

 

ID_020

 

ID_021

 

ID_022

 

ID_023

 

ID_024

 

ID_025

 

ID_026

 

ID_027

 

ID_028

 

ID_029

 

ID_030

 

Share This Post

Email
WhatsApp
Facebook
Twitter
LinkedIn
Pinterest
Reddit

Order a Similar Paper and get 15% Discount on your First Order

Related Questions

Week 10

Read attachment for details  Week 8 Feedback Overall Feedback Theory is one of the most difficult concept to grasp.  Your study must be based on a theory and align with what you are attempting to explore and what you are trying to answer based on previous gaps in research. Well

hw2

This problem exercises the basic concepts of game playing, using tic-tac-toe as an example.  We define Xn as the number of rows, columns, or diagonals with exactly n X’s and no O’s.  Similarly, On is the number of rows, columns, or diagonals with exactly n O’s.  The utility function assigns

Incident Response

Please follow the PDF WGU Performance Assessment  Please create report attach is the doc file to use  also included are the lab results with screen shots of answer  -Create “Incident Reporting Template” with file attach -Use screenshot evidence document, in .docx format, generated by the virtual lab for guidance and

Week 8

Read attachment for details Theoretical Framework – Week 8 Hide Assignment Information Turnitin™ Turnitin™ enabledThis assignment will be submitted to Turnitin™. Instructions This week you will submit your theoretical framework. The following description for this section of your thesis is from the End of Program Manual (EOP): Theoretical Framework/Approach: The

In Basketball Stars, a player attempts 25 shots in one game.

  In  basketball stars, a player attempts 25 shots in one game. a) If 15 shots are successful, what is the player’s shooting percentage? b) The next game, the player makes 18 out of 30 shots. Compare the two shooting percentages. c) What is the overall shooting percentage across both

problem

Research problems due 9/18 Please follow the instructions carefully for your research problem. Your argument and research input will significantly impact your grade. Ensure that you check for AI-generated content and plagiarism before submitting your paper. AI-generated content should not exceed 10%, and content from external sources should be limited

co task 6

Topic-bitcoin Task 6 Objective: To apply systems thinking principles to analyze a blockchain network and understand its key components, interactions, and dynamics. Assignment Tasks: Select a Blockchain Network: Choose a specific blockchain network or cryptocurrency project to analyze. You can select well-known networks like Bitcoin, Ethereum, or any other blockchain

CO Task 5

In this homework, we explore Naïve Bayes, K-Nearest Neighbors, and Support Vector Machine models. 1) (50 points) Use “credit_Dataset.arff” dataset and apply the Naïve Bayes, K-Nearest Neighbors, and Support Vector Machine technique using the WEKA tool in 2 different settings, including: a. 10 fold-cross validation. b. 80% training. Write a

PhD thesis

I need a comprehensive PhD thesis developed on the topic of “Emotion-Aware Artificial Intelligence and Sustainable Consumer Behavior: A Neuro-AI Marketing Framework for Continuous Green Consumption.”

Co project

· Comprehensive Literature Review: Require a more comprehensive survey of existing approaches. · Comparative Study: Expect more detailed benchmarking of at least 8 to 10 machine learning models. · Additional Experiments: · Conduct feature selection or dimensionality reduction as an extra step. · Explore ensemble methods or advanced techniques beyond

AI

Did AI take place the Software Engineers, HR consultants and Data Entry Jobs?

Data visualization 4 part 2

Follow the attached instructions to complete this work. Unit 4 Assignment Directions: Time Series In this assignment, you will perform a time series analysis in Tableau. · Choose a dataset to analyze based on the requirements provided.   · Once you’ve selected your time series, build a forecast to predict future

Computer Science CG Assignment 8 presentation

Follow the attach instruction to complete this work. Note: Make sure it aligns with Rubric Unit 8 Assignment 2 Directions: Final Presentation Purpose With this presentation, you will gain valuable experience demonstrating your expertise in cybersecurity governance by presenting as a CISO to a hypothetical professional audience.  Directions Begin by incorporating

Computer Science CG assignment 8

Follow the attached assignment to complete the work. Note: Follow Rubric Unit 8 Assignment 1 Directions: Presentation Rehearsal Purpose The rehearsal is your first run-through of your talk. Use the opportunity to de-bug any technical issues with lighting, positioning, and recording. You will not be graded on technical or artistic

Computer Science CG assignment 7 Outline

 Follow the attached document to complete this work Unit 7 Assignment 1 Directions: Professional Presentation Outline Purpose This assignment allows you time to review your research from previous units and organize your thoughts in an outline format. Plan on changing your paper and presentation based on feedback on this outline.  Directions

Computer Science CG assignment 6 ,

Follow the attached direction to complete this work. Note: Make sure it Aligns with Rubric Unit 6 Assignment 2 Directions: Timothy Brown vs. the SEC Purpose The Securities and Exchanges Commission (SEC) is a key US federal agency that regulates financial reporting. In this paper, you will explore how the