Our Services

Get 15% Discount on your First Order

[rank_math_breadcrumb]

channing 4.25 r studio work

follow the insrtuction

Group Project

Mostafa Rezaei

Big Data (Introduction to Data Science)

General information

The group project gives you the opportunity to practice many of the skills we learned in the class.

It includes 5 steps:

Step 1: Find and describe a data set

Find a publicly available data set. The data set should not be from the UCI ML Repository or any other
data set commonly used in ML competitions. Open Data Initiative websites are also good places to find
data sets, for example:



Once you find a dataset, you should figure out:

• the individual or organization that created it;
• the purpose of its creation;
• its terms of use;
• how the data was collected and the sampling procedure;
• the definitions of the variables and their units.

Step 2: Perform an initial EDA

Perform an initial EDA, where you create plots and group summaries to understand the variation of each
variable, including typical values, clusters, outliers, missing values, etc.

Step 3: Perform an in-depth EDA

In this step, you should perform an in-depth EDA in order to discover interesting covariations and patterns
in the data.

As discussed in the class, you should go through an iterative cycle of asking questions about the data and
finding answers to the questions using data transformation and visualization. Investigate the answers you
obtain with curiosity and skepticism and follow-up with further (more detailed) questions.

1

Step 4: Build a prediction model

In the final step, you should think about an interesting prediction problem using your dataset. Think about
the details of the prediction model, including

• the response variable and predictor variables;
• the evaluation metrics;
• how you will conduct CV to estimate the out-of-sample performance of the model and to tune the

hyper-parameters of the model.

Step 5: Present your findings!

The value of your research is limited if you keep it to yourself. So in this step you will polish your most
interesting findings in a presentations for the world to see. See below for details.

Deliverables

1. Create two plots that visually represent your most notable findings from your EDA:

• The plots should be created using ggplot2
• Ensuring they are polished and self-contained with meaningful titles, subtitles, labels, and captions
• Save each plot separately as an PNG or PDF file
• For preparing plots for communication, see

2. Build a prediction model and calculate its out-of-sample performance

• Specify the response and predictor variables
• Specify the evaluation metrics used
• Specify how you perform CV to obtain an estimate of its out-of-sample performance and to tune

its hyper-parameters

3. Provide a R Notebook file (with an extension .Rmd) containing your code and code outputs, such as
plots and tables

• Output a HTML file from the notebook, ensuring it correctly displays all code and outputs
• Use minimal comments in your code and follow the Tidyverse style guide:

org/index.html
• On the first line of the R Notebook, briefly list the contributions of each team member

4. Record a 5-minute presentation of your work

• Create a 5-slide presentation
• Slide 1: Contextual information about the dataset
• Slide 2: A detailed description of the dataset
• Slides 3 and 4: The two plots showcasing your primary EDA findings
• Slide 5: Information and results of your trained prediction model
• Your recorded presentation should not exceed 5 minutes

Submission details

• Upload your work on the dedicated assignment for the group project on BB
• Only one person per group needs to submit their group’s work
• Compress all your files into a ZIP file, containing

2

– the 2 PNG or PDF files of the EDA plots
– Your R Notebook, with an extension .Rmd
– The HTML file created from your R Notebook
– Your presentation file in PDF format
– The recording of your presentation

3

  • General information
    • Step 1: Find and describe a data set
    • Step 2: Perform an initial EDA
    • Step 3: Perform an in-depth EDA
    • Step 4: Build a prediction model
    • Step 5: Present your findings!
  • Deliverables
  • Submission details

Share This Post

Email
WhatsApp
Facebook
Twitter
LinkedIn
Pinterest
Reddit

Order a Similar Paper and get 15% Discount on your First Order

Related Questions

The Equifax Data Breach

Briefly explain what the Equifax data breach was, why it happened, and why it was important. You should mention that Equifax failed to protect personal data, discuss the ethical and security issues involved, and explain what companies can learn from the incident. Please see attached documents. 

Reflection on Big Data and AI

please see attached DAT 260 Module Five Journal Guidelines and Rubric Overview In this journal assignment, you will reflect on key concepts covered in this module. This assignment directly supports your work in Project Two, which is due in Module Seven. Directions In a well-crafted journal entry, address the following:

Disaster Recovery

  Questions: Research the network and server outage problems experienced during a previous man-made or natural disaster and answer the following questions: What parts of the infrastructure was impacted? How were the networks recovered? How redundancy could have mitigated the impact of the disaster?

Network Design and Plan Purpose

Please review attachment Mod 4 Project: Network Design and Plan Purpose In this module, we will introduce a course design project that will be completed in four parts during the course. This project provides you an opportunity to solve a comprehensive problem in firewall and virtual private network (VPN) implementation

Memecoins & Stablecoins Development

Please take a look at the attachment. Lab #3: Blockchain-Based Ecosystems: Memecoins & Stablecoins Development, Attack Simulation, and Auditing Lastname: ______________ First name:____________Date_______ TA_Prove_ Yes/No 1. Overview This lab guides students through designing, implementing, deploying, testing, and auditing a simple memecoin (ERC‑20) and a minimal ecosystem around it (liquidity pool,

Blockchain Presentation

presentation about a blockchain Term Project CIS 5730- Blockchain Application Project Guideline Part ii–Presentation Along with the project you will present what you did to the class in a short presentation. If you want, you can demonstrate the project, or just present an overview of what you did. Presentations should

Network Threats

  Questions: Below is a list of common network attacks: Distributed Denial of Service (DDoS) DNS poisoning ARP poisoning Domain hijacking MAC flooding MAC cloning Man-in-the-Middle Explain two of these network attacks and discuss methods/techniques for protecting the network against them.

Best Practices for Role-Based Privilege Management in Databases

  Research the best practices for managing and securing role-based privileges in databases and address the following questions in your discussion post:  What are the key challenges in managing role-based privileges, and how do they impact database security? Can you provide a real-world scenario where poor role-based privilege management led

Computer Science Cloud Computing assignment

Please see attachment for details 4 [ Note: To complete this template, replace the bracketed text with your own content. Remove this note before you submit your paper.] Cloud Computing Evaluation Paper 1 Cloud Computing [Differentiate between cloud computing models and their uses. What are the different types of deployment

Computer Science- Python Python Programming Assignment

Computer Science Python Programming Lab assignment help. Design a Python program that calculates a credit card customer’s minimum payment based on their balance. Please use If Else and elif statements in your code. You must add comments in your code. Please round the minimum payment to two decimal places using

Blockchain Application

I need help with my coursework project. Let me know if you need any help from me.  Lab 2: Blockchain-Based Academic Credentials Lab Lastname: ______________ Firstname:________________ 1. Overview AETHELRED UNIVERSITY Registrar’s Office wants to start issuing a digital transcripts and diploma for graduating students. You are called to build the

LOREM, IPSUM

The ability to develop a risk register is a skill needed by all cybersecurity leaders when assessing cybersecurity risks. A risk register provides a detailed listing of known risks as well as quantitative or qualitative assessments of those risks, resulting in the prioritization of action.

Virtual LANs

  Questions: A VLAN allows different devices to be connected virtually to each other as if they were in a LAN sharing a single broadcast domain. 1. Why a network engineer would want to deploy VLANs? 2. How do VLANs improve network security?

compliance and rules to follow in cybersecurity.

Follow the attached instructions to complete this work. Note: Make sure to follows rubric or aligns with Rubric. Unit 8 Assignment Directions: Case Study Review the following hypothetical case study. Consider the big-picture ideas and the specific concerns. Make use of the key terms and concepts from the readings in

Discussion on data ( computer science)

Follow the attached direction to complete this work Unit 7 Discussion   Overview Consider this scenario: PQR Corporation provides facial recognition technology to customers. Its products include customer access to consumer electronics as well as mass surveillance capabilities through networked camera systems. While operating legally, PQR has maintained a low

Computer Science – Machine Learning Python Programming Assignment

Assignment Help. Please don’t forget to add comments in the code Page 1 of 3 NorQuest College – CMPT 1011: Lab Assignment 5 CMPT 1011: Introduction to Computing Lab Assignment 2: Variables, mathematical operations and data types Value This coding challenge is worth 3% of your final grade. Background In