Our Services

Get 15% Discount on your First Order

[rank_math_breadcrumb]

channing 4.25 r studio work

follow the insrtuction

Group Project

Mostafa Rezaei

Big Data (Introduction to Data Science)

General information

The group project gives you the opportunity to practice many of the skills we learned in the class.

It includes 5 steps:

Step 1: Find and describe a data set

Find a publicly available data set. The data set should not be from the UCI ML Repository or any other
data set commonly used in ML competitions. Open Data Initiative websites are also good places to find
data sets, for example:



Once you find a dataset, you should figure out:

• the individual or organization that created it;
• the purpose of its creation;
• its terms of use;
• how the data was collected and the sampling procedure;
• the definitions of the variables and their units.

Step 2: Perform an initial EDA

Perform an initial EDA, where you create plots and group summaries to understand the variation of each
variable, including typical values, clusters, outliers, missing values, etc.

Step 3: Perform an in-depth EDA

In this step, you should perform an in-depth EDA in order to discover interesting covariations and patterns
in the data.

As discussed in the class, you should go through an iterative cycle of asking questions about the data and
finding answers to the questions using data transformation and visualization. Investigate the answers you
obtain with curiosity and skepticism and follow-up with further (more detailed) questions.

1

Step 4: Build a prediction model

In the final step, you should think about an interesting prediction problem using your dataset. Think about
the details of the prediction model, including

• the response variable and predictor variables;
• the evaluation metrics;
• how you will conduct CV to estimate the out-of-sample performance of the model and to tune the

hyper-parameters of the model.

Step 5: Present your findings!

The value of your research is limited if you keep it to yourself. So in this step you will polish your most
interesting findings in a presentations for the world to see. See below for details.

Deliverables

1. Create two plots that visually represent your most notable findings from your EDA:

• The plots should be created using ggplot2
• Ensuring they are polished and self-contained with meaningful titles, subtitles, labels, and captions
• Save each plot separately as an PNG or PDF file
• For preparing plots for communication, see

2. Build a prediction model and calculate its out-of-sample performance

• Specify the response and predictor variables
• Specify the evaluation metrics used
• Specify how you perform CV to obtain an estimate of its out-of-sample performance and to tune

its hyper-parameters

3. Provide a R Notebook file (with an extension .Rmd) containing your code and code outputs, such as
plots and tables

• Output a HTML file from the notebook, ensuring it correctly displays all code and outputs
• Use minimal comments in your code and follow the Tidyverse style guide:

org/index.html
• On the first line of the R Notebook, briefly list the contributions of each team member

4. Record a 5-minute presentation of your work

• Create a 5-slide presentation
• Slide 1: Contextual information about the dataset
• Slide 2: A detailed description of the dataset
• Slides 3 and 4: The two plots showcasing your primary EDA findings
• Slide 5: Information and results of your trained prediction model
• Your recorded presentation should not exceed 5 minutes

Submission details

• Upload your work on the dedicated assignment for the group project on BB
• Only one person per group needs to submit their group’s work
• Compress all your files into a ZIP file, containing

2

– the 2 PNG or PDF files of the EDA plots
– Your R Notebook, with an extension .Rmd
– The HTML file created from your R Notebook
– Your presentation file in PDF format
– The recording of your presentation

3

  • General information
    • Step 1: Find and describe a data set
    • Step 2: Perform an initial EDA
    • Step 3: Perform an in-depth EDA
    • Step 4: Build a prediction model
    • Step 5: Present your findings!
  • Deliverables
  • Submission details

Share This Post

Email
WhatsApp
Facebook
Twitter
LinkedIn
Pinterest
Reddit

Order a Similar Paper and get 15% Discount on your First Order

Related Questions

computering part 7

The goal of this project is to integrate your various components into polished, professional products. Follow the instructions below to ensure a successful submission: Apply Feedback: Review and incorporate all feedback received from previous submissions (Parts 2-6). Enhance and Improve: Refine any of the three required items (cover letter with

Discussion and Replies

Please see attachment for instructions     Discussion   In 250 words total, answer the questions below with 4 evidence base scholarly articles. APA format. Based on the readings this week, 1. Discuss some common strategies and pitfalls you have seen with business continuity. 2. Discuss some common strategies and

sociology

The goal of this project is to integrate your various components into polished, professional products. Follow the instructions below to ensure a successful submission: Apply Feedback: Review and incorporate all feedback received from previous submissions (Parts 2-6). Enhance and Improve: Refine any of the three required items (cover letter with

Python

  Instructions Create a simple Python application (Save as w5_firstname_lastname.py) . Create a Python script that takes two parameters to do the following:- 1) List all files names, size, date created in the given folder 2) Parameter1 = Root Folder name Parameter2= File size >>> to filter file size (

Python

  Instructions:  Describe methods for securing Python code. Pick at least ONE of the methods for securing node and deep dive into what it means and how it is used to secure code.   

Discussion 8 of 459

Follow the attach instruction to complete the work. 1. What is one specific technology you found the most intriguing throughout the course? 2. If you were to be a hacker, which building block vector would you choose to attack your selected technology and why?

WK 4 Discussion and Replies

Please see attachment for instructions     Discussion   In 250 words total, answer the questions below with 4 evidence base scholarly articles. APA format. Discussion on access control and physical security. These areas found to be one or more points of weakness in audit 1. Discuss some common points of

Node.js

  Instructions Create a simple Node.js server (Save as w4_firstname_lastname.js) . Create a restful application similar to the one in lesson 4 (ReSTFul Web Services). Document the routing table, and the application you created. Submit your week 4 work in w4_firstname_lastname.txt (Please save the file as a text file and

Computer Science- Python Gurobi assignment

I need the output following these steps: Put all of these files into the same folder, Open the python file, If there is any error, check if any file is missing, It has 105 counties and 4 districts, so it will take a while to finish running. I need it

Research Project

Please follow the instructions attached below:  I have choose the topic from the list is:   PROJECT TITLE Firm RTOS – Balancing Real-Time Performance and Flexibility Please check the abstract from my file and write the research project. 

provide me java based interview question.

Core Java Interview Questions (Basic Level) 1. What is Java? Java is a high-level, object-oriented, platform-independent programming language developed by Sun Microsystems. 2. What are the features of Java?  Object-Oriented  Platform Independent (via JVM)  Secure and Robust  Multithreaded  Architecture Neutral  High Performance (via JIT

Dynamons world Mod APK

 What are the best tips for playing RPG games like Dynamons World? I recently found a great resource at that offers a lot of insights and even MOD APKs for Dynamons World, but I’d love to hear personal strategies and gameplay advice from the community too! ???????? 

459 w7

Follow the attach instructions to complete this work. Questions: 1. What is Generative AI and how is it similar/different to Traditional AI? 2. Do you believe that work created by Generative AI (e.g. ChatGPT) is comparable in quality to human created content?  What challenges and opportunities  does Generative AI pose

Computer Science WK3 Assignment

Please see attachment for instructions ISSC680 Week 3 Homework Assignment Instructions: Please provide a one-page response to the following topic utilizing supporting documentation obtained from the attach books and the Internet. APA format and reference. Topic: Differentiate between the different types of cryptographic algorithms.

Discussion and Replies

Please see attachment for instructions     Discussion   In 250 words total, answer the questions below with 4 evidence base scholarly articles. APA format. Based on this weeks readings, 1. Discuss some effective strategies for Security Awareness in your organization or 2. What you would like to see implemented