Description
see
Project
Deadline: Sunday 30/11/2025 @ 23:59
[Total Mark for this Project is 14]
Group Details:
CRN: 14845
Name: DEEQA AHMED
Name: NORAH ALDOSSARI
Name: JUMARA ALABBAD
Name: AMJAD ALSHAHRANI
Name: NOURH AL-MARRAR
ID: s210031140
ID: s220017802
ID: s200173248
ID: s210024897
ID: s220036429
Instructions:
• You must submit two separate copies (one Word file and one PDF file) using the Assignment Template on
Blackboard via the allocated folder. These files must not be in compressed format.
• It is your responsibility to check and make sure that you have uploaded both the correct files.
• Zero mark will be given if you try to bypass the SafeAssign (e.g., misspell words, remove spaces between
words, hide characters, use different character sets, convert text into image or languages other than English
or any kind of manipulation).
• Email submission will not be accepted.
• You are advised to make your work clear and well-presented. This includes filling your information on the cover
page.
• You must use this template, failing which will result in zero mark.
• You MUST show all your work, and text must not be converted into an image, unless specified otherwise by
the question.
• Late submission will result in ZERO mark.
• The work should be your own, copying from students or other resources will result in ZERO mark.
• Use Times New Roman font for all your answers.
Restricted – مقيد
Project
Pg. 01
Learning Outcome(s):
CLO 1, 2, 5
1, Demonstrate an
understanding of the
concepts of decision
analysis and decision
support systems (DSS)
including probability,
modelling, decisions under
uncertainty, and real-world
problems.
Project
14 Marks
Teams & Datasets:
•
Teams: 3–4 students. Email team member names to your instructor by 5 Oct
2025. After that, teams will be randomly formed and datasets assigned.
•
Dataset: Each team must work on a unique dataset. Choose one from the link
below (or the instructor’s assigned set):
Saudi Open Data Platform:
Tools (pick what fits your dataset):
2, Describe advanced
Business Intelligence,
Business Analytics, Data
Visualization, and
Dashboards.
5, Improve hands-on skills
using Excel, and Orange
for building Decision
Support Systems.
•
Required: Microsoft Excel or Orange Data Mining for EDA/modelling.
•
For dashboards: Excel or Power BI (recommended).
Learning Outcomes:
You will:
1. Frame a data-driven decision problem with stakeholders and KPIs.
2. Clean and profile data; document data quality issues and fixes.
3. Perform descriptive statistics and EDA with appropriate visuals.
4. Test a quantitative hypothesis; interpret correlation/regression.
5. Train and evaluate at least two ML models aligned to the decision.
6. Build a dashboard summarizing actionable insights for decision-makers.
7. Produce clear recommendations, assumptions, and what-if takeaways.
Deliverables:
1.
Report (PDF + Word) which must incorporate all the following 7 tasks and
written using the provided template. (10 marks distributed among the below tasks).
2.
Restricted – مقيد
Slide deck (10–12 slides in 6 mins) for in-class presentation. (4 marks).
Project
Pg. 02
Project Tasks & Rubric (Total = 14 marks):
Evidence rule: every task must include screenshots/figures and a 2-4 sentence
interpretation (what it shows, why it matters for the decision).
Task 1: Problem & Data Understanding (2 marks)
•
Decision context: Who is the decision-maker? What decision will the analysis support?
Define KPIs (2–4).
•
Dataset description: source reliability, collection method, size, time span, unit of
analysis.
•
Data dictionary: list key features, types, and expected roles (predictor/target/ID).
•
Hypothesis: one testable relationship between two numerical variables (directional,
with rationale).
Task 2: Data Quality & Preparation (1 marks)
•
•
Show tests and fixes with before/after evidence for:
o
Missing values
o
Duplicates
o
Outliers
o
Noise/irregularities (e.g., inconsistent categories, types, units)
Include a Data Quality Log table: issue → method → action → impact.
Task 3: Descriptive Statistics & EDA (2 marks)
•
Central tendency (mean/median/mode) and distribution shape (variance, SD,
skewness, kurtosis).
•
Restricted – مقيد
Appropriate visuals (histograms/boxplots/density, bar/line where relevant).
Project
Pg. 03
•
2–3 insightful questions you posed from trends/patterns (and brief answers).
Task 4 : Hypothesis Testing & Relationship Analysis (2 marks)
•
Correlation analysis (numeric pair; comment on strength/direction.
•
Simple linear regression (or appropriate alternative): equation, R², residuals check,
and practical interpretation linked to KPIs.
•
Conclusion: accept/reject hypothesis; implications for the decision.
Task 5 :Visual Analytics for Decision-Makers (1 marks)
•
A small, coherent visual story (3 – 4 charts) with correct chart types, clear labels,
and callouts.
•
Each chart must answer a stakeholder-relevant question; include a 1–2 sentence
takeaway.
Task 6: Predictive/Descriptive Modeling (2 marks)
•
Choose 1 – 2 models suitable for your data/task (e.g., Decision Tree, k-NN, Random
Forest, SVM, k-means for segmentation if classification/regression is not
applicable).
•
Document training setup (feature set, split).
•
Evaluation:
o
For classification: confusion matrix, accuracy, precision/recall, and 1 key
trade-off.
o
For regression: MAE/RMSE and an error plot.
o
For clustering: silhouette (or WCSS elbow) + business interpretation of
clusters.
•
Restricted – مقيد
Brief model selection rationale tied to the decision.
Project
Pg. 04
Task 7: Interactive Dashboard & Decision Support (2 marks)
•
Excel or Power BI dashboard with 3–5 tiles: KPIs, filters/slicers, and at least one
“what-if” (e.g., price, volume, threshold).
•
One paragraph on how a manager would use this dashboard to make or justify a
decision.
Report Template (section outline):
1. Executive Summary (½ page) – problem, method, 2–3 key findings,
recommendation.
2. Decision Context & KPIs
3. Data Understanding & Preparation (with Data Quality Log)
4. EDA & Descriptive Statistics
5. Hypothesis & Relationship Analysis
6. Visual Analytics for Decision-Makers
7. Modeling & Evaluation
8. Dashboard & Decision Use Case (with screenshot)
9. Recommendations, Sensitivity/What-If Notes, Limitations, Ethics
10. References (data source + any methods you cite)
Project Report:
Restricted – مقيد
Project
Pg. 05
Project Introduction
This project analyzes data regarding the “Number of Ministry of Health Hospitals Implementing
the Home Health Care Program, Manpower, and Beneficiaries by Health Region for the Year
2023.” The focus is on five major regions: Riyadh, the Holy Capital (Makkah), Jeddah, Taif, and
Medina.
The aim of the project is to provide analytical support for the Director of the Home Health Care
Program at the Ministry of Health. It seeks to develop an optimal strategy for resource allocation
and service expansion. The decisions supported by this analysis include:
1. Redistributing manpower across health regions.
2. Identifying locations for new home health care centers.
3. Prioritizing regions that need immediate performance improvement.
4. Guiding the annual budget allocation for maximum impact.
Secondary stakeholders such as regional health directors, the Healthcare Planning Committee,
the Budget Allocation Department, and the Quality Assurance Team are also involved to ensure
a comprehensive approach and the integrated achievement of the program’s objectives.
Task 1: Problem & Data Understanding
1- Decision-Maker: The decision-maker is most likely the Director of the Home Health
Care Program at the Ministry of Health. This individual utilizes the dataset to make
informed decisions aimed at enhancing home health care services, improving patient
outcomes, optimizing resource allocation, and ensuring higher quality in the delivery of
care.
Decision Support: The dataset assists in making decisions related to:
–
Optimizing resource allocation and guiding strategic service expansion within the
Home Health Care Program.
Restricted – مقيد
Project
Pg. 06
–
Adjusting staffing, budgeting, and service distribution to enhance accessibility and
improve the overall quality of home health care services.
Key Performance Indicators (KPIs). Here are 2 KPIs relevant to this dataset:
1- Resource Efficiency
2- Service Coverage Intensity
KPIs
Description
Why It Matters
Resource Efficiency
Measures the number of
beneficiaries that each staff
member can serve.
This metric helps evaluate
manpower utilization across
different regions.
Service Coverage Intensity
Measures the number of active
cases managed by each staff
member.
Indicates workload distribution
and operational efficiency.
KPIs 1: Resource Efficiency: Formula: Total Beneficiaries ÷ Number of Manpower
Main Findings
Taif demonstrates the highest efficiency rating at 350.7, indicating that each staff member can
serve the greatest number of beneficiaries. In contrast, Riyadh has the lowest efficiency rating
at 190.0, despite having the largest workforce.
Restricted – مقيد
Project
Pg. 07
Jeddah and Madinah show balanced performance levels, with efficiencies of 275.5 and 265.8,
respectively. The Holy Capital maintains a moderate efficiency rate of 219.1, which suggests
there are potential areas for improvement.
KPI 2: Service Coverage Intensity:
Formula: Cases under Service ÷ Number of Manpower
Main Findings
Taif has the highest service intensity at 32.3, indicating that each staff member handles the
greatest number of active cases. In contrast, the Holy Capital exhibits the lowest intensity at 20.6,
suggesting potential underutilization of staff resources.
Jeddah demonstrates strong efficiency with a service intensity of 29.0, despite having fewer staff
members than Riyadh. Meanwhile, Riyadh, despite having the largest workforce, maintains only
a moderate service intensity of 23.5.
3- Dataset Description:
•
Data Source and Reliability: The data were obtained from the National Open Data
Platform (open.data.gov.sa), which is the official government-authorized source for
publishing open data in Saudi Arabia. Since the data originate directly from the Ministry
of Health, they are considered highly credible and reliable. The platform adheres to strict
data governance standards, ensuring the accuracy and validity of the published
information.
Restricted – مقيد
Project
Pg. 08
•
Data Collection Methodology: Data collection was conducted through the administrative
reporting systems of the Ministry of Health hospitals across all regions. The process
follows a standardized reporting framework, in which each health region submits
aggregated annual data on the implementation of the Home Healthcare Program. All data
are validated and reviewed by the Ministry’s central monitoring system before being
officially released.
•
Dataset Size:
–
Number of Records: 20 (representing the health regions)
–
Number of Variables: 6 key variables
–
Coverage: Comprehensive, encompassing all health regions across the
Kingdom of Saudi Arabia
•
•
Time Frame:
–
Reference Period: Reflects data from the year 2023
–
Collection Cycle: Annual aggregation
Unit of Analysis: The primary unit of analysis in this study is the health region, which
represents the administrative divisions of the national healthcare system. Each health
region acts as a distinct observation unit, allowing for comparative analyses to assess
program implementation levels, resource allocation efficiency, and service delivery
outcomes across the healthcare system.
Restricted – مقيد
Project
Pg. 09
Data dictionary: This data dictionary defines each variable’s purpose, improving analysis and
interpretation of results throughout the project.
Variable
Description
Health Region
This variable consists of
(Identifier
the names of the 20
Variable)
health administrative
Type
Role
Purpose of Analysis
Identifier – It is
This variable enables
Categorical
used to distinguish
regional comparisons
(Nominal)
and group data
and helps in
based on
recognizing
geographical
geographical patterns
regions in Saudi Arabia.
location
Number of
Hospitals
(Predictor
Variable)
This variable represents
Numerical
Predictor – This
This variable
the count of Ministry of
Discrete
variable is
measures the
Health hospitals that are
expected to
availability and
implementing the home
influence service
distribution of
health care program in
capacity and
healthcare
each region.
coverage.
infrastructure.
Binary
Predictor –
Assesses the effect of
(Dichotomous)
Represents the
specialized facilities
impact of
on program outcomes.
Home Health
Indicates the presence
specialized
Care Center
(1) or absence (0) of
infrastructure.
(Predictor
dedicated home health
Variable)
care centers.
Values: 0 = No
dedicated center,
1 = Has a
dedicated center
Restricted – مقيد
Project
Pg. 10
Number of
The total number of
Predictor – This is
Manpower
healthcare staff,
a primary resource
(Predictor
including doctors,
Variable)
nurses, and technicians,
Numerical
input variable.
Discrete
to evaluate human
resource allocation
and productivity.
working in the program.
This variable represents
Numerical
Target – It
Number of
the count of active cases
Discrete
measures the
This metric indicates
Cases Under
currently receiving home
current utilization
the intensity of
Service
health care services.
of the program
service delivery in
(Target
and the associated
real-time.
Variable)
workload.
Cumulative
Number of
Beneficiaries
(Primary
Target
Variable)
Restricted – مقيد
This variable is used
This represents the total
Numerical
Primary Target –
The purpose of this
number of unique
Discrete
This is the main
analysis is to evaluate
beneficiaries served
outcome measure
the overall reach and
since the program began.
used to assess the
impact of the
effectiveness of
program.
the program.
Project
Pg. 11
Hypothesis: A testable hypothesis is as follows:
–
Primary Hypothesis: There is a statistically significant positive relationship between
healthcare manpower allocation and cumulative beneficiary coverage across health
regions in the Home Health Care Program.
–
Variables:
•
Independent Variable: Number of healthcare staff (manpower)
•
Dependent Variable: Cumulative number of beneficiaries
(patients served)
–
Expected Relationship: A positive correlation is anticipated; regions with a higher
number of healthcare staff are expected to serve more beneficiaries.
–
Statistical Tests: Pearson Correlation Coefficient
Simple Linear Regression: (y = a + bx)
(y): Cumulative beneficiaries, (x): Number of healthcare staff
Significance Level: (alpha = 0.05)
–
Rationale: Adequate manpower enhances capacity, efficiency, and patient coverage.
Understanding this relationship can help identify whether staffing levels directly
influence program performance and inform workforce planning decisions.
–
–
Expected Outcomes:
•
Strong positive correlation (r > 0.7)
•
Statistically significant results (p
Purchase answer to see full
attachment