Description
Teams & Datasets:
- Teams: 3–4 students. Email team member names to your instructor by 5 Oct 2025. After that, teams will be randomly formed and datasets assigned.
- Dataset: Each team must work on a unique dataset. Choose one from the link below (or the instructor’s assigned set):
Saudi Open Data Platform:
Tools (pick what fits your dataset):
- Required: Microsoft Excel or Orange Data Mining for EDA/modelling.
- For dashboards: Excel or Power BI (recommended).
Learning Outcomes:
You will:
- Frame a data-driven decision problem with stakeholders and KPIs.
- Clean and profile data; document data quality issues and fixes.
- Perform descriptive statistics and EDA with appropriate visuals.
- Test a quantitative hypothesis; interpret correlation/regression.
- Train and evaluate at least two ML models aligned to the decision.
- Build a dashboard summarizing actionable insights for decision-makers.
- Produce clear recommendations, assumptions, and what-if takeaways.
Deliverables:
- Report (PDF + Word) which must incorporate all the following 7 tasks and written using the provided template. (10 marks distributed among the below tasks).
- Slide deck (10–12 slides in 6 mins) for in-class presentation. (4 marks).
Project Tasks & Rubric (Total = 14 marks):
Evidence rule: every task must include screenshots/figures and a 2-4 sentence interpretation (what it shows, why it matters for the decision).
Task 1: Problem & Data Understanding (2 marks)
- Decision context: Who is the decision-maker? What decision will the analysis support? Define KPIs (2–4).
- Dataset description: source reliability, collection method, size, time span, unit of analysis.
- Data dictionary: list key features, types, and expected roles (predictor/target/ID).
- Hypothesis: one testable relationship between two numerical variables (directional, with rationale).
Task 2: Data Quality & Preparation (1 marks)
- Show tests and fixes with before/after evidence for:
- Missing values
- Duplicates
- Outliers
- Noise/irregularities (e.g., inconsistent categories, types, units)
- Include a Data Quality Log table: issue → method → action → impact.
Task 3: Descriptive Statistics & EDA (2 marks)
- Central tendency (mean/median/mode) and distribution shape (variance, SD, skewness, kurtosis).
- Appropriate visuals (histograms/boxplots/density, bar/line where relevant).
- 2–3 insightful questions you posed from trends/patterns (and brief answers).
Task 4 : Hypothesis Testing & Relationship Analysis (2 marks)
- Correlation analysis (numeric pair; comment on strength/direction.
- Simple linear regression (or appropriate alternative): equation, R², residuals check, and practical interpretation linked to KPIs.
- Conclusion: accept/reject hypothesis; implications for the decision.
Task 5 :Visual Analytics for Decision-Makers (1 marks)
- A small, coherent visual story (3 – 4 charts) with correct chart types, clear labels, and callouts.
- Each chart must answer a stakeholder-relevant question; include a 1–2 sentence takeaway.
Task 6: Predictive/Descriptive Modeling (2 marks)
- Choose 1 – 2 models suitable for your data/task (e.g., Decision Tree, k-NN, Random Forest, SVM, k-means for segmentation if classification/regression is not applicable).
- Document training setup (feature set, split).
- Evaluation:
- For classification: confusion matrix, accuracy, precision/recall, and 1 key trade-off.
- For regression: MAE/RMSE and an error plot.
- For clustering: silhouette (or WCSS elbow) + business interpretation of clusters.
- Brief model selection rationale tied to the decision.
Task 7: Interactive Dashboard & Decision Support (2 marks)
- Excel or Power BI dashboard with 3–5 tiles: KPIs, filters/slicers, and at least one “what-if” (e.g., price, volume, threshold).
- One paragraph on how a manager would use this dashboard to make or justify a decision (I have selected this dataset; please adhere to it. (https://open.data.gov.sa/en/datasets/view/f6917c69…))
Report Template (section outline)
- Executive Summary (½ page) – problem, method, 2–3 key findings, recommendation.
- Decision Context & KPIs
- Data Understanding & Preparation (with Data Quality Log)
- EDA & Descriptive Statistics
- Hypothesis & Relationship Analysis
- Visual Analytics for Decision-Makers
- Modeling & Evaluation
- Dashboard & Decision Use Case (with screenshot)
- Recommendations, Sensitivity/What-If Notes, Limitations, Ethics
- References (data source + any methods you cite)
Project Report: