Description
see , task 4 and 6
Month
2021
January
Tourists Number
(Overnight Visitors)
(000)
5,920
2021
February
2021
Overnight Stay (000)
Tourists Spending
(SAR Mn)
30,553
8,947
4,376
21,794
6,681
March
4,408
22,466
6,682
2021
April
4,431
22,860
6,950
2021
May
6,728
45,029
11,448
2021
June
4,166
26,069
7,031
2021
July
7,533
59,880
13,917
2021
August
5,482
30,082
8,878
2021
September
4,185
18,999
6,145
2021
October
5,399
23,105
7,546
2021
November
6,713
31,475
11,018
2021
December
4,494
21,018
6,939
2021
2022
2022
2022
2022
2022
2022
2022
2022
2022
2022
2022
2022
2022
2023
2023
2023
2023
2023
2023
2023
2023
2023
2023
2023
2023
2023
2024
2024
Grand Total
January
February
March
April
May
June
July
August
September
October
November
December
Grand Total
January
February
March
April
May
June
July
August
September
October
November
December
Grand Total
January
February
63,834
7,675
6,450
6,205
8,036
7,246
4,033
10,796
7,008
4,196
4,838
7,003
4,350
77,837
6,294
5,608
6,502
6,881
5,003
9,548
8,118
6,986
6,074
6,202
8,220
6,484
81,920
7,780
7,743
353,331
34,353
28,567
28,204
40,268
31,569
18,237
71,351
34,453
16,914
19,287
29,206
17,197
369,606
34,513
29,356
35,353
38,120
27,152
77,843
57,518
37,990
34,494
39,443
49,168
34,393
495,341
41,292
40,588
102,184
11,355
9,134
8,296
10,115
9,526
6,001
17,657
11,232
5,232
5,944
8,260
4,508
107,259
8,333
7,621
9,743
9,428
6,714
15,412
11,261
8,627
8,309
9,070
10,769
9,120
114,407
10,040
10,182
#
CONFIDENTIAL
ّ
مقيد
▮▮
2024
2024
2024
2024
2024
2024
2024
2024
2024
2024
2024
2025
2025
2025
2025
2025
2025
2025
March
April
May
June
July
August
September
October
November
December
Grand Total
January
February
March
April
May
June
Grand Total
6,803
7,200
5,976
9,613
8,063
6,448
5,907
5,794
7,907
6,923
86,157
7,539
6,654
11,260
5,793
7,537
7,816
46,599
#
CONFIDENTIAL
44,765
41,381
33,301
101,340
51,038
35,904
33,424
30,939
44,552
40,095
538,618
42,233
39,915
66,561
40,005
51,908
72,910
313,532
ّ
مقيد
▮▮
10,059
8,354
7,794
14,938
9,634
8,704
8,174
8,300
10,286
8,811
115,276
10,564
8,392
15,180
9,376
12,158
15,237
70,907
Average Length of Stay
(Night)
Avergae Spending per
Trip (SAR)
Average Spending per
Night (SAR)
5.2
1,511
293
5.0
1,527
307
5.1
1,516
297
5.2
1,569
304
6.7
1,702
254
6.3
1,688
270
7.9
1,847
232
5.5
1,619
295
4.5
1,468
323
4.3
1,398
327
4.7
1,641
350
4.7
1,544
330
5.5
4.5
4.4
4.5
5.0
4.4
4.5
6.6
4.9
4.0
4.0
4.2
4.0
4.7
5.5
5.2
5.4
5.5
5.4
8.2
7.1
5.4
5.7
6.4
6.0
5.3
6.0
5.3
5.2
1,601
1,479
1,416
1,337
1,259
1,315
1,488
1,636
1,603
1,247
1,229
1,179
1,036
1,378
1,324
1,359
1,499
1,370
1,342
1,614
1,387
1,235
1,368
1,462
1,310
1,407
1,397
1,290
1,315
289
331
320
294
251
302
329
247
326
309
308
283
262
290
241
260
276
247
247
198
196
227
241
230
219
265
231
243
251
#
CONFIDENTIAL
ّ
مقيد
▮▮
6.6
5.8
5.6
10.5
6.3
5.6
5.7
5.3
5.6
5.8
6.3
5.6
6.0
5.9
6.9
6.9
9.3
6.7
1,479
1,160
1,304
1,554
1,195
1,350
1,384
1,433
1,301
1,273
1,338
1,401
1,261
1,348
1,619
1,613
1,949
1,522
225
202
234
147
189
242
245
268
231
220
214
250
210
228
234
234
209
226
#
CONFIDENTIAL
ّ
مقيد
▮▮
College of Computing and Informatics
Project
Deadline: Sunday 30/11/2025 @ 23:59
[Total Mark for this Project is 14]
Group Details:
Name:
Name:
Name:
Name:
CRN:
ID:
ID:
ID:
ID:
Instructions:
• You must submit two separate copies (one Word file and one PDF file) using the Assignment Template on
Blackboard via the allocated folder. These files must not be in compressed format.
• It is your responsibility to check and make sure that you have uploaded both the correct files.
• Zero mark will be given if you try to bypass the SafeAssign (e.g., misspell words, remove spaces between
words, hide characters, use different character sets, convert text into image or languages other than English
or any kind of manipulation).
• Email submission will not be accepted.
• You are advised to make your work clear and well-presented. This includes filling your information on the cover
page.
• You must use this template, failing which will result in zero mark.
• You MUST show all your work, and text must not be converted into an image, unless specified otherwise by
the question.
• Late submission will result in ZERO mark.
• The work should be your own, copying from students or other resources will result in ZERO mark.
• Use Times New Roman font for all your answers.
Restricted – مقيد
Project
Pg. 01
Learning Outcome(s):
CLO 1, 2, 5
1, Demonstrate an
understanding of the
concepts of decision
analysis and decision
support systems (DSS)
including probability,
modelling, decisions under
uncertainty, and real-world
problems.
Project
14 Marks
Teams & Datasets:
•
Teams: 3–4 students. Email team member names to your instructor by 5 Oct
2025. After that, teams will be randomly formed and datasets assigned.
•
Dataset: Each team must work on a unique dataset. Choose one from the link
below (or the instructor’s assigned set):
Saudi Open Data Platform:
Tools (pick what fits your dataset):
2, Describe advanced
Business Intelligence,
Business Analytics, Data
Visualization, and
Dashboards.
5, Improve hands-on skills
using Excel, and Orange
for building Decision
Support Systems.
•
Required: Microsoft Excel or Orange Data Mining for EDA/modelling.
•
For dashboards: Excel or Power BI (recommended).
Learning Outcomes:
You will:
1. Frame a data-driven decision problem with stakeholders and KPIs.
2. Clean and profile data; document data quality issues and fixes.
3. Perform descriptive statistics and EDA with appropriate visuals.
4. Test a quantitative hypothesis; interpret correlation/regression.
5. Train and evaluate at least two ML models aligned to the decision.
6. Build a dashboard summarizing actionable insights for decision-makers.
7. Produce clear recommendations, assumptions, and what-if takeaways.
Deliverables:
1.
Report (PDF + Word) which must incorporate all the following 7 tasks and
written using the provided template. (10 marks distributed among the below tasks).
2.
Restricted – مقيد
Slide deck (10–12 slides in 6 mins) for in-class presentation. (4 marks).
Project
Pg. 02
Project Tasks & Rubric (Total = 14 marks):
Evidence rule: every task must include screenshots/figures and a 2-4 sentence
interpretation (what it shows, why it matters for the decision).
Task 1: Problem & Data Understanding (2 marks)
•
Decision context: Who is the decision-maker? What decision will the analysis support?
Define KPIs (2–4).
•
Dataset description: source reliability, collection method, size, time span, unit of
analysis.
•
Data dictionary: list key features, types, and expected roles (predictor/target/ID).
•
Hypothesis: one testable relationship between two numerical variables (directional,
with rationale).
Task 2: Data Quality & Preparation (1 marks)
•
•
Show tests and fixes with before/after evidence for:
o
Missing values
o
Duplicates
o
Outliers
o
Noise/irregularities (e.g., inconsistent categories, types, units)
Include a Data Quality Log table: issue → method → action → impact.
Task 3: Descriptive Statistics & EDA (2 marks)
•
Central tendency (mean/median/mode) and distribution shape (variance, SD,
skewness, kurtosis).
•
Restricted – مقيد
Appropriate visuals (histograms/boxplots/density, bar/line where relevant).
Project
Pg. 03
•
2–3 insightful questions you posed from trends/patterns (and brief answers).
Task 4 : Hypothesis Testing & Relationship Analysis (2 marks)
•
Correlation analysis (numeric pair; comment on strength/direction.
•
Simple linear regression (or appropriate alternative): equation, R², residuals check,
and practical interpretation linked to KPIs.
•
Conclusion: accept/reject hypothesis; implications for the decision.
Task 5 :Visual Analytics for Decision-Makers (1 marks)
•
A small, coherent visual story (3 – 4 charts) with correct chart types, clear labels,
and callouts.
•
Each chart must answer a stakeholder-relevant question; include a 1–2 sentence
takeaway.
Task 6: Predictive/Descriptive Modeling (2 marks)
•
Choose 1 – 2 models suitable for your data/task (e.g., Decision Tree, k-NN, Random
Forest, SVM, k-means for segmentation if classification/regression is not
applicable).
•
Document training setup (feature set, split).
•
Evaluation:
o
For classification: confusion matrix, accuracy, precision/recall, and 1 key
trade-off.
o
For regression: MAE/RMSE and an error plot.
o
For clustering: silhouette (or WCSS elbow) + business interpretation of
clusters.
•
Restricted – مقيد
Brief model selection rationale tied to the decision.
Project
Pg. 04
Task 7: Interactive Dashboard & Decision Support (2 marks)
•
Excel or Power BI dashboard with 3–5 tiles: KPIs, filters/slicers, and at least one
“what-if” (e.g., price, volume, threshold).
•
One paragraph on how a manager would use this dashboard to make or justify a
decision.
Report Template (section outline):
1. Executive Summary (½ page) – problem, method, 2–3 key findings,
recommendation.
2. Decision Context & KPIs
3. Data Understanding & Preparation (with Data Quality Log)
4. EDA & Descriptive Statistics
5. Hypothesis & Relationship Analysis
6. Visual Analytics for Decision-Makers
7. Modeling & Evaluation
8. Dashboard & Decision Use Case (with screenshot)
9. Recommendations, Sensitivity/What-If Notes, Limitations, Ethics
10. References (data source + any methods you cite)
Project Report:
Restricted – مقيد
Project
Pg. 05
Task 1 — Problem & Data Understanding
1. Decision Context
•
Decision-maker: Ministry of Tourism or internal tourism strategy team / regional
tourism manager.
•
Decision to be supported: Optimize domestic tourism promotion and resource
allocation across months by analyzing trends in visitor numbers, average length of stay,
and spending. This enables setting monthly marketing budgets, targeting promotions
during low-demand months, and forecasting revenue from domestic tourism.
Key Performance Indicators (KPIs):
1. Monthly Domestic Tourists (000): Number of overnight visitors per month.
2. Monthly Tourists Spending (SAR Mn): Total direct spending by domestic overnight
tourists.
3. Average Length of Stay (nights): Indicates depth of visit and potential influence on
spending.
4. Average Spending per Trip (SAR): Revenue efficiency per trip, useful for evaluating
ROI of promotions.
Importance of KPIs: Monthly tourist counts and spending drive revenue and fiscal planning,
while average length of stay and average spending per trip are operational levers that can be
influenced through marketing, packages, and pricing strategies.
2. Dataset Description
•
Source: Saudi Open Data Platform (file titled “Domestic Tourism Statistics H1 2025”).
Official government open-data portal, generally considered reliable for macro tourism
statistics.
Restricted – مقيد
Project
Pg. 06
•
Collection method: Aggregated administrative or survey-based statistics compiled
monthly by tourism authorities. The dataset does not include raw collection
methodology.
•
Unit of analysis: Monthly domestic overnight-tourism summary (one row per month).
•
Time span & frequency: Monthly observations from January 2021 through June 2025.
The dataset includes “Grand Total” rows, which are aggregates and should be treated
separately.
•
Dataset size: 59 rows and 8 columns. After removing aggregate rows, the dataset
covers the monthly period from January 2021 to June 2025.
3. Data Dictionary
Original
Column Name
YEAR
Month
Cleaned Name
YEAR
Month
Type
Units
Integer /
Year (e.g.,
Categorical
2021)
Categorical /
Datetime
(Overnight
Visitors) (000)
Restricted – مقيد
Tourists_Number_000
Identifier /
Time
component
Month name
Identifier /
(January–
Time
December)
component
Tourists
Number
Role
Numeric
Thousands of
(float)
tourists
KPI /
Predictor or
Target
Project
Pg. 07
Original
Column Name
Overnight Stay
(000)
Cleaned Name
Overnight_Stay_000
Tourists
Spending (SAR Tourists_Spending_SAR_Mn
Mn)
Average Length
of Stay (Night)
Avg_Length_of_Stay
Avergae
Spending per
Avg_Spending_per_Trip_SAR
Trip (SAR)
Average
Spending per
Avg_Spending_per_Night_SAR
Night (SAR)
Type
Units
Role
Numeric
Thousand
KPI /
(float)
nights
Predictor
Million SAR
KPI / Target
Numeric
(float)
Numeric
(float)
Numeric
(float)
Numeric
(float)
Nights
SAR per trip
SAR per night
Predictor /
Behavioral
KPI /
Predictor
KPI /
Predictor
Note: The units are embedded in the column names. Tourists_Number_000 is in thousands,
and Tourists_Spending_SAR_Mn is in million SAR.
4. Initial Data Observations
•
The dataset contains 59 rows and 8 columns. The Month column includes rows labeled
“Grand Total” which are pre-aggregated totals.
•
Restricted – مقيد
Date range: January 2021 – June 2025.
Project
Pg. 08
•
Some column names contain minor errors (e.g., “Avergae Spending per Trip”), and
numeric fields may be stored as strings in certain rows.
•
Column names indicate scale: (000) for counts, (SAR Mn) for spending, which must be
considered during analysis.
5. Hypothesis
•
Hypothesis: Average Length of Stay (nights) is positively associated with Average
Spending per Trip (SAR).
•
Rationale: Longer stays provide tourists more opportunities to spend (accommodation,
activities, meals). Policies that increase average stay, such as multi-night package deals,
may increase spending per trip and total monthly revenue.
•
Test Plan: Conduct Pearson correlation analysis between Avg_Length_of_Stay and
Avg_Spending_per_Trip_SAR, and fit a simple linear regression:
𝐴𝑣𝑔_𝑆𝑝𝑒𝑛𝑑𝑖𝑛𝑔_𝑝𝑒𝑟_𝑇𝑟𝑖𝑝_𝑆𝐴𝑅 = 𝛽0 + 𝛽1 ∗ 𝐴𝑣𝑔_𝐿𝑒𝑛𝑔𝑡ℎ_𝑜𝑓_𝑆𝑡𝑎𝑦 + 𝜖
•
Null hypothesis H0: β1 ≤ 0 (no positive relationship).
•
Alternative hypothesis H1: β1 > 0 (positive relationship).
•
Evaluate the coefficient β1, p-value, R², and residual diagnostics, then interpret results
in the KPI context.
6. Recommended Data Actions
1. Remove rows where Month == ‘Grand Total’ or handle them as separate summary
rows.
Restricted – مقيد
Project
Pg. 09
2. Standardize column names (correct typos and replace spaces/special characters with
underscores).
3. Create a Date column for time-series indexing (e.g., Date = YEAR + Month).
4. Convert scaled units to consistent numeric units when necessary, clearly stating any
conversion factors.
5. Check for missing or non-numeric values in numeric columns and handle appropriately.
6. Document all changes in a Data Quality Log (issue → method → action → impact).
Dataset Preview Interpretation:
The first 20 rows show monthly aggregates of domestic overnight tourism from January 2021
with corresponding spending and behavioral metrics. Visitor counts are in thousands, and
spending is in million SAR. Rows labeled “Grand Total” represent pre-computed aggregates
and should be removed for time-series analysis.
Restricted – مقيد
Project
Pg. 10
Task 2.Data Quality & Preparation
Systematic data-quality checks and transformations were performed to prepare the dataset for
analysis. Identified issues, detection methods, remedial actions, and impacts are documented
below.
Before / After Snapshot (Evidence)
•
Before: The raw worksheet contained 59 rows and 8 columns, including two “Grand
Total” rows aggregating multiple months. Several column headers contained typos and
unit annotations (e.g., “(000)”, “(SAR Mn)”), and there was no Date column for parsing
•
After: The cleaned dataset contains 54 rows (monthly observations only) with
standardized column names. Aggregate rows were removed, numeric columns were
converted to numeric types, a Date column (first day of month) was created, and baseunit columns were added (Tourists_Number, Overnight_Stay,
Tourists_Spending_SAR) to avoid implicit unit errors
Data Quality Tests and Fixes
Issue: Presence of aggregated rows (“Grand Total”)
•
Test: Filtered Month column for value == “Grand Total”.
•
Action / Fix: Removed the rows where Month == “Grand Total”.
•
Impact: Prevented double-counting and ensured time-series integrity.
Issue: Column name inconsistency / typo
•
Test: Manual header inspection.
•
Action / Fix: Renamed columns to standardized snake_case names, e.g., Avergae
Spending per Trip (SAR) → Avg_Spending_per_Trip_SAR; created a formal data
dictionary with units.
•
Restricted – مقيد
Impact: Improved reproducibility and reduced risk of errors in analysis scripts.
Project
Pg. 11
Issue: Numeric fields stored or containing non-numeric characters
•
Test: Coerced candidate numeric columns to numeric and flagged any NA conversions.
•
Action / Fix: Converted Tourists_Number_000, Overnight_Stay_000,
Tourists_Spending_SAR_Mn, Avg_Length_of_Stay, Avg_Spending_per_Trip_SAR,
Avg_Spending_per_Night_SAR to numeric types.
•
Impact: Enables correct aggregation, descriptive statistics, and modeling.
Issue: Implicit scaled units in column names
•
Test: Reviewed column annotations (000) and (SAR Mn) and assessed need for base
units.
•
•
Action / Fix: Created explicit base-unit columns:
o
Tourists_Number = Tourists_Number_000 × 1,000
o
Overnight_Stay = Overnight_Stay_000 × 1,000
o
Tourists_Spending_SAR = Tourists_Spending_SAR_Mn × 1,000,000
Impact: Eliminates unit-mismatch errors and simplifies interpretation.
Issue: Date handling for time-series
•
Test: Attempted to parse YEAR + Month into a datetime column.
•
Action / Fix: Constructed Date as the first day of each month after removing aggregate
rows.
•
Impact: Enables temporal sorting, trend plotting, and model time-splits.
Issue: Duplicates
Restricted – مقيد
•
Test: Checked for exact duplicate rows.
•
Action / Fix: None required.
Project
Pg. 12
•
Impact: Dataset is unique per row.
Data Quality Log
Issue
Method
Aggregate ‘Grand Filter Month ==
Action
Removed 2 aggregate Prevents double-counting and
rows from main
enables correct time-series
dataset
analysis
Total’ rows
‘Grand Total’
Typo in header
Manual header
Renamed columns to
(‘Avergae’)
inspection
standardized names
Non-numeric
to_numeric
encodings
coercion
Implicit scaled
Review of unit
units
markers
Missing/invalid
dates
Construct Date
from
YEAR+Month
Duplicates check
duplicated()
function
Converted numeric
columns to floats;
flagged any NAs
Added base-unit
columns (counts and
SAR)
Created Date column;
no NaT after cleaning
0 duplicates found
Cleaned Dataset
•
Restricted – مقيد
Impact
File: cleaned_domestic_tourism.xlsx
Improves script reproducibility
Enables calculations and
plotting
Prevents unit errors in
aggregation/modeling
Enables temporal analysis
No impact — dataset unique
per row
Project
Pg. 13
•
Contents: 54 monthly observations, standardized column names, numeric types, Date
column, base-unit columns.
Figures / Screenshots
•
Figure A: Before cleaning tail rows including “Grand Total”. Caption: “Before
cleaning file contains aggregate ‘Grand Total’ rows that must be removed for timeseries analysis.”
•
Figure B: After cleaning head of cleaned table showing Date, standardized column
names, and base-unit columns. Caption: “After cleaning aggregate rows removed;
numeric columns converted and base-unit columns added.”
Restricted – مقيد
Project
Pg. 14
Task 3. EDA & Descriptive Statistics (Final English Version)
3. Descriptive Statistics & Exploratory Data Analysis (EDA)
Descriptive Statistics (2023–2025 Subset)
To focus the analysis on the most recent tourism trends, I examined the period from January
2023 to June 2025. The main descriptive statistics show that monthly domestic tourists ranged
widely across this period, with noticeable variation between peak and low months. Tourist
spending also exhibited strong month-to-month fluctuations, reflecting seasonal patterns in
domestic travel. The mean and standard deviation values confirm a high level of variability in
both visitor volume and spending, which is important for forecasting and planning
Distribution of Tourists (Histogram)
The histogram of monthly tourist numbers shows a right-skewed distribution, where most
months fall within a moderate range, while a few months show unusually high volumes. This
pattern indicates that domestic tourism has clear seasonal peaks that the Ministry should
account for when planning marketing campaigns or resource allocation
Spending Variability (Boxplot)
The boxplot for tourist spending (SAR) highlights pronounced variability, including highspending months that act as natural peaks in demand. The wide interquartile range suggests
that monthly revenue from domestic tourism is inconsistent, reinforcing the need for decisionmakers to identify low-performing periods and design targeted interventions
Trend Over Time (Line Chart)
The line chart tracking tourist numbers across 2023–2025 shows a clear seasonal wave pattern,
with periodic increases aligned with holidays and mid-year periods. The upward movement in
early 2025 suggests positive momentum in domestic tourism, which could be leveraged
through expanded promotions or stay-incentive programs
Insights from EDA (3 questions + answers)
Restricted – مقيد
Project
Pg. 15
1. Which months show the strongest tourism peaks?
The EDA shows that certain mid-year and holiday months consistently produce the
highest visitor volumes.
2. Is monthly spending aligned with visitor numbers?
Spending generally follows the same seasonal trend as tourist volume, although some
months show high spending despite moderate visitor counts, likely due to longer stays
or higher per-trip expenditure.
3. Is the domestic tourism trend improving over time?
The upward shift in early 2025 suggests growing domestic tourism activity, supported
by increases across both visitor numbers and total spending
Restricted – مقيد
Purchase answer to see full
attachment