Project Python Foundations: FoodHub Data Analysis¶

By Randley Morales

Context¶

The number of restaurants in New York is increasing day by day. Lots of students and busy professionals rely on those restaurants due to their hectic lifestyles. Online food delivery service is a great option for them. It provides them with good food from their favorite restaurants. A food aggregator company FoodHub offers access to multiple restaurants through a single smartphone app.

The app allows the restaurants to receive a direct online order from a customer. The app assigns a delivery person from the company to pick up the order after it is confirmed by the restaurant. The delivery person then uses the map to reach the restaurant and waits for the food package. Once the food package is handed over to the delivery person, he/she confirms the pick-up in the app and travels to the customer's location to deliver the food. The delivery person confirms the drop-off in the app after delivering the food package to the customer. The customer can rate the order in the app. The food aggregator earns money by collecting a fixed margin of the delivery order from the restaurants.

Objective¶

The food aggregator company has stored the data of the different orders made by the registered customers in their online portal. They want to analyze the data to get a fair idea about the demand of different restaurants which will help them in enhancing their customer experience. Suppose you are hired as a Data Scientist in this company and the Data Science team has shared some of the key questions that need to be answered. Perform the data analysis to find answers to these questions that will help the company to improve the business.

Data Description¶

The data contains the different data related to a food order. The detailed data dictionary is given below.

Data Dictionary¶

  • order_id: Unique ID of the order
  • customer_id: ID of the customer who ordered the food
  • restaurant_name: Name of the restaurant
  • cuisine_type: Cuisine ordered by the customer
  • cost_of_the_order: Cost of the order
  • day_of_the_week: Indicates whether the order is placed on a weekday or weekend (The weekday is from Monday to Friday and the weekend is Saturday and Sunday)
  • rating: Rating given by the customer out of 5
  • food_preparation_time: Time (in minutes) taken by the restaurant to prepare the food. This is calculated by taking the difference between the timestamps of the restaurant's order confirmation and the delivery person's pick-up confirmation.
  • delivery_time: Time (in minutes) taken by the delivery person to deliver the food package. This is calculated by taking the difference between the timestamps of the delivery person's pick-up confirmation and drop-off information

Let us start by importing the required libraries¶

In [5]:
# Installing the libraries with the specified version.
#!pip install numpy==1.25.2 pandas==1.5.3 matplotlib==3.7.1 seaborn==0.13.1 -q --user

Note: After running the above cell, kindly restart the notebook kernel and run all cells sequentially from the start again.

In [1]:
# import libraries for data manipulation
import numpy as np
import pandas as pd

# import libraries for data visualization
import matplotlib.pyplot as plt
import seaborn as sns

Understanding the structure of the data¶

In [2]:
# uncomment and run the following lines for Google Colab
from google.colab import drive
drive.mount('/content/drive')
Mounted at /content/drive
In [3]:
# Write your code here to read the data
df = pd.read_csv('/content/drive/MyDrive/FoodHub/foodhub_order.csv')
In [4]:
# Write your code here to view the first 5 rows
df.head()
Out[4]:
order_id customer_id restaurant_name cuisine_type cost_of_the_order day_of_the_week rating food_preparation_time delivery_time
0 1477147 337525 Hangawi Korean 30.75 Weekend Not given 25 20
1 1477685 358141 Blue Ribbon Sushi Izakaya Japanese 12.08 Weekend Not given 25 23
2 1477070 66393 Cafe Habana Mexican 12.23 Weekday 5 23 28
3 1477334 106968 Blue Ribbon Fried Chicken American 29.20 Weekend 3 25 15
4 1478249 76942 Dirty Bird to Go American 11.59 Weekday 4 25 24

Question 1: How many rows and columns are present in the data? [0.5 mark]¶

In [5]:
# Write your code here
df.shape
Out[5]:
(1898, 9)

Observations:¶

The dataset has 1898 entries and 9 columns.

Question 2: What are the datatypes of the different columns in the dataset? (The info() function can be used) [0.5 mark]¶

In [6]:
# Write your code here
df.info()
<class 'pandas.core.frame.DataFrame'>
RangeIndex: 1898 entries, 0 to 1897
Data columns (total 9 columns):
 #   Column                 Non-Null Count  Dtype  
---  ------                 --------------  -----  
 0   order_id               1898 non-null   int64  
 1   customer_id            1898 non-null   int64  
 2   restaurant_name        1898 non-null   object 
 3   cuisine_type           1898 non-null   object 
 4   cost_of_the_order      1898 non-null   float64
 5   day_of_the_week        1898 non-null   object 
 6   rating                 1898 non-null   object 
 7   food_preparation_time  1898 non-null   int64  
 8   delivery_time          1898 non-null   int64  
dtypes: float64(1), int64(4), object(4)
memory usage: 133.6+ KB

Observations:¶

We have four types of data that are integers (numerical) and the names of these data are:

  1. order_id
  2. customer_id
  3. food_preparation_time
  4. delivery_time

We have four types of data that are object (strings or mixed types) and the names of these data are:

  1. restaurant_name
  2. cuisine_type
  3. day_of_the_week
  4. rating

We have one types of data that is float (numerical) and the name of these data is:

  1. cost_of_the_order

Question 3: Are there any missing values in the data? If yes, treat them using an appropriate method. [1 mark]¶

In [7]:
# Write your code here
df.isnull().sum()
Out[7]:
0
order_id 0
customer_id 0
restaurant_name 0
cuisine_type 0
cost_of_the_order 0
day_of_the_week 0
rating 0
food_preparation_time 0
delivery_time 0

Observations:¶

There are no missing values in any column of the dataset.

Question 4: Check the statistical summary of the data. What is the minimum, average, and maximum time it takes for food to be prepared once an order is placed? [2 marks]¶

In [8]:
# Write your code here
df.describe().T
Out[8]:
count mean std min 25% 50% 75% max
order_id 1898.0 1.477496e+06 548.049724 1476547.00 1477021.25 1477495.50 1.477970e+06 1478444.00
customer_id 1898.0 1.711685e+05 113698.139743 1311.00 77787.75 128600.00 2.705250e+05 405334.00
cost_of_the_order 1898.0 1.649885e+01 7.483812 4.47 12.08 14.14 2.229750e+01 35.41
food_preparation_time 1898.0 2.737197e+01 4.632481 20.00 23.00 27.00 3.100000e+01 35.00
delivery_time 1898.0 2.416175e+01 4.972637 15.00 20.00 25.00 2.800000e+01 33.00

Observations:¶

Here is the summary of Food Preparation Time:

  1. The range (min - max) is relatively tight: from 20 to 35 minutes.
  2. Most orders are prepared in 23 to 31 minutes, meaning restaurants operate in a fairly consistent time.
  3. The mean 27.37 is close to the median 27, indicating that preparation times are approximately normally distributed.
  4. The standard deviation is about 4.63 minutes, which is quite small relative to the mean.

Here is the summary of Cost of the Order:

  1. Order costs range (min - max) from 4.47 to 35.41 dolars. This indicates that the dataset includes both low-cost and high-cost meals.
  2. The mean 16.49 dolar is higher than the median 14.14 dolar, indicating a right-skewed distribution (some higher-priced orders are pulling the mean up). This suggests most orders are on the more affordable side, with a few higher-priced orders.
  3. Middle 50% of orders range from 12.08 dolar to 22.29 dolar. This interquartile range (IQR) is 10.21 dolar, showing moderate variability in order costs.
  4. The standard deviation is 7.48 dolar, indicating that while there is variability, it's not overly extreme compared to the mean 16.49 dolar. This means most order costs are fairly predictable.

Here is the summary of Delivery Time:

  1. Delivery times range (min - max) from 15 to 33 minutes, suggesting a relatively consistent delivery experience for customers. This consistency can help build customer trust in the service.
  2. The mean 24.16 minutes is close to the median 25 minutes, indicating that delivery times are approximately normally distributed (no strong skewness). This implies that most deliveries cluster around the typical service time.
  3. The middle 50% of deliveries (IQR) range from 20 to 28 minutes. This shows that most orders are delivered within an 8-minute window, suggesting efficient delivery logistics.
  4. The standard deviation is 4.97 minutes, indicating moderate variability. This means that while delivery times can vary, most are within a predictable range, which is important for customer satisfaction.

Question 5: How many orders are not rated? [1 mark]¶

In [9]:
# Write the code here
df['rating'].value_counts()
Out[9]:
count
rating
Not given 736
5 588
4 386
3 188

Observations:¶

  1. 736 orders have the rating "Not given" or not rated, which means while technically all entries have a value, "Not given" is used as a placeholder. Almost 39% of orders have the rating marked as "Not given".
  2. The majority of actual ratings are 5, followed by 4, then 3. Most customers give high ratings (5 or 4), accounting for about 51% of all orders. Only about 10% of orders have a 3 rating.
  3. There are no 1 or 2 ratings, suggesting that dissatisfied customers might not rate or the dataset might not include low ratings.

Exploratory Data Analysis (EDA)¶

Univariate Analysis¶

Question 6: Explore all the variables and provide observations on their distributions. (Generally, histograms, boxplots, countplots, etc. are used for univariate exploration.) [9 marks]¶

Count Plot of Cuisine Type:¶

In [12]:
# Set style
sns.set(style="whitegrid")

# Count plot for cuisine_type
plt.figure(figsize=(12, 6))
sns.countplot(data=df, x='cuisine_type', order=df['cuisine_type'].value_counts().index, hue='cuisine_type' ,palette='coolwarm')
plt.title('Count Plot of Cuisine Type', fontsize=16)
plt.xlabel('Cuisine Type')
plt.ylabel('Number of Orders')
plt.xticks(rotation=45)
plt.tight_layout()
plt.show()
No description has been provided for this image
Observations:¶
  • American cuisine dominates the dataset, making it the most popular cuisine type among customers.
  • Japanese and Italian cuisines are also highly preferred.
  • Mexican, Chinese, and Korean cuisines show moderate popularity.
  • Less frequent cuisines include Thai, Indian, and Spanish, indicating niche preferences or limited availability.
  • The distribution is right-skewed, with a few cuisines significantly more popular than others.

Count Plot of Day of the week:¶

In [15]:
# Set style
sns.set(style="whitegrid")

# Count plot for day_of_the_week
plt.figure(figsize=(8, 5))
sns.countplot(data=df, x='day_of_the_week', order=df['day_of_the_week'].value_counts().index, hue='day_of_the_week', palette='coolwarm')
plt.title('Count Plot of Day of the Week', fontsize=16)
plt.xlabel('Day of the Week')
plt.ylabel('Number of Orders')
plt.tight_layout()
plt.show()
No description has been provided for this image
Observations:¶
  • The number of orders is fairly balanced between Weekdays and Weekends.
  • There is a slight increase in orders during Weekends, which might reflect more leisure-time or group-based ordering behavior.
  • This could indicate a strategic time window for restaurants and delivery services to offer promotions or increase staffing.

Count Plot of Rating:¶

In [17]:
# Set plot style
sns.set(style="whitegrid")

# Count plot for rating
plt.figure(figsize=(10, 5))
sns.countplot(data=df, x='rating', order=df['rating'].value_counts().index, hue='rating', palette='coolwarm')
plt.title('Count Plot of Ratings', fontsize=16)
plt.xlabel('Rating')
plt.ylabel('Number of Orders')
plt.tight_layout()
plt.show()
No description has been provided for this image
Observations:¶
  • A large number of ratings are marked as "Not given", indicating missing or optional customer feedback.

  • Among the provided ratings:

  • Rating 5 is the most common, suggesting a high level of customer satisfaction.

  • Ratings 4 and 3 are also relatively frequent, showing good but not perfect experiences.

  • Low ratings like 1 and 2 are rare, which may indicate either genuinely high service quality or underreporting of poor experiences.

  • The distribution is positively skewed, with a majority of the ratings on the higher end (4–5).

Histogram of Cost of the Order:¶

In [20]:
# Set plot style
sns.set(style="whitegrid")

# Histogram for cost_of_the_order
plt.figure(figsize=(10, 5))
sns.histplot(data=df, x='cost_of_the_order', kde=True, bins=30, color='red')
plt.title('Histogram of Cost of the Order', fontsize=16)
plt.xlabel('Cost of the Order ($)')
plt.ylabel('Frequency')
plt.tight_layout()
plt.show()
No description has been provided for this image
Observations:¶
  • The majority of orders fall in the $10–$20 range, confirming the histogram's peak and supporting the observation of a typical single-meal cost.
  • A right-skewed distribution is evident, with fewer high-value orders ($30+) contributing to the long tail.
  • Very few low-cost orders (<$10) support the assumption of delivery minimums or basic order value thresholds.
  • Outliers above $30 exist but are rare (only about 5%).

Boxplot of Cost of the Orders:¶

In [23]:
# Set visual style
sns.set(style="whitegrid")

# Boxplot for cost_of_the_order
sns.boxplot(x=df['cost_of_the_order'], color='red')
plt.title('Boxplot of Cost of the Order', fontsize=16)
plt.xlabel('Cost of the Order ($)')
plt.tight_layout()
plt.show()
No description has been provided for this image
Observations:¶
  • The median order cost is approximately $14.14, confirming central tendency.
  • The IQR spans from $12.08 to $22.30, showing that 50% of orders fall within this range.
  • There are visible outliers beyond $30, confirming a right-skewed distribution.
  • No extreme low-end outliers (<$5), suggesting a floor effect likely due to delivery or minimum order requirements.

Histogram of Food Preparation Time:¶

In [25]:
# Set visual style
sns.set(style="whitegrid")

# Histogram for food_preparation_time
plt.figure(figsize=(10, 5))
sns.histplot(data=df, x='food_preparation_time', kde=True, bins=20, color='red')
plt.title('Histogram of Food Preparation Time', fontsize=16)
plt.xlabel('Food Preparation Time (minutes)')
plt.ylabel('Frequency')
plt.tight_layout()
plt.show()
No description has been provided for this image
Observations:¶
  • The most common preparation time is between 20 and 30 minutes, with a sharp peak around 25 minutes.
  • The distribution appears left-skewed, centered around a standard prep window for restaurant meals.
  • Very few orders have prep times significantly less than 15 minutes or more than 35 minutes.
  • This suggests a tight quality control or consistency across restaurants in the time taken to prepare food.

Boxplot of Food Preparation Time:¶

In [27]:
# Set visual style
sns.set(style="whitegrid")

# Boxplot for food_preparation_time
sns.boxplot(x=df['food_preparation_time'], color='red')
plt.title('Boxplot of Food Preparation Time', fontsize=16)
plt.xlabel('Food Preparation Time (minutes)')
plt.tight_layout()
plt.show()
No description has been provided for this image
Observations:¶
  • The median food preparation time is approximately 25 minutes.
  • Most values lie tightly between 20 and 30 minutes, suggesting high consistency.
  • There are few outliers outside this range, indicating exceptional cases where food took much less or more time.
  • The distribution suggests that restaurants likely follow standardized prep time expectations, possibly influenced by platform SLAs or kitchen efficiency.

Histogram of Delivery Time:¶

In [29]:
# Set style
sns.set(style="whitegrid")

# Histogram for delivery_time
plt.figure(figsize=(10, 5))
sns.histplot(data=df, x='delivery_time', kde=True, bins=20, color='red')
plt.title('Histogram of Delivery Time', fontsize=16)
plt.xlabel('Delivery Time (minutes)')
plt.ylabel('Frequency')
plt.tight_layout()
plt.show()
No description has been provided for this image
Observations:¶
  • The most common delivery times range between 15 and 30 minutes, with a peak around 20–25 minutes.
  • The distribution is fairly symmetric, suggesting a normal distribution around a consistent average delivery time.
  • There are very few instances of extremely short (<10 min) or long (>35 min) delivery times, implying operational efficiency.
  • This range likely reflects urban delivery conditions with optimized logistics.

Boxplot of Delivery Time:¶

In [30]:
# Set visual style
sns.set(style="whitegrid")

# Boxplot for delivery_time
sns.boxplot(x=df['delivery_time'], color='red')
plt.title('Boxplot of Delivery Time', fontsize=16)
plt.xlabel('Delivery Time (minutes)')
plt.tight_layout()
plt.show()
No description has been provided for this image
Observations:¶
  • The median delivery time is 25 minutes, consistent with the boxplot.
  • The majority of orders (85%) are delivered within 15–30 minutes, confirming operational efficiency.
  • A small portion (~15%) took 30 minutes or more, but the maximum remains reasonable at 33 minutes, with no extreme outliers.
  • There are no ultra-fast deliveries (<15 min), which makes sense due to real-world logistical constraints.

Question 7: Which are the top 5 restaurants in terms of the number of orders received? [1 mark]¶

In [31]:
# Write the code here
# Calculate the top 5 restaurants in terms of number of orders received
top_5_restaurants = df['restaurant_name'].value_counts().head(5)

# Print the result
print("Top 5 Restaurants by Number of Orders:")
print(top_5_restaurants)
Top 5 Restaurants by Number of Orders:
restaurant_name
Shake Shack                  219
The Meatball Shop            132
Blue Ribbon Sushi            119
Blue Ribbon Fried Chicken     96
Parm                          68
Name: count, dtype: int64

Observations:¶

  • Shake Shack is the clear leader, with over 200 orders — nearly double that of the second place.
  • Both Blue Ribbon restaurants (Sushi and Fried Chicken) appear in the top 5, indicating strong brand presence.
  • The Meatball Shop and Parm also attract significant customer attention, suggesting they are local favorites.

Question 8: Which is the most popular cuisine on weekends? [1 mark]¶

In [32]:
# Write the code here
# Filter the data for weekend orders
weekend_orders = df[df['day_of_the_week'] == 'Weekend']

# Find the most popular cuisine on weekends
popular_cuisine_weekend = weekend_orders['cuisine_type'].value_counts().idxmax()

# Print the result
print(f"The most popular cuisine on weekends is: {popular_cuisine_weekend}")

# Count the number of orders per cuisine on weekends
weekend_cuisine_counts = weekend_orders['cuisine_type'].value_counts()

# Print the result
print("Number of Orders per Cuisine on Weekends:")
print(weekend_cuisine_counts)
The most popular cuisine on weekends is: American
Number of Orders per Cuisine on Weekends:
cuisine_type
American          415
Japanese          335
Italian           207
Chinese           163
Mexican            53
Indian             49
Middle Eastern     32
Mediterranean      32
Thai               15
French             13
Korean             11
Southern           11
Spanish            11
Vietnamese          4
Name: count, dtype: int64

Observations:¶

  • American cuisine dominates the weekends, likely due to its popularity.
  • Japanese, Italian, and Chinese cuisines are also highly popular on weekends.
  • Less frequently ordered cuisines include Southern, Spanish, and Vietnamese — suggesting either lower demand.

Question 9: What percentage of the orders cost more than 20 dollars? [2 marks]¶

In [33]:
# Write the code here
# Calculate the number of orders with cost greater than $20
orders_above_20 = df[df['cost_of_the_order'] > 20].shape[0]

# Calculate the total number of orders
total_orders = df.shape[0]

# Calculate the percentage
percentage_above_20 = (orders_above_20 / total_orders) * 100

# Print the result
print(f"Percentage of orders costing more than $20: {percentage_above_20:.2f}%")
Percentage of orders costing more than $20: 29.24%

Observations:¶

Approximately 29.24% of the orders cost more than $20.

This means nearly 3 out of every 10 orders are higher-value transactions, which could indicate special meals, group orders, or higher-priced items from some restaurants.

Question 10: What is the mean order delivery time? [1 mark]¶

In [34]:
# Write the code here
# Calculate the mean order delivery time
mean_delivery_time = df['delivery_time'].mean()

# Print the result
print(f"Mean Order Delivery Time: {mean_delivery_time:.2f} minutes")
Mean Order Delivery Time: 24.16 minutes

Observations:¶

The mean order delivery time in the dataset is approximately 24.16 minutes.

This means that on average, it takes about 24 minutes for an order to be delivered after it’s placed — a relatively quick service!

Question 11: The company has decided to give 20% discount vouchers to the top 3 most frequent customers. Find the IDs of these customers and the number of orders they placed. [1 mark]¶

In [35]:
# Write the code here
# Find the top 3 most frequent customers by number of orders
top_customers = df['customer_id'].value_counts().head(3)

# Print the result
print("Top 3 Most Frequent Customers (Eligible for 20% Discount Voucher):")
print(top_customers)
Top 3 Most Frequent Customers (Eligible for 20% Discount Voucher):
customer_id
52832    13
47440    10
83287     9
Name: count, dtype: int64

Observations:¶

Here are the top 3 most frequent customers eligible for the 20% discount voucher:

Rank Customer ID Number of Orders
1 52832 13 orders
2 47440 10 orders
3 83287 9 orders

Multivariate Analysis¶

Question 12: Perform a multivariate analysis to explore relationships between the important variables in the dataset. (It is a good idea to explore relations between numerical variables as well as relations between numerical and categorical variables) [10 marks]¶

Correlation Heatmap (Numerical Variables)¶

In [36]:
# Select numerical columns
numerical_columns = ['cost_of_the_order', 'food_preparation_time', 'delivery_time']

# Calculate correlation matrix
corr_matrix = df[numerical_columns].corr()

# Plot the heatmap
plt.figure(figsize=(8, 6))
sns.heatmap(corr_matrix, annot=True, cmap='coolwarm', fmt='.2f')
plt.title('Correlation Heatmap of Numerical Variables')
plt.show()
No description has been provided for this image
Observations:¶
  • Cost of the Order shows weak correlation with both food preparation time and delivery time, indicating that more expensive orders don’t necessarily take longer to prepare or deliver.
  • Food Preparation Time and Delivery Time also have a weak positive correlation, suggesting that even though orders that take longer to prepare might slightly delay delivery, it’s not a strong relationship.

Delivery Time by Cuisine Type¶

In [37]:
plt.figure(figsize=(12, 6))
sns.boxplot(x='cuisine_type', y='delivery_time', data=df, hue='cuisine_type', order=df['cuisine_type'].value_counts().index, palette='coolwarm')
plt.xticks(rotation=45)
plt.title('Delivery Time by Cuisine Type')
plt.xlabel('Cuisine Type')
plt.ylabel('Delivery Time (minutes)')
plt.show()
No description has been provided for this image
Observations:¶
  • Certain cuisines (e.g., Japanese and Chinese) exhibit slightly higher median delivery times.
  • American cuisine shows a wider spread, indicating variability in delivery times—perhaps due to different restaurant practices or delivery bottlenecks.

Delivery Time by Day of the Week¶

In [38]:
plt.figure(figsize=(8, 6))
sns.boxplot(x='day_of_the_week', y='delivery_time', hue='day_of_the_week', data=df, palette='coolwarm')
plt.title('Delivery Time by Day of the Week')
plt.xlabel('Day of the Week')
plt.ylabel('Delivery Time (minutes)')
plt.show()
No description has been provided for this image
Observations:¶
  • Both Weekdays and Weekends show similar delivery time distributions, with no drastic differences.
  • Slightly higher variability is observed on weekends, potentially due to higher order volumes.

Cost of the Order by Cuisine Type¶

In [39]:
plt.figure(figsize=(12, 6))
sns.boxplot(x='cuisine_type', y='cost_of_the_order', data=df, hue='cuisine_type', order=df['cuisine_type'].value_counts().index, palette='coolwarm')
plt.xticks(rotation=45)
plt.title('Cost of the Order by Cuisine Type')
plt.xlabel('Cuisine Type')
plt.ylabel('Cost of the Order ($)')
plt.show()
No description has been provided for this image
Observations:¶
  • French cuisine and Japanese cuisine tend to have higher median order costs, while American and Chinese cuisines are more modestly priced.
  • The variability in American cuisine costs is quite high, likely due to the range of options and restaurant types.

Cuisine vs. Food Preparation Time¶

In [40]:
plt.figure(figsize=(12, 6))
sns.boxplot(x='cuisine_type', y='food_preparation_time', data=df, hue='cuisine_type', order=df['cuisine_type'].value_counts().index, palette='coolwarm')
plt.xticks(rotation=45)
plt.title('Food Preparation Time by Cuisine Type')
plt.xlabel('Cuisine Type')
plt.ylabel('Food Preparation Time (minutes)')
plt.show()
No description has been provided for this image
Observations:¶
  • Consistency: Most cuisines have a tight interquartile range (IQR), indicating consistent kitchen efficiency across orders.
  • Longer Prep Times: Certain cuisines (e.g. French, Mediterranean, Middle Eastern) tend to show slightly higher median preparation times, which might reflect the complexity or care in preparing these dishes.
  • Shorter Prep Times: American cuisine generally has lower median preparation times with fewer outliers, likely due to its focus on quick-service meals.
  • Outliers: Some cuisines have notable outliers in prep time, indicating occasional delays or more complex menu items.

Mean Delivery Time vs. Rating¶

In [42]:
plt.figure(figsize=(10, 6))
sns.pointplot(x='rating', y='delivery_time', data=df, color='red')
plt.title('Mean Delivery Time by Rating')
plt.xlabel('Rating')
plt.ylabel('Delivery Time (minutes)')
plt.show()
No description has been provided for this image
Observations:¶
  • Consistency: Mean delivery time is fairly stable across different ratings, indicating no significant variation tied to customer ratings.
  • Slight Variation: There's a small dip in delivery time around rating 5, but the differences are minor — reinforcing that delivery time alone might not be a key driver of customer satisfaction.

This pointplot suggests that factors other than delivery time—like food quality, packaging, or delivery personnel friendliness—might be more influential on customer ratings.

Mean Food Preparation Time vs. Rating¶

In [43]:
plt.figure(figsize=(10, 6))
sns.pointplot(x='rating', y='food_preparation_time', data=df, color='red')
plt.title('Mean Food Preparation Time by Rating')
plt.xlabel('Rating')
plt.ylabel('Food Preparation Time (minutes)')
plt.show()
No description has been provided for this image
Observations:¶
  • Stable Preparation Times: Mean preparation time remains relatively consistent across different ratings, suggesting that food preparation time might not significantly influence customer ratings.
  • Slight Variations: Although there's some minor fluctuation across rating levels, these differences are not substantial, reinforcing that kitchen efficiency alone might not dictate customer satisfaction.

Mean Cost of the Order vs. Rating¶

In [44]:
plt.figure(figsize=(10, 6))
sns.pointplot(x='rating', y='cost_of_the_order', data=df, color='red')
plt.title('Mean Cost of the Order by Rating')
plt.xlabel('Rating')
plt.ylabel('Cost of the Order ($)')
plt.show()
No description has been provided for this image
Observations:¶
  • Relatively Stable: Mean order costs are fairly consistent across ratings, hovering around $15–$20, suggesting customers spend roughly the same amount regardless of how they rate their experience.
  • Slight Increase at Higher Ratings: There's a small bump in cost at ratings of 5, which could imply that customers who spend a bit more might be slightly more satisfied — though the difference isn’t large.

Overall, the cost of the order does not appear to strongly influence customer ratings, indicating that other factors — such as food quality, service, and delivery experience — may be more important.

Question 13: The company wants to provide a promotional offer in the advertisement of the restaurants. The condition to get the offer is that the restaurants must have a rating count of more than 50 and the average rating should be greater than 4. Find the restaurants fulfilling the criteria to get the promotional offer. [3 marks]¶

In [45]:
# Write the code here
# Remove entries where rating is 'Not given' and convert ratings to numeric
df_filtered = df[df['rating'] != 'Not given'].copy()
df_filtered['rating'] = pd.to_numeric(df_filtered['rating'])

# Group by restaurant and calculate the rating count and average rating
restaurant_rating_stats = df_filtered.groupby('restaurant_name')['rating'].agg(['count', 'mean']).reset_index()

# Filter restaurants that meet the promotional offer criteria
eligible_restaurants = restaurant_rating_stats[
    (restaurant_rating_stats['count'] > 50) &
    (restaurant_rating_stats['mean'] > 4)
]

# Display the eligible restaurants
print(eligible_restaurants)
               restaurant_name  count      mean
16   Blue Ribbon Fried Chicken     64  4.328125
17           Blue Ribbon Sushi     73  4.219178
117                Shake Shack    133  4.278195
132          The Meatball Shop     84  4.511905

Observations:¶

  1. Popularity Based on Ratings Count
  • Shake Shack has the highest number of ratings (133), which suggests it's one of the most frequently reviewed (and possibly ordered from) restaurants in the dataset.
  • A high number of ratings combined with a strong average indicates both popularity and consistent customer satisfaction.
  1. Top Performer by Average Rating
  • The Meatball Shop stands out with the highest average rating (4.51), implying a strong reputation for quality or service, even more so given it has over 80 reviews.
  1. Consistent Satisfaction
  • All four eligible restaurants have average ratings comfortably above 4, demonstrating not just occasional excellence but likely a consistent positive experience for customers.
  1. Restaurant Clustering
  • Two of the qualifying restaurants, Blue Ribbon Fried Chicken and Blue Ribbon Sushi, appear to be from the same brand family. This could be leveraged in joint promotional campaigns or brand-focused advertising.

Question 14: The company charges the restaurant 25% on the orders having cost greater than 20 dollars and 15% on the orders having cost greater than 5 dollars. Find the net revenue generated by the company across all orders. [3 marks]¶

In [46]:
# Write the code here
# Define a function to calculate revenue per order
def calculate_revenue(cost):
    if cost > 20:
        return cost * 0.25
    elif cost > 5:
        return cost * 0.15
    else:
        return 0.0

# Apply the function to compute revenue for each order
df['revenue'] = df['cost_of_the_order'].apply(calculate_revenue)

# Calculate total revenue
total_revenue = df['revenue'].sum()

# Print the result
print(f"Net Revenue Generated by the Company: ${total_revenue:.2f}")
Net Revenue Generated by the Company: $6166.30

Observations:¶

The net revenue generated by the company across all orders, based on the given commission structure, is $6166.30.

Question 15: The company wants to analyze the total time required to deliver the food. What percentage of orders take more than 60 minutes to get delivered from the time the order is placed? (The food has to be prepared and then delivered.) [2 marks]¶

In [47]:
# Write the code here
# Calculate the total delivery time (preparation + delivery)
df['total_delivery_time'] = df['food_preparation_time'] + df['delivery_time']

# Count total number of orders
total_orders = len(df)

# Count how many orders took more than 60 minutes
orders_over_60 = df[df['total_delivery_time'] > 60].shape[0]

# Calculate the percentage
percentage_over_60 = (orders_over_60 / total_orders) * 100

# Print the result
print(f"Total number of orders: {total_orders}")
print(f"Number of orders with delivery time > 60 minutes: {orders_over_60}")
print(f"Percentage of orders taking more than 60 minutes: {percentage_over_60:.2f}%")
Total number of orders: 1898
Number of orders with delivery time > 60 minutes: 200
Percentage of orders taking more than 60 minutes: 10.54%

Observations:¶

  1. Moderate Delay Prevalence
  • Approximately 1 in every 10 orders exceeds 60 minutes from order placement to final delivery.
  • This indicates that while most orders are fulfilled in a reasonable timeframe, there is a noticeable segment that faces significant delays.
  1. Operational Challenge Zone
  • These 200 delayed orders represent an opportunity to investigate:
  • Which restaurants or cuisine types are most often delayed.
  • Whether certain days (weekends) or peak hours contribute disproportionately.
  • Whether longer food preparation time, delivery routes, or order complexity play a role.
  1. Customer Experience Risk
  • Customers experiencing >60-minute wait times may be more likely to:
  • Leave lower ratings.
  • Abandon repeat purchases.
  • Choose competitors with faster delivery standards.

Question 16: The company wants to analyze the delivery time of the orders on weekdays and weekends. How does the mean delivery time vary during weekdays and weekends? [2 marks]¶

In [48]:
# Write the code here
# Group by 'day_of_the_week' and calculate mean delivery time
mean_delivery_time_by_day = df.groupby('day_of_the_week')['delivery_time'].mean().reset_index()

# Print the results
print("Mean Delivery Time by Day Type:")
for _, row in mean_delivery_time_by_day.iterrows():
    print(f"{row['day_of_the_week']}: {row['delivery_time']:.2f} minutes")
Mean Delivery Time by Day Type:
Weekday: 28.34 minutes
Weekend: 22.47 minutes

Observations:¶

  1. Faster Deliveries on Weekends
  • Orders placed on weekends are delivered approximately 6 minutes faster on average than those placed on weekdays.
  • This could reflect lower traffic congestion, fewer work-related orders, or optimized staffing and logistics during weekends.
  1. Weekday Slowdown
  • The longer average delivery time during weekdays may be influenced by:
  • Heavier traffic conditions.
  • Higher order volumes during office lunch hours.
  • More scattered delivery routes in commercial zones.

Conclusion and Recommendations¶

Question 17: What are your conclusions from the analysis? What recommendations would you like to share to help improve the business? (You can use cuisine type and feedback ratings to drive your business recommendations.) [6 marks]¶

This analysis of food delivery data reveals that American, Japanese, and Italian cuisines are the most popular among customers. While most ratings fall between 4 and 5, a significant number of orders remain unrated, limiting insights into customer satisfaction. The average cost of an order centers around $14, and both food preparation and delivery times are consistently around 25 minutes, indicating operational efficiency.

Notably, Italian and Korean cuisines receive the highest customer ratings, whereas Spanish and Thai cuisines lag behind in satisfaction. These trends suggest a need to promote high-performing cuisines while auditing lower-rated ones for potential service improvements. Encouraging more customers to leave ratings and maintaining delivery performance—especially during peak times—will enhance the customer experience. Strategic use of promotions can also help diversify cuisine choices and balance demand across the platform.

Conclusions:¶

  1. Cuisine Preferences
  • American, Japanese, and Italian cuisines are the most frequently ordered, suggesting strong customer preference.
  • Cuisines like Indian, Thai, and Spanish have lower order counts, indicating either limited availability or niche interest.
  1. Order Timing
  • Order volumes are evenly split between weekdays and weekends, with a slight peak on weekends, possibly due to leisure time and group dining.
  1. Customer Ratings
  • A large portion of orders lacks ratings, which limits insight into customer satisfaction.
  • Among rated orders, ratings of 4 and 5 dominate, indicating generally positive customer experiences.
  • Very few low ratings (1 or 2) were recorded, suggesting limited dissatisfaction or lack of reporting.
  1. Order Cost
  • Most orders fall between $10 and $30, with a median of ~$14, reflecting typical single-meal or small-group pricing.
  • The cost distribution is right-skewed, with a few high-cost orders acting as outliers.
  1. Food Preparation & Delivery Time
  • Food preparation time is tightly centered around 25 minutes, reflecting consistent kitchen operations.
  • Delivery time also clusters around 25 minutes, with most orders fulfilled in 15–30 minutes, indicating good logistical performance.
  • There are few outliers, and delivery times rarely exceed 33 minutes, showing strong reliability.

Recommendations:¶

  1. Expand High-Performing Cuisines
  • Invest in partnerships with more American, Japanese, and Italian restaurants to capitalize on strong demand.
  • Use promotions or curated recommendations to surface Mexican, Chinese, or Korean options more frequently, encouraging diversity.
  1. Improve Rating Collection
  • Encourage customers to leave feedback by:

    • Offering small incentives (discounts, loyalty points).
    • Simplifying the feedback UI post-delivery.
  • This will improve insights into customer satisfaction and areas needing improvement.

  1. Optimize Delivery Efficiency
  • Although delivery times are consistent, ~15% of orders exceed 30 minutes.
    • Consider dynamic route optimization or partnering with more local couriers.
    • Highlight expected wait times transparently during checkout to manage expectations.
  1. Analyze Feedback by Cuisine
  • Perform bivariate analysis between cuisine type and rating to identify:
    • Which cuisines consistently receive lower feedback.
    • If certain cuisines suffer from higher prep/delivery delays or order issues.
  1. Promote Off-Peak Orders
  • Since orders are evenly distributed between weekday and weekend:
    • Use targeted offers or discounts during low-traffic hours to balance load and improve delivery performance.