I came across this challenge after learning SQL on Datacamp and figured it would help me practice and master the concepts. It would also look good in my portfolio, right? These challenges are the best for beginners because they equip you with a taste of what it's like to solve real-life problems using SQL.
Problem Statement
The full code for these queries can be found on my GitHub
After creating the database and tables, I had to answer the following questions.
CASE STUDY QUESTIONS
This is the fun part. I used Postgresql to write all my queries.
What is the total amount each customer spent at the restaurant?
How many days has each customer visited the restaurant?
What was the first item from the menu purchased by each customer?
What is the most purchased item on the menu and how many times was it purchased by all customers?
Which item was the most popular for each customer?
Which item was purchased first by the customer after they became a member?
Which item was purchased just before the customer became a member?
What are the total items and amount spent for each member before they became a member?
If each $1 spent equates to 10 points and sushi has a 2x points multiplier - how many points would each customer have?
In the first week after a customer joins the program (including their join date) they earn 2x points on all items, not just sushi - how many points do customers A and B have at the end of January?
What is the total amount each customer spent at the restaurant?
I joined the tables to get the price and customer id which were in different tables and used the SUM()
and GROUP BY
functions.
Conclusion: A has spent the most while C has spent the least amount
How many days has each customer visited the restaurant?
I used COUNT(DISTINCT)
and GROUP BY
to get the answer.
Conclusion: B is the most frequent customer while C is the least frequent customer
What was the first item from the menu purchased by each customer?
I SELECT DISTINCT
from a subquery to get unique combinations of customer_id
and first_buy
in case someone bought the same item twice on the first day. I also use the subquery to get the minimum /earliest order date
for each customer from the sales
table. The GROUP BY
clause groups the rows by customer_id
and calculates the minimum order_date
for each group.
I then joined the subquery and sales s2
and then joined the outcome to the menu(m)
for the final result
Conclusion: Customer A's first items were curry and sushi, B bought curry and C bought ramen.
What is the most purchased item on the menu and how many times was it purchased by all customers?
First I wrote a subquery to get the most purchased item product_id
then used it in a WHERE
clause during the joining of the sales and menu tables to get the product_name
and product_count
. I learnt that In SQL when you use the GROUP BY
clause, you need to include all non-aggregated columns that are present in the SELECT
clause otherwise, it will throw you an error.
Conclusion: Seems like ramen is the most popular product. Bought 3 times by A and C and 2 times by B.
Which item was the most popular for each customer?
I learnt about using RANK() OVER
and PARTITION BY
to rank the items for each customer based on the purchase count. The outer query selects the rows where the rank = 1, indicating the most popular item for each customer.
Conclusion: Ramen is A and C's favourite and B likes everything equally.
Which item was purchased first by the customer after they became a member?
I used a subquery that got the customer_id
, product_id
, order_date
, and assigned a row number to each row partitioned by customer_id
. The inner join condition ensured that only sales made after a customer became a member were included. I later used an outer query to join the subquery with the menu
table (m
) based on the matching product_id
.
Conclusion: A's first purchase after becoming a member was curry and B's was sushi.
Which item was purchased just before the customer became a member?
This question only differs from the one above because we are looking for the item last purchased before the customer became a member. The logic is quite the same.
Conclusion: A purchased sushi and curry just before becoming a member while B purchased sushi.
What are the total items and amount spent for each member before they became a member?
The main query joins the sales
table (s
) with the menu
table to get the price of each item, then later joined with the subquery to ensure the order date is before they became a member.
Conclusion: B bought 3 items worth 40 dollars before becoming a member whereas A bought 2 items worth 25 dollars.
If each $1 spent equates to 10 points and sushi has a 2x points multiplier - how many points would each customer have?
The subquery sets the points earned for each product and it's joined to the main query to get the total points for each customer by grouping the outcome by customer_id
Conclusion: B has the most points followed by A and C has the least points.
In the first week after a customer joins the program (including their join date) they earn 2x points on all items, not just sushi - how many points do customers A and B have at the end of January?
First I joined all the tables to get data for customers A and B. The CASE statement sets the points calculator to x20 when the order date is within 7 days after the join date, x20 when the product is sushi any day and x10 if none of the other conditions are met. The WHERE
statement filters out sales that were not done in January. Aim was to get the total points at the end of January, including orders made before the customer became a member during that month.
Then did the aggregation function to get the total points. I also learnt about a function to extract the month from a date for filtering.
Conclusion: A had the most points that month followed by B.
I really had fun solving these problems
If you have any suggestions on areas I could improve on, feel free to tell me.