The Complete Guide to SQL Window Functions – Big Tech Interviews
SQL window functions are a powerful tool that can be used to solve complex problems with ease by data analysts, data scientists, and even software engineers. This article will discuss the different types of window functions and how to use them effectively. We will also provide examples of real-world scenarios where SQL window functions can be used to get the desired results. If you want to improve your SQL skills, this is the perfect article for you!
SQL window functions originated in SQL/DS, the SQL-based database platform developed by IBM in the 1980s. Since then, this powerful tool has been adopted and used by many other popular SQL databases, including SQL Server and MySQL.
There are several different types of SQL window functions and each of them serves a specific purpose. These include ranking functions, moving averages, cumulative sums, and more.
For example, the SQL function ROW_NUMBER is commonly used to rank rows in a given result set, while the SQL function PERCENT_RANK is used to calculate percentile ranks.
With that said, there are three unofficial categories of SQL window functions,
- Aggregate window functions
- Ranking window functions
- Value window functions
Aggregate Window Functions in SQL
These window functions are commonly used to calculate summaries of groups of rows in a result set, such as using SUM() to sum up the total sales for each product category, COUNT() to count the number of distinct values in a column, or AVG() to calculate the average age of customers.
Some SQL aggregate window functions include:
- SUM: Returns the sum of all values in a given set
- COUNT: Returns the number of rows in a result set
- AVG: Returns the arithmetic mean of all values in a given set
- Max: Returns the maximum value in a given set
- Min: Returns the minimum value in a given set
When we utilize these aggregate window functions, we’re able to replace the need to use a GROUP BY because SQL window functions automatically group the results for us. Additionally, aggregate window functions allow you to apply a set of aggregate functions simultaneously to each row in a result set, which can be very useful when performing complex calculations.
Let’s review an example of an aggregate SQL window function to better understand how it works.
Prompt: Write a query to find the comparison result (higher/lower/same) of the average salary of employees in a department to the company’s average salary.
Table: salary |----|-------------|--------|------------| | id | emp_id | amount | pay_date | |----|-------------|--------|------------| | 1 | 1 | 1000 | 2022-04-30 | | 2 | 2 | 5000 | 2022-04-31 | | 3 | 3 | 15000 | 2022-04-21 | | 4 | 1 | 9000 | 2022-04-24 | | 5 | 2 | 7000 | 2022-05-25 | | 6 | 3 | 9000 | 2022-05-25 | |----|-------------|--------|------------| Table: employee |-------------|---------------| | employee_id | department_id | |-------------|---------------| | 1 | 1 | | 2 | 3 | | 3 | 3 | | 4 | 5 | | 5 | 6 | | 6 | 2 | |-------------|---------------|
Explanation
In April, the company’s average salary is (1000+5000+15000+9000)/4 = 7500
The average salary for department ‘1’ is 1000, which is the salary of employee_id ‘1’ since there is only one employee in this department. So the comparison result is ‘higher’ since 7500 > 1000 obviously.
This question can be solved by using SQL window functions. We will first use an aggregate window function to calculate the average salary for each department, then compare that value against the company’s overall average salary.
select department_salary.pay_month, department_id, case when department_avg > company_avg then 'higher' when department_avg < company_avg then 'lower' else 'same' end as comparison from ( select department_id, avg(amount) as department_avg, date_format(pay_date, '%Y-%m') as pay_month from salary join employee on salary.emp_id = employee.employee_id group by department_id, pay_month ) as department_salary join ( select avg(amount) as company_avg, date_format(pay_date, '%Y-%m') as pay_month from salary group by date_format(pay_date, '%Y-%m') ) as company_salary on department_salary.pay_month = company_salary.pay_month;
The above SQL query returns the comparison result for each department to the company’s average salary, which indicates whether that department’s average salary is higher, lower, or the same as the overall company average.
In this case, we can see that departments 1 and 3 have higher salaries than the company average, while departments 2 and 5 have lower salaries. This indicates that overall, the company’s average salary is higher than the average salaries of each department.
Ranking Window Functions in SQL
Ranking window functions are typically used to rank records within a result set, such as when comparing employees’ salaries or exam scores. SQL provides several ranking functions that can be used to generate rank numbers, row numbers, and dense rank values for each record in the query results.
Some SQL ranking window functions include:
- DENSE RANK: Assigns a rank number to each row, with no gaps in the ranking values.
- ROW NUMBER: Assigns a consecutive integer to each row in the result set. RANK: Assigns a rank number to each row, but allows for gaps in the ranking values if there are ties. When
- RANK: Assigns a rank number to each row, but allows for gaps in the ranking values if there are ties. When two or more rows have the same value, they receive the same rank with the following row having a higher rank number than all of them.
- NTILE: Divides an ordered set of rows into a specified number of groups
- PERCENT_RANK: Assigns a rank number to each row, scaled between 0 and 1 (1 being the highest-ranking row).
Using SQL Window Functions in Big Tech InterviewsSQL window functions are often used in tech interviews to test your understanding of SQL concepts. It is important to understand how these functions work and what they can be used for, so that you can demonstrate your SQL knowledge in a tech interview.
When working with SQL window functions, it is important to consider the use-case and understand the differences between each type of window function, so that you can select the best one for the job. By understanding SQL window functions, you will be better equipped to answer SQL interview questions
These SQL ranking window functions are especially helpful when preparing for big tech interviews. Knowing how to use SQL ranking functions can be a great asset in the job market, as they enable you to quickly and accurately answer difficult SQL questions.
Let’s put into practice DENSE_RANK () one of the more common ranking window functions with an example about finding the second highest sold item.
Prompt: Write a query to find the second highest sold item in each product category. If there’s a tie, order by category_id ASC.
Table: product_category +-----------+---------+ |product_id |category | +-----------+---------+ |1 |bike | |2 |bike | |3 |bike | |4 |bike | |5 |motobike | |6 |motobike | |7 |car | +-----------+---------+ Table: orders +----+-----------+-------+ |id |product_id |amount | +----+-----------+-------+ |1 |1 |1 | |2 |2 |2 | |3 |3 |2 | |4 |2 |1 | |5 |1 |2 | |6 |3 |2 | |7 |4 |4 | |8 |5 |2 | |9 |5 |3 | |10 |6 |5 | +----+-----------+-------+
Explanation
First, we need to total up the units that were sold and rank the products by category.
select o.product_id ,p.category ,sum(o.amount) units_sold ,dense_rank() over ( partition by p.category order by sum(o.amount) desc, o.product_id asc ) cat_rank from orders o left join product_category p on p.product_id=o.product_id group by o.product_id ,p.category
Logic
- We are joining on the product_id so that we can tie the category to what was sold.
- First we SUM() the units_sold so we can rank units by how many were sold.
- We apply DENSE_RANK() so that ties don’t obscure ranking positions.
- This means that a 2 will be returned for the 2nd highest value in the event there is a tie for the 1 position, whereas RANK() would skip the 2 and move on to 3.
- We PARTITION BY the category so that we can get a different rank within that same category for each product_id.
- The ORDER BY dictates that the rank will be based on the highest sum of the amount ordered, and then by product_id to resolve ties per the requirements.
After this, we’ll place Step 1 in a CTE (called “ranked”) so we can query our second-place results.
select category, product_id second_highest_sold_item from ranked where cat_rank=2 order by category asc
Logic
- Our WHERE criteria use the rank we generated earlier.
Final Query
with ranked as ( select o.product_id ,p.category ,sum(o.amount) units_sold ,dense_rank() over ( partition by p.category order by sum(o.amount) desc, o.product_id asc ) cat_rank from orders o left join product_category p on p.product_id=o.product_id group by o.product_id ,p.category ) select category, product_id second_highest_sold_item from ranked where cat_rank=2 order by category asc;
Value Window Functions in SQL
Value window functions are especially helpful when you want to compare two columns with each other in SQL. This can be used to identify changes in values or patterns over time.
SQL value window functions are:
- LAG: This function returns the value of a column from the previous row.
- LEAD: This function returns the value of a column from the next row.
- FIRST_VALUE and LAST_VALUE: These functions return the first or last value in an ordered set, respectively.
- PERCENTILE_DISC and PERCENTILE_CONT: These functions return the percentile of a set of values.
- CUME_DIST: This function returns the cumulative distribution value for a row within a query result.
These window functions enable you to compare two columns side by side, and they allow you to find the difference between these two columns. For SQL Interviews, window functions are a must-have to efficiently and accurately answer questions.
Using the SQL value window functions, you can compare two columns side by side, find the difference between two values, identify patterns over time, and more. With these SQL window functions in your arsenal of SQL skills, you’ll be prepared to tackle even the toughest SQL interview questions.
Prompt: Write a query to retrieve all the customer_name(s) (ordered) whose transactions have a -10 second gap from each other.
Table: customer_transactions +---+-------------+-------------------+------------------+ |id |customer_name|transaction_time |transaction_amount| +---+-------------+-------------------+------------------+ |1 |A |2022-02-21 15:21:10|533 | |2 |B |2022-02-21 15:21:20|234 | |3 |D |2022-02-21 15:21:25|789 | |4 |D |2022-02-21 15:21:45|34 | |5 |F |2022-02-21 15:21:40|12 | |6 |A |2022-02-21 15:22:05|445 | |7 |B |2022-02-21 15:22:15|236 | |8 |C |2022-02-21 15:22:30|643 | |9 |F |2022-02-21 15:22:40|563 | |10 |C |2022-02-21 15:22:60|876 | +---+-------------+-------------------+------------------+
Explanation
First, we’ll measure the number of seconds between each transaction.
select customer_name ,date_part('seconds' ,transaction_time - lag(transaction_time) over ( partition by customer_name order by transaction_time asc ) ) seconds_between from customer_transactions
Logic
- We are using the DATE_PART() function to determine the amount of seconds between two dates/times.
- We use the LAG() window function to find the previous transaction time to each transaction time record.
- PARTITION BY the customer_name so that the window is specific to each customer.
- ORDER BY transaction_time ASC so that time is in normal, temporal order.
- We perform a calculation between the current record’s transaction time and the previous transaction time (per the LAG() that we built) and measure in seconds using DATE_PART().
Next, we’ll put Step 1 in a CTE (called “measured”), we are able to select only the records where our new, calculated seconds_between is within the requested criteria.
select customer_name from measured where seconds_between<=10 order by customer_name asc;
Logic
- The WHERE clause specifies the criteria from the requirement.
Final Query
with measured as ( select customer_name ,date_part('seconds' ,transaction_time - lag(transaction_time) over ( partition by customer_name order by transaction_time asc ) ) seconds_between from customer_transactions ) select customer_name from measured where seconds_between<=10 order by customer_name asc;
Conclusion
SQL window functions enable you to compare two columns side by side, and they allow you to find the difference between these two columns. You’ll want to reconsider using window functions if there’s a lot of data you need to process, as window functions can be time-consuming.
However, for SQL Interviews, SQL window functions are very important and powerful tools. With these SQL window functions in your arsenal of SQL skills, you’ll be prepared to tackle even the toughest SQL interview questions.
Using window functions, you can answer SQL Interview Questions like the one in this article with ease. Knowing when and how to properly use each window function is essential for successfully completing Big Tech Interviews. With these SQL skills, you’ll be ready to show off your SQL interview knowledge and ace the SQL portion of any interview
Check out Big Tech Interviews for more real window functions to practice!