33 Must-Know Data Analyst SQL Interview Questions

John H.

John H.

I love iced coffee, cute pictures of dogs, and SQL. I've previously worked at Big Tech as a data analyst and now spend my time writing and helping job seekers ace their big tech interview @ bigtechinterviews.com.

33 Must-Know Data Analyst SQL Interview Questions

SQL is a powerful programming language that is widely used for managing and analyzing data. Whether to switch from a business analyst, break in from an entirely different career path, or an existing one who is looking for a new job, it’s important to be prepared to answer easy to difficult SQL interview questions.

Here are some of the most common Uber, Amazon, and Google SQL interview questions that you may be asked during our data analyst SQL interview, along with advice on how to answer them. They’ll even be helpful if you’re in the field of data science too!

1. What is SQL and what are its main features?

SQL, or Structured Query Language, is a standard programming language for accessing and manipulating databases. SQL is renowned for its simple yet powerful syntax, which makes it easy to query data in a variety of ways. SQL also supports a wide range of data types, including numeric, text, and date/time values.

2. What are some of the most common SQL commands?

Some of the most common SQL commands are CREATE TABLE, INSERT INTO, UPDATE, DELETE, and SELECT.

  • CREATE TABLE is used to create a new table in a database.
  • INSERT INTO is used to insert data into a table.
  • UPDATE is used to update data in a table.
  • DELETE is used to delete data from a table.
  • SELECT is used to select data from a table.

3. How would you use SQL to find the right clients and orders?

				
					Table: Orders
+----------+---------+-----------+-------------+--------+
| order_id | sale_id | client_id | city        | amount |
+----------+---------+-----------+-------------+--------+
| 1        | 1       | 101       | Los Angeles | 1000.00|
| 2        | 2       | 102       | New York    | 1500.50|
| 3        | 3       | 103       | Chicago     | 2000.00|
| 4        | 4       | 104       | New York    | 2500.75|
| 5        | NULL    | NULL      | Los Angeles | 3000.00|
+----------+---------+-----------+-------------+--------+
Table: Clients
+-----------+--------------+-------------+
| client_id | client_name  | city        |
+-----------+--------------+-------------+
| 101       | John Doe     | New York    |
| 102       | Jane Smith   | New York    |
| 103       | Jim Beam     | Los Angeles |
| 104       | Jill Jackson | Los Angeles |
| 105       | Jack Johnson | Chicago     |
+-----------+--------------+-------------+
				
			

To tackle this SQL question, we’ll leverage the presence of a “client_id” field in both the Orders and Clients tables, identifying it as the key for joining these tables. The Orders table records every transaction’s details, while the Clients table stores comprehensive information about each client, including those without recorded orders. 

 

Our goal is to merge data from these two tables so that every order over $2000 is displayed alongside the corresponding client’s name. To accomplish this, we’ll perform a LEFT JOIN operation, which ensures we include all orders meeting our criteria, even those without associated client information due to null values in the “client_id” field.

 
Given the specifics, our SQL query should look as follows:
				
					SELECT Orders.order_id, Clients.client_name
FROM Orders
LEFT JOIN Clients ON Orders.client_id = Clients.client_id
WHERE Orders.amount > 2000
				
			
This query carefully combines orders with their respective clients, filtering to only include those orders with amounts exceeding $2000. The LEFT JOIN is crucial here, as it allows for the inclusion of orders that may not have a specific client associated, evident from the presence of rows with null “client_id” in the Orders table. By executing this query, we achieve the desired output, effectively marrying financial records with customer data and adhering to the specified amount threshold.
 

4. What are some common errors that occur when writing SQL queries?

One common error that occurs when writing SQL queries is forgetting to include a WHERE clause. Without a WHERE clause, your query will return all rows from the table you’re querying, which can make it difficult to find the specific information you’re looking for. Another common error is using incorrect syntax, which can lead to unexpected results or errors when your query is executed. Finally, it’s important to make sure that your SQL queries are properly formatted and easy to read; otherwise, they may be difficult for others to understand or debug if something goes wrong.

For example, the following SQL query would return all rows from the orders table, regardless of the order_date:

SELECT *

FROM orders

This would return a very large dataset that would be difficult to work with. To fix this, we can add a WHERE clause to filter the data by order_date:

SELECT *

FROM orders

WHERE order_date = ‘2018–01–01’

5. What’s the difference between a primary and foreign key?

A primary key is a column (or set of columns) in a database table that uniquely identifies each row in the table. A foreign key is a column (or set of columns) in one table that contains values that match the primary key values in another table. Foreign keys are used to create relationships between tables; for example, a foreign key in a “customer” table could reference the primary key in an “orders” table, linking each customer with their respective orders.

6. What’s a SQL join and how is it used?

A SQL join is used to combine data from two or more tables into a single result set. Joins are performed using the JOIN keyword, followed by the name of the table to join with. There are a number of different types of joins, including inner joins, outer joins, and self-joins. Inner joins return rows from both tables that have matching values in the specified columns, while outer joins return all rows from both tables, including rows with no matching values. Self-joins are used to join a table to itself; for example, you could use a self-join to find all customers who live in the same city as another customer.

sql-join-example

7. What’s a subquery and how is it used as a data analyst?

A subquery is a SQL query that is embedded within another SQL query. Subqueries are often used to find data that satisfies certain conditions; for example, you could use a subquery to find all customers who live in the same city as a particular customer. Subqueries can be used with various SQL commands, including SELECT, FROM, WHERE, and ORDER BY.

For example, consider the following customer table:

				
					Table: customers
+------------+----------------+
|     id    |  name | city    |
+------------+----------------+
|     1      | john | New York|
|     2      | jane | Boston  |
|     3      | Joe  | Chicago |
+------------+----------------+
				
			

Suppose we want to find all customers who live in the same city as customer with id=1. We could use the following SQL query:

SELECT * FROM customers WHERE city IN (SELECT city FROM customers WHERE id = 1)

This would return all rows from the customer’s table, including the row with id=1. To exclude this row, we could add a condition to the subquery that checks for customer_id != 1:

SELECT * FROM customers WHERE city IN (SELECT city FROM customers WHERE id = 1 AND id != 1)

9. What’s a SQL window function and how is it used?

A SQL window function is a function that performs a calculation on a set of values and returns a single value. Unlike aggregate functions, which return one result per group, window functions return one result per row. Common window functions include RANK, DENSE_RANK, and NTILE.

Window functions are often used with the ORDER BY clause to calculate a value for each row

8. What are some of the most important SQL data types?

SQL supports a number of different data types, including numeric, text, date/time, and Boolean values. Numeric values include integers and floating-point numbers, while text values include character strings and date/time values include date, time, and timestamp values. Boolean values can either be TRUE or FALSE.

9. What’s an index and how is it used?

A crucial component in database management systems, an index serves the purpose of enhancing the efficiency of SQL queries. It is applied to specific columns within a table and is instrumental in expediting searches for particular values within those columns. In the context of Google data science interview questions, understanding the role of indexes becomes pertinent. During query execution, the database management system meticulously examines whether an index is present for the targeted columns. If an index exists, the system leverages it to swiftly pinpoint the required data, thereby contributing to a notable improvement in query performance.

10. What’s the difference between a view and a table?

A view is a virtual table that is based on the results of an SQL query. Views are often used to provide security or simplify complex queries. For example, you could create a view that only includes customer information that is relevant to your current project. Tables, on the other hand, are database structures that actually store data.

11. What’s the difference between a WHERE and a HAVING clause?

The WHERE clause is used to filter rows from a table based on specified conditions; for example, you could use a WHERE clause to find all customers who live in a particular city. The HAVING clause is used to filter rows from a table based on aggregated values; for example, you could use a HAVING clause to find all customers who have placed more than 10 orders.

12. Tell me as a data analyst, what does the ORDER BY keyword do?

The ORDER BY keyword is used to sort the results of an SQL query in ascending or descending order. By default, ORDER BY will sort the results in ascending order; to sort the results in descending order, you can use the DESC keyword.

13. What’s a primary key?

A primary key is a column or set of columns that uniquely identify a row in a table. Primary keys must contain unique values, and they cannot be NULL.

14. What’s a foreign key?

A foreign key is a column or set of columns that contains values that match the primary key values in another table. Foreign keys are used to create relationships between tables; for example, a foreign key in a “customer” table could reference the primary key in an “orders” table, linking each customer with their respective orders

15. How do window functions work?

Window functions are a type of SQL function that operates on a set of rows and returns a single value. Window functions are typically used to calculate aggregated values, such as sums or averages, over a specified window of rows. For example, you could use a window function to calculate the average order total for each customer.

16. What’s the difference between a business analyst, data scientist, and data analyst?

For example, Amazon Business Analysts focus on understanding and improving business processes, acting as a bridge between business stakeholders and technical teams. Their responsibilities include analyzing business needs, gathering requirements, and proposing solutions, requiring strong communication and problem-solving skills. 

Data Analysts concentrate on collecting, processing, and analyzing structured data to provide insights, employing statistical analysis and data visualization. Proficiency in data analysis tools and a good understanding of the business context are essential for this role. 

Data Scientists have a more specialized focus, developing advanced models and algorithms to extract insights and predictions from both structured and unstructured data. They require strong programming skills, expertise in machine learning algorithms, statistical analysis, and domain-specific knowledge to solve complex problems and build predictive models.

17. What are some of the most common SQL functions?

Some of the most common SQL functions are SUM(), AVG(), COUNT(), MIN(), and MAX(). These functions are used to calculate aggregated values, such as sums, averages, or counts.

18. How have you used SQL to solve a problem?

This is a common data analyst and business analyst SQL interview question that is designed to assess your real-world experience with the language. When answering this question, be sure to describe a specific problem that you were able to solve using SQL. This will help to show the interviewer that you have a good understanding of how SQL can be used in practice.

19. What’s the difference between a lag and lead function in SQL?

Lag and lead functions are used to access data from a previous or future row in a table. Lag functions return data from a row that is preceding the current row, while lead functions return data from a row that is following the current row.

If the current row is customer_id 3 (Joe), a lag function would return customer_id 2 (Jane), while a lead function would return customer_id 4 (Sarah).

				
					+-------------+-------------+
| customer_id |    name     |
+-------------+-------------+
|      1      |    john     |
|      2      |    jane     |
|      3      |    Joe      |  -- Current customer
|      4      |  Michael    |  -- Next customer
|      5      |    Lisa     |  -- Next customer
|      6      |    Mark     |  -- Next customer
|      7      |   Emily     |  -- Next customer
|      8      |   David     |  -- Next customer
|      9      |   Sarah     |  -- Next customer
+-------------+-------------+

				
			

20. Write a SQL query to select the employees who earn the second-highest in each department

This SQL query will select the second-highest salary from each department by first selecting all distinct salaries from employees in each department, then ordering them in descending order, and finally selecting the top 2 salaries.

				
					console.log( 'Code is Poetry' );
				
			

21. What is a correlated subquery?

A correlated subquery is a type of SQL query that contains a reference to a value from outer query. Correlated subqueries are typically used when you want to find rows from a table that match certain conditions, but you can only know those conditions after examining other rows in the same table.

For example, you could use a correlated subquery to find all employees who make more than the average salary in their department. In this case, you would need to calculate the average salary for each department before you could compare each employee’s salary to it.

22. What’s a SQL aggregate function and how is it used?

A SQL aggregate function is a function that performs a calculation on a set of values and returns a single value. Common aggregate functions include COUNT, SUM, MAX, and MIN. Aggregate functions are often used with the GROUP BY clause to return one result per group; for example, you could use the COUNT() function to find the number of customers in each city.

For example, consider the following customer table:

Suppose we want to find the number of customers in each city. We could use the following SQL query:

SELECT city, COUNT(*) AS “Number of Customers”

FROM customers

GROUP BY city

This would return the following result:

23. When would you not want to use a window function in a SQL?

Window functions are a type of SQL function that return a value for each row in the query result, based on values from other rows in the same result. For example, you could use a window function to calculate the running total of all order totals in your customer orders table.

Window functions are not typically used with aggregate functions, because the results would not make sense. For example, if you tried to find the average salary for each department using a window function, you would end up with the same average salary for every department, because the window function would calculate the average salary for each row in the result set (which would be all employees in all departments).

24. How can you find duplicate rows in a SQL table?

				
					Table: clock_in_logs
+--------+-------------+-------------+--------------------------+
| log_id | employee_id | location_id |       clock_in           |
+--------+-------------+-------------+--------------------------+
|   1    |      1      |      1      | 2024-03-19T08:00:00.000Z |
|   2    |      2      |      2      | 2024-03-19T08:15:00.000Z |
|   3    |      3      |      1      | 2024-03-19T07:45:00.000Z |
|   4    |      4      |      3      | 2024-03-19T08:30:00.000Z |
|   5    |      5      |      1      | 2024-03-19T08:05:00.000Z |
|   6    |      1      |      1      | 2024-03-19T08:00:00.000Z |
|   7    |      2      |      2      | 2024-03-19T08:15:00.000Z |
|   8    |      3      |      1      | 2024-03-19T07:45:00.000Z |
+--------+-------------+-------------+--------------------------+
				
			

To find duplicate rows in the clock_in_logs  table, specifically to identify logs where the same employee_id, location_id, and clock_in time have been recorded more than once, you can use a SQL query that groups the entries by these fields. By counting the occurrences and filtering for counts greater than 1, you can isolate the duplicates. 

Here is how you might write that query:If we wanted to find all the duplicate rows, we could use the following SQL query:

				
					SELECT employee_id, location_id, clock_in, COUNT(*) as duplicate_count
    FROM clock_in_logs
    GROUP BY employee_id, location_id, clock_in
    HAVING COUNT(*) > 1;
				
			

25. How do you optimize a SQL query?

There are a few different ways to optimize a SQL query. One way is to make sure that the columns you’re interested in are indexed, so the database can more quickly find the data you’re looking for. Another way is to use the EXPLAIN command to see how the database will execute your query, and then make changes to your query based on that information. Finally, you can use query hints to give database-specific instructions on how to execute your query.

Another way you can optimize a SQL query is to use a tool like SQL Profiler to see where the bottlenecks are in your query and then make changes accordingly.

26. How do you find the top 5 customers by sales?

There are a few different ways to find the top 5% of customers by sales. One way is to use the GROUP BY clause to group together rows with the same values in the columns you’re interested in. For example, suppose we have a customer orders table with the following data:

If we wanted to find the top 5 customers by sales, we could use the following SQL query:

SELECT name, city, SUM(sales) AS “Total Sales”

FROM customer_orders

GROUP BY name, city

ORDER BY “Total Sales” DESC

LIMIT 5

This would return the following result:

27. How does a data analyst view a relational database?

A relational database is a database that stores data in tables. Tables are similar to folders in a file system, where each table stores data about a particular subject. For example, a customer orders table might store data about customer orders, and a product table might store data about products.

Relational databases are the most common type of database, and they are used by most businesses because they are easy to use and easy to scale.

28. What types of relationships are there with a database?

There are three types of relationships in a database: one-to-one, one-to-many, and many-to-many.

A one-to-one relationship is when each row in one table is related to only one row in another table. For example, a customer table might have a one-to-one relationship with an orders table, where each customer is related to only one order.

A one-to-many relationship is when each row in one table is related to multiple rows in another table. For example, a customers table might have a one-to-many relationship with an orders table, where each customer is related to multiple orders.

A many-to-many relationship is when each row in one table is related to multiple rows in another table, and each row in the other table is related to multiple rows in the first table. For example, a customers table might have a many-to-many relationship with a products table, where each customer is related to multiple products and each product is related to multiple customers.

29. What DDL and DML?

DDL (Data Definition Language) is a language used to create and modify database structures like tables, views, and indexes.

30. Write a query to get the total two-day rolling average for sales by day.

For example, suppose we have a sales table with the following data:

If we wanted to get the total two-day rolling average for sales by day, we could use the following SQL query:

SELECT date, sales, AVG(sales) OVER (ORDER BY date ROWS BETWEEN 1 PRECEDING AND 1 FOLLOWING) AS “Two-Day Rolling Average”

FROM sales

GROUP BY date

ORDER BY date

This would return the following result:

31. How would you teach SQL to a beginner?

If you were teaching SQL to a beginner, you would start by explaining the basics of databases and how they are used to store data. You would then move on to explaining the different types of SQL queries and how they are used to retrieve data from a database. Finally, you would teach them how to use SQL to insert, update, and delete data from a database.

32. What are some common date functions in SQL?

Some common date functions in SQL are:

  • CURRENT_DATE: Returns the current date.
  • CURRENT_TIME: Returns the current time.
  • CURRENT_TIMESTAMP: Returns the current date and time.
  • DATE_ADD: Adds a specified number of days, months, or years to a date.
  • DATE_SUB: Subtracts a specified number of days, months, or years from a date.
  • DAY: Returns the day of the month for a given date.
  • MONTH: Returns the month for a given date.
  • YEAR: Returns the year for a given date.

33. What are some advanced SQL functions?

There are many advanced SQL functions, but some of the most common are aggregate functions, window functions, and pivoting.

Aggregate functions are used to calculate a single value from multiple values. For example, the SUM() function calculates the sum of a column of values, and the AVG() function calculates the average of a column of values.

Window functions are used to calculate a value for each row in a table based on the values in other rows in the table. For example, the RANK() function assigns a rank to each row in a table, and the LAG() function returns the value of a column in a previous row.

Pivoting is when you rotate data from one format to another. For example, you can pivot data from a horizontal format to a vertical format, or from a columnar format to a row-based format.

Additional Resources

Nail your data analyst SQL interview with these 33 data analyst SQL questions

Your data analyst interview will test you on these concepts explicitly and implicitly. It’s not enough to just know what SQL queries show up at the data analyst interview; you need to be knowledgeable about how the subject fits into data analytics and its impact on your soon-to-be colleagues. First, get comfortable with the background and functionality of these concepts and queries. Then you can move on to the next step: practice solving the problems.

Frequently Asked Questions

What is the salary range for a data analyst?  
  • The salary for a data analyst can vary greatly depending on factors such as experience, location, and industry. In general, entry-level positions might start in the range of $50,000 to $90,000 annually, while more experienced analysts can earn from $90,000 to over $150,000.
How to practice SQL for data analysis?  
  • To practice SQL for data analysis, start by exploring free online resources and tutorials that offer hands-on SQL exercises. Websites like Big Tech Interviews, Khan Academy, and Codeacademy provide interactive SQL courses tailored for beginners to advanced users. Additionally, installing a database system such as MySQL or PostgreSQL and practicing writing queries using real datasets can be incredibly beneficial.
How to prepare for a SQL data analyst interview?  
  • Preparing for a SQL data analyst interview involves a combination of reviewing SQL basics and functions, practicing SQL queries on real-world datasets, and understanding database management fundamentals. Brush up on different types of SQL commands, including data manipulation, data definition, and data control commands. Familiarize yourself with common interview questions on joins, subqueries, and functions.
Where can I find data analyst SQL interview questions and answers in PDF format?  
  • Data analyst SQL interview questions and answers can be found here by clicking this link. Additionally, you might more resources on academic sharing sites such as Academia.edu or through specific SQL training and courses online.

Do you want to ace your SQL interview?

Practice free and paid real SQL interview questions with step-by-step answers.

Do you want to Ace your SQL interview?

Practice free and paid SQL interview questions with step-by-step video solutions!