Breaking Subqueries
In this series on Top SQL Query Mistakes, we've seen several examples of SQL queries that look perfectly solid on first inspection, but can lead to erroneous results and/or performance degradation. Last week, learned how the placement of predicates can adversely affect query execution - particularly in outer joins. Today's installment will focus on subqueries, and how they can break an SQL statement when changes are made to any of its underlying tables.
Single vs. Multiple Value Subqueries
Even before we compare single and multiple value subqueries, we should briefly cover what a subquery is. A subquery is a complete SQL query that is nested inside a larger query. A subquery may be placed in the SELECT, FROM, and WHERE clauses.
Now that we know what a subquery is and where it can go in a query, it should be noted that, like any SELECT query, a subquery may return one or more rows. This distinction is quite important, because it affects how you would write your query statement. For example, here's a query against the Sakila Sample Database in Navicat Premium 16 that fetches all of the actors who appeared in the film "ALONE TRIP":
Since there should only be one film named "ALONE TRIP", we can use the equals (=) operator to match the film_ids against.
Contrast the above query to the following one:
In this case, the subquery selects all of the actors who appeared in the movie. Naturally, this subquery would return multiple rows. In that case, we should employ the IN() function to match actor_ids against.
How Single Row Subqueries Break
As mentioned earlier, a subquery can be placed in the SELECT clause to fetch a column that is in some way correlated to the main query table. For example, consider these two related products and factories tables, shown in the Navicat Data Modeler:
The products and factories tables are linked using the common sku field.
Now, let's write a query to extract the factory_id for each product. One way to do that would be to write the query using correlated subquery to retrieve the product factory_id:
Note that the point here is to illustrate a technique; there are more efficient ways to retrieve the same information. In any event, we do get the correct result set, and all is well.
The query will continue to work perfectly well until the day arrives that the company decides to build a new factory as sales increase:
The extra row in the factories table causes our query to generate an error now:
The error is telling us that the outer query expected a scalar value, but our subquery returned a result set. We can fix the issue and list all factories that manufacture each product by using a JOIN:
One More Thing...
Be aware that the same error can occur in any clause where a column or expression is tested against a subquery, for example "column = (SELECT value FROM Table)". In that case, the solution is to use the IN() function instead of the equality (=) operator.