Navicat Blog

The SQL Semi Join Oct 15, 2024 by Robert Gravelle

Most database developers and administrators are familiar with the standard inner, outer, left, and right JOIN types. While these can be written using ANSI SQL, there are other types of joins that are based on relational algebra operators that don't have a syntax representation in SQL. Today we'll be looking at one such join type: the Semi Join. Next week we'll tackle the similar Anti Join. To gain a better understanding of how these types of joins work, we'll execute some SELECT queries in Navicat Premium Lite 17 against the PostgreSQL dvdrental database. It's a free database that's based on the MySQL Sakila Sample Database.

Semi Joins Explained

Imagine for a moment that ANSI SQL did support Semi Joins. If it did, the syntax would probably be similar to that of the Cloudera Impala syntax extension, which is LEFT SEMI JOIN and RIGHT SEMI JOIN. With that in mind, here's what a query that utilizes a Semi Join might look like:

SELECT *
FROM actor
LEFT SEMI JOIN film_actor USING (actor_id)

The above query would return all actors that played in films. That catch is that we don't want any films in the results, nor do we want multiple rows of the same actor. We only want each actor once (or zero times) in the result. The word "Semi" originates from Latin and translates to "half" in English. Hence, our query implements only "half the join", in this case, the left half. In SQL, there are two alternative syntaxes that we can use to accomplish a Semi Join: EXISTS and IN.

Semi Joins Using EXISTS

Here is the equivalent of the Semi Join using EXISTS:

SELECT *
FROM actor a
WHERE EXISTS (
  SELECT * 
  FROM film_actor fa
  WHERE a.actor_id = fa.actor_id
)

If we execute our query in Navicat Premium Lite 17, we can see that it works just as expected:

semi_join_exists (147K)

Rather than use a join, the EXISTS operator checks for the presence of one or more rows for each actor in the film_actor table. Thanks to the WHERE clause, most databases will be able to recognize that we're performing a SEMI JOIN rather than an ordinary EXISTS() predicate.

Semi Joins Using IN

IN and EXISTS are exactly equivalent SEMI JOIN emulations, so the following query will produce the exact same results in most databases as the previous EXISTS query:

SELECT *
FROM actor
WHERE actor_id IN (
  SELECT actor_id FROM film_actor
)

Here again is the above query and results in Navicat Premium Lite 17:

semi_join_in (157K)

EXISTS is considered to be the more powerful (albeit a bit more verbose) syntax.

Conclusion

In today's blog we learned how to emulate a Semi Join using ANSI SQL syntax. In addition to being the optimal solution in terms of "correctness", there are also some performance benefits when using a "SEMI" JOIN rather than an INNER JOIN, as the database can stop looking for matches as soon as it found the first.

Interested in giving Navicat Premium Lite 17 a try? You can download it for a 14-day fully functional FREE trial. It's available for Windows, macOS, and Linux operating systems.

Navicat Blogs
Feed Entries
Blog Archives
Share