How to Master Anti Joins and Apply Them to Business Problems

Anyone who has tried to learn SQL knows it’s tricky to get the hang of joins. You’re breezing through SELECT statements and comparison operators and ORDER BY, and wham! you run smack into joins. But, get over this conceptual speed bump, and you’re well on your way to becoming proficient at SQL.

Even after learning the principles of inner, outer, left, and right joins, you might still have a nagging question: how do I find values from one table that are NOT present in another table?

That’s where anti joins come in. They can be helpful in a variety of business situations when you’re trying to find something that hasn’t happened, such as:

  • Customers who did not place an order
  • Customers who have not visited your website
  • Salespeople who did not close a deal

We’ll walk through each of these situations later on, but first, here’s a primer on how to write an anti join.

How to perform an anti join

Unlike most SQL joins, an anti join doesn’t have its own syntax. To find all the values from Table_1 that are not in Table_2, you’ll need to use a combination of LEFT JOIN and WHERE.

1) Select every column from Table_1. Assign Table_1 an alias: t1.

SELECT *
  FROM Table_1 t1

SELECT from Table_1

2) LEFT JOIN Table_2 to Table_1 and assign Table_2 an alias: t2. By using a LEFT JOIN, your query will return all values from Table_1 (whether or not they are also present in Table_2).

SELECT *
  FROM Table_1 t1
  LEFT JOIN Table_2 t2 ON t1.id = t2.id

Both tables

LEFT JOIN the tables

3) Use a WHERE clause to filter out values that are present in Table_2.

SELECT *
  FROM Table_1 t1
  LEFT JOIN Table_2 t2 ON t1.id = t2.id
 WHERE t2.id IS NULL

Add a WHERE clause

The entire query will return only values that are in Table_1 but not in Table_2.

What an anti join looks like

Anti join examples

Once you’ve got the basics down, you can solidify your knowledge by practicing on real-world data. Try working through these three business situations on your own in Mode. Just sign up for an account and click here to write a new query against the tables in the Mode Public Warehouse.

Which customers didn’t place an order in August?

The first table (demo.accounts) has a record of all customer accounts. The second table (demo.orders) has a record of every order.

1) Select the columns you want from the accounts table: id (aliased as customer_id) and name (aliased as customer_name).

SELECT a.id as customer_id,
       a.name as customer_name
  FROM demo.accounts a

2) LEFT JOIN the orders table to the accounts table on the account_id, for orders that occurred in August 2016.

SELECT a.id as customer_id,
       a.name as customer_name
  FROM demo.accounts a
  LEFT JOIN demo.orders o
    ON a.id = o.account_id
   AND o.occurred_at BETWEEN '2016-08-01' AND '2016-08-31 23:59:59'

3) Exclude accounts that did place an order in August by adding a WHERE o.id is NULL clause.

SELECT a.id as customer_id,
       a.name as customer_name
  FROM demo.accounts a
  LEFT JOIN demo.orders o
    ON a.id = o.account_id
   AND o.occurred_at BETWEEN '2016-08-01' AND '2016-08-31 23:59:59'
 WHERE o.id is NULL

Which customers haven’t visited your website this year?

The first table (demo.accounts) has a record of all customer accounts. The second table (demo.web_events) has a record of every web visit.

1) Select the columns you want from the accounts table: id (aliased as customer_id) and name (aliased as customer_name).

SELECT a.id as customer_id,
       a.name as customer_name
  FROM demo.accounts a

2) LEFT JOIN the web events table to the accounts table on the account_id, for visits that occurred in 2016:

    SELECT a.id as customer_id,
           a.name as customer_name
           FROM demo.accounts a
 LEFT JOIN demo.web_events we
        ON we.account_id = a.id
       AND we.occurred_at BETWEEN '2016-01-01' AND '2016-12-31 23:59:59'

3) Exclude accounts that did visit the website in 2016 by adding a WHERE we.id is NULL clause.

    SELECT a.id as customer_id,
           a.name as customer_name
           FROM demo.accounts a
 LEFT JOIN demo.web_events we
        ON we.account_id = a.id
       AND we.occurred_at BETWEEN '2016-01-01' AND '2016-12-31 23:59:59'
     WHERE we.id is NULL

Which sales reps didn’t close a deal from September 5th - September 15th?

For this query, you’ll use three tables: demo.accounts, demo.orders, and demo.sales_reps.

1) While you already have your primary table containing records of all sales reps (demo.sales_reps), you’ll need a second table containing a record of all sales reps who did close a deal during the time range in order to perform your anti join. Since this table doesn’t exist already, you’ll need to start by creating a subquery.

SELECT o.id AS order_id,
       sr.id as sales_rep_id
  FROM demo.orders o
  JOIN demo.accounts a
    ON o.account_id = a.id
  JOIN demo.sales_reps sr
    ON sr.id = a.sales_rep_id
 WHERE o.occurred_at BETWEEN '2016-09-05' AND '2016-09-15 23:59:59'

Now you’ve got the two things you need: a table containing a record of all sales reps (demo.sales_reps) and a subquery that pulls a record of the sales rep associated with all orders that occurred over the specified time range. You’ll be able to join this subquery to a table just like you would join a table to a table.

2) Select the columns you want from the sales reps table. In this case, you’ll pull each sales rep’s id, name, and region id.

SELECT sr.id AS sales_rep_id,
       sr.name AS sales_rep_name,
       sr.region_id AS sales_rep_region_id
  FROM demo.sales_reps sr

3) LEFT JOIN the subquery (aliased as a) on sales_rep_id.

    SELECT sr.id AS sales_rep_id,
           sr.name AS sales_rep_name,
           sr.region_id AS sales_rep_region_id
      FROM demo.sales_reps sr
 LEFT JOIN(
    SELECT o.id AS order_id,
           sr.id as sales_rep_id
      FROM demo.orders o
      JOIN demo.accounts a
        ON o.account_id = a.id
      JOIN demo.sales_reps sr
        ON sr.id = a.sales_rep_id
     WHERE o.occurred_at BETWEEN '2016-09-05' AND '2016-09-15 23:59:59'
         ) a
        ON sr.id = a.sales_rep_id

4) Finally, exclude sales reps that are associated with an order in your subquery by adding a WHERE a.order_id is NULL clause.

    SELECT sr.id AS sales_rep_id,
           sr.name AS sales_rep_name,
           sr.region_id AS sales_rep_region_id
      FROM demo.sales_reps sr
 LEFT JOIN(
    SELECT o.id AS order_id,
           sr.id as sales_rep_id
      FROM demo.orders o
      JOIN demo.accounts a
        ON o.account_id = a.id
      JOIN demo.sales_reps sr
        ON sr.id = a.sales_rep_id
     WHERE o.occurred_at BETWEEN '2016-09-05' AND '2016-09-15 23:59:59'
         ) a
        ON sr.id = a.sales_rep_id
     WHERE a.order_id is NULL
No coding experience? No problem. Learn SQL and Python using real-world data with our free tutorials.