InternotesSharing Web Development Techniques

Blossoms

Understanding Joins

A crash course in how joins work.

If you want, you can play along at db<>fiddle https://dbfiddle.uk/97i-4Yks. The sample uses PostgreSQL, but you’ll find that you can switch to pretty any other DBMS.

Books & Authors

There are two sample tables which illustrate a one-to-many relationship:

CREATE TABLE miniauthors (
    id INT PRIMARY KEY NOT NULL,
    fullname VARCHAR(24)
);
CREATE TABLE minibooks (
    id INT PRIMARY KEY NOT NULL,
    title VARCHAR(24),
    authorid INT REFERENCES miniauthors(id)
);

The sample data looks something like this:

SELECT * FROM minibooks;
SELECT * FROM miniauthors;
id title authorid
1 Roughing It 4
2 Alice's Adventures in Wonderland 3
3 Frankenstein 2
4 The Time Machine
5 Sense & Sensibility 1
6 The Hunting of the Snark 3
7 The Innocents Abroad 4
8 Pride and Prejudice 1
id fullname
1 Jane Austen
2 Mary Wollstonecraft Shelley
3 Lewis Carroll
4 Mark Twain
5 Jules Verne

One-toMany Relationship

Note that the minibooks table has a foreign key (minibooks.authorid) which references the miniauthors table (miniauthors.id):

Books and Authors Relationship

You can see the data in the following figure:

Books and Authors Data

You’ll see:

  • Most of the books reference an author.

  • Many of the books reference the same author as other books.

  • A book may have a NULL foreign key, so it doesn’t reference an author.
    An author may have no books referencing it.

The way these tables are constructed is right and proper, but also inconvenient. To get a complete set of details, you will need to look up both tables.


Combining the Tables

If you want to look at both the books and authors, it would be more convenient if the data were in one table:

Books and Authors Full Join

The down side of this is that the author details are repeated, and would be difficult to maintain. It is also hard to justify including authors which are not referenced by a book. This is definitely a wrong way to design tables, and it would violate the rules of normalisation.

However, it’s OK to generate a virtual table, which is a temporary combination of the two tables. This is called a join.

As you’ll see shortly, there are different types of joins, which are really just variations on how much data you want to include.


Books FULL (OUTER) JOIN Authors

To get a full combination of all of the data in the two tables, you can use a FULL OUTER JOIN:

SELECT *
FROM minibooks FULL /* OUTER */ JOIN miniauthors
    ON minibooks.authorid=miniauthors.id;
id title authorid id fullname
1 Roughing It 4 4 Mark Twain
2 Alice’s Adventures in Wonderland 3 3 Lewis Carroll
3 Frankenstein 2 2 Mary Wollstonecraft Shelley
4 The Time Machine
5 Sense and Sensibility 1 1 Jane Austen
6 The Hunting of the Snark 3 3 Lewis Carroll
7 The Innocents Abroad 4 4 Mark Twain
8 Pride and Prejudice 1 1 Jane Austen
5 Jules Verne

As you see from the comment in the query, the word OUTER is optional. You can decide to leave it in or leave it out; it’s best to make up your mind. In this discussion, we’ll leave it in only as a comment and in the description.

The full outer join has the benefit of having all of the data in a single virtual table, which would make it easy to select columns and to filter the rows.

Note here that we’ve joined from the foreign key in one table to the primary key in the other, which is how the foreign key constraint was defined, making it a one-to-many (or many-to-one) relationship. You’re not limited to this relationship. You can join on any columns which you consider should be matched, as long as they’re compatible.

It’s not very often that you’ll actually use the full outer join as it is, which includes rows from both tables which don’t match anything in the other. You’re more likely to want to limit the results to those which do match.


Using Aliases

When joining tables, it’s generally a good idea to simplify your table names, using aliases:

SELECT *
FROM minibooks AS b FULL /* OUTER */ JOIN miniauthors AS a
    ON b.authorid=a.id;

Reducing the table name to a single letter is obviously just a convenience. You don’t have to of course, especially if the table names are already short and convenient.

Well, occasionally you will need alias at least one of the tables. This is especially the case if you’re joining a table to itself.

Filtered Joins

You don’t often want the full outer join, as it includes unmatched rows from both tables.

Filtering for Books

For example, if you want to focus on the books table, you might want the following:

Books and Authors Left Join

This is the sort of result you want when developing, say, a price list. Note that the result may include books which don’t reference and author, which is OK if the author information is useful but not strictly necessary. On the other hand, it doesn’t doesn’t include any authors where there’s no matching book.


You can get this result using a WHERE clause:

SELECT *
FROM minibooks AS b FULL /* OUTER */ JOIN miniauthors AS a
    ON b.authorid=a.id
WHERE b.id IS NOT NULL;

This will give you:

id title authorid id fullname
1 Roughing It 4 4 Mark Twain
2 Alice’s Adventures in Wonderland 3 3 Lewis Carroll
3 Frankenstein 2 2 Mary Wollstonecraft Shelley
4 The Time Machine
5 Sense and Sensibility 1 1 Jane Austen
6 The Hunting of the Snark 3 3 Lewis Carroll
7 The Innocents Abroad 4 4 Mark Twain
8 Pride and Prejudice 1 1 Jane Austen

Nobody ever does it this way!

Instead, we write the above as a left outer join:

SELECT *
FROM minibooks AS b LEFT /* OUTER */ JOIN miniauthors AS a
    ON b.authorid=a.id;

This will give exactly the same result, and is more idiomatic; it’s also easier to write.

Why is it a LEFT join? Because the books table is written on the left, and that’s the table we’re focussing on. If you had written the tables the other way round, and you’re more than welcome to do this if that’s what you want, then you’ll get the same results with at RIGHT join. Well, not quite, since the column order would also be the other way round, but you can easily fix that in your SELECT clause.

Just remember that a left outer join is simply a filtered full outer join where the left table rows are not missing.


Filtering for Authors

Suppose, on the other hand, that you want to focus on the authors table:

Books and Authors Right Join

This is the sort of thing you might want if you want to discuss the authors, and samples of their work, if any; author without samples are still to be included. Note that it doesn’t include any books which don’t match an author.

You can filter the full join this way:

SELECT *
FROM minibooks AS b FULL /* OUTER */ JOIN miniauthors AS a
    ON b.authorid=a.id
WHERE a.id IS NOT NULL;

This will give you something like:

id title authorid id fullname
1 Roughing It 4 4 Mark Twain
2 Alice’s Adventures in Wonderland 3 3 Lewis Carroll
3 Frankenstein 2 2 Mary Wollstonecraft Shelley
5 Sense and Sensibility 1 1 Jane Austen
6 The Hunting of the Snark 3 3 Lewis Carroll
7 The Innocents Abroad 4 4 Mark Twain
8 Pride and Prejudice 1 1 Jane Austen
5 Jules Verne

Again, nobody writes it this way. Rather, we use a right outer join:

SELECT *
FROM minibooks AS b RIGHT /* OUTER */ JOIN miniauthors AS a
    ON b.authorid=a.id;

Remember, that the left and right outer joins are simply filtered full outer joins. Whether you use LEFT or RIGHT depends on which way you wrote the two tables, and which table is of more interest.

In practice, you might find it convenient to put your preferred table on the left and the use a left outer join. However, that’s not always practical.

Only Fully Matching Rows

There are times when you only want rows which are matched on both sides:

Books and Authors Inner Join

For example, you might have products and suppliers, and decide that you want to list only products for which you have a supplier. Suppliers without matching products would be irrelevant.

Again, you can use a filter for this, though now it’s starting to get crowded:

SELECT *
FROM minibooks AS b FULL /* OUTER */ JOIN miniauthors AS a
    ON b.authorid=a.id
WHERE b.id IS NOT NULL AND a.id IS NOT NULL;

This will give you something like this:

id title authorid id fullname
1 Roughing It 4 4 Mark Twain
2 Alice’s Adventures in Wonderland 3 3 Lewis Carroll
3 Frankenstein 2 2 Mary Wollstonecraft Shelley
5 Sense and Sensibility 1 1 Jane Austen
6 The Hunting of the Snark 3 3 Lewis Carroll
7 The Innocents Abroad 4 4 Mark Twain
8 Pride and Prejudice 1 1 Jane Austen

Note that now we have no books without authors or authors without books.


Again, this is generally written this way:

SELECT *
FROM minibooks AS b /* INNER */ JOIN miniauthors AS a
    ON b.authorid=a.id;

Notes

Join Types

As you see there are four main types of joins:

  • The full outer join combines all the rows from both tables.

  • The left and right outer joins filter the full outer join, including unmatched rows from one of the tables.

  • The inner join filters the full outer join excluding all unmatched rows.

All of this is moot if there are no unmatched rows. In that case, any of the above will give you the same results, and you may as well use the inner join since it’s easier to write.

If there are unmatched rows, then you’ll need to decide whether or not you want to include them, and, if so, which ones.


Less Common Join Types

There are other common join types, including:

  • The CROSS JOIN. This combines every row in one table with every row in the other.

    This type of join is almost never useful, but it can be used, for example, if you have a single-row table or virtual table with values you want to combine with another table.

An example of where the CROSS join can be used for data values is:

WITH data AS (SELECT 0.1 AS tax /* FROM DUAL */)
SELECT *
FROM books CROSS JOIN data;

This effectively adds another column to the the other table, allowing you to use its value in calculations.

  • The NATURAL JOIN. This is really an INNER join with an implied ON clause which you leave out. It matches columns from both tables which have the same name. As a minor bonus, you would only get one copy of that column.

A NATURAL join might look like this:

SELECT *
FROM first_table NATURAL JOIN other_table;

Not all DBMSs support natural joins. Even where they do, you need to be sure that the columns with the same name are really the ones you want joined.

Older Syntax

Historically there is an older syntax for a CROSS join:

SELECT *
FROM first_table, other_table       --  aka first_table CROSS JOIN other_table
--  etc
;

This didn’t make the cross join any more useful, but you could finish it off with a WHERE clause to implement, say, an INNER join:

SELECT *
FROM first_table, other_table
WHERE first_table.otherid=othertable.id
;

The main reason to use this syntax these days is nostalgia.


However, if one of your tables is a single-row (virtual) data table, the older syntax can be simpler:

WITH data AS (SELECT 0.1 AS tax /* FROM DUAL */)
SELECT *
FROM books , data;

Omitting INNER and OUTER

As you see the the above discussion, both the INNER and OUTER keywords are optional. You just need to make up your mind whether you want to include them in your code.

Note that:

  • LEFT, RIGHT, and FULL all imply OUTER.

  • Using JOIN without another keyword implies an INNER JOIN.

In this author’s humble opinion, the second point is unfortunate, as it’s not necessarily obvious that you’ve asked for the result to be filtered. It might have been better if it implied a FULL OUTER JOIN as that is the basic combination without filtering. However, this author was somehow not invited to comment.