#	SQL & Optional Data

SQL includes `null`, which is used to indicate __no value__. The problem is that it doesn’t explain _why_ there is no value. Here are some possibilities:

- Unknown
- Not Important
- Not Yet

To test for `null`, you _cannot_ use `something=null`. Instead, you must use `something is null`. This reflects an interpretation of `null` as being unknown.

##	An Example

For example, suppose you have a table of __authors__, and you choose to record the __date of birth__ and the __date of death__.

```sql
CREATE TABLE authors (
	id INT PRIMARY KEY,
	name VARCHAR(40),
	born varchar,
	died varchar
);
```

What does it mean for the date of death to be null? What about the date of birth?

For the date of birth, it is easy to interpret a `null` as unknown.

For the date of death, however, there are _two_ possible interpretations:

- The author hasn’t died (yet)
- The author has died, but the actual data is unknown (or irrelevant)

SQL does not, of itself, accommodate alternative interpretations of `null`, so the reason remains a mystery.

##	One-Maybe Relationship

(This is my own term; many call it a __one-to-zero-or-one__, which is not quite as cool).

### One-Many Relationship

Normally, tables are related using a __Foreign Key__ to __Primary Key__ relationship, from a __child__ to __parent__ table:

```sql
CREATE TABLE child (
	id INT PRIMARY KEY,
	parentid INT REFERENCES parent(id)
	--	etc
);
```

Since many child rows can reference the same parent row, we say that there is a __one to many__ relationship between the parent and child.

### One-Maybe Relationship

A __one to one__ relationship is that of a _single_ row in one table to a _single_ row to another. Generally, it is between their primary keys.

Very few _true_ __one to one__ relationships exist, as that would involve creating both rows at the same time. In reality, One table is the main table, while the other is a secondary table, providing additional, possibly optional, columns.

You can create this relationship without changing your main table by creating a secondary table like this:

```sql
CREATE TABLE secondary (
	id INT PRIMARY KEY REFERENCES main(id)
	--	etc
);
```

If you are tempted to __auto-increment__ your primary key, don’t. The whole point is that it matches one of rows of the main table. The primary key nature ensures that it doesn’t happen twice.

---

## The Example Again

Here is a possible `authors` table, _without_ a date of death:

```sql
CREATE TABLE authors (
	id INT PRIMARY KEY,
	name VARCHAR(40),
	born varchar
);
INSERT INTO authors(id,name,born) VALUES
	(1,'Fred','now'),
	(2,'Wilma','then'),
	(3,'Betty','whenever'),
	(4,'Barney','sometime');
```

Here is a possible secondary table which will record deaths, if any:

```sql
CREATE TABLE deaths (
	id INT PRIMARY KEY REFERENCES authors(id),
	died varchar
);
INSERT INTO deaths(id,died) VALUES
	(1,null),
	(3,'yesterday'),
	(4,'anotherday');
```

Note:

- __id__ `1` has died, but the date is unknown
- __id__ `2` is missing, and can be presumed living

###	Fetching the data

To fetch the full data, we will need a `JOIN`, and since there may be missing rows in the secondary table, we will need a `LEFT JOIN`.

```sql
SELECT *
FROM authors a LEFT JOIN deaths d ON a.id=d.id;
```

You will get something like this:

| a.id | a.name | a.born   | d.id   | d.died     |
|------|--------|----------|--------|------------|
|  1   | Freddy | now      |  1     | NULL       |
|  2   | Wilma  | then     |  NULL  | NULL       |
|  3   | Betty  | whenever |  3     | yesterday  |
|  4   | Barney | sometime |  4     | anotherday |

### Missing vs Empty

The problem is with the above result set is the `died` column. For __id__ `1`, the `died` column is `null` because the date is unknown. For __id__ `2`, it is `null` because there is no entry in the `deaths` table.

How do we distinguish between them? The solution is in the `d.id` column. Because the row is missing, it is `null` for __id__ `2`.

Knowing this, we can implement the following logic:

- if `d.id` is `null`, then the row is missing, and the author is living.
- else the author has died:
	- if `died` is `not null`, we can use the date
	- else the date is unknown

In SQL, this can be expressed with the `CASE` operator:

```sql
CASE
	WHEN d.id IS NULL THEN
		'living'
	ELSE WHEN d.died is NOT NULL THEN
		d.died
	ELSE
		'unknown'
END
```

If a column may or may not be `null`, you can use the `coalesce` function:

```sql
coalesce(d.died,'unknown')
```

This means we will use the `d.died` value, if any, or `unknown` if it is null. This is a neater version of the two `ELSE` clauses above

```sql
CASE
	WHEN d.id IS NULL THEN
		'living'
	ELSE coalesce(d.died,'unknown')
END
```

###	The Finished Statement

Combining the above, we can write:

```sql
SELECT
	a.id,a.name,a.born,
	CASE WHEN d.id IS NULL THEN 'living'
		ELSE coalesce(d.died,'unknown') END AS died
FROM authors a LEFT JOIN deaths d ON a.id=d.id;
```

This gives:

| id | name   | born     | died       |
|----|--------|----------|------------|
| 1  | Freddy | now      | unknown    |
| 2  | Wilma  | then     | living     |
| 3  | Betty  | whenever | yesterday  |
| 4  | Barney | sometime | anotherday |
