#	SQL & Optional Data

SQL includes `null`, which is used to indicate __no value__. The problem is that it doesn’t explain _why_ there is no value. Here we look at adding more meaning to `null` to allow for optional data.

Here are some possibile interpretions of `null`:

- The value is _unknown_
- The value is _not important_
- There is no value _yet_

To test for `null`, you _cannot_ use `something=null`. Instead, you must use `something is null`. This reflects an interpretation of `null` as being unknown.

##	An Example

For example, suppose you have a table of __authors__, and you choose to record the __date of birth__ and the __date of death__.

```sql
CREATE TABLE authors (
	id INT PRIMARY KEY,
	name VARCHAR(40),
	born varchar,
	died varchar
);
```

(The `born` and `died` values are `varchar` here to simplify the example; in reality, they should be `date` types or similar).

What does it mean for the date of death to be null? What about the date of birth?

For the date of birth, it is easy to interpret a `null` as unknown.

For the date of death, however, there are _two_ possible interpretations:

- The author hasn’t died (yet)
- The author has died, but the actual data is unknown (or irrelevant)

SQL does not, of itself, accommodate alternative interpretations of `null`, so the reason remains a mystery.

##	One-Maybe Relationship

(This is my own term; many call it a __one-to-zero-or-one__, which is not quite as cool).

### One-Many Relationship

Normally, tables are related using a __Foreign Key__ to __Primary Key__ relationship, from a __child__ to __parent__ table:

```sql
CREATE TABLE child (
	id INT PRIMARY KEY,
	parentid INT REFERENCES parent(id)
	--	etc
);
```

Since many child rows can reference the same parent row, we say that there is a __one to many__ relationship between the parent and child.

### One-Maybe Relationship

A __one to one__ relationship is that of a _single_ row in one table to a _single_ row to another. Generally, it is between their primary keys.

Very few _true_ __one to one__ relationships exist, as that would involve creating both rows at the same time. In reality, One table is the main table, while the other is a secondary table, providing additional, possibly optional, columns.

You can create this relationship without changing your main table by creating a secondary table like this:

```sql
CREATE TABLE secondary (
	id INT PRIMARY KEY REFERENCES main(id)
	--	etc
);
```

Note that the `id` is both a __primary key__ and a __foreign key__. This is the key, so to speak, of this type of relationship.

If you are tempted to __auto-increment__ your primary key, don’t. The whole point is that it matches one of rows of the main table. The primary key nature ensures that it doesn’t happen twice.

---

## The Example Again

Here is a possible `authors` table, _without_ a date of death:

```sql
CREATE TABLE authors (
	id INT PRIMARY KEY,
	name VARCHAR(40),
	born varchar
);
INSERT INTO authors(id,name,born) VALUES
	(1,'Fred','now'),
	(2,'Wilma','then'),
	(3,'Betty','whenever'),
	(4,'Barney','sometime');
```

Here is a possible secondary table which will record only deaths, if any:

```sql
CREATE TABLE deaths (
	id INT PRIMARY KEY REFERENCES authors(id),
	died varchar
);
INSERT INTO deaths(id,died) VALUES
	(1,null),
	(3,'yesterday'),
	(4,'anotherday');
```

Note:

- __id__ `1` has died, but the date is unknown
- __id__ `2` is missing, and can be presumed living

###	Fetching the data

To fetch the full data, we will need a `JOIN`, and since there may be missing rows in the secondary table, we will need an `OUTER JOIN`; in this case it will be a `LEFT JOIN`.

```sql
SELECT *
FROM authors a LEFT JOIN deaths d ON a.id=d.id;
```

You will get something like this:

| a.id | a.name | a.born   | d.id   | d.died     |
|------|--------|----------|--------|------------|
|  1   | Freddy | now      |  1     | NULL       |
|  2   | Wilma  | then     |  NULL  | NULL       |
|  3   | Betty  | whenever |  3     | yesterday  |
|  4   | Barney | sometime |  4     | anotherday |

### Missing vs Empty

The problem is with the above result set is the `d.died` column. For __id__ `1`, the `d.died` column is `null` because the date is unknown. For __id__ `2`, it is `null` because there is no entry in the `deaths` table.

How do we distinguish between them? The solution is in the `d.id` column. Because the row is missing, it is `null` for __id__ `2`.

Knowing this, we can implement the following logic:

- if `d.id` is `null`, then the row is missing, and the author is living.
- else the author has died:
	- if `d.died` is `not null`, we can use the date
	- else the date is unknown

In SQL, this can be expressed with the `CASE` operator:

```sql
CASE
	WHEN d.id IS NULL THEN
		'living'
	ELSE WHEN d.died is NOT NULL THEN
		d.died
	ELSE
		'unknown'
END
```

If a column may or may not be `null`, you can use the `coalesce` function to provide an alternative:

```sql
coalesce(d.died,'unknown')
```

This means we will use the `d.died` value, if any, or `unknown` if it is null. This is a neater version of the two `ELSE` clauses above

```sql
CASE
	WHEN d.id IS NULL THEN
		'living'
	ELSE coalesce(d.died,'unknown')
END
```

###	The Finished Statement

Combining the above, we can write:

```sql
SELECT
	a.id,a.name,a.born,
	CASE WHEN d.id IS NULL THEN 'living'
		ELSE coalesce(d.died,'unknown') END AS died
FROM authors a LEFT JOIN deaths d ON a.id=d.id;
```

This gives:

| id | name   | born     | died       |
|----|--------|----------|------------|
| 1  | Freddy | now      | unknown    |
| 2  | Wilma  | then     | living     |
| 3  | Betty  | whenever | yesterday  |
| 4  | Barney | sometime | anotherday |
