Wednesday, March 30, 2011

What is the difference between Left, Right, Outer and Inner Joins?

A Visual Explanation of SQL Joins

I thought Ligaya Turmelle's post on SQL joins was a great primer for novice developers. Since SQL joins appear to be set-based, the use of Venn diagrams to explain them seems, at first blush, to be a natural fit. However, like the commenters to her post, I found that the Venn diagrams didn't quite match the SQL join syntax reality in my testing.
I love the concept, though, so let's see if we can make it work. Assume we have the following two tables. Table A is on the left, and Table B is on the right. We'll populate them with four records each.
id name       id  name
-- ----       --  ----
Pirate     1   Rutabaga
2  Monkey     2   Pirate
Ninja      3   Darth Vader
4  Spaghetti  4   Ninja
Let's join these tables by the name field in a few different ways and see if we can get a conceptual match to those nifty Venn diagrams.
SELECT * FROM TableA
INNER JOIN TableB
ON TableA.name = TableB.name
 
id  name       id   name
--  ----       --   ----
1   Pirate     2    Pirate
3   Ninja      4    Ninja
Inner join produces only the set of records that match in both Table A and Table B.
Venn diagram of SQL inner join
SELECT * FROM TableA
FULL OUTER JOIN TableB
ON TableA.name = TableB.name
 
id    name       id    name
--    ----       --    ----
1     Pirate     2     Pirate
2     Monkey     null  null
3     Ninja      4     Ninja
4     Spaghetti  null  null
null  null       1     Rutabaga       
null  null       3     Darth Vader
Full outer join produces the set of all records in Table A and Table B, with matching records from both sides where available. If there is no match, the missing side will contain null.
Venn diagram of SQL cartesian join
SELECT * FROM TableA
LEFT OUTER JOIN TableB
ON TableA.name = TableB.name
 
id  name       id    name
--  ----       --    ----
1   Pirate     2     Pirate
2   Monkey     null  null
3   Ninja      4     Ninja
4   Spaghetti  null  null
Left outer join produces a complete set of records from Table A, with the matching records (where available) in Table B. If there is no match, the right side will contain null.
Venn diagram of SQL left join
SELECT * FROM TableA
LEFT OUTER JOIN TableB
ON TableA.name = TableB.name
WHERE TableB.id IS null
 
id  name       id     name
--  ----       --     ----
2   Monkey     null   null
4   Spaghetti  null   null
To produce the set of records only in Table A, but not in Table B, we perform the same left outer join, thenexclude the records we don't want from the right side via a where clause.
join-left-outer.png
SELECT * FROM TableA
FULL OUTER JOIN TableB
ON TableA.name = TableB.name
WHERE TableA.id IS null 
OR TableB.id IS null
 
id    name       id    name
--    ----       --    ----
2     Monkey     null  null
4     Spaghetti  null  null
null  null       1     Rutabaga
null  null       3     Darth Vader
To produce the set of records unique to Table A and Table B, we perform the same full outer join, thenexclude the records we don't want from both sides via a where clause.
join-outer.png
There's also a cartesian product or cross join, which as far as I can tell, can't be expressed as a Venn diagram:
SELECT * FROM TableA
CROSS JOIN TableB
This joins "everything to everything", resulting in 4 x 4 = 16 rows, far more than we had in the original sets. If you do the math, you can see why this is a very dangerous join to run against large tables.

Another Explanation of SQL Joins


Simple Example: Lets say you have a Students table, and a Lockers table.
Each student can be assigned to a locker, so there is a "LockerNumber" column in the student table. More than one student could potentially be in a single locker, but especially at the begining of the school year, you may have some incoming students without lockers and some lockers that have no students assigned.
For the sake of this example, lets say you have 100 students, 70 of which have lockers. You have a total of 50 lockers, 40 of which have at least 1 student.
INNER JOIN is equivalent to "show me all students with lockers".
Any students without lockers, or any lockers without students are missing.
Returns 70 rows
LEFT OUTER JOIN would be "show me all students, with their corresponding locker if they have one". 
This might be a general student list, or could be used to identify students with no locker. 
Returns 100 rows
RIGHT OUTER JOIN would be "show me all lockers, and the students assigned to them if there are any".
This could be used to identify lockers that have no students assigned, or lockers that have too many students. 
Returns 80 rows (list of 70 students in the 40 lockers, plus the 10 lockers with no student)
FULL OUTER JOIN would be silly and probably not much use. 
Something like "show me all students and all lockers, and match them up where you can" 
Returns 110 rows (all 100 students, including those without lockers. Plus the 10 lockers with no student)
CROSS JOIN is also fairly silly in this scenario.
It doesn't use the linked "lockernumber" field in the students table, so you basically end up with a big giant list of every possible student-to-locker pairing, whether or not it actually exists.
Returns 5000 rows (100 students x 50 lockers). Could be useful (with filtering) as a starting point to match up the new students with the empty lockers.