A Deeper Look at Advanced Select Queries
By: John Cox
http://net.tutsplus.com/tutorials/databases/a-deeper-look-at-advanced-select-queries-new-premium-tutorial
To the SQL novice, joins can be intimidating and somewhat frustrating. Even when you think you have the query nailed, along comes an unexpected result which make you throw up your arms in surrender to the SQL gods. In this tutorial, we are going to build upon the excellent quick tip of working with MySQL and INNER JOIN recently posted and crack open the real power of a SELECT statement.
By: John Cox
http://net.tutsplus.com/tutorials/databases/a-deeper-look-at-advanced-select-queries-new-premium-tutorial
To the SQL novice, joins can be intimidating and somewhat frustrating. Even when you think you have the query nailed, along comes an unexpected result which make you throw up your arms in surrender to the SQL gods. In this tutorial, we are going to build upon the excellent quick tip of working with MySQL and INNER JOIN recently posted and crack open the real power of a SELECT statement.
Tutorial Details
- Program: MySQL
- Difficulty: Moderate
- Estimated Completion Time: 20 Minutes
Introduction
These basic games that everyone learns in their earliest days of school is exactly the same skills that you use when you are working with SELECTs. We just have a different syntax that we have to use. Every time we use a SELECT command we are simply grouping a data set. Using JOINs we are simply grouping multiple data sets. We are finding ducks and cows that are either similar or different and pushing that off into an array that we can work with in our logic layer. It's the syntax that is confounding, but what we are doing is the same thing that we did in our very earliest days of school.
"Every time we use a SELECT command we are simply grouping a data set. Using JOINs we are simply grouping multiple data sets. We are finding ducks and cows that are either similar or different and pushing that off into an array that we can work with in our logic layer. It's the syntax that is confounding..."
I am going to cover a few things in this tutorial, but none are extremely difficult to put into practice. I am using MySQL as my database, but for the most part I will be using simple SELECT syntax that should at least be familiar regardless of your flavor of SQL. For my query tool, I am using Sequel Pro for Mac which is relatively fast and descriptive for my purposes today. I am using a very simple table structure of animal types, farms, and wild. For those playing along at home, the table structure looks something like this:
- --
- -- Table structure for table `animal_types`
- --
- CREATE TABLE `animal_types` (
- `animal_type_id` int(11) NOT NULL AUTO_INCREMENT,
- `animal_type_description` varchar(255) NOT NULL,
- PRIMARY KEY (`animal_type_id`)
- ) ENGINE=MyISAM DEFAULT CHARSET=utf8 AUTO_INCREMENT=7 ;
- --
- -- Dumping data for table `animal_types`
- --
- INSERT INTO `animal_types` VALUES(1, 'duck');
- INSERT INTO `animal_types` VALUES(2, 'cow');
- INSERT INTO `animal_types` VALUES(3, 'goose');
- INSERT INTO `animal_types` VALUES(4, 'cat');
- INSERT INTO `animal_types` VALUES(5, 'sheep');
- INSERT INTO `animal_types` VALUES(6, 'horse');
- --
- -- Table structure for table `farm`
- --
- CREATE TABLE `farm` (
- `animal_id` int(11) NOT NULL AUTO_INCREMENT,
- `animal_type` int(11) NOT NULL,
- `animal_description` varchar(255) NOT NULL,
- `animal_name` varchar(255) NOT NULL,
- PRIMARY KEY (`animal_id`)
- ) ENGINE=MyISAM DEFAULT CHARSET=utf8 AUTO_INCREMENT=5 ;
- --
- -- Dumping data for table `farm`
- --
- INSERT INTO `farm` VALUES(1, 2, 'spotted', 'Mal');
- INSERT INTO `farm` VALUES(2, 4, 'spotted', 'Zoe');
- INSERT INTO `farm` VALUES(3, 1, 'solid color', 'Wash');
- INSERT INTO `farm` VALUES(4, 2, 'solid color', 'Inara');
- --
- -- Table structure for table `wild`
- --
- CREATE TABLE `wild` (
- `animal_id` int(11) NOT NULL AUTO_INCREMENT,
- `animal_type` int(11) NOT NULL,
- `animal_description` varchar(255) NOT NULL,
- `animal_name` varchar(255) NOT NULL,
- PRIMARY KEY (`animal_id`)
- ) ENGINE=MyISAM DEFAULT CHARSET=utf8 AUTO_INCREMENT=6 ;
- --
- -- Dumping data for table `wild`
- --
- INSERT INTO `wild` (`animal_id`, `animal_type`, `animal_description`, `animal_name`) VALUES(1, 2, 'spotted', 'Jayne');
- INSERT INTO `wild` (`animal_id`, `animal_type`, `animal_description`, `animal_name`) VALUES(2, 6, 'solid color', 'Kaylee');
- INSERT INTO `wild` (`animal_id`, `animal_type`, `animal_description`, `animal_name`) VALUES(3, 1, 'spotted', 'Simon');
- INSERT INTO `wild` (`animal_id`, `animal_type`, `animal_description`, `animal_name`) VALUES(4, 3, 'solid color', 'River');
- INSERT INTO `wild` (`animal_id`, `animal_type`, `animal_description`, `animal_name`) VALUES(5, 5, 'solid color', 'Sheppard');
Back to Basics: Breaking the Shortcut Habit
Since it is the syntax, and not necessarily the concepts that are difficult, we should take a couple of steps back and remember some of the basics. There are many different shortcuts that you can take in writing a SELECT, but many times those shortcuts will cause trouble down the line. I tend to write my queries out, so that I don't have ambiguous column names, and I can see exactly what I am expecting to get.
Let's first start with my animal_types table which has a few animal types entered in. If I wanted to select everything in that table, I could write my select like so:
- SELECT *
- FROM animal_types
Which would give me a data set of:
The problem with using the " * " selector is that a year from now I probably will not remember what I am expecting from this SELECT. Furthermore, when I am joining tables, I will probably have ambiguous column names, meaning there are multiple columns that have the same name, thus SQL has no way of setting the results apart. Instead, let's be specific, which takes just a little more typing, but corrects some of these problems with taking shortcuts:
- SELECT animal_type_id,
- animal_type_description
- FROM animal_types
Which gives the exact data set back, but I see exactly what I should be getting back:
Even though we are explicitly naming our column names, we can go just a little bit further, and that is explicitly saying these columns belong to this table. This is another shortcut that we take all the time, which adds just a little more time to writing queries, but ends up saving the debugging time later when you are joining other tables. Instead of our last query, let's instead explicitly prefix our columns with a table prefix:
- SELECT att.animal_type_id,
- att.animal_type_description
- FROM animal_types att
We have added the prefix on the table animal_types of att, which tells SQL, when you see the prefix of u, realize that it belongs to the table named animal_types. Our data set looks exactly the same again:
Let's also tell SQL what to name our columns when it gives us our results. Many times when you are working with JOINs or other advanced queries, two columns will have the same name, which makes it difficult to distinguish in your logic layer. I like to explicitly tell SQL to deliver my column names in the manner that I want, and not necessarily as they are named. To do this, we are going to use the AS keyword:
- SELECT att.animal_type_id AS animal_type_id,
- att.animal_type_description AS animal_type_description
- FROM animal_types att
Which again gives us the exact same data set but we have again been even more explicit in our query:
There is one more way to be sure that I am telling SQL to SELECT exactly what I want, and from where I want, and that is to add the database prefix onto my table. Programming languages and different tools make it very easy for the database name to be implied, but there are times where you might call a different database in a query. When you add the database prefix you take the implication of working with a single database out of the mix.
SELECT att.animal_type_id AS animal_type_id, att.animal_type_description AS animal_type_description FROM tutorial.animal_types att
Again, exact same data set:
Except now we are telling SQL to explicitly look in the database "tutorial" for the table "animal_types". Again, this comes in handy when we are pulling information from multiple databases into a single select.
Sub-Selects: Multiple Selects Bundled Together
MySQL 4.1 added the ability to use Sub-Selects. A word of caution on Sub-Selects before we get too deep into them, use them wisely. The execution plan for Sub-Selects may well be much different than you would imagine, and thus when you believe that you are using an INDEX, you are really working with a full table scan. In my experience, you can optimize your Sub-Selects to be just as performance friendly as a natural join, but when you are dealing with optimization you will want to pay close attention to the EXPLAIN results on your query. For more information, take a look at a recent MySQL optimization tutorial which gives you insight on the query execution plans.
What a Sub-Select is basically executing a SELECT statement within a SELECT statement. Let's take a look at a very common use:
- SELECT f.animal_id AS animal_id,
- f.animal_type AS animal_type,
- f.animal_description AS animal_description,
- f.animal_name AS animal_name
- FROM tutorial.farm f
- WHERE f.animal_type IN (SELECT att.animal_type_id FROM tutorial.animal_types att where att.animal_type_id = 2)
Which gives us a data set of:
In this case we are saying to return all farm animals that have are of an animal type 2. Let's build on this just a bit, and instead look for animal type 1 or 2.
- SELECT f.animal_id AS animal_id,
- f.animal_type AS animal_type,
- f.animal_description AS animal_description,
- f.animal_name AS animal_name
- FROM tutorial.farm f
- WHERE f.animal_type IN (SELECT att.animal_type_id FROM tutorial.animal_types att where att.animal_type_id = 2 OR att.animal_type_id = 1)
Which now gives us a result of:
We are using the IN keyword to tell SQL to look at the results from our Sub-Select. In this case, we could easily rewrite this to not use a Sub-Select and get the same results:
- SELECT f.animal_id AS animal_id,
- f.animal_type AS animal_type,
- f.animal_description AS animal_description,
- f.animal_name AS animal_name
- FROM tutorial.farm f
- WHERE f.animal_type IN (1, 2)
Exact same results as before:
However, when we change our Sub-Select to be something like:
- SELECT f.animal_id AS animal_id,
- f.animal_type AS animal_type,
- f.animal_description AS animal_description,
- f.animal_name AS animal_name
- FROM tutorial.farm f
- WHERE f.animal_type IN (SELECT att.animal_type_id FROM tutorial.animal_types att where att.animal_type_description LIKE "%duck%" OR att.animal_type_description LIKE "%cow%")
We begin to see the power:
We can use a Sub-Select to search and give us a data set to limit our farm results. We can take this just a bit further by also using a Sub-Select in our query SELECT. Say for instance that we want to get the animal_type_description as a Sub-Select in our main query:
- SELECT f.animal_id AS animal_id,
- f.animal_type AS animal_type,
- f.animal_description AS animal_description,
- f.animal_name AS animal_name,
- (SELECT att.animal_type_description FROM tutorial.animal_types att where att.animal_type_id = f.animal_type) AS description
- FROM tutorial.farm f
- WHERE f.animal_type IN (SELECT att.animal_type_id FROM tutorial.animal_types att where att.animal_type_description LIKE "%duck%" OR att.animal_type_description LIKE "%cow%")
In this query we are placing a condition on our Sub-Select to return the animal_type_description when it matches the f.animal_id. This condition will last for this pass, and will be reset on the next data set. Basically, we have joined our tables together by simply using a Sub-Select. Again, re-read the caution from the beginning of this section on the performance, but as you can see you can use a Sub-Select to quickly join a table based on a condition performed in the Sub-Select.
INNER JOINS: The Exclusive Club
For instance, let's say that I want to see the animal type description for my farm animals, but only when an animal type id exists. I could write an INNER JOIN that would return this result:
SELECT f.animal_id AS farm_animal_id,- f.animal_type AS farm_animal_type,
- f.animal_description AS farm_animal_description,
- f.animal_name AS farm_animal_name,
- att.animal_type_description AS description
- FROM tutorial.farm f
- INNER JOIN tutorial.animal_types att
- ON f.animal_type = att.animal_type_id;
Awesome. That said, this example does not show the exclusivity that we get from our INNER JOIN. Let's do something that happens every day with our data structure and delete an entry from our definition look up table, animal types. I am going to delete the "cat" description:
- DELETE FROM animal_types
- WHERE animal_type_description = "cat"
And now I want to re-run my original query:
- SELECT f.animal_id AS farm_animal_id,
- f.animal_type AS farm_animal_type,
- f.animal_description AS farm_animal_description,
- f.animal_name AS farm_animal_name,
- att.animal_type_description AS description
- FROM tutorial.farm f
- INNER JOIN tutorial.animal_types att
- ON f.animal_type = att.animal_type_id;
All I have done is removed an attribute from my look-up table, and "Zoe" is still in my farm animals table:
- SELECT f.animal_name as name
- FROM tutorial.farm f
But since there is not a match in my animal types table, I will not get a result from my farm animals.
Let's play the matching game with our wild animals table to test the concept a little more:
- SELECT w.animal_id AS wild_animal_id,
- w.animal_type AS wild_animal_type,
- w.animal_description AS wild_animal_description,
- w.animal_name AS wild_animal_name,
- att.animal_type_description AS description
- FROM tutorial.wild w
- INNER JOIN tutorial.animal_types att
- ON w.animal_type = att.animal_type_id
Again, always remember an INNER JOIN is going to return data from two tables where there is a match in both of the tables. When you compare two pictures and make a single picture from the two, you will only be drawing the matches.
OUTER JOINS: The Descriptive Club
INNER JOINs are easy to use, but can be limiting. There are two types of joins with MySQL which do not have the exclusivity rules, and they are LEFT OUTER JOIN and RIGHT OUTER JOIN. Other SQL databases also have a FULL OUTER JOIN, which you can simulate in MySQL with a UNION. I tend to exclusively use LEFT OUTER JOINs, as I think of my OUTER JOINS as adding description to my SELECT statements.
When you are using a LEFT OUTER JOIN you are basically saying, get me everything from the LEFT table and join anything that matches the condition from the RIGHT table non-exclusively:
- SELECT f.animal_id AS farm_animal_id,
- f.animal_type AS farm_animal_type,
- f.animal_description AS farm_animal_description,
- f.animal_name AS farm_animal_name,
- att.animal_type_description AS description
- FROM tutorial.farm f
- LEFT OUTER JOIN tutorial.animal_types att
- ON f.animal_type = att.animal_type_id
Gives us a result of:
Our farm table is our LEFT table as that is where my natural selection comes from, and it will always return a result from the LEFT table, even when there is nothing that matches from our RIGHT table. Let's take a look at an example of a RIGHT OUTER JOIN using the same basic SELECT:
- SELECT f.animal_id AS farm_animal_id,
- f.animal_type AS farm_animal_type,
- f.animal_description AS farm_animal_description,
- f.animal_name AS farm_animal_name,
- att.animal_type_description AS description
- FROM tutorial.farm f
- RIGHT OUTER JOIN tutorial.animal_types att
- ON f.animal_type = att.animal_type_id
Which gives us a result of:
Which in my mind is not nearly as useful as a data set, because I end up will NULL entries in my LEFT data set. When I write a query, I generally want to know everything in my LEFT without those nasty NULL entries to deal with. Where OUTER JOINS really come in handy is when we are joining multiple tables together to build upon the information. Lets add a new look up table to the mix for our descriptions of whether the animal is striped, spotted or solid:
- --
- -- Table structure for table `animal_descriptions`
- --
- CREATE TABLE `animal_descriptions` (
- `animal_description_id` int(11) NOT NULL AUTO_INCREMENT,
- `animal_description` varchar(255) NOT NULL,
- PRIMARY KEY (`animal_description_id`)
- ) ENGINE=MyISAM DEFAULT CHARSET=utf8 AUTO_INCREMENT=4 ;
- --
- -- Dumping data for table `animal_descriptions`
- --
- INSERT INTO `animal_descriptions` VALUES(1, 'Spotted');
- INSERT INTO `animal_descriptions` VALUES(2, 'Striped');
- INSERT INTO `animal_descriptions` VALUES(3, 'Solid Color');
And after I change the values in my farm and wild tables to integers to match my description ids, I can then make multiple OUTER joins on my original query:
- SELECT f.animal_id AS farm_animal_id,
- f.animal_type AS farm_animal_type,
- d.animal_description AS farm_animal_description,
- f.animal_name AS farm_animal_name,
- att.animal_type_description AS description
- FROM tutorial.farm f
- LEFT OUTER JOIN tutorial.animal_types att
- ON f.animal_type = att.animal_type_id
- LEFT OUTER JOIN tutorial.animal_descriptions d
- on f.animal_description = d.animal_description_id
We are saying give me all farm animal names, and while you are at it, give me the type of animal it is, as well as the description of the animal, even when there might not be a match for the farm animal in the RIGHT tables:
Bam! We have what we are after. The only limit to a join is your imagination and the performance of your database server. The more that you normalize, the more that you will probably have the need to add non-exclusive description to you results.
Combining JOINS: Mix and Match
We have one animal that is striped. Let's remove striped from our new animal description table, and then just use LEFT OUTER JOINs to simulate our last example from the wild side:
- DELETE FROM tutorial.animal_descriptions where animal_description_id = '1'
And to select from our wild animals just as we did our farm animals:
- SELECT w.animal_id AS wild_animal_id,
- w.animal_type AS wild_animal_type,
- d.animal_description AS farm_animal_description,
- w.animal_name AS wild_animal_name,
- att.animal_type_description AS description
- FROM tutorial.wild w
- LEFT OUTER JOIN tutorial.animal_types att
- ON w.animal_type = att.animal_type_id
- LEFT OUTER JOIN tutorial.animal_descriptions d
- on w.animal_description = d.animal_description_id
Which gives us a NULL value where our Striped animal description used to be:
That's all well and good, until you realize that your data structure is no longer intact because of that NULL value. Let's combine our OUTER join description with the exclusivity of an INNER JOIN on our description:
SELECT w.animal_id AS wild_animal_id,- w.animal_type AS wild_animal_type,
- d.animal_description AS farm_animal_description,
- w.animal_name AS wild_animal_name,
- att.animal_type_description AS description
- FROM tutorial.wild w
- LEFT OUTER JOIN tutorial.animal_types att
- ON w.animal_type = att.animal_type_id
- INNER JOIN tutorial.animal_descriptions d
- on w.animal_description = d.animal_description_id
Which gets rid of that NULL value, which might be the difference of a bug in your application because of a data integrity issue, and peace and quiet on the weekend:
We can also self join a table which comes in handy from time to time. What we are doing when joining tables together is getting information from table.a and combining it table.b. SQL doesn't really care if table.b is the same table as table.a. Here's an example:
SELECT w.animal_id AS wild_animal_id,- w.animal_type AS wild_animal_type,
- d.animal_description AS farm_animal_description,
- same.animal_name AS wild_animal_name_from_same_table,
- att.animal_type_description AS description
- FROM tutorial.wild w
- LEFT OUTER JOIN tutorial.animal_types att
- ON w.animal_type = att.animal_type_id
- INNER JOIN tutorial.animal_descriptions d
- on w.animal_description = d.animal_description_id
- LEFT OUTER JOIN tutorial.wild same
- ON w.animal_id = same.animal_id
In this example, we are combining the same wild table to itself on animal id to get restate it's name:
While the result might not be that interesting in this example, think about how many times in your applications that you have created multiple tables to express a parent to child relationship. You could easily cut out the multiple table structure by joining a table on itself to change the meaning of the relationship.
Unions: A Different Way to Join
A UNION is the combination of multiple queries which combine our data sets into one. Let's take the situation where we want a description of all animals, whether farm or wild. We can simply express that using a UNION:
- SELECT f.animal_id AS animal_id,
- f.animal_type AS animal_type,
- d.animal_description AS animal_description,
- f.animal_name AS animal_name,
- 'Farm' AS domain,
- att.animal_type_description AS description
- FROM tutorial.farm f
- LEFT OUTER JOIN tutorial.animal_types att
- ON f.animal_type = att.animal_type_id
- LEFT OUTER JOIN tutorial.animal_descriptions d
- ON f.animal_description = d.animal_description_id
- UNION ALL
- SELECT w.animal_id AS animal_id,
- w.animal_type AS animal_type,
- d.animal_description AS animal_description,
- w.animal_name AS animal_name,
- 'Wild' AS domain,
- att.animal_type_description AS description
- FROM tutorial.wild w
- LEFT OUTER JOIN tutorial.animal_types att
- ON w.animal_type = att.animal_type_id
- LEFT OUTER JOIN tutorial.animal_descriptions d
- ON w.animal_description = d.animal_description_id
In this UNION the one thing that I have not explained is the explicit column "domain". I am merely saying either give me "Farm" or "Wild" depending on when you are running. I am also using UNION ALL instead of UNION as I want all results and not just distinct values. Our complex query gives us the expected data set that we were hoping for:
We can also use a Sub-Select in our UNION:
- SELECT f.animal_id AS animal_id,
- f.animal_type AS animal_type,
- d.animal_description AS animal_description,
- f.animal_name AS animal_name,
- 'Farm' AS domain,
- att.animal_type_description AS description
- FROM tutorial.farm f
- LEFT OUTER JOIN tutorial.animal_types att
- ON f.animal_type = att.animal_type_id
- LEFT OUTER JOIN tutorial.animal_descriptions d
- ON f.animal_description = d.animal_description_id
- UNION ALL
- SELECT w.animal_id AS animal_id,
- w.animal_type AS animal_type,
- d.animal_description AS animal_description,
- w.animal_name AS animal_name,
- 'Wild' AS domain,
- att.animal_type_description AS description
- FROM tutorial.wild w
- LEFT OUTER JOIN tutorial.animal_types att
- ON w.animal_type = att.animal_type_id
- LEFT OUTER JOIN tutorial.animal_descriptions d
- ON w.animal_description = d.animal_description_id
- WHERE w.animal_id IN (SELECT animal_id FROM wild WHERE animal_id <= 3)
Basically saying give me a union of the two queries, but only give me the results in the wild side that have an animal id less than or equal to 3. The produced data set is just as expected:
Unions are very easy to use, but a couple of notes. The first side of the UNION will determine the name of the column. In addition, both sides of the UNION must have the identical column numbers. You can't have one query with 4 columns of data with the other query with 3.
Simulating a FULL OUTER JOIN: Practical Use
- SELECT f.animal_id AS farm_animal_id,
- f.animal_type AS farm_animal_type,
- d.animal_description AS farm_animal_description,
- f.animal_name AS farm_animal_name,
- att.animal_type_description AS description
- FROM tutorial.farm f
- LEFT OUTER JOIN tutorial.animal_types att
- ON f.animal_type = att.animal_type_id
- LEFT OUTER JOIN tutorial.animal_descriptions d
- on f.animal_description = d.animal_description_id
- UNION
- SELECT f.animal_id AS farm_animal_id,
- f.animal_type AS farm_animal_type,
- d.animal_description AS farm_animal_description,
- f.animal_name AS farm_animal_name,
- att.animal_type_description AS description
- FROM tutorial.farm f
- RIGHT OUTER JOIN tutorial.animal_types att
- ON f.animal_type = att.animal_type_id
- RIGHT OUTER JOIN tutorial.animal_descriptions d
- on f.animal_description = d.animal_description_id
In this case I am using a UNION statement instead of the UNION ALL statement as I just want distinct values. Here's my result set:
No comments:
Post a Comment