вторник, 27 мая 2008 г.

Intersection in MySQL

Recently I had to deal with one task which required to do two sets intersection. For example, if we have set A = {a, b, c} and set B = {b, c, d} the result should be set C = {b, c}. So the question was how do we do that with MySQL. My little investigation and few minutes of googling showed that there is no special command for set intersection in MySQL (like INTERSECT) or set difference, although there is one for set union (UNION. Now how come that there is one for union, bot none for other set operations. Unions are used more often?!). Instead, there is some tricky workaround using different kinds of JOINs.
Now for my task, I had one simple table words shown below:
-------------------------
word_field \\ link_field
-------------------------
a \\ link1
a \\ link2
b \\ link1
b \\ link3
c \\ link1
---------------------

Suppose, given some "word_field" values, we need to find all "link_field" values which are common for them. To make it clear, given "a" and "b" the answer would be "link1". I was able to produce some ugly looking query:

SELECT DISTINCT w1.link FROM
(SELECT * FROM words WHERE word = "a" OR word = "b") AS w1
JOIN words w2 ON w1.link = w2.link AND w1.word <> w2.word

AND (w2.word = "a" OR w2.word = "b");

I do strongly believe there exist some nice solution, which even could get rid of join, but the query above just worked for me and gave the correct results. With the lack of time and exams approaching I hadn't any chance to optimize it.
Now my concern is that how useful would be the implementation of "INTERSECT" command for MySQL. Let's say tables R and S both are on disk then it would take the same amount of disk I/O for JOIN and INTERSECT as they both will need to bring all the tuples of those two tables. On the other hand, the query
(SELECT ...) INTERSECT (SELECT ...)
makes more sense than
SELECT (JOIN (JOIN)).

1 комментарий:

qu1j0t3 комментирует...

select a.link_field from words a join words b on a.word_field='a' and b.word_field='b' and a.link_field=b.link_field;