Mysql – In Mysql 5.5, why is the upper bound of a character range not inclusive

MySQLmysql-5.5

So I'm trying to query on a character range, i.e. all manufacturers with names starting with a or b:

SELECT manufacturer FROM warehouse WHERE manufacturer >= 'a' AND manufacturer <= 'b'

or:

SELECT manufacturer FROM warehouse WHERE manufacturer BETWEEN 'a' and 'b'

For both of these queries, I only get back manus with names starting with a. If I changed the range to a-c, I would get back all with a-b. Why is the upper bound of the range not inclusive? My main issue with this is how can I include all 'Z's in the range. A-Z would not include any z's in this case.

The manual here: http://dev.mysql.com/doc/refman/5.5/en/comparison-operators.html#operator_between
… seems pretty self explanatory. Unfortunately things are not working the way they should.

The weird part is, this works when I create a small test table. However it does not work properly on my 15 million row table.

Best Answer

Perhaps you can try REGEXP with the BINARY operator

SELECT manufacturer FROM warehouse
WHERE manufacturer REGEXP BINARY '^[ab]';

if you want a range (say from A-M)

SELECT manufacturer FROM warehouse
WHERE manufacturer REGEXP BINARY '^[A-M]';

case sensitivity ranges

SELECT manufacturer FROM warehouse
WHERE manufacturer REGEXP BINARY '^[A-Ma-m]';

Notice I used the ^ regular expression operator. This searches "from the beginning of the string".

As for your question, why does BETWEEN 'a' and 'b' fail to get all b's?

Look at what you are asking for in the query: Every manu that is >= 'a' and <= 'b'.

  • Does 'a' return?. If you have a manu named 'a', then yes.
  • Does 'aa' return?. If you have a manu named 'a', then yes.
  • Does 'azzzzzzzzzzzzzzz' return?. I think you know the answer to this one.
  • Does 'b' return?. If you have a manu named 'b', then yes.
  • Does 'ba' return?. Even if you had a manu named 'ba', then no.

Although inefficient, here is a query that would have worked:

SELECT manufacturer FROM warehouse
WHERE LEFT(manufacturer,1) IN ('a','b');

UPDATE 2013-02-11 15:22 EDT

I have a nice suggestion: Create an index that only stores two characters of manufacturer.

Perhaps this:

ALTER TABLE manufacturer 
ADD INDEX manufacturer_prefix (manufacturer(2));

If you have any index of the entire manufacturer field, drop that index. You can then do effective two-character range searches.

Related Question