MySQL – Search Alphanumeric string using wildcards – underscore (_) and asterisk (*)

MySQLpattern matchingstring-searching

I have a table, products, with a product_id column as follows:

product_id
110177
110177T
177

I am trying to filter down the product results that should fetch result as below but it is also fetching the id 177

110177
110177T

Query – select * from products where produuct_id like %'177'%

What updates in query to discard the string 177 in search result and only fetch rest of two?

Best Answer

If you want to eliminate the 177 value, you have to do the following (see the fiddle here):

CREATE TABLE p (p_id VARCHAR (25));

and populate it:

INSERT INTO p VALUES
('110177'),
('110177T'),
('177');

and then run the following SQL:

SELECT * FROM p WHERE p_id LIKE '___177%';

Result:

p_id
110177
110177T

Note that the ___177% (3 underscores) predicate here will pick up all values that have 3 characters (any single character) followed by 177 followed by any other characters or none.

This is due to the difference between the % (percent) wildcard and _ (underscore) - the _ means that the placeholder represents one, and precisely one character whereas the % wildcard represents 0 or more characters.

So, the 177 isn't picked up because it has no characters before the 177 - it's explained well here.

The != solution proposed by @Akina will also work but it implies knowing the values to be excluded in advance - my reading of your question is that you want to eliminate any really short product_ids and not just particular ones!

If you have more sophisticated requirements, you should take a look at regular expressions - an example from PostgreSQL can be found here - MySQL documentation here.

p.s. welcome to the forum!

Related Solutions

Mysql – How to GROUP_CONCAT DISTINCT values in a MySQL query that gets number of records and min/max values

UPDATE Now I see your error. The inner query uses aggregation, and takes out sizeRange which is not a column you aggregate on. So you only get "samples" of that column. Strictly speaking, your query is not valid SQL but MySQL allows it given a relaxed sql_mode.

So your query is inherently erroneous. Will see if I can help fix it.

ORIGINAL answer

My guess for you would be to check the value of group_concat_max_len.

It is by default just 1024, though you typically don't really want a limit for that.

The problem might be that you are only getting partial results, where, by chance or by order of evaluation, "S" and "XL" occupy first 1024 characters or more. I see no reason why "M" or "L" would not be there -- the GROUP_CONCAT doesn't do such distinctions.

So, try out:

SET group_concat_max_len := 1000000;

And execute your query again. If this works, make sure to set said param in you MySQL configuration file.

You may find my related post useful.

PostgreSQL FTS and Trigram-similarity Query Optimization

Assessment

In your last query, the bitmap index scan looking for 'hat' produces 307 hits.
Postgres then runs a bitmap heap scan to filter merchants similar enough ( similarity(...) > 0.2), producing 12 rows. Your test is with 30K rows, so your real life query will produce around 300 times as many hits, 90k / 3.5k for the test case at hand. An additional index on merchant will help.

Advice

I suggest you create an additional trigram index for the similarity search. Be sure to read the chapter in the manual about trigram index support. We need the additional module pg_trgminstalled (like you obviously have).

For your first request:

How can I search for a query like 'WALMART BAGS' which will first return me product BAG with merchant WALMART and then BAGS from other merchants.

I suggest this query using the similarity operator %:

-- SELECT set_limit(0.2)  -- Adjust similarity operator only if needed

SELECT *
FROM   products
WHERE  to_tsvector('english', product) @@ to_tsquery('bag')
AND    merchant % 'walmart'
ORDER  BY merchant <-> 'walmart'
--    LIMIT  n; -- possibly limit to top n results

Again, you can choose between GiST and GIN, but this time GiST carries a decisive advantage:

This can be implemented quite efficiently by GiST indexes, but not by GIN indexes. It will usually beat the first formulation when only a small number of the closest matches is wanted.

Therefore, I suggest this index:

CREATE INDEX prod_merchant_trgm_idx ON products USING gist (merchant gist_trgm_ops);

As for your second request:

Can I have both GIN and GIST index working for me?

Yes, you can. It would hardly make sense to have both types for the same (combination of) column(s), but Postgres can combine GiST and GIN indices very well in the same query. I quote the excellent manual yet again, on Combining Multiple Indexes:

To combine multiple indexes, the system scans each needed index and prepares a bitmap in memory giving the locations of table rows that are reported as matching that index's conditions. The bitmaps are then ANDed and ORed together as needed by the query. Finally, the actual table rows are visited and returned. The table rows are visited in physical order, because that is how the bitmap is laid out; this means that any ordering of the original indexes is lost, and so a separate sort step will be needed if the query has an ORDER BY clause. For this reason, and because each additional index scan adds extra time, the planner will sometimes choose to use a simple index scan even though additional indexes are available that could have been used as well.