Postgresql – How simplify/optimize a search of keywords in two tables at the same time

functionsperformanceplpgsqlpostgresqlpostgresql-9.4query-performance

I'm trying to model a function which will be used by a search engine (of cars for the example).
The last version of PostgreSQL will be installed on a Windows server (I don't know server's specifications for the moment). The default Postgresql configuration will be used.
In this search engine, you can type many words and each word can correspond to a specific car part or a tag.
The function below seems to work but i have some doubts about its performance on the long term because the 'tag' table and the 'parts' table will constantly grow. We can consider that each table will have more than a million of row.
The fields cars.id, parts.id, cars_parts, cars_tags.id are declared as primary keys so there is an index on each of them.
Each item in "tags" table contain only one word. Items in "parts" can contain many word.
How can I improve simplify it and improve its performance ?

CREATE OR REPLACE FUNCTION search(keywords text[], itemsLimit int)
    RETURNS TABLE(nbr bigint, cars_id bigint, cars_name varchar, cars_description text) AS $$
DECLARE
    potential_parts text[];
    potential_tags text[]; 
    keyword text;
BEGIN
    FOREACH keyword IN ARRAY keywords
    LOOP
    potential_parts := array_append(potential_parts, '%' || keyword || '%');
    potential_tags := array_append(potential_tags, keyword || '%');
    END LOOP;

    RETURN QUERY
    SELECT COUNT(*) nb, id, name, description
    FROM (
        SELECT cars.id, cars.name, cars.description
        FROM cars
        INNER JOIN cars_parts CP ON cars.id = CP.car_id
        INNER JOIN parts P ON CP.part_id = P.id
        WHERE p.name ILIKE ANY(potential_parts)

        /* Duplicates required to get cars having (part + tag) on top,
         using an order by on the count */
        UNION ALL

        SELECT cars.id, cars.name, cars.description
        FROM cars
        INNER JOIN cars_tags CT ON cars.id = CT.car_id
        INNER JOIN tags T ON CT.tag_id = T.id
        WHERE T.name ILIKE ANY(potential_tags)
    ) ids
    GROUP BY id, name, description
    ORDER BY nb DESC
    LIMIT itemsLimit;
END;
$$ LANGUAGE plpgsql;

-- Example of call :
    SELECT * FROM search(array['Fast', 'Wheel-554D'], 10);

Best Answer

The default Postgresql configuration will be used.

The default PostgreSQL configuration is meant to be adequate for smallish servers. If you are worried about scalability, which you seem to be, then restricting yourself to using the default configuration is going to be a poor plan.

WHERE p.name ILIKE ANY(potential_parts)

You are probably going to want a trgm_ops index on p.name, and similarly on T.name

You might also need to add user-space code which prevents people from searching on things of less than 3 or 4 characters.

We can consider that each table will have more than a million of row.

Make a million rows of dummy data, and see what happens. No use speculating when you can just do it.