Postgresql fulltext slow

performancepostgresqlpostgresql-performance

I am trying PostgreSQL for the first time, but am finding the fulltext search very slow.

My database has a table with 5 columns and 5 million rows

CREATE TABLE table_test (
    id integer NOT NULL,
    titre character varying(210),
    annee integer,
    pays character varying(50),
    resume_article text
);
INSERT INTO table_test(titre, annee, pays, resume_article) VALUES
('Blanche-Neige et les Sept Nains', '1937', 'USA', 'L''action débute lorsqu''un jour, la Reine interrogeant son miroir, celui-ci lui répond que la plus belle du royaume est dorénavant la princesse Blanche-Neige'),
('Les Petits Mouchoirs', '2010', 'France', 'Max, riche propriétaire d''un hôtel-restaurant, et sa femme Véro invitent chaque année leurs amis dans leur maison au Cap Ferret pour célébrer l’anniversaire d''Antoine et le début des vacances.'),
('La Reine des neiges', '2013', 'USA', 'Le roi et la reine partent immédiatement voir les trolls dont le roi est capable de guérir Anna.'),
('Au service de Sa Majesté', 'UK', '1937', 'Un mauvais garçon new-yorkais, fuit au Royaume unis en empruntant l''identité de sa victime présumée, un Canadien.');

My tests

// 1st test
ALTER TABLE table_test ADD COLUMN fulltext tsvector NOT NULL DEFAULT '';
UPDATE table_test SET fulltext = (to_tsvector('french', COALESCE(titre, '')) || to_tsvector('french', COALESCE(resume_article, '')));
CREATE INDEX IF NOT EXISTS fulltext_all ON table_test USING GIN(fulltext);

SELECT * FROM table_test WHERE fulltext @@ to_tsquery('reine') AND pays='USA'
//Result in 12/15 seconds


// 2nd test
ALTER TABLE table_test ADD COLUMN fulltext tsvector NOT NULL DEFAULT '';
UPDATE table_test SET fulltext= to_tsvector('simple',"titre" || ' ' || "resume_article");
CREATE INDEX IF NOT EXISTS fulltext_all ON table_test USING GIN(fulltext);

SELECT * FROM table_test WHERE fulltext @@ to_tsquery('reine') AND pays='USA'
//Result in 15/20 seconds


SELECT * FROM table_test WHERE (titre ILIKE '%reine%' OR resume_article ILIKE '%reine%') AND pays='USA'
//Result in 5/6 seconds

And if the result is more than 1000 rows, I need to wait 2 or 3 minutes.

    CREATE TABLE table_test (
    id integer NOT NULL,
    titre character varying(210),
    annee integer,
    pays character varying(50),
    resume_article text
    tsv tsvector NOT NULL,
    CONSTRAINT table_test_pkey PRIMARY KEY (id)
)
WITH (
  OIDS=FALSE
);
CREATE INDEX idx_pays ON table_test USING btree(pays);
CREATE INDEX idx_tsv ON table_test USING gin(tsv);

EXPLAIN ANALYZE SELECT * FROM table_test WHERE tsv@@ to_tsquery('french', 'maison');

"Bitmap Heap Scan on table_test  (cost=3980.87..363013.72 rows=430047 width=212) (actual time=675.558..37781.252 rows=429066 loops=1)"
"  Recheck Cond: (tsv @@ '''maison'''::tsquery)"
"  Rows Removed by Index Recheck: 2172041"
"  Heap Blocks: exact=41111 lossy=79485"
"  ->  Bitmap Index Scan on idx_tsv  (cost=0.00..3873.36 rows=430047 width=0) (actual time=486.134..486.134 rows=429066 loops=1)"
"        Index Cond: (tsv @@ '''maison'''::tsquery)"
"Planning time: 0.506 ms"
"Execution time: 37848.042 ms"

When I try this query without EXPLAIN ANALYSE, it's 2 minutes to wait. Can anyone explain where my mistakes are?

Best Answer

How do you know it's the full text search that's slow? It is very likely that the slow down is caused more by AND pays='USA' You do not have an index on it and with 5 million records 10 seconds seems fair enough.

Create an index on the pays field. This will speed things up provided that the pays field has good cardinality.