I work as a software engineer with python and django. Currently I am struggling with a design choice made before my time. We have a transaction table that logs all customer activity. Due to the success of the platform the data in the table is rapidly increasing. I have issues getting the query time to a manageable size. I have this example of a query that runs extremely slow. I guess some good indexing could do the job but I don't really know where to start. I would love some tips on how to help myself (any quality posts/books or other resources) or how to solve this problem. If somehow possible i would like to not make manual queries and just use the ORM. The * in the select i placed to make the query more readable.
SELECT *
FROM "customer_customer"
INNER JOIN "customer_transaction"
ON ("customer_customer"."id" = "customer_transaction"."customer_id")
WHERE ("customer_customer"."status" = 1
AND NOT ("customer_customer"."id" IN ( SELECT U1."customer_id" AS Col1
FROM "customer_transaction" U1
WHERE U1."transaction_type" IN (30, 14)
)
)
AND "customer_transaction"."date" >= '2018-05-11 11:01:43.598530+02:00')
As Asked in the comments here are additional infos:
Currently I am running the commands on my local computer. The Query is generated by the orm.
Create of the customer table:
--
-- PostgreSQL database dump
--
-- Dumped from database version 9.6.0
-- Dumped by pg_dump version 9.6.0
SET statement_timeout = 0;
SET lock_timeout = 0;
SET idle_in_transaction_session_timeout = 0;
SET client_encoding = 'UTF8';
SET standard_conforming_strings = on;
SET check_function_bodies = false;
SET client_min_messages = warning;
SET row_security = off;
SET search_path = public, pg_catalog;
SET default_tablespace = '';
SET default_with_oids = false;
--
-- Name: customer_customer; Type: TABLE; Schema: public; Owner: postgres
--
CREATE TABLE customer_customer (
id integer NOT NULL,
firstname character varying(63),
phone character varying(31),
terms_accepted boolean NOT NULL,
user_id integer,
cashed_vip_points integer NOT NULL,
vip_points integer NOT NULL,
receive_mail boolean NOT NULL,
mailchimp_email character varying(254),
image character varying(100),
image_thumb character varying(100),
favorite_hash character varying(31),
has_accepted_favorite_hint boolean NOT NULL,
address_id integer,
blog_url character varying(200),
instagram_username character varying(200),
overrule_default_vip_points integer,
status integer NOT NULL,
signature boolean,
signature_date date,
store_id_id integer,
shopping_mail boolean NOT NULL,
CONSTRAINT customer_customer_overrule_default_vip_points_check CHECK ((overrule_default_vip_points >= 0))
);
ALTER TABLE customer_customer OWNER TO postgres;
--
-- Name: customer_customer_id_seq; Type: SEQUENCE; Schema: public; Owner: postgres
--
CREATE SEQUENCE customer_customer_id_seq
START WITH 1
INCREMENT BY 1
NO MINVALUE
NO MAXVALUE
CACHE 1;
ALTER TABLE customer_customer_id_seq OWNER TO postgres;
--
-- Name: customer_customer_id_seq; Type: SEQUENCE OWNED BY; Schema: public; Owner: postgres
--
ALTER SEQUENCE customer_customer_id_seq OWNED BY customer_customer.id;
--
-- Name: customer_customer id; Type: DEFAULT; Schema: public; Owner: postgres
--
ALTER TABLE ONLY customer_customer ALTER COLUMN id SET DEFAULT nextval('customer_customer_id_seq'::regclass);
--
-- Name: customer_customer customer_customer_address_id_key; Type: CONSTRAINT; Schema: public; Owner: postgres
--
ALTER TABLE ONLY customer_customer
ADD CONSTRAINT customer_customer_address_id_key UNIQUE (address_id);
--
-- Name: customer_customer customer_customer_favorite_hash_key; Type: CONSTRAINT; Schema: public; Owner: postgres
--
ALTER TABLE ONLY customer_customer
ADD CONSTRAINT customer_customer_favorite_hash_key UNIQUE (favorite_hash);
--
-- Name: customer_customer customer_customer_pkey; Type: CONSTRAINT; Schema: public; Owner: postgres
--
ALTER TABLE ONLY customer_customer
ADD CONSTRAINT customer_customer_pkey PRIMARY KEY (id);
--
-- Name: customer_customer customer_customer_user_id_key; Type: CONSTRAINT; Schema: public; Owner: postgres
--
ALTER TABLE ONLY customer_customer
ADD CONSTRAINT customer_customer_user_id_key UNIQUE (user_id);
--
-- Name: customer_customer_211f6852; Type: INDEX; Schema: public; Owner: postgres
--
CREATE INDEX customer_customer_211f6852 ON customer_customer USING btree (store_id_id);
--
-- Name: customer_customer customer_custo_address_id_41aab9497590bc7_fk_address_address_id; Type: FK CONSTRAINT; Schema: public; Owner: postgres
--
ALTER TABLE ONLY customer_customer
ADD CONSTRAINT customer_custo_address_id_41aab9497590bc7_fk_address_address_id FOREIGN KEY (address_id) REFERENCES address_address(id) DEFERRABLE INITIALLY DEFERRED;
--
-- Name: customer_customer customer_custom_store_id_id_67b8071f917b6245_fk_stores_store_id; Type: FK CONSTRAINT; Schema: public; Owner: postgres
--
ALTER TABLE ONLY customer_customer
ADD CONSTRAINT customer_custom_store_id_id_67b8071f917b6245_fk_stores_store_id FOREIGN KEY (store_id_id) REFERENCES stores_store(id) DEFERRABLE INITIALLY DEFERRED;
--
-- Name: customer_customer customer_customer_user_id_482ced6557101913_fk_auth_user_id; Type: FK CONSTRAINT; Schema: public; Owner: postgres
--
ALTER TABLE ONLY customer_customer
ADD CONSTRAINT customer_customer_user_id_482ced6557101913_fk_auth_user_id FOREIGN KEY (user_id) REFERENCES auth_user(id) DEFERRABLE INITIALLY DEFERRED;
--
-- PostgreSQL database dump complete
--
And of the transaction table:
--
-- PostgreSQL database dump
--
-- Dumped from database version 9.6.0
-- Dumped by pg_dump version 9.6.0
SET statement_timeout = 0;
SET lock_timeout = 0;
SET idle_in_transaction_session_timeout = 0;
SET client_encoding = 'UTF8';
SET standard_conforming_strings = on;
SET check_function_bodies = false;
SET client_min_messages = warning;
SET row_security = off;
SET search_path = public, pg_catalog;
SET default_tablespace = '';
SET default_with_oids = false;
--
-- Name: customer_transaction; Type: TABLE; Schema: public; Owner: postgres
--
CREATE TABLE customer_transaction (
id integer NOT NULL,
points integer,
transaction_type integer NOT NULL,
customer_id integer NOT NULL,
date timestamp with time zone NOT NULL,
product_id integer,
fotostream_entry_id integer,
acommit_transaction_id character varying(36),
amount numeric(6,2),
has_storno_id integer,
merged_customernumber_id integer,
message_de character varying(255),
message_fr character varying(255),
points_befor_migration integer,
store_id integer,
storno_from_id integer,
user_id integer,
_transaction_type_messages_id integer,
aac_import_row character varying(5000)
);
ALTER TABLE customer_transaction OWNER TO postgres;
--
-- Name: customer_transaction_id_seq; Type: SEQUENCE; Schema: public; Owner: postgres
--
CREATE SEQUENCE customer_transaction_id_seq
START WITH 1
INCREMENT BY 1
NO MINVALUE
NO MAXVALUE
CACHE 1;
ALTER TABLE customer_transaction_id_seq OWNER TO postgres;
--
-- Name: customer_transaction_id_seq; Type: SEQUENCE OWNED BY; Schema: public; Owner: postgres
--
ALTER SEQUENCE customer_transaction_id_seq OWNED BY customer_transaction.id;
--
-- Name: customer_transaction id; Type: DEFAULT; Schema: public; Owner: postgres
--
ALTER TABLE ONLY customer_transaction ALTER COLUMN id SET DEFAULT nextval('customer_transaction_id_seq'::regclass);
--
-- Name: customer_transaction customer_transaction_pkey; Type: CONSTRAINT; Schema: public; Owner: postgres
--
ALTER TABLE ONLY customer_transaction
ADD CONSTRAINT customer_transaction_pkey PRIMARY KEY (id);
--
-- Name: customer_transacti_acommit_transaction_id_71030b1b69b97709_like; Type: INDEX; Schema: public; Owner: postgres
--
CREATE INDEX customer_transacti_acommit_transaction_id_71030b1b69b97709_like ON customer_transaction USING btree (acommit_transaction_id varchar_pattern_ops);
--
-- Name: customer_transacti_acommit_transaction_id_71030b1b69b97709_uniq; Type: INDEX; Schema: public; Owner: postgres
--
CREATE INDEX customer_transacti_acommit_transaction_id_71030b1b69b97709_uniq ON customer_transaction USING btree (acommit_transaction_id);
--
-- Name: customer_transaction_7473547c; Type: INDEX; Schema: public; Owner: postgres
--
CREATE INDEX customer_transaction_7473547c ON customer_transaction USING btree (store_id);
--
-- Name: customer_transaction_928570bc; Type: INDEX; Schema: public; Owner: postgres
--
CREATE INDEX customer_transaction_928570bc ON customer_transaction USING btree (merged_customernumber_id);
--
-- Name: customer_transaction_9524d7ad; Type: INDEX; Schema: public; Owner: postgres
--
CREATE INDEX customer_transaction_9524d7ad ON customer_transaction USING btree (_transaction_type_messages_id);
--
-- Name: customer_transaction_9bea82de; Type: INDEX; Schema: public; Owner: postgres
--
CREATE INDEX customer_transaction_9bea82de ON customer_transaction USING btree (product_id);
--
-- Name: customer_transaction_b65a298f; Type: INDEX; Schema: public; Owner: postgres
--
CREATE INDEX customer_transaction_b65a298f ON customer_transaction USING btree (fotostream_entry_id);
--
-- Name: customer_transaction_cb24373b; Type: INDEX; Schema: public; Owner: postgres
--
CREATE INDEX customer_transaction_cb24373b ON customer_transaction USING btree (customer_id);
--
-- Name: customer_transaction_d9b62ea2; Type: INDEX; Schema: public; Owner: postgres
--
CREATE INDEX customer_transaction_d9b62ea2 ON customer_transaction USING btree (storno_from_id);
--
-- Name: customer_transaction_date_bd33b3ac; Type: INDEX; Schema: public; Owner: postgres
--
CREATE INDEX customer_transaction_date_bd33b3ac ON customer_transaction USING btree (date);
--
-- Name: customer_transaction_e8701ad4; Type: INDEX; Schema: public; Owner: postgres
--
CREATE INDEX customer_transaction_e8701ad4 ON customer_transaction USING btree (user_id);
--
-- Name: customer_transaction_f2c0da2f; Type: INDEX; Schema: public; Owner: postgres
--
CREATE INDEX customer_transaction_f2c0da2f ON customer_transaction USING btree (has_storno_id);
--
-- Name: customer_transaction_transaction_type_36582b63; Type: INDEX; Schema: public; Owner: postgres
--
CREATE INDEX customer_transaction_transaction_type_36582b63 ON customer_transaction USING btree (transaction_type);
--
-- Name: customer_transaction_transaction_type_custome_3619995d_idx; Type: INDEX; Schema: public; Owner: postgres
--
CREATE INDEX customer_transaction_transaction_type_custome_3619995d_idx ON customer_transaction USING btree (transaction_type, customer_id, date);
--
-- Name: customer_transaction_transaction_type_customer_id_3eb6f7d0_idx; Type: INDEX; Schema: public; Owner: postgres
--
CREATE INDEX customer_transaction_transaction_type_customer_id_3eb6f7d0_idx ON customer_transaction USING btree (transaction_type, customer_id);
--
-- Name: customer_transaction D4d691342aa107b3b4fb5a167936d123; Type: FK CONSTRAINT; Schema: public; Owner: postgres
--
ALTER TABLE ONLY customer_transaction
ADD CONSTRAINT "D4d691342aa107b3b4fb5a167936d123" FOREIGN KEY (merged_customernumber_id) REFERENCES customer_customernumber(id) DEFERRABLE INITIALLY DEFERRED;
--
-- Name: customer_transaction D6e0c79ad7a40ca02054ed28c4d6999c; Type: FK CONSTRAINT; Schema: public; Owner: postgres
--
ALTER TABLE ONLY customer_transaction
ADD CONSTRAINT "D6e0c79ad7a40ca02054ed28c4d6999c" FOREIGN KEY (_transaction_type_messages_id) REFERENCES customer_transactiontype(id) DEFERRABLE INITIALLY DEFERRED;
--
-- Name: customer_transaction D9460b882ac4401f6adf8077475229ed; Type: FK CONSTRAINT; Schema: public; Owner: postgres
--
ALTER TABLE ONLY customer_transaction
ADD CONSTRAINT "D9460b882ac4401f6adf8077475229ed" FOREIGN KEY (fotostream_entry_id) REFERENCES fotostream_fotostreamentry(id) DEFERRABLE INITIALLY DEFERRED;
--
-- Name: customer_transaction cust_storno_from_id_6a48315f632674fa_fk_customer_transaction_id; Type: FK CONSTRAINT; Schema: public; Owner: postgres
--
ALTER TABLE ONLY customer_transaction
ADD CONSTRAINT cust_storno_from_id_6a48315f632674fa_fk_customer_transaction_id FOREIGN KEY (storno_from_id) REFERENCES customer_transaction(id) DEFERRABLE INITIALLY DEFERRED;
--
-- Name: customer_transaction custo_has_storno_id_116b248645f7fd59_fk_customer_transaction_id; Type: FK CONSTRAINT; Schema: public; Owner: postgres
--
ALTER TABLE ONLY customer_transaction
ADD CONSTRAINT custo_has_storno_id_116b248645f7fd59_fk_customer_transaction_id FOREIGN KEY (has_storno_id) REFERENCES customer_transaction(id) DEFERRABLE INITIALLY DEFERRED;
--
-- Name: customer_transaction customer_product_id_428b30409c797b6b_fk_lookbook_productbase_id; Type: FK CONSTRAINT; Schema: public; Owner: postgres
--
ALTER TABLE ONLY customer_transaction
ADD CONSTRAINT customer_product_id_428b30409c797b6b_fk_lookbook_productbase_id FOREIGN KEY (product_id) REFERENCES lookbook_productbase(id) DEFERRABLE INITIALLY DEFERRED;
--
-- Name: customer_transaction customer_t_customer_id_7962b09af88fe147_fk_customer_customer_id; Type: FK CONSTRAINT; Schema: public; Owner: postgres
--
ALTER TABLE ONLY customer_transaction
ADD CONSTRAINT customer_t_customer_id_7962b09af88fe147_fk_customer_customer_id FOREIGN KEY (customer_id) REFERENCES customer_customer(id) DEFERRABLE INITIALLY DEFERRED;
--
-- Name: customer_transaction customer_transacti_store_id_4014f4c86692b54b_fk_stores_store_id; Type: FK CONSTRAINT; Schema: public; Owner: postgres
--
ALTER TABLE ONLY customer_transaction
ADD CONSTRAINT customer_transacti_store_id_4014f4c86692b54b_fk_stores_store_id FOREIGN KEY (store_id) REFERENCES stores_store(id) DEFERRABLE INITIALLY DEFERRED;
--
-- Name: customer_transaction customer_transaction_user_id_3497aca364c6a472_fk_auth_user_id; Type: FK CONSTRAINT; Schema: public; Owner: postgres
--
ALTER TABLE ONLY customer_transaction
ADD CONSTRAINT customer_transaction_user_id_3497aca364c6a472_fk_auth_user_id FOREIGN KEY (user_id) REFERENCES auth_user(id) DEFERRABLE INITIALLY DEFERRED;
--
-- PostgreSQL database dump complete
--
And here is the execution plan:
Merge Join (cost=14112.60..11422372541.38 rows=505000 width=671) (actual time=1351.692..16713328.367 rows=146436 loops=1)
Merge Cond: (customer_customer.id = customer_transaction.customer_id)
Buffers: shared hit=1922626 read=720356, temp read=98686908 written=1364
-> Index Scan using customer_customer_pkey on customer_customer (cost=14096.99..11422109016.84 rows=81175 width=121) (actual time=1342.257..16649665.313 rows=35553 loops=1)
Filter: ((status = 1) AND (NOT (SubPlan 1)))
Rows Removed by Filter: 309213
Buffers: shared hit=156156 read=72783, temp read=98686908 written=1364
SubPlan 1
-> Materialize (cost=14096.57..78342.29 rows=805641 width=4) (actual time=0.007..52.406 rows=356853 loops=161642)
Buffers: shared hit=1667 read=25275, temp read=98686908 written=1364
-> Bitmap Heap Scan on customer_transaction u1 (cost=14096.57..71166.08 rows=805641 width=4) (actual time=147.297..485.822 rows=797943 loops=1)
Recheck Cond: (transaction_type = ANY ('{30,14}'::integer[]))
Heap Blocks: exact=24756
Buffers: shared hit=1667 read=25275
-> Bitmap Index Scan on customer_transaction_transaction_type_customer_id_3eb6f7d0_idx (cost=0.00..13895.16 rows=805641 width=0) (actual time=140.944..140.944 rows=797943 loops=1)
Index Cond: (transaction_type = ANY ('{30,14}'::integer[]))
Buffers: shared hit=1 read=2185
-> Index Scan using customer_transaction_cb24373b on customer_transaction (cost=0.43..252918.54 rows=2144835 width=550) (actual time=0.039..63012.608 rows=2143881 loops=1)
Filter: (date >= '2018-05-11 11:01:43.59853+02'::timestamp with time zone)
Rows Removed by Filter: 1048039
Buffers: shared hit=1766470 read=647573
Planning time: 16.013 ms
Execution time: 16713362.490 ms
Best Answer
Assuming you're after Customers and their Transactions where the Customer has no Type 14 or 30 Transactions ...
You'll want to consider indexes on the customer ID fields in both tables and, for the transactions table, perhaps the customer ID and transaction date.
Other things: