I'm dealing with a HUGE amount of data entry. I need to know the best practice to insert/update table records after new entry inserted in another table.
By huge i mean we get more than 2 million rows a day, it's an ongoing process.
I tried creating a trigger that will insert or update the data depending on a complex logic. But I doubt this is a good approach for doing this, I'm afraid it will be heavy on the database.
I use PostgreSQL 9.1
The current trigger looks like:
DECLARE
AVar integer;
AdateVar timestamp without time zone;
AnameVar character varying(40);
BEGIN
SELECT id
INTO AVar
FROM table1
WHERE ST_DWithin(NEW.position,ST_SetSRID(ST_MakePoint(longitudedecimal,latitudedecimal),4326) ,0.01447534783);
select Adate
INTO AdateVar
from table2
where id = NEW. id
ORDER BY Adate DESC limit 1;
IF (aVar > 0) THEN
select name into AnameVar FROM table1 WHERE id = AVar;
INSERT into table2 (id,name,date) SELECT NEW.id,AnameVar,NEW.timestamp;
ELSE ...
END IF;
RETURN NULL;
End;
EDIT
here is a piece of the trigger header and function, the function is too long. coz as i know the trigger cannot call more than one function! so it became very long and complicated, I posted just one case out of 12:
CREATE TRIGGER ts_trigger
AFTER INSERT
ON table1
FOR EACH ROW
EXECUTE PROCEDURE test_trigger();
declare
Aposition geometry;
Cposition geometry;
Plong double precision;
Plat double precision;
Adate timestamp without time zone;
Cdate timestamp without time zone;
startDate timestamp without time zone;
CnotifDate timestamp without time zone;
AnotifDate timestamp without time zone;
lastmsg timestamp without time zone;
InsideCircle integer;
InsideSquare integer;
insidePoint integer;
distance character varying(40);
-- this variables used to calculate the time in the table3
inAction character varying(40);
diff character varying(40);
days character varying(40);
hours character varying(40);
str character varying(40);
CinAction character varying(40);
AinAction character varying(40);
BEGIN
select time_stamp INTO Adate from table1 where userid=NEW.userid and time_stamp < NEW.time_stamp order by time_stamp desc limit 1 ;
select position INTO Aposition from table1 where userid=NEW.userid and time_stamp < NEW.time_stamp order by time_stamp desc limit 1;
select time_stamp INTO Cdate from table1 where userid=NEW.userid and time_stamp > NEW.time_stamp order by time_stamp limit 1;
select position INTO Cposition from table1 where userid=NEW.userid and time_stamp > NEW.time_stamp order by time_stamp limit 1;
SELECT p.num INTO InsideCircle FROM table3 p WHERE ST_DWithin(Aposition,ST_SetSRID(ST_MakePoint(p.longitudedecimal,p.latitudedecimal ), 4326) ,0.02171302174) ORDER BY ST_Distance(ST_SetSRID(ST_MakePoint(p.longitudedecimal,p.latitudedecimal ), 4326),Aposition) limit 1;
SELECT p.num INTO InsideSquare FROM table3 p WHERE ST_DWithin(NEW.position ,ST_SetSRID(ST_MakePoint(p.longitudedecimal,p.latitudedecimal ), 4326) ,0.02171302174) ORDER BY ST_Distance(ST_SetSRID(ST_MakePoint(p.longitudedecimal,p.latitudedecimal ), 4326),NEW.position) limit 1;
SELECT p.num INTO insidePoint FROM table3 p WHERE ST_DWithin(Cposition,ST_SetSRID(ST_MakePoint(p.longitudedecimal,p.latitudedecimal ), 4326) ,0.02171302174) ORDER BY ST_Distance(ST_SetSRID(ST_MakePoint(p.longitudedecimal,p.latitudedecimal ), 4326),Cposition) limit 1;
-
IF (InsideCircle >0 and (InsideCircle =InsideSquare or InsideSquare is null))THEN
select startDate INTO startDate from myTable where id=NEW.userid and num = InsideCircle and startDate =Adate;
IF (InsideSquare >0)then
if (Cdate is not null )then
if (insidePoint is null)THEN
diff = NEW.time_stamp -startDate;
str= split_part(diff,' ',2);
IF(str = '')then
hours= split_part(diff,':',1);
days = '0';
ELSE
str= split_part(diff,' ',3);
IF(str = '') then
hours ='00';
days = split_part(diff,' ',1);
ELSE
hours= split_part( split_part(diff,' ',3),':',1);
days = split_part(diff,' ',1);
END IF;
END IF;
inAction = days || ',' ||hours;
Update myTable SET notifDate = NEW.time_stamp , time_inAction=inAction WHERE id=NEW.userid and num =InsideSquare ;
END IF;
END IF;
Return Null;
END;
Best Answer
The fragment you posted so far can be simplified to:
It is very inefficient to run separate queries with assignments in plpgsql instead of a single query.
Basic type names like
date
ortimestamp
lead to confusing error messages and other conflicts,id
andname
are the worst possible column names, non-descriptive and with countless duplicates all over your tables. Revisit your naming convention ...You probably do not need any of this. Inserting millions of rows shouldn't be handled by triggers which are fired for each row. Extremely expensive. You need a set-based solution without triggers.
Probably best to
COPY
to a tmeporary staging table andINSERT
/UPDATE
from there. But basic information is missing.Related answers:
Optimizing bulk update performance in PostgreSQL
How to update selected rows with values from a CSV file in Postgres?