I am working on an API which would need to process large dataset. The query to select records is complex and takes time to complete. I cannot use the pagination using ROW_NUMBER as not all records returned by query will be marked closed. The same records (potentially all but not necessarily) may get reselected by the query.
I am thinking to use global temporary table as working set. Below is my approach:
- API will first create the global temporary table.
- API will execute the query and populate the temp table.
- API will take data in chunks and process it.
- API will drop the table after processing all records.
The API can be scheduled to run at an interval of 5 or 10 minutes. There will not be concurrent instances running, only one instance will run at one time.
Do you see any issues with the usage of global temporary table? Would a permanent table be more appropriate in this scenario?
Best Answer
Using a permanent table (as the commenter suggests) makes it a lot easier to track dependencies. Instead of recreating the table every time, just
TRUNCATE
it.When you implement a pattern like this, you have to make sure you are making progress and not selecting the same records over and over again. One way to do this, is to process records in the order of the primary key on the input table. Here is the pseudocode for doing this:
Of course, the above only works if the
Key
column is ascending. If it isn't you have to maintain a column in the Input table that track whether work has been done or not (and then SELECT TOP the ones where it hasn't)