I think you're asking how to impliment a solution you'e already decided on for a more general problem you don't describe. If you were to outline the actual problem that this is supposed to solve you might get better suggestions about how to solve it.
Working within the very limited information provided:
Update: I found your other question, which you really should've linked to. You seem to be trying to roll your own message queue. Don't do that. Read these:
Have I convinced you that you shouldn't try to do this yourself yet? Look into:
Some of what you want isn't available in current PostgreSQL versions. For example:
INSERT
s should not do any query in that table or any kind of unique index. INSERTs shall just locate the best page for the main file/main btree for this table and just insert the row in between two other rows, ordered by ID.`
That'd require an index-organized table, which PostgreSQL doesn't have yet. The closest you'll get would be a one-column table with a PRIMARY KEY
. With regular VACUUM
on PostgreSQL 9.2 you'd be able to use index-only scans to access it most of the time.
As for allowing duplicates, you don't really seem to want to permit them at all, you're just saying you want to work around concurrency issues by temporarily permitting them.
You can remove such duplicates during INSERT
so the table its self doesn't need to permit them. However, that'll cause issues with:
- INSERTs will happen in bulk (about 1000 per transaction) and must not fail, expect for disc full, etc. There must not be any chance for deadlocks.
... assuming that those inserts occur concurrently from multiple transactions. You'll have races between the checks for existence and the insert that can cause insert batches to fail and have to be re-tried.
I suspect that your best bet is to have a one-column table without a PRIMARY KEY
. Just create an ordinary b-tree index on it, and leave the table without a PRIMARY KEY
. Since it genuinely has no primary key (the only column may have duplicates) this is entirely reasonable.
(BTW, given that SQL is supposedly all about sets, it astounds me how awful it is at "add this entry to the set if not already present").
That's a great question.
And there are good answers.
The engine definitely will use the index even if you don't use every key column.
That's especially so if they are in order, as you are talking about.
(can anyone else speak to different orders of key columns?)
You will benefit just fine from selecting just on the first column alone as a key, or multiple columns.
What will make a difference - for any index - is staying inside the INCLUDEd columns.
No matter how many key columns you use in your Where, the performance hit for having to go back to the primary key for additional columns can be huge as it doubles the "operations".
When it comes to dealing with performance vs. size, you have the same problem as with any index.
Since you know you want the same columns returned in all cases, if you are READ focused, you will probably want to the index with all 6, if you INCLUDE everything.
It will certainly save you db size compared to making both indexes.
On WRITE, you obviously have a bigger burden with a larger index. That is a significant additional amount of sorting.
If you do just one row inserted at a time, maybe it won't hardly matter at all.
If you do bulk inserts, you'll definitely want to test the two indexes to see the write performance for your actual inserts.
Best Answer
Indexes have more than one use. The primary use (in most cases) is to quickly identify the rows that meet a query's conditions. When doing this, the leftmost columns of the index are considered.
Say you have a table,
users
, with 100 columns, a mix of dates, strings, and numbers. Five of the columns in the table are:id
,username
,country
,acct_expir_date
, andaccess_level
(an int).id
is the primary key, both it andusername
are unique.Let's say there's a (non-clustered) index on
country
andusername
. The query:could use this index to identify all users from that country, locate their records in the actual table, and return the requested data.
However, the query:
would not be able to use the index - the leftmost column in the index,
country
, is not a part of theWHERE
clause.That said, indexes do have a second use. If all the columns in a query exist in an index, then the query can ignore the actual table, and use the index as if it were the table itself. this is called a covering index.
Let's assume we now have another non-clustered index on our table, on the columns
acct_expir_date
,username
,id
, andaccess_level
, in that order. And, we have the query:This query would probably use our index even if the
SELECT
column list was*
(all columns); however, since our index contains all the columns in the query, theusers
table itself will not be touched - the query will simply use the information in the index to provide the requested data.So, in your case, the columns in the index up to the primary key are there to be used for the index's first use - locating records. The remaining columns may be there for the second use; providing a covering index.
NOTE: as stated by Balazs Papp in the comments, columns to the right of the primary key in the index can be used to help identify specific rows in a query where a range of primary key values are of interest; however, each individual row in the index would have to be checked, since the rows cannot be sorted on anything to the right of the primary key (since it's unique). Since the index is already in memory, narrowing down the actual selection of rows further via the index would indeed probably be faster than loading the full records and then narrowing things down; however, if the extra columns are going to be used in searches regularly, they might be more useful to the left of any unique columns, where they can narrow things down more quickly.
UPDATE: tables that can be loaded into memory in their entirety don't generally need covering indexes. If your table is not all that large, then any performance gain from the indexes may be lost in maintaining them and storing them. Consider the size of the table overall, the size of the rows of the table vs. the rows of the index, and how often queries are performed that might use the index, vs. how often the data in those columns is added or updated. If the indexes seem like they might be causing more harm than good, you may want to consider dropping them. However, do so with great care; if there is some critical query they're boosting performance on, removing them could cause significant issues. I would remove no more than one per month or so; over the course of a month of normal business, whatever use the index might have should be encountered. And, only do so with the knowledge of key people in your department; not just your boss, but someone who deals with reports of performance issues, so any problems that may arise can be resolved. Also, keep in mind how long it would take to recreate the index; there may actually be cases where creating the index, using it for one monthly report, then dropping it would still give a performance benefit, without the maintenance and (long-term) storage costs.
UPDATE 2: For the sake of (improved) completeness, as joanolo mentions in the comments, indexes also have a third general use - they do sort the values in indexes columns by those columns. If that sort matches your
ORDER BY
clause, the engine can avoid having to sort the records by retrieving them in the necessary order to begin with. This isn't particularly relevant to the OP's question, since a unique column means that any values to the right in the index aren't being sorted (because the position of the rows is fixed once the index reaches a unique column), but it is true.