To be more precise, NOT FENCED
routines run in the same process space as the database manager itself. The engine is written in C, so calling a not fenced routine is just like calling another C function from main()
. This is where all the memory corruption and performance aspects come from: a not fenced routine has access to all the same resources -- memory, files, etc. -- as the database manager process itself.
For a FENCED
routine the database manager launches a separate process (db2fmp
), which in turn executes the routine code. As a result the operating system protection prevents a fenced routine to access any memory area or resource that belongs to the database manager.
SQL routines cannot be fenced, strictly speaking, because they don't "run", but they are even better than not fenced -- they are bytecode that the DB2 runtime engine itself executes, so there's no separate thread or process.
C and C++ routines can be fenced, in which case they execute in separate processes, or not fenced, in which case they are loaded into the database manager process space and called as functions.
Java routines can only be fenced by the fact that they need a separate process, Java virtual machine, to execute. If you declare them as NOT FENCED, the option is quietly ignored.
Having said all that, your only choice between fenced and not fenced is with C/C++ routines. Typically you would run the fenced for safety, changing to the not fenced mode only when you are pretty sure they can't harm the database manager and they need higher performance1.
1 - Performance difference between fenced and not fenced routines comes from the overhead associated with allocating a fenced process, as well as with interprocess communications between the database manager and the not fenced routine. Even then, the fenced process is not created every time a fenced routine is called; a pool of them will be created and reused for such calls. All this means that you might see the benefit of declaring a routine as fenced only if that routine is called very frequently (as in, tens of times per second or more).
Right, I've found the answer to my own question, and thought I'd post it here in case anyone else would ever have the same basic question:
Given that a cluster shard the data across numerous nodes, there is no real guarantee that an AUTO_INCREMENT
primary key will reflect the order in which records were added. Especially when dealing with multiple master replication.
Details can be found here
In essence, the answer to my question boils down to this:
Each node is given a range of available increment-values for the next N inserts. However, you have no real say in which node will insert the next row, and thus you can't be absolutely sure that the next row to be inserted will receive the MAX(pk_id_val) + 1
as a PK value.
A simple diagram to clarify:
+------------+ +------------+
| Node 1 | <==> | Node 2 |
+------------+ +------------+
| PK 1-3 | | PK 4-6 | (Range of available increment vals)
+------------+ +------------+
INSERT INTO cluster_tbl (field) VALUES ('test');
INSERT INTO cluster_tbl (field) VALUES ('test');
INSERT INTO cluster_tbl (field) VALUES ('test');
If all three insert queries are sent to Node 1, then all is fine and dandy, and the PK's will be 1, 2 and 3 respectively. However, if the first query is sent to node 1, the second to Node 2, and the third back to node 1 again, then the PK's will be 1, 4 and 2 respectively.
In which chase a SELECT * FROM cluster_tbl ORDER BY pk_field DESC LIMIT 0,1;
query will serve up the second row that was inserted, and not the third.
A possible solution: create an index on an inserted
field, defined as follows:
`insert` TIMESTAMP NOT NULL DEFAULT CURRENT_TIMESTAMP
And order the query by descending timestamp values:
SELECT *
FROM cluster_tbl
ORDER BY insert DESC
LIMIT 0,1;
That, for me at least, did the trick
Best Answer
We've been doing this for almost five years, and we think that explicitly testing modifications is definitely doable, but it is quite slow. Besides, we cannot easily run such tests concurrently from several connections, unless we use separate databases. Instead, we should test modfications implicitly - we use them to build up at least some of the test data, and verify that our selects return expected results.
I've written an article entitled Close Those Loopholes: Lessons learned from Unit Testing T-SQL, as well as some blog posts
Regarding your question "Is there a treshold in complexity where it gets completely hopeless?", complex modules need tests much more than simple ones.
To simplify maintenance, we generate expected results, and we store them in separate files - that makes a huge difference.