First, let me address your concern about data redundancy. I agree with you that the second schema is more likely to reduce the redundancy, and is probably closer to what I would go with. One thing to be aware of, though, is this will be unique search terms. So unless you have some way of normalizing data, misspellings will get through and be viewed as different search terms.
Also, I would switch the way you link the searchTerm to the date searched, so that dateofSearchTerm links to the search term (not the other way around).
CREATE TABLE SearchTerm (
ID INTEGER PRIMARY KEY AUTOINCREMENT NOT NULL,
SearchTerm VARCHAR( 255 ) NOT NULL,
Hit INTEGER NOT NULL,
);
CREATE TABLE DateofSearchTerm (
DateID INTEGER PRIMARY KEY AUTOINCREMENT NOT NULL,
searchID INTEGER NOT NULL,
Date DATE NOT NULL
);
It would also be a good idea to make searchID a foreign key that references searchTerm.ID
to ensure there are no orphan rows (rows in DateofSearchTerm
that do not have a matching parent in SearchTerm
).
As for your anecdotal story about your friend, the biggest thing I see you missing in your designs are indexes. A table 3-million strong should have no problem running quickly with proper indexes.
Using your design, you would want an index on the searchTerm.searchTerm
column, and on DateofSearchTerm.date
.
I think if you follow the advice to make searchID
a foreign key, that will also become a proper index. I don't use SQL Server, so I don't know if foreign keys require an index, so make sure that it does.
Indeed, the single row accounting schema proposed allows to do proper double entry accounting (to always specify the account debited and credited) without introducing the redundancy of the "amount" data.
The one row schema gives you an implementation of a double entry that balances by construction, so that it is impossible to ever "loose balance". The machine may recompute the ledgers on the fly.
You have to do 2 select instead of one to retrieve a ledger.
Note, besides transaction split, there are other transactions such as foreign exchange might end up with 2 records instead of 4. It all depends if you would denormalize a bit just enter 4 transactions with similar description.
You may prevent the entry or modification of any transaction to maintain an audit trail, this is required if you want to be able to audit the transaction log.
It appears that in the thread above, for the CPA, "fully normalized" appears to mean rules recognized by all accountants, whereas for the programmer it has a diferent meaning, that there is no derived or redundant data stored.
All there is to accounting data is a set of transactions which give amount and the accounts from which and to which they flow, along with their date, some description (and other attachements). Ledgers and balances are simple views derived from this transactional data by doing sums.
Best Answer
It depends on amount of data and how it will be used. You can have one table for with flag for intraday/COB datasets, then queries that would filter on those flags. After that you'll probably use
EXCEPT
to get unprocessed transactions. Then you need to keep this data for 2-5-7 years. There may be more requirements. It is OK if you have enough capacity and the only use for this table is one-off reconciliation and then storing data.However, if you need to run reports on COB data, join it to dimensions, do more reconciliation, enrich with more attributes, send to downstream systems, then you would go for a separate table.