Assuming your RDBMS supports windowing functions, something like this will give you what you are asking for:
select *
from( select dataentry.*,
row_number() over (partition by fname, mname, lname order by title) n
from dataentry )
where n=1;
But note that I have arbitrarily chosen to order by title
when discarding 'duplicates' - and even more arbitrarily chosen to overlook that if two matching rows also have matching title
s the database will order them in an undefined and possibly unpredictable way. You will need to adjust that to meet your requirements.
There are a few different strategies you could take, where on one end you pursue aggressive normalization to, on the other end, full denormalization. The full denormalization would be equivalent to your second example where all relevant info simply ends up in the transaction table without references to other tables.
Full Normalization
So, to completely normalize, you would still want a Categories
table, but you want to even eliminate the storage of redundant information in this table, so you would need a CategoryList
table and a SubCategoryList
table as
CategoryList
id category
----------------
1 Food
2 Household
etc...
and
SubCategoryList
id subcategory
----------------
1 Work Lunch
2 Fast Food
3 Grocery Store
4 Mortgage
5 Repairs
etc...
You could then construct your category table from these two tables as
Categories
id category_id subcategory_id
----------------------------------
1 1 1
2 1 2
3 1 3
4 2 4
5 2 5
Treatment of NULL
subcategories can easily be handled by either 1) simply placing a NULL entry for the subcategory_id
column in the appropriate row of the Categories
table, or 2) adding a subcategory entry id, subcategory
where the subcategory field is NULL
.
Last but not least you would add a foreigh key reference from your Transactions
table to the appropriate id
in the Categories
table.
Does it really need to be so normalized?
Well, in my opinion, no it doesn't. I've heard a quote, though I can't remember who spoke it, but it basically goes "Normalize until it hurts, denormailize until it works." Especially in the case where you don't have a lot of categories, the fully normalized design may be a little bit of overkill.
What might simply make more sense would be to keep the above mentioned CategoryList
and SubCategoryList
tables to enumerate your types, but skip making the separate Categories
table, and then simply have your Transactions
table referencing the CategoryList
and SubCategoryList
tables as
id txdate amount category subcategory account ...
--------------------------------------------------------------
1 6/25/15 15.25 1 2 cash ...
This way, you save on storage, and you can easily update/modify any category or subcategory entry in the list without needing to modify your entire Transactions
table. Further, you can simply permit the subcategory
column of the Transactions
table to permit NULL
entries, if need be.
Hope this helps!
Best Answer
You can accomplish that using
ROW_NUMBER()
.Take a look at the example below:
But I'd prefer doing that in the application layer.