If you have an index on LastLoggedIn
, deleting a few tens of thousands of records should be fast without needing to partition. I ran a small test on MS SQL:
CREATE TABLE Temp.Temp
(
UserID INT NOT NULL PRIMARY KEY,
LastLoggedIn DATETIME2(0) NOT NULL INDEX,
Dummy1 VARCHAR(30),
Dummy2 INT
)
I insert ~57 M dummy records, then selected a LastLoggedIn
date such that 35,000 records were at least that old. Deleting these records took less than a second. This is on a four-core machine with 128 GB of memory, under light load; your mileage may vary.
Of course, if you run this purge only every few weeks, you may have millions of records to delete rather than thousands (300 K records took seven seconds for me; 1.3 M records took 30 seconds). But in that scenario, it's part of your monthly maintenance window rather than daily upkeep, and a brief delay should be acceptable.
If you get poor performance with just an index, then sure, consider partitioning. As others have pointed out, there will be a significant cost to UPDATE
operations, a greater overhead than maintaining an index. You will have many more updates than purges, and the latter can happen during relatively idle periods, so I would worry about the updates more.
If you need always-on uptime, and the delete locks the table for an unacceptable length of time, that might justify using partitioning, but even there I'd look for alternatives. You could SELECT
a list of old users and then delete them one at a time (confirming their LastLoggedIn
dates first of course) with a cursor, for example.
I'm an MS SQL guy, so my apologies if any of this is incorrect for PostgreSQL.
What I would do is to create a table with the stock_id (that can be the alphanumeric code or a integer), the timestamp of the measurement and the current value. That is your entry data, 3 columns.
From that point you can add columns for calculations (the difference absolute or percent) with the previous value. Having all in the same table will simplify the model and ease your queries. Try to create a date (not timestamp) column and create a partition by it. It may lighten a bit the access to the table as long as you set it in your queries.
Best Answer
Specific answers
First of all, here is my personal approach to your specific questions:
No, I do not recommend you to proceed that way. Since you have the chance to make use of a relational database management system (RDBMS), I personally consider that it would be highly convenient and reasonable to design and implement your database in a relational fashion. I am convinced that there is no need to use a format as limited as CSV when one can take advantage of the robustness of relational tools. If you opt for the CSV method, you would be adding unnecessary complexity to your project.
Pros. CSV format has its use in certain contexts, e. g., supplying information interchange between applications that handle data in incompatible formats. Another feature of CSV is that it is easily modified by spreadsheet aplications such as MS Excel or LibreOffice Calc. Anyway, due to the fact that you are going to use a RDBMS, I do not think any of these facts is relevant in terms of database design or storage in a relational platform. If you come across the need to deliver some of your data in CSV format in the future, then you can export your structured data to a comma delimited file without undermining your database soundness.
Cons. As you may already know, by storing CSV data in a single column, you would be breaking First Normal Form. This way, having unnormalized data (nonsimple or multivalued columns), you lose many of the essential technical advantages that are provided by RDBMS which are firmly grounded in relational theory. Since many people have rightly enumerated the abundant problems that arise by storing data in CSV format in an RDBMS, I will only mention that you would be putting at risk the data integrity (which is widely considered one of the most important properties of a database), the performance of your system would be substantially diminished and you will end up re-implementing in your own application many of the capabilities that RDBMS offer natively, which have been vastly optimized over the years.
Yes, to me that is the best approach and can be easily implemented in a relational system by means of normalization.
As I understand your specifications,
MatchGender
represents the gender of potential partners with which a givenUser
is interested in getting in touch. So, the assertionsa User seeks one-to-many MatchGenders
anda MatchGender is sought by one-to-many Users
are business rules of particular importance in this specific context.If the above is correct, then there is a many-to-many relationship between
User
andMatchGender
, and this kind of scenarios are very common in relational database development. I will model this aspect and offer an implementation example below, in my suggestion.Pros. You can expect predictable results since your database will be built on relational principles. You can take full advantage of RDBMS capabilities, such as aggregate functions, JOINs, referential integrity, etc. You can prevent data redundancy and full table scans. Your database structure will be more flexible for future changes or extensions. You can deliver database performance that largely meets the needs of your clients. Your data retrieval will be straightforward.
Cons. At first glance, the relational approach may seem a bit cumbersome, since it requires the database designers and programmers to be more disciplined but, as the project grows, that discipline will pay off.
Other relevant aspects of your scenario
I want to describe some other points about my understanding of your situation. This way, I think that the statements
a User pursues one-to-many Hobbies
anda Hobby is pursued by one-to-many Users
are important business rules as well. So, there is also a many-to-many relationship between the entitiesUser
andHobby
, and this also has to be modelled and implemented.I assume that you will have a set of
Hobbies
that will be inserted into the database by some sort of system administrator. If so, aHobbyCode
that uniquely identifies a given occurence of aHobby
may be defined and you can set it as the meaningful PRIMARY KEY of such table.Modeling and implementation suggestion
Therefore, to deal with your situation, I propose the following IDEF1X data model:
And then, the next
DDL
can be derived and physically implemented with ease from such data model: