A few things you will have to think about before starting:
How do you decide if a title is the same as a previous one? Say there is an article 'A god table for storing news items'. Your script discovers and stores it in your DB. Later the editors of that news site notice the typo and correct the title as 'A good table for storing news items'. What do you do in such a case? (The same applies to the content.)
If you want to do something with your data after storing, you will need a unique identifier in the table. Using this, you will be able to point to an item from other tables.
You may find useful to store the URL of the articles. However, from time to time these can change just like titles do.
Short news are typically found in many places on the internet. Both their title and contents are the same (like in the case of short news from news agencies) - are these the same or not? If you store the URL, then these are different. If not, these are the same. What to do with these?
If you decide that you will store these from each site you process, than you will find useful to identify your source sites. (Be careful, the URL will look like 'http://dba.stackexchange.com/questions/23568/how-to-design-a-good-table-for-real-time-news-extraction' and the site will be 'dba.stackexchange.com'. Here the latter forms a part of the former, but this is not necessarily true for other sites.) In this case, you will have to create another table where you list all the source sites. This table will be referenced from your other table.
So far we have the following database structure:
Table "news_item"
+--------------+------------+--------------+---------------------+-----+-----------+
| news_item_id | title | content | time_created | URL | source_id |
+--------------+------------+--------------+---------------------+-----+-----------+
| 1 | First item | some content | 2012-09-03 22:53:14 | ... | 1 |
. .
. .
. .
+--------------+------------+--------------+---------------------+-----+-----------+
Table "source"
+-----------+-----------------------+
| source_id | name |
+-----------+-----------------------+
| 1 | dba.stackexchange.com |
. .
. .
. .
+-----------+-----------------------+
You can write your SQL script only after answering at least the above questions.
Perhaps as a third alternative you may want to try
X = RIGHT(CAST(time as CHAR(14)),2);
Here is a fourth alternative requiring some project management decisions: Instead of storing date('YmdHis'))
you should store date('U'))
. The U
stands for seconds since Jan 1, 1970. Then, your formula would be simpler
X = MOD(time,60);
If you must keep what you have, use the BENCHMARK function
SET @dt=20130416152057;
SELECT SUBSTR(CAST(@dt AS CHAR(14)),9,2);
SELECT MOD(FLOOR(@dt / 10000), 100);
SET @iterations=1000000;
SELECT BENCHMARK(@iterations,SUBSTR(CAST(@dt AS CHAR(14)),9,2));
SELECT BENCHMARK(@iterations,MOD(FLOOR(@dt / 10000), 100));
SET @iterations=10000000;
SELECT BENCHMARK(@iterations,SUBSTR(CAST(@dt AS CHAR(14)),9,2));
SELECT BENCHMARK(@iterations,MOD(FLOOR(@dt / 10000), 100));
I tried it on a random DB Server 1 millions times and then 10 million times
mysql> SET @dt=20130416152057;
Query OK, 0 rows affected (0.00 sec)
mysql> SELECT SUBSTR(CAST(@dt AS CHAR(14)),9,2);
+-----------------------------------+
| SUBSTR(CAST(@dt AS CHAR(14)),9,2) |
+-----------------------------------+
| 15 |
+-----------------------------------+
1 row in set (0.00 sec)
mysql> SELECT MOD(FLOOR(@dt / 10000), 100);
+------------------------------+
| MOD(FLOOR(@dt / 10000), 100) |
+------------------------------+
| 15 |
+------------------------------+
1 row in set (0.00 sec)
mysql> SET @iterations=1000000;
Query OK, 0 rows affected (0.00 sec)
mysql> SELECT BENCHMARK(@iterations,SUBSTR(CAST(@dt AS CHAR(14)),9,2));
+----------------------------------------------------------+
| BENCHMARK(@iterations,SUBSTR(CAST(@dt AS CHAR(14)),9,2)) |
+----------------------------------------------------------+
| 0 |
+----------------------------------------------------------+
1 row in set (0.15 sec)
mysql> SELECT BENCHMARK(@iterations,MOD(FLOOR(@dt / 10000), 100));
+-----------------------------------------------------+
| BENCHMARK(@iterations,MOD(FLOOR(@dt / 10000), 100)) |
+-----------------------------------------------------+
| 0 |
+-----------------------------------------------------+
1 row in set (0.40 sec)
mysql> SET @iterations=10000000;
Query OK, 0 rows affected (0.00 sec)
mysql> SELECT BENCHMARK(@iterations,SUBSTR(CAST(@dt AS CHAR(14)),9,2));
+----------------------------------------------------------+
| BENCHMARK(@iterations,SUBSTR(CAST(@dt AS CHAR(14)),9,2)) |
+----------------------------------------------------------+
| 0 |
+----------------------------------------------------------+
1 row in set (1.50 sec)
mysql> SELECT BENCHMARK(@iterations,MOD(FLOOR(@dt / 10000), 100));
+-----------------------------------------------------+
| BENCHMARK(@iterations,MOD(FLOOR(@dt / 10000), 100)) |
+-----------------------------------------------------+
| 0 |
+-----------------------------------------------------+
1 row in set (4.02 sec)
mysql>
On the particular hardware I ran this on, your first expression is about 2.67 times faster.
Give it a Try !!!
Best Answer
You can use the
NOW()
function in mysql. Note, this assumes thattermDate
is the later date: