Mysql – Storing prices with price history and different specificity (global, user group, single user) in MySQL database

MySQLperformancequery-performancesubquery

I need to store prices with their history. A price can have different specificity. It can refer to a single user (most specific), group (less specific) and product price (global – least specific). There are also two different types of prices (product and delivery price). Now I am having trouble with this. I made it to work, but I somehow feel bad about it when I write queries (they tend to get long and complicated).

I designed two tables like this (they are same):

base_prices

| id | referent_id | product_id | valid_from | valid_until | amount | pieces_per_lot |

delivery_prices

| id | referent_id | product_id | valid_from | valid_until | amount | pieces_per_lot |

For a user specific price referent id would be a positive integer, matching users id.
For a group specific price referent id would be a negative integer, matching group id (I made group ids negative integers).
For global product price, referent id will be NULL.
This way, when I query price for a specific item and specific user and his group all I need to do is filter table by item_id and order results by referent_id column in descending order. This way, user specific are on top, group after and global are last.

Date ranges valid_from-valid_until cannot overlap for two rows.

Now a query to get current prices would look something like this:

SELECT 
    * 
FROM 
    base_prices AS outer 
WHERE 
    id = (
        SELECT 
            id 
        FROM 
            base_prices AS inner 
        WHERE 
                NOW() BETWEEN valid_from AND valid_until
            AND inner.item_id = outer.item_id
            AND (
                    inner.referent_id = # HERE GOES USER ID
                OR  inner.referent_id = # HERE GOES GROUP ID
                OR  inner.referent_id IS NULL
            )
        ORDER BY
                inner.referent_id DESC # FOR SPECIFICITY

Thing that bugs me is that I always have to write this kind of sub-query to get items, and this doesn't seem as a good design to me. When data starts to build up, this queries will cost much more. Also, I had to write a few triggers to keep data integrity during inserts and updates.

My opinion is that current prices and historical prices need to be separated, but I was in a hurry, so this is what came up. I was thinking about scheduling events during inserts based on valid_from column that will move data from historical prices table to (new) current prices table.

I am inexperienced, so this seems kind of weird to me. It might be OK, though, but I would like to see some suggestions and opinions.

If I left out something or there is not enough info, leave a comment.

Best Answer

There are a couple of ways you could separate the current values from the historic ones: you could simply include a boolean field that is true for the latest price and filter on that, or you could keep current prices in a separate table. Both options involve a little extra work to maintain integrity but would make a query for current prices more efficient. This doesn't remove the need for the sub-query to pick the first row that matches the user/group/NULL condition, though fewer rows will be scanned in processing both the inner and outer queries.

The following might be more efficient as a straight query if only the current prices are in the base table (add the is_latest=true check or your current date filter to the where and on clauses if you keep the historic values in the same table):

SELECT CASE WHEN usermatch.id IS NOT NULL THEN usermatch.field1 WHEN groupmatch.id IS NOT NULL THEN groupmatch.field1 ELSE nullmatch.field1 END AS field1
     , CASE WHEN usermatch.id IS NOT NULL THEN usermatch.field2 WHEN groupmatch.id IS NOT NULL THEN groupmatch.field2 ELSE nullmatch.field2 END AS field2
     (... and so on ...)
     , CASE WHEN usermatch.id IS NOT NULL THEN usermatch.fieldN WHEN groupmatch.id IS NOT NULL THEN groupmatch.fieldN ELSE nullmatch.fieldN END AS fieldN
FROM   base_prices AS nullmatch
LEFT OUTER JOIN 
       base_prices AS groupmatch ON groupmatch.id=nullmatch.id AND nullmatch.referent_id = @GroupIDHere
LEFT OUTER JOIN 
       base_prices AS usermatch ON usermatch.id=nullmatch.id AND nullmatch.referent_id = @UserIDHere
WHERE nullmatch.referent_id IS NULL

This will only work if you always have a row for each product where referent_id is null, otherwise you could try:

SELECT CASE WHEN usermatch.id IS NOT NULL THEN usermatch.field1 WHEN groupmatch.id IS NOT NULL THEN groupmatch.field1 ELSE nullmatch.field1 END AS field1
     , CASE WHEN usermatch.id IS NOT NULL THEN usermatch.field2 WHEN groupmatch.id IS NOT NULL THEN groupmatch.field2 ELSE nullmatch.field2 END AS field2
     (... and so on ...)
     , CASE WHEN usermatch.id IS NOT NULL THEN usermatch.fieldN WHEN groupmatch.id IS NOT NULL THEN groupmatch.fieldN ELSE nullmatch.fieldN END AS fieldN
FROM   users
CROSS JOIN 
       base_prices AS nullmatch
LEFT OUTER JOIN 
       base_prices AS groupmatch ON groupmatch.id=nullmatch.id AND nullmatch.referent_id = users.groupid
LEFT OUTER JOIN 
       base_prices AS usermatch ON usermatch.id=nullmatch.id AND nullmatch.referent_id = users.id
WHERE users.id = @UserIDHere
AND   nullmatch.referent_id IS NULL

In both cases be careful using this construct in a view because further filtering by the fields brought out via those case statements will not be able to use any indexes you may have defined.

Obviously the case statement per field is a bit of faf but it should stop the query runner searching the table once for every row returned as it might with the sub-query arrangement.

In other databases you might be able to use windowing functions (like ROW_NUMBER) to do this more conveniently and efficiently, but mySQL does not support those to my knowledge.

Related Solutions

Mysql – Slow performance of MySQL Join Query

Please provide SHOW CREATE TABLE; the explain is useless without it.

OR is a performance killer in many contexts.

( p.pricelist = "name_abc" AND p.iln = "sellerID_123" ) OR ( p.pricelist = "name_def" AND p.iln = "sellerID_456" ) OR ...

Turn that into

JOIN ( SELECT id FROM p WHERE 
( p.pricelist = "name_abc" AND p.iln = "sellerID_123" ) OR 
( p.pricelist = "name_def" AND p.iln = "sellerID_456" ) OR ... ) x ON x.id = foo.id

Also needed (on p):

INDEX(pricelist, iln, id)

(With the CREATEs, I could be more specific.)

The idea behind this "trick" is to move the costly work of the OR into a subquery that returns the necessary ids. Plus the INDEX makes it so that it can do all that work in the INDEX.

Mysql – Selecting minimum value using a subquery

Modifying slightly your second query, will give you both the merchant id and the lowest price (over all products that pass the conditions - I guess that's what you want):

SELECT p.p_m_id, MIN(p_price) AS min_p_price 
FROM tgmp_affiliates ga 
JOIN tgmp_prices p 
    ON ga.a_code = p.p_gtin 
        AND ga.a_code > '' 
JOIN tgmp_merchants m 
    ON m.m_id = p.p_m_id 
WHERE ga.site_id = '34' 
    AND p.site_id = '34' 
    AND ga.a_parent = '25573' 
    AND p.p_type = 'games' 
    AND m.m_hide = 0 
GROUP BY p.p_m_id ;

Then you can join this - as a derived table - to all the tables that you need data from in the results:

SELECT
    m.*, p.*, ga.*                     -- whatever columns you want  
FROM tgmp_affiliates ga 
JOIN tgmp_prices p 
    ON ga.a_code = p.p_gtin 
        AND ga.a_code > '' 
JOIN tgmp_merchants m 
    ON m.m_id = p.p_m_id 
JOIN
      ( SELECT p.p_m_id, MIN(p_price) AS p_price 
        FROM tgmp_affiliates ga 
        JOIN tgmp_prices p 
            ON ga.a_code = p.p_gtin 
                AND ga.a_code > '' 
        JOIN tgmp_merchants m 
            ON m.m_id = p.p_m_id 
        WHERE ga.site_id = '34' 
            AND p.site_id = '34' 
            AND ga.a_parent = '25573' 
            AND p.p_type = 'games' 
            AND m.m_hide = 0 
        GROUP BY p.p_m_id 
      ) AS tmp
    ON  tmp.p_m_id = p.p_m_id 
    AND tmp.p_price = p.p_price
WHERE ga.site_id = '34' 
    AND p.site_id = '34' 
    AND ga.a_parent = '25573' 
    AND p.p_type = 'games' 
ORDER BY p.p_price ;