Checking existence and syncing data with millions of rows

data synchronizationdatabase-design

I am developing sms based service platform. In India, we have two different types of routes to send sms – promo & trans. Sms sent via promo route has to be checked against nationally maintained database (DND), which currently holds 231450000 mobile numbers.

My question is

  1. What is the efficient way to store this data (DND).
  2. What is the efficient way to check, whether the number exists within these numbers or not.
  3. Twice in a week, a delta dump of these numbers (DND) is provided by telecom authority. How to sync the dump with existing db.
  4. Should I use mysql to store this data, which will consist of mobile numbers, or any other in-memory storage?

EDIT:

  1. Each row of data consists of mobile number alongwith few preferences like area code, user preference etc
  2. The data dump provided twice a week contains only additions / deletions and not the whole data.
  3. Lookup frequency can be very large. Actually every SMS sent from the platform has to first make a lookup against the data (DND). So if the platform is handling 100 sms per second, it has to make 100 lookups per second against this data. The lookup simply checks existence of a particular mobile number against the data(DND) in database.
  4. The new delta data dump (additions / deletions), has to be synced in few hours only, as the dump is provided by telecom authority in the day and the same has to go live by midnight 00:00 hours. So the total time to sync data will be around 3 hours.

Sample DND dump

Sample of DND dump provided. Here <code>A</code> under <code>Opstype</code> means record has to be added, while as <code>D</code> means record has to be deleted.

Sample of DND dump provided. Here A under Opstype means record has to be added, while as D means record has to be deleted.

Best Answer

I would not design a system to do lookups on the fly. Instead I would design the system to validate if a given number is on the DND list at the time the number is added to your system, and weekly (right after the DND list is updated). Then you don't need to check the number each time you send an SMS.

If you follow that design pattern, you'll only need to interact with the DND table very infrequently, and at times that don't impact the speed of sending SMS messages.

If I was doing this, I'd likely use Microsoft SQL Server, and I'd consider a design similar to the one below. It should be trivial to adapt these SQL statements into whatever flavor of SQL you want to use.

The table to hold the DND rows:

CREATE TABLE dbo.DND
(
    TelephoneNo BIGINT NOT NULL
        CONSTRAINT PK_DND
        PRIMARY KEY CLUSTERED
    , PhoneType INT NOT NULL
    , ServiceAreaCode INT NOT NULL
);

The table to hold your subscribers phone numbers:

CREATE TABLE dbo.Subscribers
(
    TelephoneNo BIGINT NOT NULL
        CONSTRAINT PK_Subscribers
        PRIMARY KEY CLUSTERED
    , IsOnDND BIT NOT NULL
    , DNDLastChecked DATETIME NOT NULL
);

This query could be used to update the Subscribers table weekly, right after the DND list is updated.

UPDATE dbo.Subscribers 
SET IsOnDND = 1
    , DNDLastChecked = GETDATE()
WHERE EXISTS (
    SELECT 1
    FROM dbo.DND
    WHERE DND.TelephoneNo = Subscribers.TelephoneNo
    );

I would handle updating the DND list from the weekly change list using a scheduled SQL Server Agent Job.

The table to hold the weekly changes might look like:

CREATE TABLE dbo.DNDChanges
(
    TelephoneNo BIGINT NOT NULL
        CONSTRAINT PK_DNDChanges
        PRIMARY KEY CLUSTERED
    , Opstype CHAR(1) NOT NULL
    , PhoneType INT NOT NULL
    , ServiceAreaCode INT NOT NULL
);

The scheduled job would perform the following steps:

TRUNCATE TABLE dbo.DNDChanges;

--insert rows into the DNDChanges table using SSIS, BCP, etc

/* 
    Here we remove rows from the DND table where the number has 
    been deleted (Opstype = 'D')
*/
DELETE 
FROM dbo.DND
WHERE EXISTS (
    SELECT 1
    FROM dbo.DNDChanges
    WHERE DNDChanges.Opstype = 'D'
        AND DNDChanges.TelephoneNo = DND.TelephoneNo
    );

/*    
    This updates the PhoneType, and ServiceAreaCode from the DND 
    incremental change list.
*/
UPDATE dbo.DND
SET DND.PhoneType = DNDChanges.PhoneType
    , DND.ServiceAreaCode = DNDChanges.ServiceAreaCode
    , DND.TelephoneNo = DNDChanges.TelephoneNo
FROM dbo.DND
    INNER JOIN dbo.DNDChanges ON DND.TelephoneNo = DNDChanges.TelephoneNo;
    /* no need to add OPSType = 'A' to the WHERE clause
       since we've already removed those rows from the
       DND table */

/*
    Add new rows from the DND incremental change list
*/
INSERT INTO dbo.DND (
    TelephoneNo
    , PhoneType
    , ServiceAreaCode
    )
SELECT *
FROM dbo.DNDChanges 
WHERE NOT EXISTS (
    SELECT 1
    FROM dbo.DND
    WHERE dbo.DND.TelephoneNo = DNDChanges.TelephoneNo
    );