Assuming you are using SQL Server, you could adapt this to work:
CREATE TABLE [dbo].[IP2Location](
[ID] [int] NOT NULL CONSTRAINT PK_IP2Location PRIMARY KEY CLUSTERED IDENTITY(1,1) ,
[ip] [bigint] NULL,
[country_code] [varchar](2) NULL,
[country_name] [varchar](255) NULL,
[region_name] [varchar](255) NULL,
[city_name] [varchar](255) NULL,
[latitude] [float] NULL,
[longitude] [float] NULL,
[zip_code] [varchar](255) NULL,
[time_zone] [varchar](255) NULL,
[isp] [varchar](255) NULL,
[domain] [varchar](255) NULL,
[net_speed] [varchar](255) NULL,
[idd_code] [varchar](255) NULL,
[area_code] [varchar](255) NULL,
[IP2LocationTimeStamp] [timestamp] NOT NULL,
);
/*
Create a non-clustered index on the numeric IP address to speed lookups
I indexed the country_code field here as well since our lookups use it.
I INCLUDED country_name, region_name, and time_zone since that is what we
typically request in a SELECT statement from this table.
*/
CREATE NONCLUSTERED INDEX [IX_IP2Location_IP2LocationServiceQuery] ON [dbo].[IP2Location]
(
[ip] ASC,
[country_code] ASC
)
INCLUDE ( [country_name],
[region_name],
[time_zone])
GO
/*
Create BIGINT versions of the dotted-ip field ip_address
*/
UPDATE IP2Location SET IP =
(CAST(substring(ip_address,1,charindex('.',ip_address)-1) AS BIGINT) * 0x01000000)
+ (CAST(substring(ip_address, charindex('.',ip_address,charindex('.',ip_address))+1, (charindex('.', ip_address, charindex('.', ip_address, charindex('.',ip_address))+1)-charindex('.',ip_address,charindex('.',ip_address)))-1) AS BIGINT) * 0x00010000)
+ (CAST(substring(ip_address, charindex('.',ip_address,charindex('.',ip_address,charindex('.',ip_address))+1)+1, charindex('.', ip_address, charindex('.',ip_address,charindex('.',ip_address,charindex('.',ip_address))+1)+1) - (charindex('.',ip_address,charindex('.',ip_address,charindex('.',ip_address))+1)+1)) as BIGINT) * 0x00000100)
+ (CAST(substring(ip_address, charindex('.',ip_address,charindex('.',ip_address,charindex('.',ip_address,charindex('.',ip_address)+1))+1)+1, len(ip_address) - (charindex('.',ip_address,charindex('.',ip_address,charindex('.',ip_address,charindex('.',ip_address)+1))+1))) as BIGINT)
);
/*
Function to return a TVF containing location info about the given numeric IP address
For example:
SELECT country_code, country_name, region_name, city_name, latitude, longitude, time_zone
FROM GetLocationFromIPv4(16777215);
*/
CREATE FUNCTION [dbo].[GetLocationFromIPv4]
(
@IP bigint
)
RETURNS @IPLoc TABLE
(
country_code varchar(2)
, country_name varchar(255)
, region_name varchar(255)
, city_name varchar(255)
, latitude float
, longitude float
, time_zone varchar(255)
, isp varchar(255)
, domain varchar(255)
, net_speed varchar(255)
, idd_code varchar(255)
, area_code varchar(255)
)
AS
BEGIN
/*
TVF to return (typically) a single record from IP2Location as quickly as possible
*/
INSERT INTO @IPLoc
SELECT country_code
, country_name
, region_name
, city_name
, latitude
, longitude
, time_zone
, isp
, domain
, net_speed
, idd_code
, area_code
FROM IP2Location
WHERE ip = @IP;
RETURN
END
GO
/*
Function to return a TVF containing location info about the given dotted-decimal IP address
For example:
SELECT country_code, country_name, region_name, city_name, latitude, longitude, time_zone
FROM GetLocationFromIPv4('192.168.0.1');
*/
CREATE FUNCTION [dbo].[GetLocationFromIPv4String]
(
@IP nvarchar(15)
)
RETURNS @IPLoc TABLE
(
country_code varchar(2)
, country_name varchar(255)
, region_name varchar(255)
, city_name varchar(255)
, latitude float
, longitude float
, time_zone varchar(255)
, isp varchar(255)
, domain varchar(255)
, net_speed varchar(255)
, idd_code varchar(255)
, area_code varchar(255)
)
AS
BEGIN
/*
TVF to return (typically) a single record from IP2Location as quickly as possible
*/
DECLARE @IPInt64 bigint;
DECLARE @n1 bigint;
DECLARE @n2 bigint;
DECLARE @n3 bigint;
DECLARE @n4 bigint;
set @n1 = substring(@IP
,1
,charindex('.',@IP)-1
);
set @n2 = substring(
@IP
, charindex('.',@IP,charindex('.',@IP))+1
, (charindex('.', @IP, charindex('.', @IP, charindex('.',@IP))+1)-charindex('.',@IP,charindex('.',@IP)))-1
);
set @n3 = substring(
@IP
, charindex('.',@IP,charindex('.',@IP,charindex('.',@IP))+1)+1
, charindex('.', @IP, charindex('.',@IP,charindex('.',@IP,charindex('.',@IP))+1)+1) - (charindex('.',@IP,charindex('.',@IP,charindex('.',@IP))+1)+1)
);
set @n4 = substring(
@IP
, charindex('.',@IP,charindex('.',@IP,charindex('.',@IP,charindex('.',@IP)+1))+1)+1
, len(@IP) - (charindex('.',@IP,charindex('.',@IP,charindex('.',@IP,charindex('.',@IP)+1))+1))
);
SET @IPInt64 = (@n1 * 0x01000000) + (@n2 * 0x00010000) + (@n3 * 0x00000100) + @n4;
INSERT INTO @IPLoc
SELECT TOP(1)country_code
, country_name
, region_name
, city_name
, latitude
, longitude
, time_zone
, isp
, domain
, net_speed
, idd_code
, area_code
FROM IP2Location
WHERE ip = @IPInt64;
RETURN
END
GO
I can think of three solutions - EAV, XML, and Sparse Columns. The latter is vendor-specific and may not be useful to you.
Whichever method you choose, you may wish to consider storing the original request data in a raw format, in a table or flat file. It will make it easy to try new ways of storing the data, allow you to reload data if you discover a mistake with the way you're parsing your requests, and offer opportunities for parsing the API requests using batch processing or "big data" tools if you find that your data warehouse isn't able to efficiently deal with the data.
EAV considerations
EAV/KVS, as you've described it above, is likely to be the most straightforward implementation.
Unfortunately it's also going to be very expensive - to get any sort of efficient queries on commonly used keys you'll need to have indexes on the key column, which could get very fragmented. Querying for particular keys would be extremely expensive.
You may be able to reduce the cost of indexing or index scans by supporting your EAV store with materialised views (many vendors support this) for querying keys or values that you care about.
XML
Most enterprise database systems offer very mature XML handling, including validation, indexing, and sophisticated querying.
Loading the API request into the database as XML would provide one tuple per request, which logically might be a bit more palatable to you than having an unknown number of rows in an EAV table.
Whether this is efficient would depend a lot on your RDBMS vendor and your implementation.
The biggest downside is that this is probably the only way of managing data that's more complicated than string manipulation of the original request!
Sparse Columns / traditional tables
It's possible that you could load your data into a traditional table structure, with one column per key.
SQL Server's Sparse Columns feature is a great alternative to an EAV store. A table with Sparse Columns behaves much the same as a normal table, except that it can have up to 30,000 columns, and NULL values in sparse columns consume no space in the table.
Combining them with Filtered Indexes (another SQL Server specific feature) can provide an extremely efficient alternative to an EAV store if you're frequently querying for a couple of specific columns and/or values.
Using a traditional table with other vendors may be viable - IBM supports over 700 columns per table and Oracle about 1000, and features such as compression or Oracle's treatment of trailing nulls might mean that you can store your API data fairly efficiently.
The obvious downside of this approach is that as you added new keys to your API, you'd need to adjust your schema accordingly.
Best Answer
You are mixing up three orthogonal concepts: data model (star schema), workload characteristics (OLTP vs. OLAP), and physical data organisation (columnar).
Your data model has no bearing on whether column-organised tables are appropriate for you; however, data organisation must reflect the nature of your workload (i.e. queries). For example,
SELECT * FROM...
cannot possibly perform well in a columnar database.