Ny advantage to putting some columns in another table, or should I just put them in the same table

database-designsqlite

I have a SQLite database with about 50 tables in it. The largest table contains location data and it has 6,000,000 rows.

An example of some of the columns in this table are:

location_contrivedkey --(unique identifier for each location; 1, 2, 3, 4, 5, etc.)
location_coordinates
location_status --(is there a building in this location)
location_rating --(1-10 based on certain factors)

I now realize I need about 30 more specific rating type values for each location. For example:

location_rating_crime
location_rating_weather
location_rating_tourism
location_rating_income
location_rating_jobs

Each location will have a value in each of these columns. None will be blank.

Questions

Is there any reason to put them into another table and join on the location_contrivedkey, or should I just add them all to the current table?

I read somewhere about queries being slower the more columns you have in a table? Is that true?

Also, is there a 2,000 column limit?

Should I be concerned about this approach?

Best Answer

If I were you, I would include the new fields within the location table - while at the same time making considerable use of Declarative Referential Integrity.

I would do something like this (I haven't included all of your new fields - you can check out the fiddle here):

Create and populate two tables, crime and weather:

CREATE TABLE crime (cr_rating VARCHAR (25) PRIMARY KEY);
INSERT INTO crime VALUES ('Low'), ('Medium'), ('High');

and

CREATE TABLE weather (weather_type VARCHAR (50) PRIMARY KEY);
INSERT INTO weather VALUES ('Scorching'), ('Sunny'), ('Mild'), ('Overcast'), ('Drizzle'), ('Rainy'), ('Stormy'), ('Hurricane');

Then create and populate a job table - see discussion below:

CREATE TABLE job (job_id INTEGER PRIMARY KEY, job_type VARCHAR (30) NOT NULL);
INSERT INTO job (job_type) VALUES ('Unknown'), ('Unemployed'), ('Blue collar'), ('White collar'), ('Manager'), ('Self-employed');

No need to use AUTOINCREMENT for reasons outlined here. Developers tend to compulsively use AUTOINCREMENT - possibly a hangover from MySQL standards and possibly some idea that it must somehow be "better" than the simpler code above.

Then create and populate the main location table:

CREATE TABLE the_location
(
  loc_key INTEGER PRIMARY KEY,
  loc_coords VARCHAR (50) NOT NULL, -- not sure how you store these in your system
  loc_status INTEGER CHECK (loc_status IN (0,1)),
  loc_rating INTEGER CHECK (loc_rating BETWEEN 1 AND 10), 
  loc_crime VARCHAR (25) NOT NULL, 
  -- could use a CHECK CONSTRAINT, but use a lookup table instead
  -- note also that the types should match exactly
  loc_weather VARCHAR (50) NOT NULL,
  -- location_rating_tourism, -- same lookup table idea here (Good, Medium, Poor, Death-zone)
  -- location_rating_income, -- list of social classes table
  -- location_rating_jobs
  CONSTRAINT ct_loc_crime FOREIGN KEY (loc_crime) REFERENCES crime (cr_rating),
  CONSTRAINT ct_loc_weather FOREIGN KEY (loc_weather) REFERENCES weather (weather_type)
);

INSERT INTO the_location 
(loc_coords, loc_status, loc_rating, loc_crime, loc_weather) 
VALUES 
('coord_1', 0, 7, 'Medium', 'Rainy'), 
('coord_2', 1, 7, 'High', 'Mild');

I've left some of the other fields out, but if there is one and only one of these things associated with your location, then you can simply have them as a single field in the location table as discussed above.

However, consider the job table - corresponding to the location_rating_jobs record in your location table. Suppose there is a certain percentage of Unknown job_types, a certain percentage of Unemployed &c... per location?

In this case, you can use an Associative Entity (otherwise known as a joining or linking) table to cope with such a relationship. I would do something like the following:

CREATE TABLE location_job
( 
  lj_loc INTEGER NOT NULL,
  lj_job      INTEGER NOT NULL,
  lj_percent  INTEGER DEFAULT 0 NOT NULL,
  PRIMARY KEY (lj_loc, lj_job),
  CONSTRAINT ct_lj_loc FOREIGN KEY (lj_loc) REFERENCES the_location (loc_key),
  CONSTRAINT ct_lj_job FOREIGN KEY (lj_job) REFERENCES job(job_id)
);

and then populate it respecting the FOREIGN KEY constraints.

INSERT INTO location_job 
VALUES 
(1, 1, 10), (1, 2, 10), (1, 3, 10), (1, 4, 10), (1, 5, 10), (1, 6, 50),
(2, 1, 30), (2, 2, 30), (2, 3, 10), (2, 4, 10), (2, 5, 10), (2, 6, 10);

And then, in order to find out the percentage of job_type by area, I would run the following query:

SELECT 
  tl.loc_key, tl.loc_coords, 
  lj.lj_loc, lj.lj_job, lj.lj_percent, 
  j.job_id, j.job_type
FROM the_location tl
JOIN location_job lj ON tl.loc_key = lj.lj_loc
JOIN job j ON lj.lj_job = j.job_id
ORDER BY tl.loc_key, j.job_id;

Result:

loc_key loc_coords  lj_loc  lj_job  lj_percent  job_id  job_type
1   coord_1 1   1   10  1   Unknown
1   coord_1 1   2   10  2   Unemployed
1   coord_1 1   3   10  3   Blue collar
..
.. results snipped for brevity - see fiddle
..

Obviously, in a real query, the duplication of values would be eliminated and you would only have one of each necessary one - I've kept them all here so that you can effectively see the process of joining across tables.

The only area that falls outside the area of standard (current) SQL is the fact that the percentages per area cannot be > 100 (*) (Firebird can have queries in CHECK constraints - but it's about the only server that I know of which can). This is known as ASSERTIONs or "Declarative multi-row constraints" which have not yet been implemented by the major vendors - you'd have to use TRIGGERs for this - or maybe C code in SQLite.

(*) or perhaps your model would allow for those who were both Self-employed and Managers?

If this doesn't answer your question, please let me know. p.s. welcome to the forum! :-)

FRAGMENTATION

Fragmentation is introduced in a heavy-write environment.

DELETEs automatically create empty space by at least the size of the row at the time of deletion.
UPDATEs can also cause fragmentation, mostly notably on variable-length data.

Further Links on InnoDB Fragmentation and how to Eliminate it from InnoDB data and system tablespace

http://www.bluegecko.net/mysql/innodb-tablespace-fragmentation-find-it-and-fix-it/
How do you remove fragmentation from InnoDB tables? (April 11, 2012)
Howto: Clean a mysql InnoDB storage engine? (My StackOverflow Post From Oct 29, 2010)

NORMALIZATION

Any columns you have in a user table that are immutable (i.e., will never experience changes) should act as the main table. Things such as

Address Change
Spelling Correction
Last Name Change (in case a lady get married)

will produce very little fragmentation since changes of those kinds are rare.

Any information that logs frequent changes to user information should go into a userinfo table. This will separate fragmentation issues from the immutable user data. You can easily defrag an InnoDB userinfo table with one of the following

ALTER TABLE userinfo ENGINE=InnoDB;
OPTIMIZE TABLE userinfo;

CONCLUSION

You should split up the user data into user and userinfo tables. A simple INNER JOIN will combine them as needed.

Questions

Best Answer

Related Solutions

MySQL – Storing vs Calculating Aggregate Values

Mysql – Should I put followers_no & following_no & product_no & last_login and etc in the user table? Won’t updates cause fragmentation on the table

FRAGMENTATION

NORMALIZATION

CONCLUSION

Related Question