MySQL Optimization – More Tables vs More Columns/Rows

MySQLpython

Im new to SQL and looking to build a database. As an example to what I'm trying to do is have a database with about 5000 stock tickers. Each ticker will have about 35 data-points, corresponding to specific dates. All data will be INT or a small string.

I'm wondering if it's better to have each ticker represent a table, with all its data-points inside the table (about 15 columns and 20 rows), or should I put this all in one table, with the index being the ticker symbol, then having a bunch of columns.

It seems easier to make a new table for each ticker, as I can put the column titles as dates, and row titles as what the data represents, but not sure if this is optimized or would cause problems with multiple people searching for different tables.

Thanks!

EDIT: Here is an example of the table I'm looking to create, is this possible to have 2 rows indexes?

enter image description here

Best Answer

One table. Period. Full-stop.

A general rule is not to have multiple "identical" tables. It is a maintenance problem, and it is likely to be slower. (The common exception is SaaS, such as WordPress, where they simply live with lots of data bases with the 'same' 20 tables.)

"35 datapoints ... dates" -- I'm confused. Either you have information about a stock or you have daily prices going back years. Are you saving only 5 weeks of data? Or what do you mean?

5K rows in one table is "tiny" as datasets go.

Does this also have stock prices for each day? With this, you get to millions or maybe billions of rows of about 6 columns. Again, I strongly recommend a single table for all prices. (Plus a table for all the tickers -- again 5K rows of 30 columns.)

I have dealt with such datasets, so ask away. Or look around; I have answered questions like this in this forum and also in stackoverflow.com

The prices table would have something like

stock_id MEDIUMINT UNSIGNED -- 2 bytes
dy DATE  -- 3 bytes
open, high, low, close   -- see below

The prices get tricky.

FLOAT -- ~7 significant digits, 4 bytes; maybe good enough, might have roundoff
DECIMAL -- but how many decimal places; potentially bulky
INT -- not good enough
DOUBLE -- 8 bytes; overkill

Indexing will be important, but first, figure out what the inserts and selects are like. These may be optimal:

PRIMARY KEY(stock_id, dy)
INDEX(dy, stock_id)

For one stock_id, there will be many rows. Do not use 'many' columns for an "array"; that is very clumsy. Perhaps your dates are "quarterly", as in company reports, not "daily" as in stocks?

5000 companies with 8 years of quarterly reports, at one row each == 160K rows in the single table. That's still a "small" table.

Plus another table with 5000 rows, one per company, with name, address, etc. That is, attributes that don't change.

Yes, income, balance, cashflow, etc should be separate columns in the table. Assuming those values come from a quarterly report, then they would be in the 160K-row table I discussed above.

A table with a few dozen columns and a few million rows is no problem (in general). If you get to 100 columns or a billion rows, you may encounter non-trivial problems.