MySQL table design / architecture, table is too big

database-designMySQLschema

I have a MySQL DB that contains a lot of text, I'm fetching data from a website and inserting it into a table.

I'm using a SSD HD (100GB) for the DB and I'm out of space, I think that something in the table structure made it too big, I can't predict the size of all the columns so I'm using varchar\text\medium text for most of the fields. when I insert all the data to the DB I monitor the errors and when I see that a certain field is too small for the data I'm trying to insert I'm increasing the size of the field (e.g. from varchar(1000) to varchar(2000)).

until now I have about 1.8M~ rows, I think that I'm doing something wrong.

here is the structure of my table –

CREATE TABLE `PT` (
  `patID` int(11) NOT NULL,
  `Title` varchar(450) DEFAULT NULL,
  `IssueDate` date DEFAULT NULL,
  `NoFullText` tinyint(1) DEFAULT NULL,
  `Abstract` text,
  `ForeignReferences` varchar(15000) DEFAULT NULL,
  `CurrentUSClass` varchar(2200) DEFAULT NULL,
  `OtherReferences` mediumtext,
  `ForeignPrio` varchar(900) DEFAULT NULL,
  `CurrentIntlClass` varchar(3000) DEFAULT NULL,
  `AppNum` varchar(45) DEFAULT NULL,
  `AppDate` date DEFAULT NULL,
  `Assignee` varchar(300) DEFAULT NULL,
  `Inventors` varchar(1500) DEFAULT NULL,
  `RelatedUSAppData` text,
  `PrimaryExaminer` varchar(100) DEFAULT NULL,
  `AssistantExaminer` varchar(100) DEFAULT NULL,
  `AttorneyOrAgent` varchar(300) DEFAULT NULL,
  `ReferencedBy` text,
  `AssigneeName` varchar(150) DEFAULT NULL,
  `AssigneeState` varchar(80) DEFAULT NULL,
  `AssigneeCity` varchar(150) DEFAULT NULL,
  `InventorsName` varchar(800) DEFAULT NULL,
  `InventorsState` varchar(300) DEFAULT NULL,
  `InventorsCity` varchar(800) DEFAULT NULL,
  `Claims` mediumtext,
  `Description` mediumtext,
  `InsertionTime` datetime NOT NULL,
  `LastUpdatedOn` datetime NOT NULL,
  PRIMARY KEY (`patID`),
  UNIQUE KEY `patID_UNIQUE` (`patID`)
) ENGINE=MyISAM DEFAULT CHARSET=latin1;

what should I do? I have about 20% of the data (which means I'm going to need 350GB~ space) what is the performance impact here? should I divide the table into several tables over several HDs? I'm going to use sphinx to index and query the data in the end.

should I partition the table and use another disk for the partitioned table? will it help to create several tables out of this one? what would be the best way of doing so?

most of the queries will be SELECT, I'm going to search on about 4-5 text columns in this table (full text search + Sphinx).

Thanks!

Best Answer

Well first there is no way that all of this should ever be in one table. You need related tables for information such as Assignee and Inventor. Please read about normalization.

But in reality are you sure that mysql is the way to go for this since you seem mostly to want to search on data that is not always in the same form. Personally, for this type of thing I would look at NoSQL databases for the text parts in conjuction with a relational database for the data that is easy to describe and determine the size of.