background info a php script is feeding the mysql table.
I could do it manually, but am hoping for at least some kind of automation.
1 12342 64 0
2 12347 65 0
3 11364 66 0
4 12281 67 0
5 12342 68 0
6 12347 69 0
7 11314 70 12347
8 2922 71 0
9 2351 72 0
10 2639 73 0
11 698 74 283
12 10852 75 283
13 362 76 109
14 568 77 107
15 12347 78 19
16 9504 79 139
17 12347 80 135
18 12347 81 28
19 12342 82 22
20 12342 83 66
21 12223 84 60
22 12347 85 123
23 12347 86 316
24 12347 87 139
25 12347 88 22274
26 12347 89 84
27 12342 90 84
28 11395 91 3
29 12347 92 1
30 12347 93 0
31 12347 94 0
32 9504 95 0
33 12347 96 0
34 12347 97 0
35 9846 98 0
36 101 99 0
37 0 100 8
38 0 101 0
39 0 102 1
40 0 103 1
41 0 104 0
42 0 105 0
43 0 106 0
44 0 107 0
45 0 108 0
46 0 109 0
47 0 110 0
48 0 111 0
49 0 112 0
50 0 113 0
51 0 114 10
52 0 115 1
53 0 116 1
54 0 117 1
55 0 118 1
56 0 119 1
57 0 120 8
58 0 121 1
59 0 122 1
60 0 123 1
61 0 124 1
62 0 125 3
63 0 126 1
The first and third columns are column numbers and 2nd and fourth are total number of values in each column that are not null.
I want to split up my table into 1 table with columns that are always (12347 records) present or almost always present. Then a column indicating a template number and an index field.
Then I need to analyze the rest of the fields to see what patterns emerge to create additional tables to hold the data that is sometimes present.
Not real info just an example
For example if column 1 = "something" column 2 ="something" and etc then columns 80-90 are present.
What I want to do then is have a table with an index number and 10 columns. Then main table will have a template number and an index number. Where the template number will probably be the name of the table to look in or a numeric index of a list of names.
Is there an automatic tool to analyze the data to determine what conditions are necessary for the optional/sparsely populated fields to have data.
It took over an hours work to determine that there are 28 columns that are nearly always there, 53 with no values, and the rest somewhere in the middle.
Show create table
www_error | CREATE TABLE `www_error` (
`REQUEST_TIME_FLOAT` decimal(15,4) DEFAULT NULL,
`REQUEST_URI` int(11) DEFAULT NULL,
`REDIRECT_UNIQUE_ID` varchar(50) DEFAULT NULL,
`REDIRECT_REQUEST_METHOD` varchar(18) DEFAULT NULL,
`REDIRECT_STATUS` int(11) DEFAULT NULL,
`UNIQUE_ID` varchar(50) DEFAULT NULL,
`HTTP_HOST` varchar(33) DEFAULT NULL,
`HTTP_ACCEPT` varchar(300) DEFAULT NULL,
`HTTP_ACCEPT_LANGUAGE` varchar(120) DEFAULT NULL,
`HTTP_ACCEPT_ENCODING` varchar(60) DEFAULT NULL,
`HTTP_COOKIE` blob,
`HTTP_CONNECTION` varchar(14) DEFAULT NULL,
`HTTP_UPGRADE_INSECURE_REQUESTS` varchar(10) DEFAULT NULL,
`HTTP_CACHE_CONTROL` varchar(50) DEFAULT NULL,
`PATH` int(11) DEFAULT NULL,
`SERVER_SIGNATURE` varchar(300) DEFAULT NULL,
`SERVER_SOFTWARE` varchar(20) DEFAULT NULL,
`SERVER_NAME` varchar(33) DEFAULT NULL,
`SERVER_ADDR` varchar(30) DEFAULT NULL,
`SERVER_PORT` varchar(5) DEFAULT NULL,
`REMOTE_ADDR` bigint(40) DEFAULT NULL,
`DOCUMENT_ROOT` int(11) DEFAULT NULL,
`REQUEST_SCHEME` varchar(20) DEFAULT NULL,
`CONTEXT_PREFIX` int(11) DEFAULT NULL,
`CONTEXT_DOCUMENT_ROOT` int(11) DEFAULT NULL,
`SCRIPT_FILENAME` int(11) DEFAULT NULL,
`REMOTE_PORT` int(11) DEFAULT NULL,
`REDIRECT_URL` blob,
`GATEWAY_INTERFACE` varchar(15) DEFAULT NULL,
`SERVER_PROTOCOL` varchar(170) DEFAULT NULL,
`REQUEST_METHOD` varchar(20) DEFAULT NULL,
`QUERY_STRING` varchar(123) DEFAULT NULL,
`SCRIPT_NAME` int(11) DEFAULT NULL,
`PHP_SELF` int(11) DEFAULT NULL,
`REQUEST_TIME` int(15) unsigned DEFAULT NULL,
`HTTP_PRAGMA` varchar(10) DEFAULT NULL,
`LESSKEY` varchar(16) DEFAULT NULL,
`NNTPSERVER` varchar(4) DEFAULT NULL,
`MANPATH` varchar(59) DEFAULT NULL,
`XDG_SESSION_ID` varchar(1) DEFAULT NULL,
`HOSTNAME` varchar(4) DEFAULT NULL,
`XKEYSYMDB` varchar(28) DEFAULT NULL,
`HOST` varchar(4) DEFAULT NULL,
`TERM` varchar(5) DEFAULT NULL,
`SHELL` varchar(9) DEFAULT NULL,
`PROFILEREAD` varchar(4) DEFAULT NULL,
`HISTSIZE` varchar(4) DEFAULT NULL,
`SSH_CLIENT` varchar(22) DEFAULT NULL,
`MORE` varchar(3) DEFAULT NULL,
`SSH_TTY` varchar(10) DEFAULT NULL,
`USER` varchar(4) DEFAULT NULL,
`LS_COLORS` varchar(879) DEFAULT NULL,
`XNLSPATH` varchar(22) DEFAULT NULL,
`HOSTTYPE` varchar(6) DEFAULT NULL,
`CONFIG_SITE` varchar(40) DEFAULT NULL,
`FROM_HEADER` varchar(20) DEFAULT NULL,
`PAGER` varchar(4) DEFAULT NULL,
`CSHEDIT` varchar(5) DEFAULT NULL,
`XDG_CONFIG_DIRS` varchar(8) DEFAULT NULL,
`LIBGL_DEBUG` varchar(5) DEFAULT NULL,
`MINICOM` varchar(5) DEFAULT NULL,
`MAIL` varchar(14) DEFAULT NULL,
`CPU` varchar(6) DEFAULT NULL,
`INPUTRC` varchar(10) DEFAULT NULL,
`PWD` varchar(24) DEFAULT NULL,
`LANG` varchar(5) DEFAULT NULL,
`PYTHONSTARTUP` varchar(16) DEFAULT NULL,
`GPG_TTY` varchar(10) DEFAULT NULL,
`SHLVL` varchar(1) DEFAULT NULL,
`HTTP_USER_AGENT` bigint(20) DEFAULT NULL,
`SSH_CONNECTION` varchar(34) DEFAULT NULL,
`LESSCLOSE` varchar(18) DEFAULT NULL,
`WINDOWMANAGER` varchar(5) DEFAULT NULL,
`REDIRECT_spider` varchar(300) DEFAULT NULL,
`spider` varchar(300) DEFAULT NULL,
`CONTENT_TYPE` varchar(839) DEFAULT NULL,
`HTTP_REFERER` varchar(109) DEFAULT NULL,
`HTTP_ACCEPT_CHARSET` varchar(75) DEFAULT NULL,
`HTTP_DNT` varchar(10) DEFAULT NULL,
`HTTP_FROM` varchar(300) DEFAULT NULL,
`REDIRECT_HTTPS` varchar(10) DEFAULT NULL,
`REDIRECT_SSL_TLS_SNI` varchar(30) DEFAULT NULL,
`HTTPS` varchar(10) DEFAULT NULL,
`SSL_TLS_SNI` varchar(33) DEFAULT NULL,
`CONTENT_LENGTH` varchar(10) DEFAULT NULL,
`HTTP_KEEP_ALIVE` varchar(10) DEFAULT NULL,
`REDIRECT_QUERY_STRING` varchar(660) DEFAULT NULL,
`SERVER_ADMIN` enum('comments@mail','','comments@ecybernard.com') DEFAULT NULL,
`HTTP_DEPTH` varchar(1) DEFAULT NULL,
`HTTP_TRANSLATE` varchar(1) DEFAULT NULL,
`HTTP_IF_MODIFIED_SINCE` varchar(29) DEFAULT NULL,
`HTTP_REFER` varchar(61) DEFAULT NULL,
`LESS` varchar(8) DEFAULT NULL,
`LESS_ADVANCED_PREPROCESSOR` varchar(2) DEFAULT NULL,
`COLORTERM` varchar(1) DEFAULT NULL,
`MACHTYPE` varchar(17) DEFAULT NULL,
`QT_SYSTEM_DIR` varchar(23) DEFAULT NULL,
`OSTYPE` varchar(5) DEFAULT NULL,
`HOME` varchar(5) DEFAULT NULL,
`HTTP_TE` varchar(18) DEFAULT NULL,
`LC_CTYPE` varchar(11) DEFAULT NULL,
`HTTP_MIME_VERSION` varchar(3) DEFAULT NULL,
`HTTP_X_VERMEER_CONTENT_TYPE` varchar(32) DEFAULT NULL,
`XDG_DATA_DIRS` varchar(10) DEFAULT NULL,
`LS_OPTIONS` varchar(22) DEFAULT NULL,
`XCURSOR_THEME` varchar(3) DEFAULT NULL,
`LOGNAME` varchar(4) DEFAULT NULL,
`DBUS_SESSION_BUS_ADDRESS` varchar(25) DEFAULT NULL,
`XDG_RUNTIME_DIR` varchar(11) DEFAULT NULL,
`G_BROKEN_FILENAMES` varchar(1) DEFAULT NULL,
`CVS_RSH` varchar(3) DEFAULT NULL,
`LESSOPEN` varchar(14) DEFAULT NULL,
`_` varchar(12) DEFAULT NULL,
`HTTP_PROXY_CONNECTION` varchar(10) DEFAULT NULL,
`HTTP_X_FB_BACKGROUND_STATE` varchar(1) DEFAULT NULL,
`HTTP_X_FB_SIM_HNI` varchar(6) DEFAULT NULL,
`HTTP_X_FB_NET_HNI` varchar(6) DEFAULT NULL,
`HTTP_X_FB_HTTP_ENGINE` varchar(5) DEFAULT NULL,
`REDIRECT_ERROR_NOTES` varchar(136) DEFAULT NULL,
`HTTP_REFERRER` varchar(34) DEFAULT NULL,
`HTTP_X_REAL_IP` varchar(12) DEFAULT NULL,
`HTTP_X_FORWARDED_FOR` varchar(12) DEFAULT NULL,
`HTTP_TRUE_CLIENT_IP` varchar(12) DEFAULT NULL,
`HTTP_EXPECT` varchar(12) DEFAULT NULL,
`HTTP_CONTENT_LANGUAGE` varchar(5) DEFAULT NULL,
`HTTP_ORIGIN` varchar(20) DEFAULT NULL,
KEY `REQUEST_URI` (`REQUEST_URI`),
CONSTRAINT `www_error_ibfk_1` FOREIGN KEY (`REQUEST_URI`) REFERENCES `request` (`id`) ON DELETE NO ACTION ON UPDATE NO ACTION
) ENGINE=InnoDB DEFAULT CHARSET=utf8
Secondary columns with only 193 unique values across 26 columns and 4000 and counting rows.
HTTP_ACCEPT
HTTP_ACCEPT_LANGUAGE
HTTP_ACCEPT_ENCODING
HTTP_COOKIE
HTTP_UPGRADE_INSECURE_REQUESTS
HTTP_CACHE_CONTROL
SERVER_SIGNATURE
QUERY_STRING
HTTP_PRAGMA
REDIRECT_spider
spider
CONTENT_TYPE
HTTP_REFERER
HTTP_ACCEPT_CHARSET
HTTP_DNT
HTTP_FROM
REDIRECT_HTTPS
REDIRECT_SSL_TLS_SNI
HTTPS
SSL_TLS_SNI
CONTENT_LENGTH
HTTP_KEEP_ALIVE
REDIRECT_QUERY_STRING
HTTP_DEPTH
HTTP_TRANSLATE
HTTP_REFER
HTTP_PROXY_CONNECTION
Best Answer
Which of those fields do you need to filter (
WHERE
) on? Which ones do you need toGROUP BY
orORDER BY
? Make those real columns. Put the rest into aJSON
string. Then, depending on which version of MySQL/MariaDB you are using, put that into a singleTEXT
orJSON
column.As a first cut, you could make the 28 always-present fields into real columns. But I recommend you cut back from that.
JSON
is open-ended; you can add more as needed. You can leave out missing fields.No analysis needed, no
NULLs
needed. NoALTER TABLE .. ADD COLUMN ..
needed when Apache is upgraded.See also http://mysql.rjweb.org/doc.php/eav .