You must realize the tradeoffs of using CHAR vs VARCHAR
With CHAR fields, what you allocate is exactly what you get. For example, CHAR(15) allocates and stores 15 bytes, no matter how characters you place in the field. String manipulation is simple and straightforward since the size of the data field is totally predictable.
With VARCHAR fields, you get a completely different story. For example VARCHAR(15) actually allocates dynamically up to 16 bytes, up to 15 for data and, at least, 1 additional byte to store the the length of the data. If you have the string 'hello' to store that will take 6 bytes, not 5. String manipulation must always perform some form of length checking in all cases.
The tradeoff is more evident when you do two things:
1. Storing millions or billions of rows
2. Indexing columns that are either CHAR or VARCHAR
TRADEOFF #1
Obviously, VARCHAR holds the advantage since variable-length data would produce smaller rows and, thus, smaller physical files.
TRADEOFF #2
Since CHAR fields require less string manipulation because of fixed field widths, index lookups against CHAR field are on average 20% faster than that of VARCHAR fields. This is not any conjecture on my part. The book MySQL Database Design and Tuning performed something marvelous on a MyISAM table to prove this. The example in the book did something like the following:
ALTER TABLE tblname ROW_FORMAT=FIXED;
This directive forces are VARCHARs to behave as CHARs. I did this at my previous job back in 2007 and took a 300GB table and sped up index lookups by 20%, without changing anything else. It worked as published. However, it did produce a table almost double in size, but that simply goes back to tradeoff #1.
You could analyze the data being stored to see what MySQL recommends for column definition. Just run the following against any table:
SELECT * FROM tblname PROCEDURE ANALYSE();
This will traverse the entire table and recommend column definitions for every column based on the data it contains, the minimum field values, maximum field values, and so forth. Sometimes, you just have to use common sense with planning CHAR vs VARCHAR. Here is a good example:
If you are storing IP addresses, the mask for such a column is at most 15 characters (xxx.xxx.xxx.xxx). I would jump right at CHAR(15) in a heartbeat because the lengths of IP addresses will not vary all that much and the added complexity of string manipulation controlled by an additional byte. You could still do a PROCEDURE ANALYSE() against such a column. It may even recommend VARCHAR. My money would still be on CHAR over VARCHAR in this instance.
CHAR vs VARCHAR issues can be resolved only through proper planning. With great power comes great responsibility (cliche but true)
It is very dependent on the makeup of the data, how the data changes over time and what queries are run as to what the most efficient method is.
I would suggest that a filtered index is the solution that is going to be the most efficient without talking about a specific scenario.
Edit:
You should add a filtered index filtered on where optIn = 1
with the fields that are required by the query of consequence as it will use this index to generate the result and not have to refer to the main table at all. To check that it reads from just the index you can analyse the execution plan. I would expect that the query of consequence would not require all of the fields for the user record and thus the fields in the index would be a subset of the fields in the actual user record.
In short the goal is to do a little extra work on inserting the record (updating the indexes) and storing extra data so that the query has to read a significantly smaller amount of data that is in continuous blocks on the disk (assuming we are talking about traditional disk technology) than it would if it had to scan through the whole table picking about the records and fields in those records required by the query.
Best Answer
Short answer for bool vs varchar: Probably not much diff, but it may depend.
Short answer for bool that is true 90% of the time: The index is useless.
Long answer...
What percentage of the rows are "is_active"? If it is more than about 20%, the index won't be used!.
Use reasonable limits on
VARCHAR
. Even without that,'inactive'
is only 8 characters, not 255. So,WHERE status = 'inactive'
is not much slower thanWHERE is_active
.Is that the entire
WHERE
clause? I ask because Optimization requires looking at the entireWHERE
clause. For that matter, then entireSELECT
can be important.Often the query involves
In which case, this may be very beneficial, even with 90% being active:
But, again, I need to see the details before giving a definitive answer.
And, no, the poor cardinality of
is_active
is not relevant in a composite index.