This is one (of many) downsides of EAV designs.
You can't really improve the JOIN: because of the necessary complexity, a cost based optimiser won't get to the perfect plan. It finds "good enough"
Suggestions:
- don't use a view: use aggregate type queries (eg COUNT(*) = 2 if I match both height and weight)
- use a trigger to maintain a real (or sparse) table and query that
The first option scales better becauses a few indexes on the main EAV fact table can cover all queries nicely.
Views are typically not implemented for performance. And while you currently cannot implement explicit indexed views (which are just views that SQL Server maintains for you), you can certainly maintain facts manually yourself.
For example, you mention that you currently calculate "whether someone is dead or not" using three CTEs and a CASE expression (sorry, to be pedantic, it's not a statement).
Instead of referencing this set of CTEs every time the view is accessed, why not put that fact in a table (potentially along with other facts that have to be calculated per user), and calculate that periodically in the background? So maybe every 5 minutes (that is just a SWAG, you'll have to determine what's appropriate), you run a SQL Server Agent job that re-populates the table based on what it currently knows is the truth. Now the view just has to reference the table that is the output of that script, instead of calculating it over and over again while the users wait. So for example:
CREATE TABLE dbo.PersonProperties
(
PersonID INT PRIMARY KEY REFERENCES dbo.Persons(PersonID),
IsDead BIT NOT NULL DEFAULT 0
);
Now the job can simply merge that table with the results of the CTE, and then the view can include a reference to that table which simply pulls the BIT column along with a join on the PK. This should be MUCH less expensive at query time that re-evaluating all of that logic every time.
To minimize blocking (e.g. when users are accessing the view at the same time the job is running), you can implement what I call "schema switch-a-roo" and which I blogged about here:
http://www.sqlperformance.com/2012/08/t-sql-queries/t-sql-tuesday-schema-switch-a-roo
So instead of locking resources on the expensive query throughout the operation, the only blocking that happens is when the metadata switch actually takes place.
This works as long as you can afford some brief periods where the data is not accurate. You can tighten up that window to be pretty narrow, but there is always a chance that a person will die in between and for a brief moment a query will return that they are still alive. If you can't afford this, then you make it a part of the process that first introduces that fact to the database to make sure the CTE reflects that immediately and the new table also reflects it immediately.
Still not good enough? The flag a user as "dirty" the second a change for them comes in. The view can union or left join with the stale data for users that are "clean" and go after live data only for the users that are "dirty."
Best Answer
It'll sound a little stupid, but creating a separate composite non clustered index on the attributes table on product id and each column, resulting in 150+ indexes on the attributes table, will get you the ultimate speed for that query because it will support the select distinct column value by product from the attributes table. Your insert performance will be super painful.