I have a table on sql server lets say TableA.
and a another table as TableB.Both of these tables are heavily loaded.
Table A is loaded with records 35883788. It is holding a lot of duplicates on column program_id.
Table B is loaded with records 27343331. It is also holding a duplicate on column on program_id.
For some business reason we need to keep duplicate records on program_id on both the tables.
Now I am performing a left join on given table as:
select
a.*,b.date_of_birth,
datediff(year,b.date_of_birth,a.from_date)-
(case when dateadd(year,
datediff(year,b.date_of_birth,a.from_date),b.date_of_birth)
> a.from_date then 1 else 0
end) as age_final,
case when datediff(yyyy,b.date_of_birth,a.from_date) is null then ''
when datediff(yyyy,b.date_of_birth,a.from_date) <=1 then 'less than 1 year'
when datediff(yyyy,b.date_of_birth,a.from_date) <=17 then 'then 1 - 17 year'
when datediff(yyyy,b.date_of_birth,a.from_date) <=29 then 'then 19 - 29 year'
when datediff(yyyy,b.date_of_birth,a.from_date) <=39 then 'then 30 - 39 year'
when datediff(yyyy,b.date_of_birth,a.from_date) <=49 then 'then 40 - 49 year'
when datediff(yyyy,b.date_of_birth,a.from_date) <=59 then 'then 50 - 59 year'
when datediff(yyyy,b.date_of_birth,a.from_date) <=64 then 'then 60 - 64 year'
else 'More than 64 years' end as Age_band
from TableA a
left join ( select program_id,date_of_birth,max(process_date)
from TableB
group by program_id,date_of_birth) b
on a.program_id=b.program_id;
my inner query is giving me a unique set of program_id as using group by statement.
None of these tables are indexed. If I have to create an index on both the tables, what kind of index do I need to have on program_id for both the tables.
what all other things I can take care of to improve the joining performance.
need suggestion to optimize the join and indexes.
Best Answer
DDL & DML at the bottom
Standard indexes that could be created based on the information given
Running the query
The NC index on
TableA
is used sinceID
is included in the NC index and no other columns are inTableA
except theprogram_id
coluln, which is a key column of that index.But, since you are using
select a.*
you will probably have to add all these columns to the NC index onTableA
as included columns, which will increase your index size depending on the amount of columns in the table.If we add a
varchar(3)
column and populate it with a value:The index is no longer used:
Because of the additional
bla
column added:To fix this, recreate the index with the
bla
column included.Additionally
Heap tables are not ideal, you should consider adding a clustered index that makes sense and reading up on indexing.
More on the difference between heap tables and tables with a Clustered index here
Using
Select *
is not a good practice, consider naming all columns used in your select statement separately.Bad habits to kick : using SELECT *
DDL + DML used for testing