Since you are on SQL Server 2012 you can use a couple of the new windowing functions.
with C1 as
(
select T.team_id,
case
when M.winning_team_id is null then 'T'
when M.winning_team_id = T.team_id then 'W'
else 'L'
end as streak_type,
M.match_id
from FantasyMatches as M
cross apply (values(M.home_fantasy_team_id),
(M.away_fantasy_team_id)) as T(team_id)
), C2 as
(
select C1.team_id,
C1.streak_type,
C1.match_id,
lag(C1.streak_type, 1, C1.streak_type)
over(partition by C1.team_id
order by C1.match_id desc) as lag_streak_type
from C1
), C3 as
(
select C2.team_id,
C2.streak_type,
sum(case when C2.lag_streak_type = C2.streak_type then 0 else 1 end)
over(partition by C2.team_id
order by C2.match_id desc rows unbounded preceding) as streak_sum
from C2
)
select C3.team_id,
C3.streak_type,
count(*) as streak_count
from C3
where C3.streak_sum = 0
group by C3.team_id,
C3.streak_type
order by C3.team_id;
SQL Fiddle
C1
calculates the streak_type
for each team and match.
C2
finds the previous streak_type
ordered by match_id desc
.
C3
generates a running sum streak_sum
ordered by match_id desc
keeping a 0
a long as the streak_type
is the same as the last value.
Main query sums up the streaks where streak_sum
is 0
.
Let's break this question up into a few parts.
Q: I need to insert 1mm rows a day. Is that a lot?
Not really. 1mm divided by 24 hours divided by 60 minutes divided by 60 seconds gives you about 12 inserts per second. For a rough frame of perspective, it's not unusual to see 1,000 inserts per second in typical commodity servers with no tuning.
Granted, your load won't be perfectly averaged out like that - you'll have bursts of load - but I wouldn't make database platform decisions based on less than 10k-20k inserts per second. Any platform out there will work fairly well.
Q: How should I structure the data?
Zoom out - don't think table, think databases. If you're going to be keeping this data permanently, and it's truly insert-only with no updates, then you probably want to start a new database for time lengths. Your inserts may only go into one table in one database, but every year, create a new database (MyApp_2015) and seal the old 2014 data as read-only. You can stop backing it up (as long as you've still got a good backup once), stop doing index maintenance, statistics updates, etc.
The PHP will only ever have to know about the current database for inserts, making your design a lot easier. The archival process becomes a DBA task much later down the road as long as you go in knowing that there will be more than one database involved.
If you were doing more than 1,000 inserts per second sustained, and you wanted easier performance management, then I'd also suggest building sharding into the initial design regardless of the database platform. Don't get me wrong, any modern database can handle over 1,000 inserts per second, but designing sharding in now just gives you more flexibility later on. At 12 inserts per second, it's just not worth the design/testing hassle.
Q: How should I do reporting?
In an ideal world, reports would not be done against the live server. Run the reports against a restored or replicated copy of the database. This does two things: it reduces load on the live server, and it validates your backups, guaranteeing that you've got your valuable data elsewhere.
Best Answer
Did CTE with RANK/OVER:
and in the select I ended with:
Thanks.