I am starting a ratings data gallery, but it is still early. I have the quadratic stuff that I was arguing with
Extrapolation is extremely tricky in this scenario. The curves are subtly colored from grayscale up to the series color, representing the curve you would get if you fit all of the data up to the most recent episode, starting at episode 6 (so the grey curve is what you would predict from the first 6 episodes).
You can see the weird effect of quadratic fitting and extrapolation that I mentioned to
Also, the way the curves were fit, the "Disregarding Premieres and Finales" curves for series 7 disregard all 3 of Asylum of the Daleks, Angels Take Manhatten, and Bells of St. John from series 7. I repeated that analysis but it seems clear to me that the premiere/finale effect is categorically different for early seasons vs. late, and the mid-season hiatus episodes. And for example, using only episodes 1 to 6 in that last series 7 plot, is using only 3 points, in a model with 3 degrees of freedom. It is fitting the data points exactly. I can also report all standard errors and p-values for the estimates, if anyone is interested. I myself feel that the evidence of changing extrapolation with respect to adding new end points makes it clear how variable any "proof based on exponential curve" really is.
I did something marginally principled which was to use a local polynomial smoother (lowess curve) on the data. That is also in the gallery, and it shows the general trend that you can see from the line plots: RTD seasons seem to have a bigger upswing at the end than Moffat. It's still not a very good analysis to do for this data.
I also plotted Appreciation Index for all series episodes (not the specials). Poor Elton. And I looked at the effect of gaps in days between airings, a tiny bit, which you can see a little bit of that in the "Viewers by Gap" plot. If an episode aired within 14 days of another episode, it's on the left. If there was a bigger gap (eg, series premieres), it's on the right. The x axis is jittered so you can see all the points. "Journey's End" is the RTD outlier for the small gap. "Rose" is the RTD outlier for the premiere gap. By the way if you are counting, the gap between "Rose" and the last aired episode ("Survival") is 6685 days. The next biggest gap in seasons is 1/10th that size, and also included all of Tennant's 2009/2010 specials, which I do not include in this analysis. Also I managed to switch the colors for RTD and Moffat in those two plots. Sorry!
So, overall, Series 4 seems to be the most liked & popular of all the series. Currently to me, based on raw numbers, seems like series 7 is in a bit of a slump right now but that isn't really taking into account things like iPlayer stats, the different time schedules, the air time, the competing programs, the season, and how things will go for the next few episodes. I am working on cleaning up a data set of the top 30 programs per channel per week scraped (by hand for each week, that was my friday night, I have no life) from the Barb site. I should have that data set cleaned today, because it is the weekend and I still have no life.
The big difficulty here, if we want to talk about "statistical significance", is that there are so many parameters that could influence the data, and only 88 data points. Gonna be hard to get "the real story" with any amount of statistical significance. I will be able to examine several factors together, but I think any hypothesis tests for big multiple regressions in this case will have low power. Might be able to show that the most parsimonious models explain most of the variation in the data with casual viewer statistics. I still would not be surprised to see a bit of a downward trend in Moffat's overall season statistics based on the wild popularity of RTD season 4.
In terms of tracking fandom... I do still have eg, FF.net and Teaspoon data from up to the end of February this year. Could explore the trends of who is writing which eras over time to understand more fannish popularity. Or figure out how to do tumblr/twitter/lj stats. Lastly, anybody know where I can get budget information for the episodes?
Some strangely behaving quadratic curves




Lowess Smoother (local polynomial smoothing)

Appreciation Index

Gaps in airing

no subject
Date: 2013-05-11 06:50 pm (UTC)Who knew stats could look so...beautiful?
*HUGS*
no subject
Date: 2013-05-11 07:03 pm (UTC)Those quadratics are definitely interesting. From a purely personal perspective, I'm surprised that S5 has slightly flatter extrapolated curves than S6; IMO S5 is the most consistently good Moffat-era series (my favorite of all New Who, in fact), but I know my opinions don't constitute facts. (Yet.)
I haven't looked at the specific AI figures to calculate the average AI of RTD era vs. Moffat era, but the Moffat era seems more consistently liked -- although RTD had some fairly wild up and down spikes that on average, could mean the eras roughly turn out the same. Regardless, the overall fluctuation between 80 to ~90 isn't all that wide, particularly when you consider that anything over 85 -- where most of the numbers fall -- is assumed to be doing extremely well.
The big difficulty here, if we want to talk about "statistical significance", is that there are so many parameters that could influence the data, and only 88 data points.
Yup, and that's the thing that's missing from quite a lot of what passes for statistical "analysis" of the show's ratings. However, your plots certainly seem to indicate that the show isn't exactly in danger of becoming wildly unpopular, no matter what fandom's confirmation bias suggests.
no subject
Date: 2013-05-11 07:16 pm (UTC)Which is, I think, the main reason that the AIs basically return back to baseline instead of continuing RTD's trend. That makes sense with regard to the casual viewer who likes Daleks and Cybermen blowing each other up but won't care about something called "The Pandorica" that Moffat made up. Especially if they had to tune in two weeks ago to know what's going on. Although series 6 really was the big convoluted mystery, and it seems to be holding steady. But if you want to argue that AI is censored by a dropoff in viewership (indices stay the same but people who leave don't bother to report an AI), then that might tell a different story.
How do we compare these competing hypotheses? I kind of wish we could do post-enumeration surveys. "Did you tune in to Doctor Who and then turn it off b/c it was rubbish?" (Y/N). Barring that data, well, there are plenty of stories the current data supports.
no subject
Date: 2013-05-11 08:47 pm (UTC)And I'm not sure what else to say, except, wow, people really didn't like Love & Monsters, did they?
no subject
Date: 2013-05-11 09:45 pm (UTC)no subject
Date: 2013-05-12 01:24 am (UTC)and for now, I'm going to neglect getting into the deep of your study because we both know I need to be doing other things. super funness though!