eve11: (chance)
[personal profile] eve11


I am starting a ratings data gallery, but it is still early. I have the quadratic stuff that I was arguing with [livejournal.com profile] kilodalton was highly susceptible to outliers... surprise, it is highly susceptible to outliers, and also, the quadratic terms in the more stable curves like series 5, 6, and 7 are generally not "statistically significantly different from 0". This is not to say that a linear model is any more of a better fit. I think fitting a trend in that way to this data is not really a good idea.

Extrapolation is extremely tricky in this scenario. The curves are subtly colored from grayscale up to the series color, representing the curve you would get if you fit all of the data up to the most recent episode, starting at episode 6 (so the grey curve is what you would predict from the first 6 episodes).

You can see the weird effect of quadratic fitting and extrapolation that I mentioned to [livejournal.com profile] nonelvis earlier: that because of the way the quadratics are fitted, if you include the huge dropoff from the premiere (which should be eg, more evidence of "Going into the toilet"), then the resulting curves actually estimate that the show will do better later on, because they are estimating the inflection point based on outliers as the curve stabilizes. That said, series 7 currently is still waiting on a set of "pick me up" episodes before the end of the series. But I am not convinced the trend is because of anything other than "the casual viewer" and the myriad reasons why they do or do not watch.

Also, the way the curves were fit, the "Disregarding Premieres and Finales" curves for series 7 disregard all 3 of Asylum of the Daleks, Angels Take Manhatten, and Bells of St. John from series 7. I repeated that analysis but it seems clear to me that the premiere/finale effect is categorically different for early seasons vs. late, and the mid-season hiatus episodes. And for example, using only episodes 1 to 6 in that last series 7 plot, is using only 3 points, in a model with 3 degrees of freedom. It is fitting the data points exactly. I can also report all standard errors and p-values for the estimates, if anyone is interested. I myself feel that the evidence of changing extrapolation with respect to adding new end points makes it clear how variable any "proof based on exponential curve" really is.

I did something marginally principled which was to use a local polynomial smoother (lowess curve) on the data. That is also in the gallery, and it shows the general trend that you can see from the line plots: RTD seasons seem to have a bigger upswing at the end than Moffat. It's still not a very good analysis to do for this data.

I also plotted Appreciation Index for all series episodes (not the specials). Poor Elton. And I looked at the effect of gaps in days between airings, a tiny bit, which you can see a little bit of that in the "Viewers by Gap" plot. If an episode aired within 14 days of another episode, it's on the left. If there was a bigger gap (eg, series premieres), it's on the right. The x axis is jittered so you can see all the points. "Journey's End" is the RTD outlier for the small gap. "Rose" is the RTD outlier for the premiere gap. By the way if you are counting, the gap between "Rose" and the last aired episode ("Survival") is 6685 days. The next biggest gap in seasons is 1/10th that size, and also included all of Tennant's 2009/2010 specials, which I do not include in this analysis. Also I managed to switch the colors for RTD and Moffat in those two plots. Sorry!

So, overall, Series 4 seems to be the most liked & popular of all the series. Currently to me, based on raw numbers, seems like series 7 is in a bit of a slump right now but that isn't really taking into account things like iPlayer stats, the different time schedules, the air time, the competing programs, the season, and how things will go for the next few episodes. I am working on cleaning up a data set of the top 30 programs per channel per week scraped (by hand for each week, that was my friday night, I have no life) from the Barb site. I should have that data set cleaned today, because it is the weekend and I still have no life.

The big difficulty here, if we want to talk about "statistical significance", is that there are so many parameters that could influence the data, and only 88 data points. Gonna be hard to get "the real story" with any amount of statistical significance. I will be able to examine several factors together, but I think any hypothesis tests for big multiple regressions in this case will have low power. Might be able to show that the most parsimonious models explain most of the variation in the data with casual viewer statistics. I still would not be surprised to see a bit of a downward trend in Moffat's overall season statistics based on the wild popularity of RTD season 4.

In terms of tracking fandom... I do still have eg, FF.net and Teaspoon data from up to the end of February this year. Could explore the trends of who is writing which eras over time to understand more fannish popularity. Or figure out how to do tumblr/twitter/lj stats. Lastly, anybody know where I can get budget information for the episodes?




Some strangely behaving quadratic curves
seriesquadratic
seriesquadratic2
seriesquadratic3
seriesquadratic4

Lowess Smoother (local polynomial smoothing)
LowessSmoothed

Appreciation Index
AI

Gaps in airing
ViewersByGap

Date: 2013-05-11 06:50 pm (UTC)
From: [identity profile] a-phoenixdragon.livejournal.com
The second to last one looks like it is moving if you look away from it for a split...

Who knew stats could look so...beautiful?

*HUGS*

Date: 2013-05-11 07:03 pm (UTC)
nonelvis: (DW science geeks)
From: [personal profile] nonelvis
You mean, the data actually were complex and couldn't be easily analyzed to come up with a single objective truth? Gee, you don't say.

Those quadratics are definitely interesting. From a purely personal perspective, I'm surprised that S5 has slightly flatter extrapolated curves than S6; IMO S5 is the most consistently good Moffat-era series (my favorite of all New Who, in fact), but I know my opinions don't constitute facts. (Yet.)

I haven't looked at the specific AI figures to calculate the average AI of RTD era vs. Moffat era, but the Moffat era seems more consistently liked -- although RTD had some fairly wild up and down spikes that on average, could mean the eras roughly turn out the same. Regardless, the overall fluctuation between 80 to ~90 isn't all that wide, particularly when you consider that anything over 85 -- where most of the numbers fall -- is assumed to be doing extremely well.

The big difficulty here, if we want to talk about "statistical significance", is that there are so many parameters that could influence the data, and only 88 data points.

Yup, and that's the thing that's missing from quite a lot of what passes for statistical "analysis" of the show's ratings. However, your plots certainly seem to indicate that the show isn't exactly in danger of becoming wildly unpopular, no matter what fandom's confirmation bias suggests.

Date: 2013-05-11 08:47 pm (UTC)
thisbluespirit: (dw - Eleven reading knitting book)
From: [personal profile] thisbluespirit
Ooh, pretty graphs again!

And I'm not sure what else to say, except, wow, people really didn't like Love & Monsters, did they?

Date: 2013-05-12 01:24 am (UTC)
From: [identity profile] lostrack621.livejournal.com
Like I keep saying....when I am done with all of "this" (this = dissertation), I'm going to have you teach me some of your R skills because these plots are BEAUTIFUL. So, the data aren't exactly...but the figures are gorgeous. WANT. So much envy right now. In a good way. :)

and for now, I'm going to neglect getting into the deep of your study because we both know I need to be doing other things. super funness though!

Profile

eve11: (Default)
eve11

December 2022

S M T W T F S
    123
45678910
11121314151617
18192021222324
25262728293031

Most Popular Tags

Style Credit

Expand Cut Tags

No cut tags
Page generated Jan. 24th, 2026 08:53 pm
Powered by Dreamwidth Studios