Hey DW fiction fans!
Nov. 13th, 2012 05:27 pm![[personal profile]](https://www.dreamwidth.org/img/silk/identity/user.png)
Do you like Doctor Who fan fiction? Do you like telling folks about the awesome stories you find? Do you frequent A Teaspoon and an Open Mind and make liberal use of their bookmarks and favorites buttons? Well,
calufrax is looking for folks who would like to sign up for a week to spotlight four to seven of their favorite fics housed on the archive. Go here to read more and sign up!. Ad if you're not familiar with Teaspoon, the
calufrax Tag List is a great place to start finding gems.
calufrax recc-ers make an effort to highlight stories across all genres and eras from the show.
Don't believe me?
There are 1854 tags in
calufrax with a label of "doctor:1" through "doctor:11". Also, on the Teaspoon main site there are 29403 stories labeled in the eras "First Doctor" through "Eleventh Doctor". Here is the data by era:
Columns are the count and percentages of tags in the archive, the count and percentages of tags in the recs, the proportion of stories in each era that were recced*, and two different expected numbers of tags for the recs in
calufrax, based on a simple random sampling of the archive (rec.exp), and an "ideal" recs comm where each era is equally represented and thus recs are uniformly distributed across eras (unif.exp).
Now, a completely fair and stratified reccs community might aim for equal representation among all eras in the recs. That's an ideal but is not quite fair when the archive itself is not fully balanced. But some trends are evident. Eras 9 and 10 make up 72.6% of the comm but only 41.8% of the recs. You can see that in the observed frequencies of recs tags, the earlier eras, while not achieving perfect balance, are certainly over-represented relative to their frequency in the archive, borrowing from the New Who eras of Nine and Ten which, while highly represented in the archive, are at least more balanced in the recs. I also made a picture.

Chisquare tests of the observed frequencies against null hypothesis of balanced according to archive frequency and balanced according to uniform frequency both resulted in p-values of 0 (eg, X^2 values of over 1000 on 10 degrees of freedom in both cases. resounding rejection of null). This corroborates what is obvious in the graph. Not so obvious is that even taking out the outliers of eras Nine and Ten, the resulting chisquare test of uniformity across the remaining eras still results in a rejection (X^2 = 45 on 8 df, p = 3x10^[-7] ). I think that this may be because people are forgetting the awesomness of Two and substituting in the more recent awesomeness of Eleven**, and because people remember that Eight is very pretty.
It is also kind of cool how there is a little bump in the archive for 4 and 5 that is reflected somewhat in the recs pattern as well. I think at least just from an exploratory look at the data, that Three's contingent of recc-ers is quite good at getting him noticed. Also he has the added bonus of covering UNIT as well. I can also conclude that Six's era seems to have a very large footprint, with the highest percentage of stories recc-ed in the category. This means that Six writers are awesome.***
[*Astute readers will point out that stories may have multiple tags. I agree, but I believe it is true for both the archive and the recs site. There may be a "Multi-era" effect here, but there are only ~2900 tags in Multi-Era which is only 10% of the data set, so let's assume for now that multi-era distribution of Eras is comparable to the straight tags (actually I would think it would generally up the count across all Doctors being that multi-way would be chosen when you have lots of different eras in a single story, so imagine that the red line may not quite be as severe as it is depicted on the chart).]
[** I am likely guilty of this and must rectify it in later stints.]
[*** Full disclosure in the interests of Science, I write Six from time to time, thus I am also awesome]
Yes, okay then. Should you feel inclined, go sign up to rec on
calufrax!
![[livejournal.com profile]](https://www.dreamwidth.org/img/external/lj-community.gif)
![[livejournal.com profile]](https://www.dreamwidth.org/img/external/lj-community.gif)
![[livejournal.com profile]](https://www.dreamwidth.org/img/external/lj-community.gif)
Don't believe me?
There are 1854 tags in
![[livejournal.com profile]](https://www.dreamwidth.org/img/external/lj-community.gif)
era archive.tag pct.archive rec.tag pct.rec pct.era.rec rec.exp unif.exp 1 521 0.018 93 0.050 0.179 33 169 2 490 0.017 83 0.045 0.169 31 169 3 641 0.022 127 0.069 0.198 40 169 4 1027 0.035 140 0.076 0.136 65 169 5 1045 0.036 116 0.063 0.111 66 169 6 426 0.014 94 0.051 0.221 27 169 7 625 0.021 121 0.065 0.194 39 169 8 894 0.030 157 0.085 0.176 56 169 9 4568 0.155 224 0.121 0.049 288 169 10 16784 0.571 550 0.297 0.033 1058 169 11 2382 0.081 149 0.080 0.063 150 169
Columns are the count and percentages of tags in the archive, the count and percentages of tags in the recs, the proportion of stories in each era that were recced*, and two different expected numbers of tags for the recs in
![[livejournal.com profile]](https://www.dreamwidth.org/img/external/lj-community.gif)
Now, a completely fair and stratified reccs community might aim for equal representation among all eras in the recs. That's an ideal but is not quite fair when the archive itself is not fully balanced. But some trends are evident. Eras 9 and 10 make up 72.6% of the comm but only 41.8% of the recs. You can see that in the observed frequencies of recs tags, the earlier eras, while not achieving perfect balance, are certainly over-represented relative to their frequency in the archive, borrowing from the New Who eras of Nine and Ten which, while highly represented in the archive, are at least more balanced in the recs. I also made a picture.

Chisquare tests of the observed frequencies against null hypothesis of balanced according to archive frequency and balanced according to uniform frequency both resulted in p-values of 0 (eg, X^2 values of over 1000 on 10 degrees of freedom in both cases. resounding rejection of null). This corroborates what is obvious in the graph. Not so obvious is that even taking out the outliers of eras Nine and Ten, the resulting chisquare test of uniformity across the remaining eras still results in a rejection (X^2 = 45 on 8 df, p = 3x10^[-7] ). I think that this may be because people are forgetting the awesomness of Two and substituting in the more recent awesomeness of Eleven**, and because people remember that Eight is very pretty.
It is also kind of cool how there is a little bump in the archive for 4 and 5 that is reflected somewhat in the recs pattern as well. I think at least just from an exploratory look at the data, that Three's contingent of recc-ers is quite good at getting him noticed. Also he has the added bonus of covering UNIT as well. I can also conclude that Six's era seems to have a very large footprint, with the highest percentage of stories recc-ed in the category. This means that Six writers are awesome.***
[*Astute readers will point out that stories may have multiple tags. I agree, but I believe it is true for both the archive and the recs site. There may be a "Multi-era" effect here, but there are only ~2900 tags in Multi-Era which is only 10% of the data set, so let's assume for now that multi-era distribution of Eras is comparable to the straight tags (actually I would think it would generally up the count across all Doctors being that multi-way would be chosen when you have lots of different eras in a single story, so imagine that the red line may not quite be as severe as it is depicted on the chart).]
[** I am likely guilty of this and must rectify it in later stints.]
[*** Full disclosure in the interests of Science, I write Six from time to time, thus I am also awesome]
Yes, okay then. Should you feel inclined, go sign up to rec on
![[livejournal.com profile]](https://www.dreamwidth.org/img/external/lj-community.gif)
no subject
Date: 2012-11-14 03:41 am (UTC)I am beginning to think I forgot most of what I learned in math stats. Like, I don't even really know what "exploratory data analysis" means. Although to be fair, I think chi-squares and all that would have been the next course in the prob/stats sequence, and I took the more generic intro stats like, two years ago and so it's kind of mindwiped. :/
no subject
Date: 2012-11-14 03:45 am (UTC)no subject
Date: 2012-11-14 03:50 am (UTC)no subject
Date: 2012-11-14 12:57 pm (UTC)