Data munging
Feb. 22nd, 2013 10:38 amExploring around the Doctor Who fanfic.net database that my friend sent me. Interesting stuff. The most popular story on there in terms of raw favorites is FrostFyre's "Man with No Name" a Tenth Doctor/Firefly crossover, 106K words in 32 chapters, with 1603 favorites and 879 reviews, published over 6 months in the 2nd half of 2007. 2nd most popular in terms of favorites is "That Which Holds the Image" by TheAngelsHaveThePhoneBox, a recent HP/DW crossover wherein Harry summons a weeping angel boggart. 9 chapters, 40K words, recently completed after 1.5 years. 1309 favorites, 794 reviews.
I should probably look at straight DW stories as opposed to crossovers. Which, luckily, my intrepid data gathering partner has given me that variable.
One graph so far: as per the interests of my data gathering partner
This is a bit misleading because FF.net did not start tracking completions as a field until more recently. Back in the olden days, you could specify it as complete but it wasn't a required field? Anyway, the percentages on those graphs are taken as 6-month bins of the raw data (shown in blue tickmarks but there is too much data to really see the patterns). There are likely two reasons for the jump: (1) The new series started in mid 2005. (2) The aforementioned more recent addition of formally tracking completions.
There are 29660 stories marked as complete in the DW database. Of those stories, 83.7% of them (24833) were last updated on the day of publishing, eg, they were one-shots. That leaves 16.3% of the stories marked as complete that were updated over multiple days. The median completion time of these stories is 30 days. The mean completion time is 99 days. The max is 2212 days, just about 6 years.
Next I'm going to look at seeing if I can discern from recently added/updated stories, what the decay rate is for expectations of reviews given publish date. I don't have the best data for that though.
ETA: oh, hey, here is someone using the same stuff I am (R and RSQLite) to examine Pokemon fics. They are using ggplot2, which, I have a book on but haven't done much with. I'm a straight "plot" user myself.
I should probably look at straight DW stories as opposed to crossovers. Which, luckily, my intrepid data gathering partner has given me that variable.
One graph so far: as per the interests of my data gathering partner
This is a bit misleading because FF.net did not start tracking completions as a field until more recently. Back in the olden days, you could specify it as complete but it wasn't a required field? Anyway, the percentages on those graphs are taken as 6-month bins of the raw data (shown in blue tickmarks but there is too much data to really see the patterns). There are likely two reasons for the jump: (1) The new series started in mid 2005. (2) The aforementioned more recent addition of formally tracking completions.
There are 29660 stories marked as complete in the DW database. Of those stories, 83.7% of them (24833) were last updated on the day of publishing, eg, they were one-shots. That leaves 16.3% of the stories marked as complete that were updated over multiple days. The median completion time of these stories is 30 days. The mean completion time is 99 days. The max is 2212 days, just about 6 years.
Next I'm going to look at seeing if I can discern from recently added/updated stories, what the decay rate is for expectations of reviews given publish date. I don't have the best data for that though.
ETA: oh, hey, here is someone using the same stuff I am (R and RSQLite) to examine Pokemon fics. They are using ggplot2, which, I have a book on but haven't done much with. I'm a straight "plot" user myself.
no subject
Date: 2013-02-22 05:50 pm (UTC)no subject
Date: 2013-02-22 06:13 pm (UTC)ff.net completion data
Date: 2013-02-25 02:18 pm (UTC)