eve11: (rationalreal)
[personal profile] eve11
Still churning through the Teaspoon data. I made a graph of number of reviews vs. word count for a set of categories of word count.

ReviewsByWordCount
These are boxplots, with the mean of each distribution added as a little "+", showing the skewness of the distributions. I also attempted to normalize for repeat reviews by counting the number of unique reviewers only per story. (Honestly I saw one outlier earlier that was a 1000-word story with over 80 reviews; when I looked at it, it was actually a conversation between one lone reviewer and the author, done in 8 pages of multiple comments). Note the y-axis is on the log scale, so it's de-emphasizing outliers. Medians and means steadily increase. The median number of unique reviewers for shorter stories (under 1000 words) is 2, and it increases steadily with word count to the effect that the median number of unique reviewers for longer stories (over 25K words) is all the way up at 8, and the mean for long stories is 13.

A few outliers of note, marked from the lowest category (0-100 words):
 tables$story[which((tables$story$uniqreview >= 20) & (tables$story$lengthcat == "0-100")),]
         id                                         title author rating
266     562 "Yeah, I know what an allegory is," said Ace.     81      1
4930   9402                           Cliff, Shag, Marry?   2066      1
9799  15715               Torchwood Babiez: Missing Shoes   2255      1
10496 16552                       A Very Baby Halloweenie   2255      1
10662 16749               WHY THE MASTER PWNS EVERYTHING.   3435      2
12525 19168        Torchwood Babiez: Missing Shoes Part 2   2255      1
17859 27078        Torchwood Babiez: Missing Shoes Part 3   2255      1
18718 28298                                 Pandora's Box    647      1
22483 33603        Torchwood Babiez: Missing Shoes Part 4   2255      1
      chapters words reviews  updated published uniquereview
266          1   100      21 10/09/03  10/09/03           21
4930         1    63      31 12/29/06  12/29/06           29
9799         9    25      69 09/25/07  09/25/07           59 
10496        1     1      29 10/30/07  10/30/07           26
10662        1     1      20 11/09/07  11/09/07           20
12525        8     8      46 02/08/08  02/07/08           41
17859       12    12      29 11/11/08  11/11/08           23
18718        1   100      23 01/07/09  01/06/09           22
22483       10    10      35 10/20/09  10/20/09           24


Three of these are actual stories: ids 562, 9402, and 28298. They are all also brilliant in different ways, definitely check them out. The others are actually pictures/comics, which is why the word counts are so small. I didn't look up the Torchwood Babiez ones but I've heard of them and know they were pretty popular. The Master one (16749) is also pretty funny. Written by Adalia Zandra who also used images in her hilarious and slashy (non-explicit but not quite vanilla) Passing Notes (also I note she is responsible for one of the 23 stories titled "Aftermath", heh).

ETA: Ha, "Passing Notes" is actually the highest outlier in the 1K-2K category too! Do you all think that humor stories garner more reviews than other kinds of stories? I see that as a kind of trend in my stories. I should check that out. (makes graph) Nothing really jumps out except "Het" which is likely highly correlated with Ten/Rose. I should mention that the numbers in these plots count stories multiple times if the story is marked by multiple genres.
reviewsbygenre

And for good measure and [livejournal.com profile] aralias: Unique reviweres by category (again with the caveat that stories with multiple categories are counted multiply)
reviewsbycategory
Median review counts for Classic series and Tenth Doctor seem to be similar. These are not yet normalized by "time spent in archive (see below)". Nine is higher than I'd think, and Eleven is lower, but again, temporal effects may adjust that.

I want to divide those review counts by some measure of time in order to account for stories that have had a longer time to accrue reviews. I could use the date of the newest review minus the publish date; I could use the average of that same value, or likely a higher percentile like the 75th or 85th.


**** WARNING: MATH INTERLUDE AHEAD ****
How might one formally model a story's popularity in terms of reviews? I would start with a Poisson process, which is generally what you use when you want to count arrivals of things over time. The way it is modeled; suppose that arrivals follow a Poisson process with mean lambda: for the span of time between t_1 and t_2, the distribution of the number of arrivals within that span (t_1, t_2) is Poisson with mean equal to lambda*(t_2 - t_1). So the more time passes, the bigger the average number of arrivals you expect, which makes sense. If you think of the mean lambda as a flat constant function defined over time, then the mean for any segment is the area of the rectangle under the segment in time.

As Khan Academy explains in the youtube link, a basic Poisson process assumes that the likelihood of arrivals is constant over time. I think we can all agree that isn't the case for reviews; the average reviews per day will spike at publication and update times (also likely at times recced, and hey I did just grab all of [livejournal.com profile] calufrax's history to integrate into the db and will work on that). So as a function of time, the mean lambda is not constant. You can still get the probability of the number of arrivals in an interval of time, but now that nice "area of a rectangle" from the homogenous case turns into "area under the curve" (eg, the integral of lambda(t) between t_1 and t_2).

Google image search fails me for clarifying pictures of this. I could draw one in R, and perhaps will, but I'm guessing most people skipped this lj-cut anyway.

Anyway, it would be interesting to model this mean function parametrically, as a function of different properties of the story (for example, the era, the content, the rating, when it was published). Most stories have so few reviews, we'd have to categorize them somehow to group them together so they could borrow information from each other. I imagine that authors who appear in others' "favorite author" lists (when those others are prone to reviewing) will have a stronger spike, due to their followers getting email notifications when they update/upload stories. Exponential decay of the average expected reviews after publication/update seems reasonable. Also that is doubly nice because the integral of e^x has closed form (it is e^x). so you model eg, the rate of that exponential decay as a kind of popularity metric.

The reason why I suggest this is because the shape (or not) of this curve is kind of how I would really want to normalize stories for the accumulation of reviews over time. In the short term exploratory analysis, I should divide the number of unique reviewers by some measure that incorporates "t_2 - t_1" where "t_2" is the time of the last review, and "t_1" is the time of publication. Those mathematically inclined will see that, if the process could be considered homogenous, it is relatively simple to say that you should divide all story review counts by the total active time in the database, to get comparable "lambda" values for each story that will measure time-adjusted popularity. But the bursty-ness and decay of the process makes the question of what cutoff or measure of time to divide by, a bit more difficult. It becomes a question of quantiles.

Right, that's enough math. More for me than you all, I think.
**** This concludes the math interlude! ****


I am still working through how best to categorize multi-era fics in terms of Classic vs. New eras. This question/difficulty of classification has now come up in both Teaspoon and in the FF.net data. In FF.net I went through the character lists and gave each an associated era (with indicators for "more info needed" for characters like Rose and Sarah Jane who span several eras). I have to go through and normalize the characters for the Teaspoon data. which also includes having to correct for HTML errors/truncations when the character lists in the story blurbs got too long.

ETA: Ah, forgot about cooking! My co-worker (a fellow mathematician and baker) has suggested that we celebrate Pi day (3/14) with pies. I am thinking I would like to try a gluten-free version of something like this Mexican chocolate silk pie. It seems mostly gluten free already and the crust is an easy substitute of gf ginger snaps instead of graham crackers. But why go through all that trouble to make a from-scratch chocolate pudding and then top it with Cool Whip? Might as well make the whipped cream/buttercream topping from scratch at that point too!

Date: 2013-03-02 10:56 pm (UTC)
From: [identity profile] a-phoenixdragon.livejournal.com
I tried to absorb a lot of the info - but pretty graph distracted me, lol!! Still...what I did grab boggled my mind and was yet interesting. *GLEES*

Date: 2013-03-03 02:47 am (UTC)
From: [identity profile] a-phoenixdragon.livejournal.com
I SAW!! OHHHH, LOVELOVELOVE!!

Course sex gets more reviews than anything, lol! (I exaggerate, yus). Slash and Crossovers don't get much love - poor things. And my main genres, too (besides horror), so...maybe I need to step away from my genres? *Laughs* Seriously...love the graphs - help me make sense of the math. I need...visual aids. *HEADDESK*

Date: 2013-03-03 02:54 am (UTC)
From: [identity profile] a-phoenixdragon.livejournal.com
S'a damned shame! Everyone should get love. S'why I've pretty much made that my fanfic mission, lol!! Love to all, dammit. Love. To. ALL.

*SQUISHES*

Date: 2013-03-03 03:46 am (UTC)
From: [identity profile] lostrack621.livejournal.com
DROOLS over the plots. ZOMG - I am learning R post-haste after finishing my dissertation and defending. I swear that your plots are so pretty! I am going to bother you BIG TIME. ;)

I am all for doing gluten-free baking -- I'm learning more and more about baking substitutes for gluten and eggs and some of them are DARNED delicious! My Mom's friend sent us the name of a website, Chocolate Covered Katie, that is for healthy desserts, many of which are gluten free. There are also yummy non-dessert things, too, and I cannot wait to try the healthy tater tots (made with quinoa and white beans and other nummies).

Let us know which pie you pick!

Ooh, and I want to start baking with rhubarb! Have you tried it? In college, one of my friend's moms used to send rhubarb preserves and they were DELICIOUS.

Date: 2013-03-03 04:16 am (UTC)
From: [identity profile] lostrack621.livejournal.com
Ooh, goodie - I will download that stuff tomorrow to put in the "to do after diss" folder. Thank you! :)

So, my Mom's been using rice flour and had generally good success with the bread she made for my sister, as well as some cakes, and I know she's been trying some other flour options, too. Mom made a delicious gf crust for a gf pumpkin pie - if you want, I can send you the recipe. I don't believe it was complicated...

Rhubarb is typically (normally) a mid-spring/summer thing, but because of greenhouses, you can get it all year long. I haven't seen rhubarb since November in the store, though, but I've read that frozen rhubarb chunks (no added sugar, etc) work just as well as fresh stuff. I think it absolutely would go well with blackberries - a quick google search shows "blackberry rhubarb crumble" and this and that for many entries....
Edited Date: 2013-03-03 04:16 am (UTC)

Date: 2013-03-03 01:39 pm (UTC)
thisbluespirit: (dw - Six & Peri)
From: [personal profile] thisbluespirit
Ha, I am not sure I am reading your graph correctly (I'm so ignorant, I only know very simple graphs, you know where there's these bars and these numbers, or a pie chart...) but it looks as though what I've always felt - that Six fans are more likely to leave a comment and encourage you - than other Classic Doctor fans. (Seven seems level with him, but then I've written very little Seven, strangely). But I don't know - is that what it means, the way they are? If so, I'm surprised Five has less.

Anyway, this is turning out to be v fascinating, so do keep on sharing please. :-)

Date: 2013-03-03 04:45 pm (UTC)
thisbluespirit: (dw - Six & Peri)
From: [personal profile] thisbluespirit
Thanks. :-) I'm glad I'm following at lest to some degree... and ahahaha, so I am statistically right in my hitherto unsupported assertion that you always get comments on Six fic - Six fans are just more appreciative.

Anyway, forgive my dimness, but I wanted to check I was understanding things properly or not.

Six fans are the best, basically - now it's official! :lol:

Profile

eve11: (Default)
eve11

December 2022

S M T W T F S
    123
45678910
11121314151617
18192021222324
25262728293031

Most Popular Tags

Style Credit

Expand Cut Tags

No cut tags
Page generated Mar. 18th, 2026 10:13 am
Powered by Dreamwidth Studios