eve11: (find_x)
[personal profile] eve11
And encourage my fandom nerdiness. A friend and coworker (a software engineer) stopped by my office today asking me a statistics question. "It's not work-related" he told me. "I have a database with three variables for each record and I want to know how they are related."


Me: Numeric?
Him: Yeah
Me: Are they continuous variables? Or discrete?
Him: Okay, I will just tell you. They're follows, hits, and review counts for fanfiction stories.
Me: OMG can you actually scrape me that data???
Him: Yeah, I have a script that feeds a SQL Lite data base and can grab everything on fanfic.net within a category. It gets all of the info from the title blurbs. It fills in 0s for when favorites and reviews don't exist.
Me: That is awesome. Can you scrape all of Doctor Who for me?

We know each other well enough to know we both intersect with the fanfic world. He is an avid reader of long anime-related stories and apparently his profile on ff.net has "A list of common grammatical errors." :D I have often wanted to collect the data but my script-fu is lacking. I always stop at the "build a script to scrape the websites" step. He has the data and is stumbling at the "make pretty pictures and describe relationships" step. He's curious about relationships between follows, favorites and time with respect to completion and popularity, etc.

It could be the start of a beautiful friendship!

He also says he can probably scrape all of Teaspoon for me too, as I showed him the format and he said, "Oh yeah, I'm familiar with that format". Also cool. What would be a little more time consuming would be to scrape everyone's favorites and match them up so we can look at it like a big graph.

A colleague alerted me to this web-based javascript viz library called DS3: http://anna.ps/talks/fel/ Might be interesting to see if we can figure out who, empirically, is the "BNF" and to see who the cross-era people are or how insular different niches are, etc.

Date: 2013-02-16 04:17 am (UTC)
From: [identity profile] ladymercury-10.livejournal.com
This is a very interesting experiment. It would be cool to compare information for the same stories from different sites, if you found ones that were cross-posted, to see how they fared with different audiences.

Date: 2013-02-16 04:28 am (UTC)
From: [identity profile] a-phoenixdragon.livejournal.com
Ohhh! When you do find them...let me know. This is an interesting study!!

You should do FF.Net, AO3 and Teaspoon. There are differences in hit counts, Kudos, Favs and Reviews (each site has its own tastes and followers), but all three probably hold the same numbers for certain people (or at least ratio of/type depending on design, etc).

I'm sorry...this made sense when I thought of it. UGH. *is a moron*
Edited Date: 2013-02-16 04:30 am (UTC)

Date: 2013-02-16 01:19 pm (UTC)
thisbluespirit: (dw - Eleven reading knitting book)
From: [personal profile] thisbluespirit
Heh, sounds fascinating! :-)

Date: 2013-02-16 03:13 pm (UTC)
From: [identity profile] singeaddams.livejournal.com
Interesting! I'd like a bar-graph with the longest bar being MINE.

*fantasizes*

Date: 2013-02-16 04:08 pm (UTC)
From: [identity profile] a-phoenixdragon.livejournal.com
Oh, thank goodness!! And those are good sites to start off with...

Date: 2013-02-16 04:55 pm (UTC)
nonelvis: (DW blue TARDIS)
From: [personal profile] nonelvis
As long as your friend can scrape the data without creating undue load on Teaspoon's server, I'd be very curious to see the results.

Date: 2013-02-16 05:26 pm (UTC)
nonelvis: (DW blue TARDIS)
From: [personal profile] nonelvis
I don't know our bandwidth limits, actually, and would have to look into that. If he does things gradually, though, I doubt there'd be an issue, especially since we're serving low-bandwidth text, not graphics.

I can check on things once I'm back from Gally.

Date: 2013-02-24 04:58 pm (UTC)
nonelvis: (DW blue TARDIS)
From: [personal profile] nonelvis
Okay, I've finally checked our bandwidth limits, and in the past few months, we've never exceeded more than about 75% of our limit. One well-behaved script scraping things gradually shouldn't have any significant impact on our bandwidth.

ETA: if your friend still wants to go ahead with this project, I'd like to alert the rest of the mod team. Do you mind if I mention what's going on? I don't want to break your f'lock without your permission.
Edited Date: 2013-02-24 05:00 pm (UTC)

Date: 2013-02-24 07:12 pm (UTC)
nonelvis: (DW blue TARDIS)
From: [personal profile] nonelvis
Thanks! I'm curious about the data myself, and I'm pretty sure my fellow mods will be, too.

Date: 2013-02-26 01:10 am (UTC)
nonelvis: (DW science geeks)
From: [personal profile] nonelvis
Cool! Thanks for letting me know.

Profile

eve11: (Default)
eve11

December 2022

S M T W T F S
    123
45678910
11121314151617
18192021222324
25262728293031

Most Popular Tags

Style Credit

Expand Cut Tags

No cut tags
Page generated Jan. 24th, 2026 04:09 pm
Powered by Dreamwidth Studios