Twist my arm
Feb. 15th, 2013 10:54 pmAnd encourage my fandom nerdiness. A friend and coworker (a software engineer) stopped by my office today asking me a statistics question. "It's not work-related" he told me. "I have a database with three variables for each record and I want to know how they are related."
Me: Numeric?
Him: Yeah
Me: Are they continuous variables? Or discrete?
Him: Okay, I will just tell you. They're follows, hits, and review counts for fanfiction stories.
Me: OMG can you actually scrape me that data???
Him: Yeah, I have a script that feeds a SQL Lite data base and can grab everything on fanfic.net within a category. It gets all of the info from the title blurbs. It fills in 0s for when favorites and reviews don't exist.
Me: That is awesome. Can you scrape all of Doctor Who for me?
We know each other well enough to know we both intersect with the fanfic world. He is an avid reader of long anime-related stories and apparently his profile on ff.net has "A list of common grammatical errors." :D I have often wanted to collect the data but my script-fu is lacking. I always stop at the "build a script to scrape the websites" step. He has the data and is stumbling at the "make pretty pictures and describe relationships" step. He's curious about relationships between follows, favorites and time with respect to completion and popularity, etc.
It could be the start of a beautiful friendship!
He also says he can probably scrape all of Teaspoon for me too, as I showed him the format and he said, "Oh yeah, I'm familiar with that format". Also cool. What would be a little more time consuming would be to scrape everyone's favorites and match them up so we can look at it like a big graph.
A colleague alerted me to this web-based javascript viz library called DS3: http://anna.ps/talks/fel/ Might be interesting to see if we can figure out who, empirically, is the "BNF" and to see who the cross-era people are or how insular different niches are, etc.
Me: Numeric?
Him: Yeah
Me: Are they continuous variables? Or discrete?
Him: Okay, I will just tell you. They're follows, hits, and review counts for fanfiction stories.
Me: OMG can you actually scrape me that data???
Him: Yeah, I have a script that feeds a SQL Lite data base and can grab everything on fanfic.net within a category. It gets all of the info from the title blurbs. It fills in 0s for when favorites and reviews don't exist.
Me: That is awesome. Can you scrape all of Doctor Who for me?
We know each other well enough to know we both intersect with the fanfic world. He is an avid reader of long anime-related stories and apparently his profile on ff.net has "A list of common grammatical errors." :D I have often wanted to collect the data but my script-fu is lacking. I always stop at the "build a script to scrape the websites" step. He has the data and is stumbling at the "make pretty pictures and describe relationships" step. He's curious about relationships between follows, favorites and time with respect to completion and popularity, etc.
It could be the start of a beautiful friendship!
He also says he can probably scrape all of Teaspoon for me too, as I showed him the format and he said, "Oh yeah, I'm familiar with that format". Also cool. What would be a little more time consuming would be to scrape everyone's favorites and match them up so we can look at it like a big graph.
A colleague alerted me to this web-based javascript viz library called DS3: http://anna.ps/talks/fel/ Might be interesting to see if we can figure out who, empirically, is the "BNF" and to see who the cross-era people are or how insular different niches are, etc.
no subject
Date: 2013-02-16 04:17 am (UTC)no subject
Date: 2013-02-16 04:28 am (UTC)You should do FF.Net, AO3 and Teaspoon. There are differences in hit counts, Kudos, Favs and Reviews (each site has its own tastes and followers), but all three probably hold the same numbers for certain people (or at least ratio of/type depending on design, etc).
I'm sorry...this made sense when I thought of it. UGH. *is a moron*
no subject
Date: 2013-02-16 01:19 pm (UTC)no subject
Date: 2013-02-16 03:13 pm (UTC)*fantasizes*
no subject
Date: 2013-02-16 03:57 pm (UTC)no subject
Date: 2013-02-16 03:58 pm (UTC)no subject
Date: 2013-02-16 03:59 pm (UTC)no subject
Date: 2013-02-16 04:01 pm (UTC)no subject
Date: 2013-02-16 04:08 pm (UTC)no subject
Date: 2013-02-16 04:55 pm (UTC)no subject
Date: 2013-02-16 05:13 pm (UTC)no subject
Date: 2013-02-16 05:26 pm (UTC)I can check on things once I'm back from Gally.
no subject
Date: 2013-02-16 05:43 pm (UTC)no subject
Date: 2013-02-24 04:58 pm (UTC)ETA: if your friend still wants to go ahead with this project, I'd like to alert the rest of the mod team. Do you mind if I mention what's going on? I don't want to break your f'lock without your permission.
no subject
Date: 2013-02-24 05:25 pm (UTC)I think actually that Teaspoon might have fewer kinks in the data than ff.net, on account of the modding for "things that are not really stories", several of which I've already come across in the ff.net data.
You can link them to this post for an example. I'm also working on a post about character listings that should go up sometime today.
no subject
Date: 2013-02-24 07:12 pm (UTC)no subject
Date: 2013-02-25 04:10 pm (UTC)no subject
Date: 2013-02-26 01:10 am (UTC)no subject
Date: 2013-02-26 07:23 pm (UTC)