To continue my quest for yet more data, I started to look into the transient world of sound. I was once amazed by the automated captions on TV. They were so accurate and appeared with hardly any delay, which made me doubt it’s automated, and yet we can’t have someone typing all those words out for every single program, right? Apparently, we do have this technology ready since a long time ago. If we could jot down everything people are saying, that would mean a lot more data to play with!
TV programs seem a bit too unattainable, so for…
While I was taking this course on Udemy, my dear friend Sruthi gave me a surprisingly relevant challenge to work on: scrape some immigration-related articles from two distinctive news publications and compare the keywords they are using. It sounds immensely fun to do, so here’s the progress report dedicated to her!
While it sounds like a straightforward ask, there are a couple tricky parts and here’s the breakdown:
First, let’s review the data we have collected from AO3 so far:
(See Part 1 for the data collection process)
In Part 2, we have focused on the more structured data and left out Tags, Summary, Content (which is actual fiction text), and Comments. Those would be the focus of this post.
Beyond simple word frequency counts, I first explored…
With all the data collected at one place, now we can conduct all kinds of Exploratory Data Analysis (EDA). Since I do not have a fixed plan in mind for this project, I just went ahead and tried to answer various questions I’m interested in.
There are two distinctive parts of this analysis: the first is an initial exploration of the fiction’s profile information and its interaction with “performance” or “reaction” statistics — the more numerical aspect; the second part involves much bulkier textual data that requires a different set of skills to unpack. …
When starting this project, I had the dual purpose of getting started with web scraping/text mining and actually fetching some insights from fanfics I read and love.
I’m writing this primarily to document the process and questions for anyone who happens to drop in or my future self…
A Disciple of the Secret Lives of Data