With that a new project was born to understand media coverage and accurately measure bias, misinformation and narrative pushes. So with my team we started developing a new framework that covers the entire pipeline: from acquiring data from homepages to finding and tracking stories we sketched out everything we needed to get data in a format and with features not available anywhere else. During the course of months we had to overcome a number of obstacles which also made me understand why nobody tried this granular approach before: this is hard! We built our own “boxer” (that’s how we call our next-gen scraper that understands homepages and finds all articles’ boxes), our own clustering algorithm implementation, our own AI to identify which articles belong to which stories and of course our grading system.
Triple A news
So how do you ultimately gauge media coverage? To compare coverage, to understand bias, to identify which stories truly dominated which outlets you need to be able to measure coverage far beyond a simple count of articles. So we drafted rules for an algorithm that would take into consideration multiple parameters to describe how an article is represented on a homepage. First of all we developed our own grading system, which has dozens of data points involved to work for any kind of homepage, any article and any language, but the dumbed-down explanation is simple: the bigger the article box and the closer to the top of the page the article box is, the better the grade, starting with AAA and then going down from A to F with some special grades for specific boxes (Z for very minor boxes, for example).
We developed our own grading system, which has a dozens of data point involved to work for any kind of homepage, any article and any language
Then we used the tracking system we built to take into consideration how long an article has been presented on a given homepage with a specific grade. Summing up all boxes, with our custom algorithm that combines time of exposure and grade we came up with a ranking for each article. Finally, the sum of the rankings of all articles in a story gives us the ranking for stories across media outlets. Now, while doing all this with my team, I also started doing some extensive research that led me to a realization…
A gold media intelligence mine
What we were building was new to media analysis and it wasn’t just a matter of keeping media outlets honest. I spoke with huge media intelligence players, billion dollars companies, and when I asked if they provided this kind of granular, weighted and tracked analysis they said it wasn’t something they were able to do. So we broke down all the media intelligence markets, we spent weeks tracking down every single player we could find and not one was offering what we were working on. We then identified exactly which markets we could target, put together some financial analysis and drafted the commercial offer we will be unveiling soon.
I want Discoverage to redefine how we quantify media coverage
Be it to keep media accountable or to provide corporations with essential business information they can’t access right now, my vision for Discoverage is to redefine how we quantify media coverage. Media monitoring tools are too static: they can only detect whether an article exists but not whether it was covered for 3 days as the most important news of the day on an international media outlet or not even mentioned on the main page of a blog. And yet that’s a 3 billion dollar market in the US alone! I want to see the ranking that Discoverage is introducing become in the next few years the new way we look at media coverage. Not just by the presence of some articles but by how long and how prominently those articles are presented to the public, which is for me the only way to actually gauge media coverage.