= Likely Center
= Likely Leans Left
= Likely Leans Right
= Leans Unknown
Current Nobias Status (July 2019):
Criteria: Article level Slant, Source level Slant when article level slant is missing and Source and Author Credibility
Platforms: Facebook News Feed, Google News Feed and Google Search
Publications: Political news from 400 news sources covering 80% of US news (Lexis Nexis)
Region: U.S. only
Criteria (2019): Author level Slant, Gender bias
Publications (2020): Financial news, Health news
Region: International (2021)
And now for the nerdy science behind Nobias...
How We Determine Political Slant
It is difficult to exactly identify the bias of a news source as there is no comparable observed metric of left versus right leaning. In the absence of such a metric, we use the published methodology of Matthew Gentzkow and Jesse Shapiro’s in Econometrica (2010), a top economics journal (henceforth Gentzkow et al) to identify the leaning of a news source.
To estimate the “newspaper slant”, Gentzkow et al followed a two-step procedure.
Identify the phrases most used by a particular Democrat or Republican congressman in their congressional speeches based on the 2005 congressional records. This step identifies the left (Democrat) leaning vs. right (Republican) leaning phrases and gives a benchmark set of phrases of two or three words (bi-grams or tri-grams). For example, they identified that phrases like “death tax,” “tax relief,” “personal account,” and “war on terror” as strongly Republican, and “estate tax,” “tax break,” “private account,” and “war in Iraq,” as strongly Democratic. They identified 1000 such phrases as a benchmark for Democrat and Republican leaning.
Match these phrases with phrases used by a news source in a regression framework to generate the slant index for a newspaper.
Nobias’ closely follows the procedure of Gentzkow et al. with a few exceptions.
Gentzkow et al used the 2005 congressional speeches as a benchmark, which may miss out on a few relevant new phrases which may be used by Democrats and Republicans in recent times. For example, “Obamacare” was not available in 2005 and Nobias believes it may be used more often in recent times and is a relevant phrase.
We use only phrases used at least 3 times. The dataset includes the relative frequency of these phrases and the identity of the speaker. We follow a similar procedure of Gentzkow et al. and assign a relative weight by a regression procedure. Specifically, we run a regression of the relative frequency of these phrases used by the Congressperson on the identity of their respective political party (Democrat/Republican). We use the phrases with statistically significant (@10% level of significance) slope and intercept coefficients and use the procedure of Genzkow et al. (section 3.2 of their 2010 paper) to map the phrases to ideology. This procedure gives us the most recent phrases and assigns an ideology to a particular phrase. With these procedures, Nobias identified 318 additional phrases from recent (2015 to 2017) Congressional speeches.
Nobias then uses these 1,318 phrases to identify the left (Democrat) and right (Republican) leaning of news sources based on the relative usage of these phrases from about sixty seven thousand articles published in the four-month period July 1st, 2018 to October 31st, 2018.
How Nobias develops a relative rank of news sources:
We use the 1,318 benchmark phrases to measure the slant of each pre-processed article by taking the ratio of Democrat phrases matched in the article to that of the total phrases (Democrat+Republican). Therefore, the slant measured for each article is the relative slant for being a Democrat. For example, if there are 10 phrases that matched in an article and 3 of them are Democrat phrases, then the slant (of being Democrat) of the article is 0.3. Clearly, one minus the (Democrat) slant measure is the relative slant of being a Republican.
Due to the possibility that an article may contain zero phrases and hence zero Democrat leaning phrases leading to a 0/0 problem, we use the following simple non-informative prior. We count the number of Democrat phrases that matched, K, and number of Democrat & Republican phrases that matched, N, and calculate article slant as (K+alpha)/(N+alpha+beta), we set alpha=0.1 and beta =0.1 as non-informative prior.
How Nobias establishes the relative leaning of each news source:
If 90% of articles from the news source has less than 6 left or right leaning phrases then we assume that this news source avoids favoring either conservative or liberal ideas and causes and we identify them as Center.
If 85% to 90% of articles from the news source has less than 6 left or right leaning phrases then we proceed to calculate the slant (steps 3 & 4 below) but identify these sources as Center-Left or Center-Right.
We examine the distribution of slant of each news source (overall articles of the particular news source) and compare it to the distribution of slant of our entire sample by taking the ratio of the median of the news source slant to the median of the entire sample. This ratio being greater than one implies that the news source is more Democrat leaning than the entire sample during the same period. Similarly, a ratio less than one implies that the source is more Republican leaning. For example, if the median of the slant of all news source is 0.6 and a particular news source has a median of 0.75, then the ratio being 1.25 classifies the news source as likely left leaning.
We then look at the distribution of this ratio for all news source and classify three broad groups (Left (Democrat), Center and Right (Republican)) based on the quintiles of the distribution.
Note that the relative leaning of a news source is only calculated for sources with at least 30 articles published in the four-month period July 1st, 2018 to October 31st, 2018.
Limitations and Works in Progress:
Nobias’ procedure described above is preliminary and we are currently working on various ways to update our methodology. These limitations may misclassify certain news sources as we work on improvements.
A few areas which need further improvements are:
We currently have used about eighty thousand articles from the most recent six-month period in 2018 (July-Dec 2018) to classify the news source as Democrat vs. Republican leaning. This period may be unusual as it is includes the midterm election period. We are in the process of extending this period as well as devising a procedure to make our ranking period independent. We are using various machine learning tools here too to update this step.
To identify initial source leaning, our method is simple and based on the comparison between the median slant of the news source during the six-month period to the median slant of all the articles in our sample in the same period. We are in the process of updating this step by using various advanced machine learning techniques and cross-validation to improve the prediction accuracy.
If you have other recommendations or feedback, please contact our data science team.
Nobias Article Slant Prediction Methodology
In this methodology we add to the source slant, used as a prior, an article corpus from Lexis Nexis to develop article level slant.
We first preprocess our data, which included removing stop words, contractions, uppercase letters, conversion to vectors of word counts and log-transformation. This process leads to about 67,433 different words as the initial word list. Our method of predicting the association of a set of words to the political slant of an article uses the following methodology:
We first use the Amazon Mechanical Turk to label a subset of training articles as left or right leaning.
We then use some widely used machine learning techniques to identify which words are most associated with left or right leaning articles and assign those words as left and right leaning words.
Specifically, we first use L1-regularized logistic regression to identify the most predictive left or right leaning words from the master word list. In this process a crucial role is played by the regularization hyper-parameter. We use a ten-fold cross-validation technique to find this regularization parameter. This step identified 1,536 “informative” words.
We then use a L2-regularized regression to find the relative weights of these words in predicting the best words associated with a left or right slant article. We also used a ten-fold cross-validation technique to find this regularization parameter in this step.
For example, steps 2a and 2b identifies word stems like “leftist”, “islam”, “parenthood” associated with more right leaning articles and stems like “ineq”, “auster”, “richest” as words associated with more left leaning articles.
We compute the log odds ratio based on the reduced and most predictive word list for binary slant found in step 2 above. The quality of these article slant scores was evaluated using the Area Under the Receiver Operating Curve (ROC-AUC) metric, with bootstrapping to estimate standard errors.
Trained and evaluated on curated labeled articles curated, our model achieved a mean AUC of 0.906 (SE 0.009). The trained model was also tested on separate bias labels crowd-sourced using Amazon Mechanical Turk yielded an AUC of 0.830 (SE 0.008).
Articles with predicted slant scores above 1.0 were labeled as “center - right leaning,” and those with slant scores above 2.0 were labeled as “right leaning.” Likewise, articles with slant scores below -2.0 were labeled as “center-left,” and those with slant scores below -3.0 were labeled as “left”.
Cutoffs were selected to allow type 1 error (i.e. false positive) rates of no more than 10% for “center left or right” labels and no more than 5% for “left or right” labels. On our datasets these thresholds resulted in empirical sensitivities of 52% for labels of at least “center-left/right” in either direction. On average, 36% of articles were labeled as “center-left/right” slanted in the correct direction, 16% as “left/right” leaning in the correct direction, 5% as “center-left/right” leaning in the incorrect direction, 4% as “left/right” leaning in the incorrect direction, and 39% as “center”.
What Drives Media Slant? Evidence from US Daily Newspapers, by Matthew Gentzkow and Jesse M. Shapiro Econometrica, Vol. 78, No. 1 (January 2010), 35–71
Measuring Group Differences in High Dimensional Choices: Method and Application to Congressional Speech, by Matthew Gentzkow, Jesse M. Shapiro and Matt Taddy, NBER Working Paper 22423
How We Determine Credibility
Nobias uses editorial ratings generated by LexisNexis. These editorial ranks are applied to news sources and employers of journalists. It is a source-level categorization indicating LexisNexis' editorial ranking of the source. Note, we are adding to our source list as we expand to non-US Sources that report on US political news.
There are five LexisNexis source ranks, we use the top three ranks as follows:
LexisNexis Source Rank 1 (Nobias Credibility: Highly Credible): Top international, national, and business news sources, e.g. The New York Times, CNN, The Economist.
LexisNexis Source Rank 2 (Nobias Credibility: Very Credible): Top regional sources, e.g. Houston Chronicle, MIT Technology Review, Pharmaceutical Journal, Advertising Age.
LexisNexis Source Rank 3 (Nobias Credibility: Somewhat Credible): A broad range of news sources of good editorial quality. Includes the following types of news sources:
Industry specific news sources such as PC Magazine, World of Concrete
Country specific news sources, e.g. New York Daily News, etc.
Government department press releases, e.g. US Treasury, etc.
Dedicated sports news sources, e.g. ESPN.com, Sports Network
LexisNexis Source Rank 4 (Nobias Credibility: Unknown): Covers non-news sources and data. Includes the following types of material:
Regional US news sources, e.g. Kansas City Star, Long Beach Herald
Regional UK news sources, Falkirk Today, This is Bristol
Corporate website press pages, e.g. Oracle, EasyJet
Wire news services, e.g. Business Wire, PR Newswire via Yahoo
Political party (affiliated) websites, e.g. The White House, The Labour Party
About.com topic sites, e.g. About.com TV, About.com Golf
LexisNexis Source Rank 5 (Nobias Credibility: Unknown): Covers non-news sources and data. Includes the following types of material:
Message boards, e.g. Raging Bull, StockSelector.com
Miscellaneous consumer sources, e.g. Jokes.com, Comic Book Bin, the Onion
Author Credibility Score
Nobias additionally provides author ratings include which whether journalistic awards won (see list below) and inheriting their employer rank (generated by LexisNexis). Employers are identified using Muck Rack and Linkedin.
List of recognized journalistic awards:
Dart Awards * David Nyhan Prize * Deadline Club Award for Business Feature * Deadline Club Award for Business Investigative Reporting * Deadline Club Award for Daniel Pearl Prize for Investigative Reporting * Deadline Club Award for Magazine Feature Reporting * Deadline Club Award for Magazine Investigative Reporting * Deadline Club Award for Magazine Personal Service * Deadline Club Award for Magazine Profile * Deadline Club Award for Newspaper or Digital Beat Reporting * Deadline Club Award for Newspaper or Digital Enterprise Reporting * Deadline Club Award for Newspaper or Digital Feature Reporting * Deadline Club Award for Newspaper or Digital Local Reporting * Deadline Club Award for Newspaper or Digital Spot News Reporting * Deadline Club Award for Opinion Writing * Deadline Club Award for Reporting by a Newspaper With Circulation Under 100,000 * Deadline Club Award for Reporting by Independent Digital Media * Deadline Club Award for Science, Technology, Medical or Environmental Reporting * Deadline Club Hall of Fame Award * Edward R. Murrow Award * Frontline Club Awards * George Polk Award for Education Reporting * George Polk Award for Financial Reporting * George Polk Award for Foreign Reporting * George Polk Award for Justice Reporting * George Polk Award for Local Reporting * George Polk Award for Magazine Reporting * George Polk Award for Medical Reporting * George Polk Award for National Reporting * George Polk Award for Political Reporting * George Polk Award for State Reporting * Goldsmith Awards * Michael Kelly Award * NewsGuild Heywood Broun Award * Pulitzer Prize for Commentary * Pulitzer Prize for Criticism * Pulitzer Prize for Editorial Writing * Pulitzer Prize for Explanatory Reporting * Pulitzer Prize for Feature Writing * Pulitzer Prize for International Reporting * Pulitzer Prize for Investigative Reporting * Pulitzer Prize for Local Reporting * Pulitzer Prize for National Reporting * Scripps Howard Business/Economics Reporting William Brewster Styles Award * Scripps Howard Commentary Award * Scripps Howard Editorial Writing Walker Stone Award * Scripps Howard Human Interest Writing Ernie Pyle Award * Scripps Howard Investigative Reporting Ursula and Gilbert Farfel Prize * Scripps Howard Washington Reporting Raymond Clapper Award * Seldon Ring Award * Shahid Ethics Award * Sigma Delta Chi Awards for Excellence in Journalism (SPJ) * SPJ Award for Deadline Reporting * SPJ Award for Editorial Writing * SPJ Award for Feature Reporting * SPJ Award for Foreign Correspondence * SPJ Award for Investigative Reporting * SPJ Award for Magazine Reporting * SPJ Award for Non-Deadline Reporting * SPJ Award for Public Service in Journalism * SPJ Award for Public Service in Magazine Journalism * SPJ Award for Washington Correspondence * The Ancil Payne Award for Ethics in Journalism * The Phillip Meyer Award * Thomas M. Keenan NewsGuild of New York Service Award * Toner Prize *