2 Process: Topic modeling
To answer the question, we need data, and the most complete source of data is on Facebook, by far the most commonly used social media platform on the Islands, particularly the Facebook Graph API1. We extract data from this API for the top news pages in the country, then apply topic modeling to the content of the headlines, captions, and posts.
Topic modeling, in natural language processing and machine learning, is a way for us to take unstructured text such as headlines, captions, and posts, and discover latent “topics” or themes that are underlying the corpus. For this specific article, we use Latent Dirichlet Allocation (LDA)(Blei, Ng, and Jordan 2003) In other words, by taking into consideration the words used, we are able to define certain topics then classify each post in the topic to which the article belongs. For more information, please read the technical documentation.
3 News landscape of the Philippines
3.1 News ‘atlas’ of the Philippines
Once we perform the topic modeling, we can generate some pretty interesting visualizations. See here for an overview of the news landscape of the Philippines, each group is composed of the most relevant words for that topic, and a manually-determined label. The distances that the topics have from each other reflect their semantic distance, or basically how different the words are for that topic.
As you can see there is a central mass of mainly English news on both local and international topics. In the periphery are mostly lifestyle and entertainment topics, on the far top right there are sports news, and then Filipino language news settles at the bottom.
In the center of the mass you can see what (or who) is clearly the center of most news in 2016 - newly elected President Duterte. Spanning out from that topic are those of his policies and programs, the War on Drugs, his campaign, law enforcement, the Marcos Burial, and the drug-related charges filed against his critic Senator Leila de Lima.
To the northwest of President Duterte, you’ll see more general nationwide news - electoral process, transportation, finance and economy, and the weather.
You’ll see further northwest a slightly separated island that covers international news, foreign policy, and the exiting Aquino Administration, whose main focus in the final years of his term was to secure an arbitral judgement against China for its claims in the South China Sea.
One thing to note is that Women’s Volleyball and Beauty Pageants seems to have lumped together. Why? I have no clue.
3.2 Sample articles
But wait, how can we determine that these topic classifications actually make sense? Let’s try to pull out sample articles classified by the model and see if they can be understood.
Inspecting the headlines seems to show that the topics are well classified. For more details about how the model was trained, you are welcome to view the technical documentation.
4 Topic trends in 2016
Now that we are confident that topic classification is attained reasonably well, let’s take a look at the trends over time.
- Feb to Mar - a lull in the news save for a bout of local election related violence showing up in City and Barangay News.
- Apr - the RCBC Money Laundering Scandal topped the charts in Financial Regulation and Crime,
- May - entirely dedicated to the Election Campaign and the final election week contained many articles about the Electoral Process
- Jun - after the election of President Duterte, the news was filled with details of his stunning rise to power and what the new administration will do, NBA Basketball also dominated in June as the finals were taking place,
- Jul to Aug - a mixed bag, covering negotiations taking place with groups causing Internal Conflict, drug lists being released in Law Enforcement, and a fresh new bout of Domestic Terrorism from the Abu Sayyaf Group,
- Sep - Technology News peaked with the release of the new iPhone 7 and Apple Watch, the War on Drugs started to come under fire from critics, with staunch critic Leila De Lima and Drug Allegations surfacing during the last week,
- Oct - Foreign Policy was a hot topic, with President Duterte pivoting to align more closely with China, and raising concerns about the viability of the claim in the South China Sea,
- Nov - the Marcos Burial was top news for 4 straight weeks
- Dec - we round out the rather tumultuous year with many articles about Relationships, Work and Family, Science News; also, the annual Christmas rush has brought articles related to Transportation.
If you’re interested in viewing the trend for a particular topic, you can use this chart to view it. Simply hover over the line to highlight it and display the topic name.
5 News page topic distribution
5.1 News page distribution
A common accusation leveled against traditional news media is the amount of bias in reporting various topics. We try to explore the topic distribution of news articles.
A few observations from the chart:
- Most major news articles have an even balance of news
- GMA News focused a lot on Filipino Showbiz, ABS-CBN focused on Movies & Television
- INQUIRER.net focused Law Enforcement and the War on Drugs
- Oddly enough, ANC 24/7’s articles were mostly about Women’s Volleyball and Beauty Pageants
5.2 News page topic concentration
We know how each of the topics are distributed, but how do we measure topics against each other in terms of topic concentration and potential “bias”. One way to measure concentration across different topics is the Herfindahl-Hirschman Index (HHI). This index ranges from 0 to 100 with 100 meaning perfect concentration.
The most concentrated were news pages that focused mainly on Filipino subjects (which got lumped together), but apart from that the interesting findings are:
- SunStar News, a Cebu based newspaper, focused mainly on local news, particularly in Visayas and Mindanao.
- Yahoo Philippines almost always just talked about entertainment topics, and MSN Philippines focused on lifestyle topics.
- Main news outlets, contrary to the comments posted on these pages, were the least concentrated and had the widest topic distribution.
6 Reactions to topics
If topics are relatively well distributed in terms of topics, then how do we go about explaining the perception of biased media. Well, Facebook’s timeline is heavily geared to provide information relating to topics that you have already interacted with. If we compare the amount of news articles published per topic over time, with the reactions we have to them over time:
7 Final remarks
Using natural language processing and topic modeling, we are able to turn unstructured text information and uncover latent topics in the corpus, so we can learn about the entire online news landscape in the country, not just what pops up in our newsfeeds. Some key takeaways from this exercise would be:
- Despite being a politics-driven year, we are still obsessed with entertainment. Movies, Television, and Showbiz all have taken top spots in the news.
- There are definite spikes in activity during certain weeks. Some of the most newsworthy events were the Elections in May, the South China Sea dispute in July, President Duterte in May and again in June, and the Marcos Burial in November.
- Despite what Facebook commenters may say, news pages are well distributed in terms of topics, and do not solely focus on particular topics. What actually appears on people’s newsfeeds, which is dependent on what users interact with, is another story.
8 Technical documentation
In the interest of reproducible research, here are the notebooks containing the code, results, and commentary behind the post. You may download the resulting code and run it yourself, provided:
- You have the proper API keys to access the Facebook Graph API2, and
- You agree that this is provided as-is and with no warranty.
You may also view the GitHub Repository here for the complete code and analysis that went into this post. You may also choose to collaborate with me in producing more parts to this series!
References
Footnotes
If you are looking for the source of this data, unfortunately Facebook has altered their APIs and it is no longer straightforward to extract this information, and the Terms of Service inhibit me from sharing the raw data.↩︎
If you are looking for the source of this data, unfortunately Facebook has altered their APIs and it is no longer straightforward to extract this information, and the Terms of Service inhibit me from sharing the raw data.↩︎
Citation
@online{palanca2017,
author = {TJ Palanca},
title = {On {Facebook} {News} in the {Philippines}},
date = {2017-03-12},
url = {https://tjpalanca.com/facebook-news-topic-modeling.html},
langid = {en}
}