On Facebook News in the Philippines

Using topic modeling to find trends in online news coverage

Facebook has become one of most visible sources of news in the Philippines, and it has arguably played a huge role in shaping Filipino views. We take a look at how the Philippines news cycle evolved with by using topic modeling.
natural language processing
data visualization
social commentary

TJ Palanca


March 12, 2017

1 Motivation: Social Media in the Philippines

1.1 “I read it on Facebook”

The Philippines is the social media capital of the world. According to this Huffington Post article(Revesencio 2017), “from a global average of 4.4 hours/day, the Filipino spends an average of 6.3 hours/day online via laptop and 3.3 hours/day via mobile.” It’s then no surprise that social media has become one of the main news sources for many Filipinos, and with a tumultuous 2016 Presidential Election, many issues have cropped up, from the newly elected President hitting media for “biased” news, introducing legislation around the spread of “fake news” on social media, to a campaign engineering a social media machine designed to weaponize hatred.

Embed from Getty Images
Philippines President Rodrigo Duterte (R) poses for a selfie during a meeting with the Filipino community in Singapore on December 16, 2016.

Since social media and the internet is such a vast place, how much do we really know about the Philippine news landscape online? This series is intended to explore the phenomenon by analyzing the unstructured text information in news posts for the entire 2016.

1.2 “Biased media”

Mainstream news outlets, many of which have a significant online presence, have come under fire for apparently over or under-reporting certain events, usually against the newly-elected President and other government officials. By de-emphasizing positive news and constantly posting negative news about the administration, critics of mainstream media claim that there is an attempt to discredit the administration and turn the public opinion against them. This seems like a problem we can address and answer with data, and that’s exactly what we’ll try to do in the first part of this series.

2 Process: Topic modeling

To answer the question, we need data, and the most complete source of data is on Facebook, by far the most commonly used social media platform on the Islands, particularly the Facebook Graph API1. We extract data from this API for the top news pages in the country, then apply topic modeling to the content of the headlines, captions, and posts.

Figure 1: TOPIC MODELING - A brief overview of how topic modeling is performed.

Topic modeling, in natural language processing and machine learning, is a way for us to take unstructured text such as headlines, captions, and posts, and discover latent “topics” or themes that are underlying the corpus. For this specific article, we use Latent Dirichlet Allocation (LDA)(Blei, Ng, and Jordan 2003) In other words, by taking into consideration the words used, we are able to define certain topics then classify each post in the topic to which the article belongs. For more information, please read the technical documentation.

3 News landscape of the Philippines

3.1 News ‘atlas’ of the Philippines

Once we perform the topic modeling, we can generate some pretty interesting visualizations. See here for an overview of the news landscape of the Philippines, each group is composed of the most relevant words for that topic, and a manually-determined label. The distances that the topics have from each other reflect their semantic distance, or basically how different the words are for that topic.

Figure 2: News Landscape of the Philippines (click to zoom)

As you can see there is a central mass of mainly English news on both local and international topics. In the periphery are mostly lifestyle and entertainment topics, on the far top right there are sports news, and then Filipino language news settles at the bottom.

In the center of the mass you can see what (or who) is clearly the center of most news in 2016 - newly elected President Duterte. Spanning out from that topic are those of his policies and programs, the War on Drugs, his campaign, law enforcement, the Marcos Burial, and the drug-related charges filed against his critic Senator Leila de Lima.

To the northwest of President Duterte, you’ll see more general nationwide news - electoral process, transportation, finance and economy, and the weather.

You’ll see further northwest a slightly separated island that covers international news, foreign policy, and the exiting Aquino Administration, whose main focus in the final years of his term was to secure an arbitral judgement against China for its claims in the South China Sea.

One thing to note is that Women’s Volleyball and Beauty Pageants seems to have lumped together. Why? I have no clue.

3.2 Sample articles

But wait, how can we determine that these topic classifications actually make sense? Let’s try to pull out sample articles classified by the model and see if they can be understood.

Figure 3: Sample articles from the LDA classification (click to zoom)

Inspecting the headlines seems to show that the topics are well classified. For more details about how the model was trained, you are welcome to view the technical documentation.

5 News page topic distribution

5.1 News page distribution

A common accusation leveled against traditional news media is the amount of bias in reporting various topics. We try to explore the topic distribution of news articles.

Figure 6: BIAS AND BALANCE - Check the news distribution and hover over each bar for details (full screen)

A few observations from the chart:

  • Most major news articles have an even balance of news
  • GMA News focused a lot on Filipino Showbiz, ABS-CBN focused on Movies & Television
  • INQUIRER.net focused Law Enforcement and the War on Drugs
  • Oddly enough, ANC 24/7’s articles were mostly about Women’s Volleyball and Beauty Pageants

5.2 News page topic concentration

We know how each of the topics are distributed, but how do we measure topics against each other in terms of topic concentration and potential “bias”. One way to measure concentration across different topics is the Herfindahl-Hirschman Index (HHI). This index ranges from 0 to 100 with 100 meaning perfect concentration.

Figure 7: TOPIC CONCENTRATION - Hover over each bar to see the top topics for that page (full screen)

The most concentrated were news pages that focused mainly on Filipino subjects (which got lumped together), but apart from that the interesting findings are:

  • SunStar News, a Cebu based newspaper, focused mainly on local news, particularly in Visayas and Mindanao.
  • Yahoo Philippines almost always just talked about entertainment topics, and MSN Philippines focused on lifestyle topics.
  • Main news outlets, contrary to the comments posted on these pages, were the least concentrated and had the widest topic distribution.

6 Reactions to topics

If topics are relatively well distributed in terms of topics, then how do we go about explaining the perception of biased media. Well, Facebook’s timeline is heavily geared to provide information relating to topics that you have already interacted with. If we compare the amount of news articles published per topic over time, with the reactions we have to them over time:

Figure 8: REACTIONS VS ARTICLES - What news publish vs what topics we interact with.

7 Final remarks

Using natural language processing and topic modeling, we are able to turn unstructured text information and uncover latent topics in the corpus, so we can learn about the entire online news landscape in the country, not just what pops up in our newsfeeds. Some key takeaways from this exercise would be:

  1. Despite being a politics-driven year, we are still obsessed with entertainment. Movies, Television, and Showbiz all have taken top spots in the news.
  2. There are definite spikes in activity during certain weeks. Some of the most newsworthy events were the Elections in May, the South China Sea dispute in July, President Duterte in May and again in June, and the Marcos Burial in November.
  3. Despite what Facebook commenters may say, news pages are well distributed in terms of topics, and do not solely focus on particular topics. What actually appears on people’s newsfeeds, which is dependent on what users interact with, is another story.

8 Technical documentation

In the interest of reproducible research, here are the notebooks containing the code, results, and commentary behind the post. You may download the resulting code and run it yourself, provided:

  • You have the proper API keys to access the Facebook Graph API2, and
  • You agree that this is provided as-is and with no warranty.

You may also view the GitHub Repository here for the complete code and analysis that went into this post. You may also choose to collaborate with me in producing more parts to this series!


Blei, David, Andrew Ng, and Michael Jordan. 2003. “Latent Dirichlet Allocation.” Journal of Machine Learning Research 3: 993–1022. http://www.jmlr.org/papers/volume3/blei03a/blei03a.pdf.
Revesencio, Jonha. 2017. “Philippines: A Digital Lifestyle Capital in the Making? HuffPost.” https://web.archive.org/web/20170928113516/http://www.huffingtonpost.com/jonha-revesencio/philippines-a-digital-lif_1_b_7199924.html.


  1. If you are looking for the source of this data, unfortunately Facebook has altered their APIs and it is no longer straightforward to extract this information, and the Terms of Service inhibit me from sharing the raw data.↩︎

  2. If you are looking for the source of this data, unfortunately Facebook has altered their APIs and it is no longer straightforward to extract this information, and the Terms of Service inhibit me from sharing the raw data.↩︎


BibTeX citation:
  author = {TJ Palanca},
  title = {On {Facebook} {News} in the {Philippines}},
  date = {2017-03-12},
  url = {https://tjpalanca.com/facebook-news-topic-modeling.html},
  langid = {en}
For attribution, please cite this work as:
TJ Palanca. 2017. “On Facebook News in the Philippines.” March 12, 2017. https://tjpalanca.com/facebook-news-topic-modeling.html.