Google News is a service that we can use to take a pulse of a popular topic. Currently, there is a presidential election happening in the United States. With this event, we have an opportunity for news data analysis. In the realm of marketing, there is a concept of Effective Frequency. This refers to how many times you need to expose people to a message or idea before making a buy decision. In an election, the buy is a vote, and the message is which candidate to vote for. In this article, we will walk through how to use the Google News API with Python. This will allow us to capture data over time and analyze it.
The idea we will test is about media exposure for presidential candidates. Who will have their name appear more often in the news for a few days leading up to the election? Yes, that does not guarantee exposure to the winning candidate more often. News articles are not the only source of media. And it does not ensure that the names are being used in a positive light. But it should give us some signs of the public awareness of the candidates.
The following list is a summary of the process we will follow:
A Five-Step Process
- Make sure you have python installed
- Get an API Key
- Subscribe to the Google News API
- Use the Google News API with Python
- Chart the results
Step 1. Make sure you have python installed
For this article, we will be using a computer running Windows to run the Python code. I installed Python version 3 from the Python installation instructions for Windows.
But to be sure, we will put up this sample code and try running it:
#!/usr/bin/env python import sys sys.stdout.write("hello from Python %sn" % (sys.version,))
When I save the code as hello.py and run this on my machine I see this:
>python hello.py hello from Python 3.9.0 (tags/v3.9.0:9cf6752, Oct 5 2020, 15:34:40) [MSC v.1927 64 bit (AMD64)]
Python version 3.x is required to use the http.client library in the sample Python code for the Google News API.
Step 2. Get an API Key
Once we know Python is available, we need to get an API Key. The Google News API we will be using is hosted on the RapidAPI platform. Getting a key is a simple process that is free. Go to the RapidAPI home page and use an email address or social media account to connect.
Step 3. Subscribe to the Google News API
Once you register on the RapidAPI platform, the next step is to subscribe to the Google News API. You can do that by clicking on the blue button on the endpoints page which says “Subscribe to Test”:
After subscribing, I use Search endpoint. I would use the “Topic Headlines” endpoint but I need a custom topic. I want to search for US Presidential Election so I use the Search endpoint to do so.
Step 4. Use the Google News API with Python
Now that we have made sure Python is available and subscribed to an API, it’s time to use it. First we will start with the sample code on the Endpoints page for Python http.client. This uses built in Python libraries (when you have Python version 3+ installed).
import http.client conn = http.client.HTTPSConnection("google-news.p.rapidapi.com") headers = { 'x-rapidapi-host': "google-news.p.rapidapi.com", 'x-rapidapi-key': "xxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxx" } conn.request("GET", "/v1/search?country=US&lang=en&q=Elon%20Musk", headers=headers) res = conn.getresponse() data = res.read() print(data.decode("utf-8"))
You’ll need to replace the rapidapi-key Xs with your actual key. You will see this on the endpoints page when you are logged in.
The sample code pulls posts in JSON format. Next, we will use Python to populate a local summary text file with the results:
#! python """ File: captureNewsTrends.py Description: This script pulls a list of items from the Google News API. Then it loops through results to tally the frequency of key words in the title of each news article. The counts for a given date and time are saved into a file. """ #import libraries used below import http.client import json from datetime import datetime from pathlib import Path # This is where the generated html will be saved (in the local directory) # More information about the Path function is described at https://realpython.com/python-pathlib/ data_folder = Path("C:/Users/myUserName/Documents/") outputFile = data_folder / "newsTrends.csv" # datetime object containing current date and time now = datetime.now() # Get the date and time in the format YYYY-mm-dd H:M:S dt_string = now.strftime("%Y-%m-%d %H:%M:%S") # Initialize counters trumpCnt=0 bidenCnt=0 jorgensenCnt=0 articleCnt=0 # Set to 1 to show details along the way for debugging purposes debug=0 #This is the url encoded query we will search for in the Google News API # Spaces are replaced with %20 query = "US%20Presidential%20Election" #Connect to the API conn = http.client.HTTPSConnection("google-news.p.rapidapi.com") headers = { 'x-rapidapi-host': "google-news.p.rapidapi.com", 'x-rapidapi-key': "xxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxx" } conn.request("GET", "/v1/search?when=1h&country=US&lang=en&q="+query, headers=headers) res = conn.getresponse() data = res.read() # Load API response into a Python dictionary object, encoded as utf-8 string json_dictionary = json.loads(data.decode("utf-8")) # Loop through dictionary keys to access each article for item in json_dictionary['articles']: # Pull the title for this article into a variable. thisTitle = item['title'] if debug>0: print("Title:", thisTitle) #Get a count of keywords in the article title trumpCnt+= thisTitle.upper().count("TRUMP") bidenCnt+= thisTitle.upper().count("BIDEN") jorgensenCnt+= thisTitle.upper().count("JORGENSEN") articleCnt+=1 # Create summary line for the csv file outputCSV = str(dt_string) + "," + str(articleCnt) + "," + str(trumpCnt) + "," + str(bidenCnt) + "," + str(jorgensenCnt) + "\n" #Now populate the csv file with open(outputFile, "a", encoding="utf-8") as f: f.write(outputCSV)
Again, to use the code above you’ll need to replace the Xs with your RapidAPI Key. Also replace the path with a local path on your computer or server. What the code will do is to pull a list of news articles posted within the last hour. The source is Google News and the query is US Presidential Election. Then it loops through each article title retrieved. For each article it checks for the last name of the current presidential candidates. Finally it adds a row to a local Comma Separated Value (CSV) file.
To get a data point every hour, I create a batch file called captureNewsTrends.bat like this:
"C:\path\to\python.exe" "C:\path\to\captureNewsTrends.py"
Then using the windows scheduler I set the batch file to run once per hour to populate the csv file. The query used in this example has under 60 results for each of the API calls. This is for every hour in several days leading up to the election. If you expand the search topic or the time requested you will get more results. But this API call only returns the first 100 results. That’s why it’s important to limit the results and/or the time frame to get a smaller number of them at a time.
Step 5. Chart the Results
Once the election is over, we can stop capturing information every hour. At that point we have a csv file which we can chart on a graph. Using Excel the trend graph obtained from this script looks like this:
There are a few interesting things to note from this graph:
- It looks like Trump got slightly more exposure than Biden. But it still looks very close between those 2 candidates.
- Jorgensen did not appear in any of the news headlines. That indicates that the “2 party system” is still dominant. This matches the election results. None of the states are registering more than 2% of the votes to anyone besides Trump or Biden.
- As of this writing (2 days after the election) votes are still being counted and there is no winner yet. Even the states that have almost all their votes counted are very close between the 2 top candidates. That closeness matches the chart.
Conclusion
In this article, we have walked through an example of using the Google News API with Python. We started by getting set up with the API and then used Python to create a CSV file with the results. Then we charted the results in a graph using Excel.
The thesis under test is whether media exposure impacts election results. This idea comes from an old marketing axiom called the “Rule of 7”. This refers to how often people need to be exposed to a message before making a purchase. In this experiment the following limitations apply:
- There are many ways to be exposed to messages that were not considered in the data analyzed. For example political signs have littered roadsides all over the country for several weeks if not months. Most of them have the last name of candidates prominent on the sign. Many have just that, though for lesser-known candidates first names are often included.
- There are many other search queries which people might use that would expose them to political messages. We used one query.
- The tone or emphasis of each article was not captured. Much of the current political news articles highlight negative aspects of candidates. The positivity of each message was not measured. This might be a good use for the Sentiment Analysis API.
- Correlation does not imply causality. Just because the exposure may correlate with vote count, it does not mean that is the cause.
With that aside, there is a strong correlation between exposure and votes. Biden and Trump received vs what percentage of people voted for anyone else. Personally I only remember seeing 3 candidates on the ballot, but there are others in the race:
The winner needs to receive 270 electoral votes so it is not over yet.
Leave a Reply