Overview
It seems that the way that we consume information has changed a lot. We have become quite a news junkie recently. One thing, in particular, is that We have been reading quite a lot of international news to determine the stages of Covid-19 in our country.
To do this, we generally visit a lot of news media sites in various countries to read up on the news. This gave me an idea. Why not create international news for Corona? And here it is. This blog is about how We created the news data from NewsApi.
News API
The source of data comes from the News API, which lets me access articles from leading news outlets from various countries for free. The only caveat is that I could only hit the API 500 times a day, and there is a result limit of 100 results for a particular query for free accounts.
We tried to get around those limit barriers, so we don’t hit the API a lot. We also tried to get news data from last month using multiple filters to get a lot of data.
Python does not have built-in functionality for News API, so first we have to install the package, to run my initial queries.
Now let’s jump to the coding approach to use the beauty of this package to retrieve the bulk news data:
#Importing necessary files import pandas as pd import datetime from datetime import timedelta from newsapi.newsapi_client import NewsApiClient newsapi = NewsApiClient(api_key='aadd8b9994ac49fa9e742705846cb107')
Paste your own API-key which you get after registering on the News API. The primary way the API works is by giving us access to 3 functions.
a) A function to get Recent News from a country:
json_data = newsapi.get_top_headlines(language='en', country='in') data = pd.DataFrame(json_data['articles']) data.head()
#Here country=’in’ represents India. You can use the desired country of your own
b) A function to get “Everything” related to a query from the country. You can see the descriptions of API parameters here:
c) A function to get a list of sources from a Country programmatically. We can then use these sources to pull data from the “everything” API:
def get_sources(country): sources = newsapi.get_sources(country=country) sources = [x['id'] for x in sources['sources']] return sources sources = get_sources(country='in') print(sources[:5]) Output: 'google-news-in', 'the-hindu', 'the-times-of-india'
I used all the functions above to get data that refreshes at a particular cadence. You can see how I use these API functions in a loop to download the data.
For folks who are lost, you might like to start with the basics first. The News API is also free. There might be rate limits that might kick in even after we have tried to handle that.
Conclusion
Here I have tried to extract the web-based Covid news using Python. You can use this data to perform many analyses and it is quite sufficient for that. Literally thanks to News API for saving our time.