Overview
If you need to scrape Twitter data with python and are struggling with other packages, Twitterscraper [GitHub] is a solid option for quickly collecting a large number of historical tweets.
Social media can be a gold mine of data in regards to consumer sentiment. Platforms such as Twitter lend themselves to holding useful information. Since users may post unfiltered opinions that are able to be retrieved with ease.
Combining this with other internal company information can help with providing insights. That can be useful into the general sentiment people may have in regards to companies, products, etc.
This tutorial is meant to be a quick straightforward introduction to scraping tweets from Twitter in Python. To provide direction for this tutorial I decided to focus on scraping through a general text search.
Social media can be an incredible source of real-time updates on current events, but accessing the data often presents challenges.
Scraping Twitter Data with Twitterscraper
My task was to identify power outages with social media. With the ultimate goal of improving the ability of emergency management officials to allocate resources in real-time.
I identified Twitter as the platform most likely to yield a large number of posts related to the subject, and enthusiastically set out to begin collecting tweets.
My enthusiasm was tested, however, as I ran into several stumbling blocks.
The first logical choice was to use Twitter’s official Search api. Unfortunately, the free version only allows access to the seven most recent days of historical tweets and is limited to scraping 18,000 tweets per a 15-minute window.
With further access requiring a costly subscription. Since I did not have the resources to buy a subscription, there was no way this method would allow me to amass enough data to train a model.
Next, I tried the Tweetscraper package. This seemed promising, and, indeed, I got excellent results with Tweetscraper.
Unfortunately for me, I overtaxed the API during my initial experimentation with using the package, and my computers were locked out from further access before I could refine my query to get meaningful results.
This was a major bummer, but the deadline loomed and we still had our hearts set on using Twitter. Enter Twitterscraper!
Twitterscraper doesn’t have the same built-in Python functionality that Tweetscraper does, so, first I have to install the package, to run my initial queries.
Let’s do the coding
Now let’s jump to the coding approach to use the beauty of this package to retrieve the bulk twitter data:
#Importing necessary files
from twitterscraper import query_tweets
import datetime as dt
import pandas as pd
#Providing the range of days in which I want the data
begin_date=dt.date(2020,4,1)
end_date=dt.date(2020,4,3)
#Providing limit to the fetched tweets and also specifying the language of the desired tweets
limit=1000
language='english'
#Extracting tweets for the Keyword “Covid-19”
tweets=query_tweets("Covid-19”,limit=limit,
begindate=begin_date,
enddate=end_date,lang=language)
This will extract the tweets for the keyword with the specified date range, limit of tweets and also for the specified language. It will take few minutes to extract all data.
#Creating Dataframe for the extracted tweets
df=pd.DataFrame(t.__dict__ for t in tweets)
df.head()
The final data gives you 21 columns to begin with the processing of the data. You can extract the desired columns for further operations.
Finally, I conclude by saying that, literally thanks, Twitterscraper. You saved the day!
Pingback: Quandl: Importing stock data using python - CodeHacks