Twitter Data Extraction¶
The tidyextractors.tidytwitter
submodule lets you extract user data from Twitter with minimal effort. This page will guide you through the process.
A Minimal Code Example¶
from tidyextractors.tidytwitter import TwitterExtractor
# Your Twitter API credentails. See below for how to get them!
credentials = {
'access_token': '',
'access_secret': '',
'consumer_key': '',
'consumer_secret': ''
}
# A list of users for data extraction.
users = ['user1','user2','user3']
# Extract Twitter data.
tx = TwitterExtractor(users, extract_tweets=True, **credentials)
# Twitter user profile data in a Pandas DataFrame
user_df = tx.users(drop_collections=True)
# User/tweet keyed Pandas DataFrame
tweet_df = tx.tweets()
Step 1: Get API Credentials¶
To extract data using the Twitter API, you will first need to obtain API credentials. Your API credentials contain four pieces of information:
access_token
access_secret
consumer_key
consumer_secret
To get these credentials, check out the Twitter developer documentation: https://dev.twitter.com/oauth/overview/application-owner-access-tokens
Step 2: Extract Data¶
Once you have your API credentials, you can extract user data with the TwitterExtractor
:
Warning
The Twitter API enforces rate limits, so be careful when downloading large amounts of data.
For a raw report on your remaining limit, call tx._api.rate_limit_status()
after extraction.
Note
As per the limit imposed by the Twitter API, only the 3,200 most recent tweets will be downloaded for each user.
from tidyextractors.tidytwitter import TwitterExtractor
credentials = {
# Randomly generated example credentials for demonstration only
'access_token': '985689236-R0EjHQJZLya6gb82R5g8Odb4UMwkhQy4Q2AxzBnB',
'access_secret': 'CVuVV0LSf74PQt2HH6zt08aeumGdMvlZtKF7BbHvRmX4r',
'consumer_key': 'F47AzSRag0KvVFG4eJYexuDqB',
'consumer_secret': 'lovnyqIA1oKs0jI4A27VXLLSUWrKc0hnNzyTu39NWIjSiq1xxj'
}
# User names may have leading "@" but this is not required.
users = ['user1','user2','user3']
# Users' tweets are extracted by default, but this may be disabled.
tx = TwitterExtractor(users, extract_tweets=True, **credentials)
You may need to wait while the data is being extracted, but all the data is now stored inside the extractor object. You just need a bit more code to get it in your preferred format.
Step 3: Get Pandas Data¶
Now, you can call a TwitterExtractor
method to return data in a Pandas DataFrame.
# Twitter user profile data in a Pandas DataFrame
user_df = tx.users(drop_collections=True)
# User/tweet keyed Pandas DataFrame
tweet_df = tx.tweets()
Note
TwitterExtractor.users()
drops columns with collections of data in cells (i.e. list
, set
, and dicts
) because “tidy data” requires only atomic values in cells.
If you don’t want data dropped, change the optional drop_collections
argument to false.