Finding the most liked tweets for a topic in a year
I’m nearly halfway through writing a month of daily blog posts. I wanted to write some posts covering the history of both Wikidata and Wikibase on Twitter. Being a developer, I looked for APIs, but it seems tweets are not as accessible as they once were.
This is a short write up of my adventure, covering APIs, scraping thoughts, and finally, my working solution, all be it with a quirk of 2 that I can’t explain.
APIs
Twitter offers a range of free and paid for / premium APIs. And conveniently there is an overview page for the search feature, which I believe to be what I want.
At the time of writing this, the overview looks something like this.
Category | Product name | Supported history | Data fidelity |
---|---|---|---|
Standard | Standard Search API | 7 days | Incomplete |
Premium | Search Tweets: 30-day endpoint | 30 days | Full |
Premium | Search Tweets: Full-archive endpoint | The entire archive | Full |
Enterprise | 30-day Search API | 30 days | Full |
Enterprise | Full-archive Search API | The entire archive | Full |
Twitter API v2 | Recent search | 7 days | Full |
Twitter API v2 | Full-archive search Only available via Academic Research access | The entire archive | Full |
There are only 3 APIs that provide access to the whole archive, and there are specifically for Premium, Enterprise or Research-based API users.
Premium API access requires “month to month contracts” and I don’t really want to go down that route. Academic research access has some requirements that I also don’t believe I meet.
Scraping
I ended up taking a look at multiple Twitter scraping options, but they all seemed broken, and also are against the Twitter TOS! (Probably because they offer paid-for services for diving into the archive).
A quick summary:
- Twitter Scraper – Apify: Some sort of API that requires sign up that may or may not be able to do this kind of thing
- Towards data science, How to Scrape Tweets From Twitter: Blog post covering 2 solutions, one of which is noted as now broken
- taspinar/twitterscraper: Has a Dockerfile that didn’t build for me, and multiple reports on the issues of the thing on longer working
- jonbakerfish/TweetScraper: A mildly complicated installation. I don’t think I got past this step
- twintproject/twint: I was hopefull here, there was even a repository for a Dockerfile. But this also seems to currently be non functional
Solution (UI search)
After a few hours of looking around for a programmatic solution, I gave up.
It turns out that you can search the whole Twitter archive in the UI search box.
"wikibase" since:2021-01-01 until:2021-12-31
(link)
Using this in combination with the min_faves
option you can slowly refine your search until you just find the top tweets for a given time period.
"wikibase" min_faves:50 since:2021-01-01 until:2021-12-31
(link)
And so the manual work of copying links around in order to write blog posts began!
Quirk?
When making a search for Wikibase tweets with over 50 likes in 2021, I get 4 results.
"wikibase" min_faves:50 since:2021-01-01 until:2021-12-31
(link)
Most importantly for describing this quirk, is the tweet from the Wikidata account from Nov 2 which has 105 likes.
If I up the min_faves
parameter to 90, I now only get a single tweet, and it is one of the other 4 tweets from my first search, and not this Wikidata one which also has over 90 likes.
"wikibase" min_faves:90 since:2021-01-01 until:2021-12-31
(link)
Does Twitter have some sort of fuzzy idea of the number of faves / likes that a tweet has?
More interestingly if I min_faves
down to 70, where I would expect both of the above tweets to appear, the Wikidata tweet still doesn’t show up!
"wikibase" min_faves:70 since:2021-01-01 until:2021-12-31
(link)
This shows me a tweet from Lozana with 75 likes, and the @annechardo tweet with 113.
Perhaps this connects back to something that I read on one of the API documentation pages.
Please note that Twitter’s search service and, by extension, the Search API is not meant to be an exhaustive source of Tweets. Not all Tweets will be indexed or made available via the search interface.
Twitter API docs
[…] lifespan. So let’s take a look back through time at some of the most liked Wikidata tweets (according to Twitter free search) since […]
[…] lifespan. So let’s take a look back through time at some of the most liked Wikibase tweets (according to Twitter free search) since […]