Finding the most liked tweets for a topic in a year

December 10, 2021 2 By addshore

I’m nearly halfway through writing a month of daily blog posts. I wanted to write some posts covering the history of both Wikidata and Wikibase on Twitter. Being a developer, I looked for APIs, but it seems tweets are not as accessible as they once were.

This is a short write up of my adventure, covering APIs, scraping thoughts, and finally, my working solution, all be it with a quirk of 2 that I can’t explain.

APIs

Twitter offers a range of free and paid for / premium APIs. And conveniently there is an overview page for the search feature, which I believe to be what I want.

At the time of writing this, the overview looks something like this.

CategoryProduct nameSupported historyData fidelity
StandardStandard Search API7 daysIncomplete
PremiumSearch Tweets: 30-day endpoint30 daysFull
PremiumSearch Tweets: Full-archive endpointThe entire archiveFull
Enterprise30-day Search API30 daysFull
EnterpriseFull-archive Search APIThe entire archiveFull
Twitter API v2Recent search7 daysFull
Twitter API v2Full-archive search
Only available via Academic Research access
The entire archive
Full

There are only 3 APIs that provide access to the whole archive, and there are specifically for Premium, Enterprise or Research-based API users.

Premium API access requires “month to month contracts” and I don’t really want to go down that route. Academic research access has some requirements that I also don’t believe I meet.

Scraping

I ended up taking a look at multiple Twitter scraping options, but they all seemed broken, and also are against the Twitter TOS! (Probably because they offer paid-for services for diving into the archive).

A quick summary:

Solution (UI search)

After a few hours of looking around for a programmatic solution, I gave up.

It turns out that you can search the whole Twitter archive in the UI search box.

"wikibase" since:2021-01-01 until:2021-12-31 (link)

Using this in combination with the min_faves option you can slowly refine your search until you just find the top tweets for a given time period.

"wikibase" min_faves:50 since:2021-01-01 until:2021-12-31 (link)

And so the manual work of copying links around in order to write blog posts began!

Quirk?

When making a search for Wikibase tweets with over 50 likes in 2021, I get 4 results.

"wikibase" min_faves:50 since:2021-01-01 until:2021-12-31 (link)

Most importantly for describing this quirk, is the tweet from the Wikidata account from Nov 2 which has 105 likes.

If I up the min_faves parameter to 90, I now only get a single tweet, and it is one of the other 4 tweets from my first search, and not this Wikidata one which also has over 90 likes.

"wikibase" min_faves:90 since:2021-01-01 until:2021-12-31 (link)

Does Twitter have some sort of fuzzy idea of the number of faves / likes that a tweet has?

More interestingly if I min_faves down to 70, where I would expect both of the above tweets to appear, the Wikidata tweet still doesn’t show up!

"wikibase" min_faves:70 since:2021-01-01 until:2021-12-31 (link)

This shows me a tweet from Lozana with 75 likes, and the @annechardo tweet with 113.

Perhaps this connects back to something that I read on one of the API documentation pages.

Please note that Twitter’s search service and, by extension, the Search API is not meant to be an exhaustive source of Tweets. Not all Tweets will be indexed or made available via the search interface.

Twitter API docs