Twitter has made good on one of CEO Elon Musk’s many promises, posting on a Friday afternoon what it claims is the code for its tweet recommendation algorithm on GitHub.
The code, posted under a GNU Affero General Public License v3.0, contains numerous insights as to what factors make a tweet more or less likely to show up in users’ timelines.
In a blog post accompanying the code release, Twitter’s engineering team (under no particular byline) notes that the system for determining which “top Tweets that ultimately show up on your device’s For You timeline” is “composed of many interconnected services and jobs.” Each time a Twitter home screen is refreshed, Twitter pulls “the best 1,500 Tweets from a pool of hundreds of millions,” the post states.
The largest source of those tweets is “In-Network Sources,” or users someone follows. The top tweets from that pile are ranked on the likelihood of a user’s engagement with that tweet’s author; the more likely, the more their tweets show up in For You. For the “Out-of-Network Sources,” those not followed by the user, Twitter says it considers tweets that attracted engagement from people users follow and tweets liked by those who like tweets similar to a user.
Already, those who have looked through the code have spotted considerations that raise many more questions. Many have posted them, naturally, on Twitter itself.
Twitter just released source code for "the algorithm"
— Ólafur Waage (@olafurw) March 31, 2023
Oh, what file is this? Predicates for tweets on the home timeline?
Oh what is that 2nd image? pic.twitter.com/UE3dU8e3Os
Ólafur Waage, a senior software developer at Norwegian software consulting service TurtleSec, noted that inside “HomeTweetTypePredicates.scala,” some of the seeming considerations for a tweet to be a candidate for the “For You” section are:
author_is_elonauthor_is_power_userauthor_is_democratauthor_is_republican
Elsewhere in the code, a code comment presumably left by a Twitter engineer clarifies that those identification values are “used purely for metrics collection.” The comment reads as follows:
These author ID lists are used purely for metrics collection. We track how often we are serving Tweets from these authors and how often their tweets are being impressed by users. This helps us validate in our A/B experimentation platform that we do not ship changes that negatively impacts one group over others.
The names of the objects in question such as “DDGStatsDemocratsFeature” or “DDGStatsElonFeature” seem to support this interpretation, but it may not be possible to confirm that with the available code. It’s interesting that Twitter is checking and collating these variables, however. During a Twitter Spaces audio session, a Twitter engineer noted that the Democrat and Republican labels were used for metrics. Musk, who claimed he was unaware of the labels before today, suggested they should not be there.

Loading comments...