Blue Sky: Can Twitter be owned by its users?

Twitter competitor Blue Sky is growing fast and it aims to "decentralize Twitter". We explore what that means and how it works.

Apr 12, 2023

First off: I'm a distributed systems engineer and have designed and implemented Dropbox's Peer-to-Peer protocol. I've also spent several years simulating the growth of social networks at a startup I previously founded. My own company Teleport.XYZ is working on decentralizing ridesharing, so I spend all day thinking about how to decentralize previously centralized services.

Few things are more powerful in the world than the ability to shape the narrative, censor others, and push your own messaging. Time and again, we've seen that no matter who controls speech, power corrupts.

Twitter has been at the center of the intelligentsia's struggle over the narrative: No matter who is in charge, we know that the power will be abused: Twitter's previous leadership censored political opponents, Elon censored competitors, and no doubt if someone else gained this power, they too would succumb to wielding it. It's as if the power to control speech was The One Ring, and no one is strong enough to resist it forever. So some of us have concluded that the only reasonable option is to throw The One Ring (Twitter) back into the volcano from which it was forged (The Internet).

So how would you go about doing this? Twitter's former CEO Jack Dorsey announced the Blue Sky project on December 11, 2019. Its aims were simple: To develop an open protocol to address the challenges faced by centralized platforms.

Let's dig into what moving from a centralized service to a decentralized protocol means:

A centralized service runs on computers wholly under control of a single entity. They can change, monitor, amplify and bury all information. They can read your private messages. They can delete content, promote other content, and even impersonate you. They can tell you that they are doing this in your interest, and they can, as Twitter has recently done make assurances that they are publishing their process. But in the end you have to trust that those in power won't read your private messages and that they won't censor and de-platform you because they disagree with you. If the press is indeed the fifth estate, then it seems clear that Democracy can't function if we hand the power to control speech to centralized entities, be they foreign or domestic, well-intended or malicious.

A protocol on the other hand is an agreement for how computers can talk to each other. The word protocol has its origins in describing the set of forms and etiquettes observed when meeting foreign and local dignitaries. Think of it as something along the lines of "First gifts are exchanged, and then you hand the business card with both hands while bowing looking down deferentially as you're handing it to the other person". Protocols are what power the internet. When your E-Mail program sends a message to an E-Mail server, that's what it does: First, it says Hi, then it demonstrates that it's allowed to send email on behalf of the person they are sending as, next they submit the subject line and the body of the message, and finally after receiving acknowledgement that the message was sent, they say goodbye if they are polite.

To "decentralize Twitter" essentially means to transform the platform from a centralized service, controlled by a single entity, to a distributed network where control and decision-making are spread across multiple, independent nodes. This is a big effort and involves redesigning the underlying architecture and protocols to enable peer-to-peer communication, data storage, and content sharing. The challenges involved are definitely big. But the team at Blue Sky told me they have solved it. So let's go deeper.

Blue Sky is built on something they call the AT Protocol (or @ Protocol?), which stands for Authenticated Transfer Protocol, aka ATP. What this protocol needs to define is how users can create and update records of their activities (posts, comments, likes, follows). Blue Sky/ATP calls the place data like this is stored "Signed Data Repositories". Signed, because each user has a cryptographic key to update his data repository. Each data repository is the equivalent to an account on twitter (e.g. @justinbieber or @rihanna).

But how do users "own" these data repositories? In ATP - remember, ATP is the protocol that Blue Sky, the decentralized Twitter replacement is built on - accounts are primarily identified by a cryptographic key. This key has a public component, and a private component. This is very similar to how Bitcoin wallets have a public name/address, and a private key that's required to access the funds contained within that address. The true name of the ATP account is what's called a DID, a Decentralized Identifier, which is just a representation of the public key (there's a bit more to it, but it's best to just think of it as a public key for now).

For example, my DID looks like this

did:plc:vzemfzughnwhrtvr63ur44r6

Now a DID like that wouldn't make a great username, so how do we get to user-readable usernames from here? ATP's solution is simple: Just reuse DNS (the Domain Name System), one of the core internet protocols that's already in use everywhere. What DNS does for millions of queries per second already is to resolve names like "amazon.com" to IP numbers like 192.168.0.1. So why not use DNS to resolve did:plc:vzemfzughnwhrtvr63ur44r6 to paulbohm.com? Exactly, that's what ATP does.

So where this puts us at is that bluesky uses domain names (example.com) not emails (user@example.com) as the identifiers most users will see. It also means that you can change your username (Domain Name), while retaining your identity (DID, the equivalent of an IP Address) and all your followers and other information stored in Data Repositories. You can still subsegment each domain using subdomains, so anyone working for example.com could get paul.example.com and maria.example.com entries to show their affiliation.

I think this innovation is quite a big deal: It means that while there is still a bit of a land rush for people to get accounts on the original .bsky.social domain for OG bragging rights, the obvious solution for those who really care about their brand is to get a cool domain name. There's another reason you really want your own domain as username: It makes it easy for people to know it's you. If someone registers celebrity.bsky.social on the network, you can't be sure it's them. But if they show up with their domain name, it's a lot easier to establish if they are who they say they are.

Ok this gets us to about half-time: We've described how Blue Sky/ATP defines identities, and how users own the keys to those identities, but what about servers? Isn't Twitter run in these enormously big data centers with millions of computers crunching all the Tweets? How do you decentralize that? Glad you asked!

Blue Sky's approach to decentralization is using a paradigm called Federation. What this means is that all the data from the Data Repositories needs to be stored somewhere. And ideally not just in one place controlled by one company, but in many places controlled by many people.

So let's recap: Users own accounts which have public and private keys. The public key is public, and everyone knows about it, but the private key is private — only the user has it. This also means that even the servers can't impersonate you since they don't have your key. The private key/DID/Decentralized Identifier is the true name of the account, but the account can also have a domain name — just like websites on the internet have domain names and IP Addresses. Most of the time you never interact with the DID or public key but just with the domain name of other people. The software takes care of it for you.

The data that users sign using their private key gets stored in Data Repositories, and those Data Repositories in turn get stored on Federation Servers. These Federation Servers need to synchronize data between them so that multiple servers in the world share the same world-view. The way this will be accomplished is using peering agreements. This means you need to talk to someone who is already part of the network to give you access to the "firehose" of all the information that happens on the network. Once enough servers have a global view of all the posts, that should become easier: You won't be dependent on any singular entity gatekeeping you away from the global feed of information, but you might have to pay someone for sending you all that data. It's open because you're probably going to get it, but it's probably not free forever.

There are two types of networking the Blue Sky / ATP Federation supports:

Big-World Networking: That's the firehose. Having all the data of the entire network on all the Big-World Servers. That's what you need to do if you want to crunch the data, make it searchable, and provide algorithms that help you discover new content and new people to follow. Not everyone needs to run a Big-World server, and running a Big-World server can be quite costly.

Small-World Networking: This is much more directed and downstream from Big-World: A Small-World Server only gets the information it needs to serve its clients: Posts, DMs, likes, and so on for just the people it serves, and those people its users follow.

It's a lot cheaper to operate a small-world server than a Big-World server. And while it's not explicitly discussed, I'd argue there's probably an argument to be made that there's also a need for Huge-World Servers which not only see all the data, but store it forever; Not every Big-World server will want to store decades worth of data since that can be very costly.

Now what's cool about this architecture is that you're not dependent on any single server. You can take your account (DID/account key) and move to any server you want. If it's a Big-World server, it already has a copy of your data repository. If it's a Small-World server it can ask its upstream Big-World Server for a copy and there you go.

It also means that if you don't agree with the algorithmic filtering on the server that you are on, it's incredibly easy and fast to just pack up and move. Just connect to a new server and there you go.

Now let's talk a bit about the economics of moving a beast with multiple data centers to such a federated/decentralized architecture. How much is it going to cost, optimistically, to run a Big-World server? How much is it going to cost to send all that data between multiple Big-World servers?

For that we'll need to make some assumptions, and I'd love to hear from you if you can make a better guess. This is my napkin math:

Assuming 6,000 tweets per second, an average tweet size of 3 KB, $0.05/GB bandwidth cost, $0.02/GB storage cost, and no data redundancy, the daily data storage needs of Twitter might be estimated as follows:

Write throughput: 6,000 tweets/second * 3 KB/tweet = 18,000 KB/second ≈ 17.58 MB/second
Storage capacity (daily): 17.58 MB/second * 86,400 seconds/day ≈ 1,519.07 GB/day ≈ 1.51 TB/day
Storage capacity (monthly, with 30-day data retention): 1,519.07 GB/day * 30 days ≈ 45,572.1 GB/month ≈ 44.54 TB/month
Daily cost for storage: 1,519.07 GB/day * $0.02/GB ≈ $30.38/day
Daily cost for bandwidth: 1,519.07 GB/day * $0.05/GB ≈ $75.95/day
Total daily cost: $30.38/day (storage) + $75.95/day (bandwidth) ≈ $106.33/day
Monthly cost for storage: $30.38/day * 30 days ≈ $911.40/month
Monthly cost for bandwidth: $75.95/day * 30 days ≈ $2,278.50/month
Total monthly cost: $911.40/month (storage) + $2,278.50/month (bandwidth) ≈ $3,189.90/month

The hardware to handle the write speeds and puts us 2x redundancy might cost around $30k. Add ~$500 of electricity cost (~144kWh/day), and round to a rough order of magnitude: This puts us somewhere in the $4,000-10k/month range for storing 30 days worth of data with 2x redundancy.

That seems at least order of magnitude viable, especially since other Big-World servers and downstream Small-World servers will likely pay for both data and indexing/algorithmic filtering services.

So in conclusion, as crazy as the idea of moving the Twitter behemoth to a decentralized/federated network sounds: It might actually work, and it could create a really cool ecosystem that allows a thriving ecosystem of service providers to build on top of the protocol. Thumbs up from me!

Now, if you're wondering if something similar could be done in other industries, subscribe to my newsletter and follow along as I post more about our project to decentralize ridesharing at Teleport.XYZ.

Blue Sky: Can Twitter be owned by its users?

Twitter competitor Blue Sky is growing fast and it aims to "decentralize Twitter". We explore what that means and how it works.

Discussion about this post