2 Background

Authors

Affiliation

2.1 The Emergence of the Fediverse

2.1.1 Interoperability

A system supports interoperability when information can be exchanged between parts of the system. Interoperability can have tremendous benefits because it guarantees parts of the system can work together while at the same time supporting components that are developed or operated independently. For example, the Internet is a highly interoperable system because it allows computers and networks to communicate with each other using shared protocols. Similarly, Email allows servers and accounts to pass messages to each other using shared protocols, but run their own software of choice¹.

What makes federated systems different from simple linking in practice is how they handle data. Individual nodes in a federated system do not simply link to data from other nodes in practice, but often store and replicate a copy of the data. In practice, this means such systems are less vulnerable to censorship, but it also introduces new complications regarding privacy. Deleting data from such a system is non-trivial.

2.1.2 Early Examples of Federated Social Websites

The Fediverse’s cultural roots are in the free software movement, which emphasizes permissive licensing and open source software. Early Fediverse projects largely attempted to create libre alternatives to corporate social media platforms (Mansoux and Abbing 2020, 125). For example, the GNU Social project (formerly known as StatusNet) was created as a free software approach to microblogging. Similarly, the Diaspora project launched in 2010, inspired by its founders’ shared concerns over the consolidation of information on the cloud (Nussbaum 2010). The network claimed over two hundred thousand users by November 2011 (Bielenberg et al. 2012) and was designed to be decentralized, with data stored and managed on independent servers known as “pods”.

More recently, the IndieWeb created a method for interlinking websites using shared standards such as Microformats 2 to mark semantic data (Jamieson, Yamashita, and McEwen 2022). Culturally, the IndieWeb often encouraged people post on their own websites and then syndicate to other websites using a system called POSSE (Post on your Own Site, Syndicate Elsewhere). Protocols like WebMentions allow IndieWeb sites to interact with each other.

2.1.3 ActivityPub

The ActivityPub (AP) protocol’s story starts from a place of fragmentation. While projects like GNU Social and StatusNet had a small but dedicated following, it was difficult to pass messages between servers with incompatible protocols like OStatus and Pump.io (Göndör and Küpper 2017). The ActivityPub project sought to bridge the collective of early Fediverse projects under a unifying standard and was recommended by the World Wide Web Consortium in 2018.

Although the protocols underlying the internet remains invisible for most of its users, a tremendous amount of time and effort go into their development. The development of a protocol can be challenging because it is impossible to anticipate all possible use-cases and there are trade-offs to most design decisions. ActivityPub, for instance, has been criticized for its lack of optimization and vulnerability to accidental distributed-denial-of-service attacks (Das 2024). It is important to remember that the ActivityPub standard represents the specific needs of their stakeholders at the time it was designed and adopted: namely, bridging the prior Fediverse protocols under a single, unified standard.

Fragmentation remains a challenge for DOSNs. It is hard to balance keeping wide compatibility with the creation of new and innovative features. While ActivityPub has been largely successful at bridging the fledgling communities it was designed for, further development on DOSNs may well leave AP behind. For instance, Bluesky has opted to produce its own protocol instead of adopting AP citing issues with data portability and scalability (“AT Protocol FAQ” n.d.).

2.1.4 Mastodon

Mastodon represents the most important Fediverse project to date. But despite its millions of registered users and coverage in major media publications, the project had humble beginnings. Eugen Rochko released the first public edition of Mastodon in October 2016. In a comment on the Hacker News thread on launch, Rochko wrote about the project’s ethos: “This isn’t a startup, it’s an open-source project. Most likely the Twitters and Facebooks will win, but people should have a viable choice… Plus this is an incredibly fun project to be working on, to be quite honest” (Rochko 2016). The software soon found an audience and eclipsed the user base of other Fediverse software.

Early reporting on Mastodon often described it an alternative to other platforms like Twitter, a framing which Zulli, Liu, and Gehl (2020) criticized.

Figure 2.1: **Histogram of Mastodon servers by number of users on October 2023.** Note that the bins and y-axis both use a log 10 scale. The majority of Mastodon servers are small.

Most Mastodon servers are small. The vast majority of Mastodon servers have fewer than 10 accounts. Many of these have only a single account. The distribution of accounts on servers, however, is highly skewed: the median server has 3 accounts, while the mean has 595 accounts.

2.2 Challenges for Online Communities

2.3 Collective Action Problems

Kollock (1998, 183) defines social dilemmas as situations where individually rational behavior leaves the collective worse off. These social dilemmas have a deficient equilibrium where there is an outcome that leaves everyone better off, but no individual incentive to move toward that outcome (Kollock 1998, 185).

Collective action problems are social dilemmas that occur when self-interested individuals have no incentive to work toward a public good (Olson 1965, 2). Hardin (1968) described one such collective action problem, where people individually overexploit shared resources, as the tragedy of the commons in a Malthusian argument against population growth. More recent scholarship, however, has shown that the tragedy of the commons is not an inevitable outcome from shared resources (Ostrom 1990). Instead, people can and do work together to manage shared resources.

Kollock (1999) argues that online communities produce public goods, often in the form of knowledge with a nearly limitless potential audience. Access to free and accurate information leaves everyone better off. This idea has been the foundation of knowledge production projects like Wikipedia and open source software projects like the Linux kernel, both of which function as public goods (non-excludable and non-rivalrous).

Due to the differences in the means of production within the Fediverse, many of the challenges faced by commercial websites are trasformed into collective action problems. These can create challenegs for the sytem as a whole. For instance, several major Fediverse servers have shut down over the years due to hosting costs. The vast majority of people with accounts on the Fediverse do not directly financially contribute to their servers—though many do.

2.4 Content Moderation

All websites which rely on third-party, user-generated content must perform some form of content moderation to remain viable (Gillespie 2018). Without it, online spaces would become dominated by spam, pornography, or other unwanted content (Gillespie 2020, 330–31). Much of this work remains largely invisible by design (Roberts 2019, 14).

While all social websites must perform content moderation, approaches vary. Large, well-resourced websites like Facebook hire teams of contractors which do the bulk of cleaning up their website. Other websites built around named subcommunities like Reddit hand off moderation duties to unpaid volunteer community members.

The small size of the average Mastodon server affects content moderation. Despite the small size of most Mastodon servers, the average Mastodon account is on a large server. Raman et al. (2019) found the top 5% of Mastodon servers host 90.6% of Mastodon accounts and send 94.8% of the posts. This means while the bulk of the moderation work concentrates on a few large servers, vulnerabilities and problems can come from a large set of smaller, less resourced servers.

Nicholson, Keegan, and Fiesler (2023) characterized the written rules on a number of Mastodon servers.

2.5 Discovery

Recommender systems help people filter information to find resources relevant to some need (Ricci, Roḳaḥ, and Shapira 2022). The development of these systems as an area of formal study harkens back to information retrieval (e.g. Salton and McGill (1987)) and foundational works imagining the role of computing in human decision-making (e.g. Bush (1945)). Early work on these systems produced more effective ways of filtering and sorting documents in searches such as the probabilistic models that motivated the creation of the okapi (BM25) relevance function (Robertson and Zaragoza 2009). Many contemporary recommendation systems use collaborative filtering, a technique which produces new recommendations for items based on the preferences of a collection of similar users (Koren, Rendle, and Bell 2022).

Collaborative filtering systems build on top of a user-item-rating (\(U-I-r\)) model where there is a set of users who each provide ratings for a set of items. The system then uses the ratings from other users to predict the ratings of a user for an item they have not yet rated and uses these predictions to create a ordered list of the best recommendations for the user’s needs (Ekstrand, Riedl, and Konstan 2011, 86–87). Collaborative filtering recommender systems typically produce better results as the number of users and items in the system increases; however, they must also deal with the “cold start” problem, where limited data makes recommendations unviable (Lam et al. 2008). The cold start problem has three possible facets: boostrapping new communities, dealing with new items, and handling new users (Schafer et al. 2007, 311–12). In each case, limited data on the entity makes it impossible to find similar entities without some way of building a profile. Further, uncorrected collaborative filtering techniques often also produce a bias where more broadly popular items receive more recommendations than more obscure but possibly more relevant items (Zhu et al. 2021). Research on collaborative filtering has also shown that the quality of recommendations can be improved by using a combination of user-based and item-based collaborative filtering (Sarwar et al. 2001).

Although all forms of collaborative filtering use some combination of users and items, there are two main approaches to collaborative filtering: memory-based and model-based. Memory-based approaches use the entire user-item matrix to make recommendations, while model-based approaches use a reduced form of the matrix to make recommendations. This is particularly useful because the matrix of items and users tends to be extremely sparse, e.g. in a movie recommendor system, most people have not seen most of the movies in the database. Singular value decomposition (SVD) is one such dimension reduction technique which transforms a \(m \times n\) matrix \(M\) into the form \(M = U \Sigma V^{T}\) (Paterek 2007). SVD is particularly useful for recommendation systems because it can be used to find the latent factors which underlie the user-item matrix and use these factors to make recommendations.

While researchers in the recommendation system space often focus on ways to design the system to produce good results mathematically, human-computer interaction researchers also consider various human factors which contribute to the overall system. Crucially, McNee et al. argued “being accurate is not enough”: user-centric evaluations, which consider multiple aspects of the user experience, are necessary to evaluate the full system. HCI researchers have also contributed pioneering recommender systems in practice. For example, GroupLens researchers Resnick et al. (1994) created a collaborative filtering systems for Usenet and later produced advancements toward system evaluation and explaination of movie recommendations (Herlocker et al. 2004; Herlocker, Konstan, and Riedl 2000). Cosley et al. (2007) created a system to match people with tasks on Wikipedia to encourage more editing. This prior work shows that recommender systems can be used to help users find relevant information in a variety of contexts.

Mastodon and other decentralized online social networks are particularily vulnerable to discovery problems. As information and accounts are spread out across many different servers, location matters in a way that is not relevant on centralized social networks. At the same time, any recommendation system run on a particular server is limited to the information on that server unless some system is in place to spread recommendations across servers, e.g. using federated machine learning.

“AT Protocol FAQ.” n.d. AT Protocol. https://atproto.com/guides/faq. Accessed December 4, 2024.

Bielenberg, Ames, Lara Helm, Anthony Gentilucci, Dan Stefanescu, and Honggang Zhang. 2012. “The Growth of Diaspora - A Decentralized Online Social Network in the Wild.” In 2012 Proceedings IEEE INFOCOM Workshops, 13–18. https://doi.org/10.1109/INFCOMW.2012.6193476.

Bush, Vannevar. 1945. “As We May Think.” The Atlantic 176 (1): 101–8.

Cosley, Dan, Dan Frankowski, Loren Terveen, and John Riedl. 2007. “SuggestBot: Using Intelligent Task Routing to Help People Find Work in Wikipedia.” In Proceedings of the 12th International Conference on Intelligent User Interfaces, 32–41. IUI ’07. New York, NY, USA: ACM. https://doi.org/10.1145/1216295.1216309.

Das, Ankush. 2024. “Please Don’t Share Our Links on Mastodon: Here’s Why!” It’s FOSS News. https://news.itsfoss.com/mastodon-link-problem/.

Ekstrand, Michael D., John T. Riedl, and Joseph A. Konstan. 2011. “Collaborative Filtering Recommender Systems.” Foundations and Trends in Human–Computer Interaction 4 (2): 81–173. https://doi.org/10.1561/1100000009.

Gillespie, Tarleton. 2018. Custodians of the Internet: Platforms, Content Moderation, and the Hidden Decisions That Shape Social Media. New Haven: Yale University Press.

———. 2020. “Platforms Throw Content Moderation at Every Problem.” In Fake News: Understanding Media and Misinformation in the Digital Age. The MIT Press. https://doi.org/10.7551/mitpress/11807.001.0001.

Göndör, Sebastian, and Axel Küpper. 2017. “The Current State of Interoperability in Decentralized Online Social Networking Services.” In 2017 International Conference on Computational Science and Computational Intelligence (CSCI), 852–57. https://doi.org/10.1109/CSCI.2017.148.

Hardin, Garrett. 1968. “The Tragedy of the Commons.” Science 162 (3859): 1243–48. https://doi.org/10.1126/science.162.3859.1243.

Herlocker, Jonathan L., Joseph A. Konstan, and John Riedl. 2000. “Explaining Collaborative Filtering Recommendations.” In Proceedings of the 2000 ACM Conference on Computer Supported Cooperative Work, 241–50. CSCW ’00. New York, NY, USA: Association for Computing Machinery. https://doi.org/10.1145/358916.358995.

Herlocker, Jonathan L., Joseph A. Konstan, Loren G. Terveen, and John T. Riedl. 2004. “Evaluating Collaborative Filtering Recommender Systems.” ACM Transactions on Information Systems 22 (1): 5–53. https://doi.org/10.1145/963770.963772.

Hill, Benjamin Mako. 2014. “Google Has Most of My Email Because It Has All of Yours.” Copyrighteous.

Jamieson, Jack, Naomi Yamashita, and Rhonda McEwen. 2022. “Bridging the Open Web and APIs: Alternative Social Media Alongside the Corporate Web.” Social Media + Society 8 (1): 20563051221077032. https://doi.org/10.1177/20563051221077032.

Kollock, Peter. 1998. “Social Dilemmas: The Anatomy of Cooperation.” Annual Review of Sociology 24 (January): 183–214. https://www.jstor.org/stable/223479.

———. 1999. “The Economies of Online Cooperation: Gifts and Public Goods in Cyberspace.” In Communities in Cyberspace, edited by Marc Smith and Peter Kollock, 220–39. London, UK: Routledge.

Koren, Yehuda, Steffen Rendle, and Robert Bell. 2022. “Advances in Collaborative Filtering.” In Recommender Systems Handbook, edited by Francesco Ricci, Liʾor Roḳaḥ, and Bracha Shapira, Third edition, 91–142. New York, NY: Springer.

Lam, Xuan Nhat, Thuc Vu, Trong Duc Le, and Anh Duc Duong. 2008. “Addressing Cold-Start Problem in Recommendation Systems.” In Proceedings of the 2nd International Conference on Ubiquitous Information Management and Communication, 208–11. ICUIMC ’08. New York, NY, USA: Association for Computing Machinery. https://doi.org/10.1145/1352793.1352837.

Mansoux, Aymeric, and Roel Rocsam Abbing. 2020. “Seven Theses on the Fediverse and the Becoming of Floss.” In The Eternal Network: The Ends and Becomings of Network Culture, edited by Kristoffer Gansing and Inga Luchs. Institute of Network Cultures and transmediale e.V.

Nicholson, Matthew N., Brian C Keegan, and Casey Fiesler. 2023. “Mastodon Rules: Characterizing Formal Rules on Popular Mastodon Instances.” In Companion Publication of the 2023 Conference on Computer Supported Cooperative Work and Social Computing, 86–90. CSCW ’23 Companion. New York, NY, USA: Association for Computing Machinery. https://doi.org/10.1145/3584931.3606970.

Nussbaum, Emily. 2010. “How Diaspora Is a Very Different Kind of Social Network.” New York Magazine, September.

Olson, Mancur. 1965. The Logic of Collective Action: Public Goods and the Theory of Groups. Cambridge, MA: Harvard University Press.

Ostrom, Elinor. 1990. Governing the Commons: The Evolution of Institutions for Collective Action. The Political Economy of Institutions and Decisions. Cambridge ; New York: Cambridge University Press.

Paterek, Arkadiusz. 2007. “Improving Regularized Singular Value Decomposition for Collaborative Filtering.” Proceedings of KDD Cup and Workshop, August.

Raman, Aravindh, Sagar Joglekar, Emiliano De Cristofaro, Nishanth Sastry, and Gareth Tyson. 2019. “Challenges in the Decentralised Web: The Mastodon Case.” In Proceedings of the Internet Measurement Conference, 217–29. IMC ’19. New York, NY, USA: Association for Computing Machinery. https://doi.org/10.1145/3355369.3355572.

Resnick, Paul, Neophytos Iacovou, Mitesh Suchak, Peter Bergstrom, and John Riedl. 1994. “Grouplens: An Open Architecture for Collaborative Filtering of Netnews.” In Proceedings of the 1994 ACM Conference on Computer Supported Cooperative Work, 175–86. CSCW ’94. New York, NY, USA: ACM. https://doi.org/10.1145/192844.192905.

Ricci, Francesco, Liʾor Roḳaḥ, and Bracha Shapira, eds. 2022. Recommender Systems Handbook. Third edition. New York, NY: Springer.

Roberts, Sarah T. 2019. Behind the Screen: Content Moderation in the Shadows of Social Media. New Haven: Yale University Press.

Robertson, Stephen, and Hugo Zaragoza. 2009. “The Probabilistic Relevance Framework: BM25 and Beyond.” Foundations and Trends in Information Retrieval 3 (4): 333–89. https://doi.org/10.1561/1500000019.

Rochko, Eugen. 2016. “Show HN: A New Decentralized Microblogging Platform.”

Salton, Gerard, and Michael J. McGill. 1987. Introduction to Modern Information Retrieval. 3. pr. McGraw-Hill International Editions. New York: McGraw-Hill Book Comp.

Sarwar, Badrul, George Karypis, Joseph Konstan, and John Riedl. 2001. “Item-Based Collaborative Filtering Recommendation Algorithms.” In Proceedings of the 10th International Conference on World Wide Web, 285–95. WWW ’01. New York, NY, USA: Association for Computing Machinery. https://doi.org/10.1145/371920.372071.

Schafer, J. Ben, Dan Frankowski, Jon Herlocker, and Shilad Sen. 2007. “Collaborative Filtering Recommender Systems.” In The Adaptive Web: Methods and Strategies of Web Personalization, edited by Peter Brusilovsky, Alfred Kobsa, and Wolfgang Nejdl, 291–324. Lecture Notes in Computer Science. Berlin, Heidelberg: Springer. https://doi.org/10.1007/978-3-540-72079-9_9.

Zhu, Ziwei, Yun He, Xing Zhao, Yin Zhang, Jianling Wang, and James Caverlee. 2021. “Popularity-Opportunity Bias in Collaborative Filtering.” In Proceedings of the 14th ACM International Conference on Web Search and Data Mining, 85–93. WSDM ’21. New York, NY, USA: Association for Computing Machinery. https://doi.org/10.1145/3437963.3441820.

Zulli, Diana, Miao Liu, and Robert Gehl. 2020. “Rethinking the ‘Social’ in ‘Social Media’: Insights into Topology, Abstraction, and Scale on the Mastodon Social Network.” New Media & Society 22 (7): 1188–1205. https://doi.org/10.1177/1461444820912533.

While email is a decentralized system by design, in practice it has become more centralized over time. For instance, Google’s Gmail service is a dominant provider of email services. Smaller email providers may struggle to get through Gmail’s spam filters, increasing the incentive to simply use Gmail. This means that in practice, Google still has a great amount of data even on people who opt not to use Google’s services (Hill 2014).↩︎