Federating the Social Web

Prospectus Presentation

Carl Colglazier

Community Data Science Collective

Northwestern University

Background

The social web’s “wicked problems”—those with many stakeholders and impossible to solve without tradeoffs:

  • Content moderation1

  • Discovery2

  • Norms (Establishing and Maintaining)

Kraut, Resnick, and Kiesler3

Gillespie4

Narayanan5

Challenges in the Current Social Web Ecosystem

  • Many social web services have produced controversy because they (and their communities) have gatekeeping powers.1

  • Their operation and development follow the needs of their stakeholders.2

  • While some of these websites use volunteers, there is still a top-down structure to their operation (e.g. Reddit mods vs. admins).3

  • Alternatives exist, but networks effects help entrench incumbents.4

 

Moody v. NetChoice

European Union Digital Services Act

Protocols vs. Platforms

Rather than building new protocols, the internet has grown up around controlled platforms that are privately owned. These can function in ways that appear similar to the earlier protocols, but they are controlled by a single entity1

PROTOCOLS PLATFORMS
Decentralized: Power distributed across many independent implementations Centralized: Power concentrated within single corporate entities
Open: Interoperable standards allowing anyone to build compatible services Closed: Walled gardens with controlled access and developer limitations
User-Controlled: Users choose preferred clients and implementations Company-Controlled: Company dictates features, rules, and content policies
Example: Email (SMTP, IMAP, POP3) Example: Facebook

The Early Social Web Pre-dated Platforms

Early social online communities such as bulletin board systems (BBS) and USENET were independently operated.1

Diagram of a centralized and decentralized network.

The Fediverse

  • A collection of social servers

  • Interoperability via (mainly) ActivityPub protocol

  • Software like Mastodon and Pleroma

Screenshot of Mastodon front-end.

Social Interoperability

While ActivityPub allowed the fledgling Fediverse to interconnect, as a sociotechnical system, much of the social infrastructure is still developing.

  • Trust & Safety: Organizations like Independent Federated Trust & Safety (IFTAS) and the Social Web Foundation help admins coordinate and grow the ecosystem.

  • Reliability: Agreements like the Mastodon Server Covenant ensure servers will give people notice before shutting down and have at least one person availible in case of emergencies.1

  • Best Practices: Emerging research suggests challenges for admins and how to best deal with them.2

Why Study The Fediverse?

The Fediverse faces similar problems to other organizations on the social web, but have different motivations and stakeholders.

Studies

  1. The effects of de-federation (inter-server sanctions)1

  2. Server recommendation system (newcomers and server choice)2

  3. Moderating the Fediverse (rules and enforcement)

Remaining Work

Figure 1: Gantt chart for the timeline and major milestones remaining.

The Effects of Group Sanctions on Activity and Toxicity

Colglazier, Carl, Nathan TeBlunthuis, and Aaron Shaw. “The Effects of Group Sanctions on Participation and Toxicity: Quasi-experimental Evidence from the Fediverse.” In Proceedings of the International AAAI Conference on Web and Social Media, vol. 18, pp. 315-328. 2024.

Fediverse servers are decentralized and autonomous.

This means while they have almost complete control over their own servers, they cannot control what happens on other servers.

Servers may decide another server is more trouble than they are worth and block the server—an action called de-federation.

For the accounts who lose connections, what are the effects on activity and toxicity?

Data

Figure 2: The y-axis shows the cumulative number of blocked and blocking accounts included in our analysis over our study period.

We gathered de-federation events where accounts who previously interacted no longer could due to a new block. We then matched these accounts with synthetic controls.

De-federation reduced activity on blocked servers, but not on blocking servers

Figure 3: Visualization of activity among blocked and blocking user accounts shows an asymmetric change in activity following defederation
Group median W p
\(U_0\) -135.5 41197.5 0.000
\(C_0\) -18.0 35762.0 0.143
\(U_1\) -54.5 12413.0 0.122
\(C_1\) -53.5 12520.0 0.091
\(\Delta_0\) -39.0 39927.0 0.000
\(\Delta_1\) 3.0 10645.5 0.421
Table 1: Non-parametric tests for differences in activity before and after defederation events (summed across all weeks) find a measurable decrease in posting activity for the accounts on blocked servers compared to matched controls but no such change for accounts on blocking servers.

We found no change in toxicity

Figure 4: Median toxicity among accounts which posted each week for blocked and blocking user accounts. The median toxicity remained flat for all groups.
Group median W p
\(U_0\) -0.006 17746 0.538
\(C_0\) 0.004 14000 0.950
\(U_1\) -0.008 6514 0.619
\(C_1\) 0.001 5546 0.873
\(\Delta_0\) -0.005 17161 0.072
\(\Delta_1\) 0.000 6414 0.305
Table 2: Non-parametric difference-in-differences for median post toxicity before and after de-federation events. The \(W\) test statistic represents the sum of the ranks of the positive differences between paired observations while the p-value compares to the alternative hypothesis that the changes are zero.

Takeaways

Previous research into group sanctions show they can be effective in altering anti-social behavior,1 but communities do not exist in isolation and users can continue their operations off-platform.2

 

We find:

  • De-federation can be effective at reducing activity for users on blocked servers

  • No evidence of blowback on toxicity for any affected users

Any decentralized online social network will have de-federation or a similar mechanism.

Server Recommendations

Colglazier, Carl. “Do Servers Matter on Mastodon? Data-driven Design for Decentralized Social Media.” In 1st International Workshop on Decentralizing the Web (2024).

Millions of newcomers joined Mastodon after Elon Musk bought Twitter

Figure 5: Accounts in the dataset created between January 2022 and March 2023.

Many people find the Mastodon onborading process confusing

Onboarding newcomers is essential for online communities.1

Compared to commercial social media, Mastodon onboarding is harder because newcomers need to pick a server.

Screenshot of the “Join Mastodon” server selection webpage.

Do some Mastodon servers retain newcomers better than others?

Smaller, less general servers are more likely to retain new accounts

Figure 6: Survival probabilities for accounts created during May 2023.
Term Estimate Low High p-value
Join Mastodon 0.115 0.972 1.296 0.117
General Servers 0.385 1.071 2.015 0.017
Small Server -0.245 0.664 0.922 0.003
Table 3: Coefficients for the Cox Proportional Hazard Model with Mixed Effects. The model includes a random effect for the server.

Accounts that move between servers are more likely to move to smaller servers

Model A Model B
Coef. Std.Error Coef. Std.Error
(Sum) -9.529 ***0.188 -10.268 ***0.718
nonzero -3.577 ***0.083 -2.861 ***0.254
Smaller server 0.709 ***0.032 0.629 ***0.082
Server size (outgoing) 0.686 ***0.013 0.655 ***0.042
Open registrations (incoming) 0.168 ***0.046 -0.250 0.186
Languages match 0.044 0.065 0.589 0.392
Table 4

Our analysis suggests…

  • Accounts on large, general servers fare worse
  • Moved accounts go to smaller servers

Can we build a system that helps people find servers?

Constraints

  • Consent: servers should be able to choose whether to participate

  • Privacy: do not reveal information about individual accounts

  • Decentralization: do not concentrate data in one place

  • Openness: use shared standards and protocols

Concept

A decentralized, tag-based collaborative filtering system

  • Each server reports their top tags from the last three months

  • Learn from these reports and from other servers which tags are most important for each server

  • Recommend servers based on selected tags of interest

Implementation

  • Report top hashtags used by the most accounts on each server

  • For robustness, drop hashtags used by too few accounts or servers

  • Build an \(m \times n\) server-tag matrix \(M\)

  • Normalize with Okai BM25 TF-IDF and L2 normalization1

  • Apply singular value decomposition (SVD) on \(M\) to create a new matrix \(M'\)

  • Match servers to selected tags using cosine similarity

Singular value decomposition visualisation

Demo

https://carlcolglazier.com/demos/deweb2024/

Evaluation

Recommendation systems attempt to be predictive.

What are we trying to predict? Out-of-sample data:

  • Train/test split

  • Posts just before our three month period

  • Accounts that move servers

Histogram comparison of three potential recommendation models

User study

We want to also get information from potential real-world users.

To do this, I plan to run a small semi-structured interview study with a mix of users recruited from Mastodon and outside Mastodon.

Rules and Content Moderation in the Fediverse

Work forthcoming

Background

Rules and norms play an essential role in the operation of online communities.1

Commercial social web sites like Reddit have site-wide rules which apply to all communities,2 but decentralized online social networks like Mastodon do not have these.3

Figure 7: Rules as they appeared on Mastodon Social in 2022.

Mastodon servers tend to adopt the same rules

Figure 8

Puzzle: the Fediverse promises autonomy and indepedence, but servers look very similar in practice.

  • RQ1: How do rules originate and spread on the Fediverse?

  • RQ2: Why do Fediverse servers have similar rules?

Data

Figure 9: Trace data of Mastodon server rules (2021-2025)
Description of servers in interview study
ID Software Size Topic
FV1 Mastodon [100–1K) Regional
FV2 Mastodon [1K–10K) Language
FV3 Mastodon [10–100) Interest/Language
FV4 Mastodon [1K–10K) Interest(?)
FV5 Mastodon [10-100) Regional/Interst
FV6 Pleroma
FV7 Mastodon [100–1K)
FV8 Mastodon [100–1K)
FV9 Mastodon [1K–10K)
FV10 Mastodon [100–1K)
FV11 Mastodon [100–1K)
FV12 Mastodon [10–100)
FV13 Mastodon [100–1K) Interest
FV14 Pleroma
FV15 Pleroma Religion
FV16 Misskey
FV17 Mastodon [100–1K)
Table 5: Description of servers in interview study

Quantitative data processing

Raw Longitudinal Data

Filter Servers

Vectorize Rules with SBERT

Cluster Similar Rules

Most common rules

Rule Count
No racism, sexism, homophobia, transphobia, xenophobia, or casteism 1925
No incitement of violence or promotion of violent ideologies 1629
No harassment, dogpiling or doxxing of other users 1581
No illegal content. 1273
Sexually explicit or violent media must be marked as sensitive when posting 1220
Do not share intentionally false or misleading information 1074
Be nice. 510
No spam or advertising. 410
Don't be a dick. 405
Be excellent to each other. 264
Table 6: Most common rules on Fediverse servers. Highlighted rules were once rules on Mastodon Social.

Questions I still intend to answer with the longitudinal data

  • How many Mastodon servers have rules?

  • What predicts if a server has rules?

  • How often do rules change? (Not very often)

  • Do similar servers adopt similar rules? (e.g. can we predict rule adoption)

Initial Findings

Mastodon rules are created in a process consistent with instutional isomorphism.1

Figure 10: Diagram of three kinds of institutional isomorphism and how they shape rules in the Fediverse.

Qualitative Findings

  • Applying local rules to external posts

  • Proactive vs. reactive moderation

  • Rules as signposts

  • Rules to solicit reports

  • Network integrity

References

Abdollahpouri, Himan, Robin Burke, and Bamshad Mobasher. “Recommender Systems as Multistakeholder Environments.” In Proceedings of the 25th Conference on User Modeling, Adaptation and Personalization, 347–48. UMAP ’17. New York, NY, USA: Association for Computing Machinery, 2017. doi:10.1145/3079628.3079657.
Anaobi, Ishaku Hassan, Aravindh Raman, Ignacio Castro, Haris Bin Zia, Damilola Ibosiola, and Gareth Tyson. “Will Admins Cope? Decentralized Moderation in the Fediverse.” In Proceedings of the ACM Web Conference 2023, 3109–20. WWW ’23. New York, NY, USA: Association for Computing Machinery, 2023. doi:10.1145/3543507.3583487.
Baran, P. “On Distributed Communications Networks.” IEEE Transactions on Communications Systems 12, no. 1 (March 1964): 1–9. doi:10.1109/TCOM.1964.1088883.
Chandrasekharan, Eshwar, Shagun Jhaver, Amy Bruckman, and Eric Gilbert. “Quarantined! Examining the Effects of a Community-Wide Moderation Intervention on Reddit.” ACM Transactions on Computer-Human Interaction 29, no. 4 (March 2022): 29:1–26. doi:10.1145/3490499.
Chandrasekharan, Eshwar, Umashanthi Pavalanathan, Anirudh Srinivasan, Adam Glynn, Jacob Eisenstein, and Eric Gilbert. “You Can’t Stay Here: The Efficacy of Reddit’s 2015 Ban Examined Through Hate Speech.” Proc. ACM Hum.-Comput. Interact. 1, no. CSCW (December 2017): 31:1–22. doi:10.1145/3134666.
Chandrasekharan, Eshwar, Mattia Samory, Shagun Jhaver, Hunter Charvat, Amy Bruckman, Cliff Lampe, Jacob Eisenstein, and Eric Gilbert. “The Internet’s Hidden Rules: An Empirical Study of Reddit Norm Violations at Micro, Meso, and Macro Scales.” Proceedings of the ACM on Human-Computer Interaction 2, no. CSCW (November 2018): 1–25. doi:10.1145/3274301.
Colglazier, Carl. “Do Servers Matter on Mastodon? Data-driven Design for Decentralized Social Media.” International Workshop on Decentralizing the Web 1 (2024).
Colglazier, Carl, Nathan TeBlunthuis, and Aaron Shaw. “The Effects of Group Sanctions on Participation and Toxicity: Quasi-experimental Evidence from the Fediverse.” Proceedings of the International AAAI Conference on Web and Social Media 18 (May 2024): 315–28. doi:10.1609/icwsm.v18i1.31316.
DiMaggio, Paul J., and Walter W. Powell. “The Iron Cage Revisited: Institutional Isomorphism and Collective Rationality in Organizational Fields.” American Sociological Review 48, no. 2 (1983): 147–60. doi:10.2307/2095101.
Driscoll, Kevin. The Modem World: A Prehistory of Social Media. Yale University Press, 2022.
Fiesler, Casey, Jialun" Aaron" Jiang, Joshua McCann, Kyle Frye, and Jed R. Brubaker. “Reddit Rules! Characterizing an Ecosystem of Governance.” In Proceedings of the International AAAI Conference on Web and Social Media, 72–81. Stanford, CA: AAAI, 2018.
Gillespie, Tarleton. Custodians of the Internet: Platforms, Content Moderation, and the Hidden Decisions That Shape Social Media. New Haven: Yale University Press, 2018.
Gillett, Rosalie, and Nicolas Suzor. “Incels on Reddit: A Study in Social Norms and Decentralised Moderation.” First Monday, June 2022. doi:10.5210/fm.v27i6.12575.
Hovenkamp, Herbert. “Antitrust and Platform Monopoly.” The Yale Law Journal 130, no. 8 (June 2021).
Kraut, Robert E., Paul Resnick, and Sara Kiesler. Building Successful Online Communities: Evidence-based Social Design. Cambridge, MA: MIT Press, 2012.
Masnick, Mike. “Masnick’s Impossibility Theorem: Content Moderation At Scale Is Impossible To Do Well.” Techdirt. https://www.techdirt.com/2019/11/20/masnicks-impossibility-theorem-content-moderation-scale-is-impossible-to-do-well/, November 2019.
———. “Protocols, Not Platforms: A Technological Approach to Free Speech.” Knight First Amendment Institute, August 2019.
Narayanan, Arvind. “Understanding Social Media Recommendation Algorithms.” Knight First Amendment Institute, March 2023.
Nicholson, Matthew N., Brian C Keegan, and Casey Fiesler. “Mastodon Rules: Characterizing Formal Rules on Popular Mastodon Instances.” In Companion Publication of the 2023 Conference on Computer Supported Cooperative Work and Social Computing, 86–90. CSCW ’23 Companion. New York, NY, USA: Association for Computing Machinery, 2023. doi:10.1145/3584931.3606970.
Pinch, Trevor J., and Wiebe E. Bijker. “The Social Construction of Facts and Artefacts: Or How the Sociology of Science and the Sociology of Technology Might Benefit Each Other.” Social Studies of Science 14, no. 3 (August 1984): 399–441. doi:10.1177/030631284014003004.
Ribeiro, Manoel Horta, Shagun Jhaver, Savvas Zannettou, Jeremy Blackburn, Gianluca Stringhini, Emiliano De Cristofaro, and Robert West. “Do Platform Migrations Compromise Content Moderation? Evidence from r/The_Donald and r/Incels.” Proceedings of the ACM on Human-Computer Interaction 5, no. CSCW2 (October 2021): 316:1–24. doi:10.1145/3476057.
Robertson, Stephen, and Hugo Zaragoza. “The Probabilistic Relevance Framework: BM25 and Beyond.” Foundations and Trends in Information Retrieval 3, no. 4 (2009): 333–89. doi:10.1561/1500000019.
Shaw, Aaron. “Centralized and Decentralized Gatekeeping in an Open Online Collective.” Politics & Society 40, no. 3 (2012): 349–88. doi:10.1177/0032329212449009.
Zittrain, Jonathan. “A History of Online Gatekeeping.” Harvard Journal of Law & Technology 19 (2005/2006): 253.