Federating the Social Web

Prospectus Presentation

Carl Colglazier

Community Data Science Collective

Northwestern University

Background

The social web’s “wicked problems”—those with many stakeholders and impossible to solve without tradeoffs:

Content moderation¹
Discovery²
Norms (Establishing and Maintaining)

Protocols vs. Platforms

Rather than building new protocols, the internet has grown up around controlled platforms that are privately owned. These can function in ways that appear similar to the earlier protocols, but they are controlled by a single entity¹

PROTOCOLS	PLATFORMS
Decentralized: Power distributed across many independent implementations	Centralized: Power concentrated within single corporate entities
Open: Interoperable standards allowing anyone to build compatible services	Closed: Walled gardens with controlled access and developer limitations
User-Controlled: Users choose preferred clients and implementations	Company-Controlled: Company dictates features, rules, and content policies
Example: Email (SMTP, IMAP, POP3)	Example: Facebook

Diagram of a centralized and decentralized network.

The Fediverse

A collection of social servers
Interoperability via (mainly) ActivityPub protocol
Software like Mastodon and Pleroma

Why Study The Fediverse?

The Fediverse faces similar problems to other organizations on the social web, but have different motivations and stakeholders.

Studies

The effects of de-federation (inter-server sanctions)¹
Server recommendation system (newcomers and server choice)²
Moderating the Fediverse (rules and enforcement)

Remaining Work

Figure 1: Gantt chart for the timeline and major milestones remaining.

The Effects of Group Sanctions on Activity and Toxicity

Colglazier, Carl, Nathan TeBlunthuis, and Aaron Shaw. “The Effects of Group Sanctions on Participation and Toxicity: Quasi-experimental Evidence from the Fediverse.” In Proceedings of the International AAAI Conference on Web and Social Media, vol. 18, pp. 315-328. 2024.

Fediverse servers are decentralized and autonomous.

This means while they have almost complete control over their own servers, they cannot control what happens on other servers.

Servers may decide another server is more trouble than they are worth and block the server—an action called de-federation.

For the accounts who lose connections, what are the effects on activity and toxicity?

Data

Figure 2: The y-axis shows the cumulative number of blocked and blocking accounts included in our analysis over our study period.

We gathered de-federation events where accounts who previously interacted no longer could due to a new block. We then matched these accounts with synthetic controls.

De-federation reduced activity on blocked servers, but not on blocking servers

Figure 3: Visualization of activity among blocked and blocking user accounts shows an asymmetric change in activity following defederation

Group	median	W	p
\(U_0\)	-135.5	41197.5	0.000
\(C_0\)	-18.0	35762.0	0.143
\(U_1\)	-54.5	12413.0	0.122
\(C_1\)	-53.5	12520.0	0.091
\(\Delta_0\)	-39.0	39927.0	0.000
\(\Delta_1\)	3.0	10645.5	0.421

Table 1: Non-parametric tests for differences in activity before and after defederation events (summed across all weeks) find a measurable decrease in posting activity for the accounts on blocked servers compared to matched controls but no such change for accounts on blocking servers.

We found no change in toxicity

Figure 4: Median toxicity among accounts which posted each week for blocked and blocking user accounts. The median toxicity remained flat for all groups.

Group	median	W	p
\(U_0\)	-0.006	17746	0.538
\(C_0\)	0.004	14000	0.950
\(U_1\)	-0.008	6514	0.619
\(C_1\)	0.001	5546	0.873
\(\Delta_0\)	-0.005	17161	0.072
\(\Delta_1\)	0.000	6414	0.305

Table 2: Non-parametric difference-in-differences for median post toxicity before and after de-federation events. The \(W\) test statistic represents the sum of the ranks of the positive differences between paired observations while the p-value compares to the alternative hypothesis that the changes are zero.

Takeaways

Previous research into group sanctions show they can be effective in altering anti-social behavior,¹ but communities do not exist in isolation and users can continue their operations off-platform.²

We find:

De-federation can be effective at reducing activity for users on blocked servers
No evidence of blowback on toxicity for any affected users

Any decentralized online social network will have de-federation or a similar mechanism.

Server Recommendations

Colglazier, Carl. “Do Servers Matter on Mastodon? Data-driven Design for Decentralized Social Media.” In 1st International Workshop on Decentralizing the Web (2024).

Millions of newcomers joined Mastodon after Elon Musk bought Twitter

Figure 5: Accounts in the dataset created between January 2022 and March 2023.

Many people find the Mastodon onborading process confusing

Onboarding newcomers is essential for online communities.¹

Compared to commercial social media, Mastodon onboarding is harder because newcomers need to pick a server.

Screenshot of the “Join Mastodon” server selection webpage.

Do some Mastodon servers retain newcomers better than others?

Smaller, less general servers are more likely to retain new accounts

Figure 6: Survival probabilities for accounts created during May 2023.

Term	Estimate	Low	High	p-value
Join Mastodon	0.115	0.972	1.296	0.117
General Servers	0.385	1.071	2.015	0.017
Small Server	-0.245	0.664	0.922	0.003

Table 3: Coefficients for the Cox Proportional Hazard Model with Mixed Effects. The model includes a random effect for the server.

Accounts that move between servers are more likely to move to smaller servers

	Model A		Model B
	Coef.	Std.Error	Coef.	Std.Error
(Sum)	-9.529	***0.188	-10.268	***0.718
nonzero	-3.577	***0.083	-2.861	***0.254
Smaller server	0.709	***0.032	0.629	***0.082
Server size (outgoing)	0.686	***0.013	0.655	***0.042
Open registrations (incoming)	0.168	***0.046	-0.250	0.186
Languages match	0.044	0.065	0.589	0.392

Table 4

Our analysis suggests…

Accounts on large, general servers fare worse
Moved accounts go to smaller servers

Can we build a system that helps people find servers?

Constraints

Consent: servers should be able to choose whether to participate
Privacy: do not reveal information about individual accounts
Decentralization: do not concentrate data in one place
Openness: use shared standards and protocols

Concept

A decentralized, tag-based collaborative filtering system

Each server reports their top tags from the last three months
Learn from these reports and from other servers which tags are most important for each server
Recommend servers based on selected tags of interest

Implementation

Report top hashtags used by the most accounts on each server
For robustness, drop hashtags used by too few accounts or servers
Build an \(m \times n\) server-tag matrix \(M\)
Normalize with Okai BM25 TF-IDF and L2 normalization¹
Apply singular value decomposition (SVD) on \(M\) to create a new matrix \(M'\)
Match servers to selected tags using cosine similarity

Singular value decomposition visualisation

Demo

https://carlcolglazier.com/demos/deweb2024/

Evaluation

Recommendation systems attempt to be predictive.

What are we trying to predict? Out-of-sample data:

Train/test split
Posts just before our three month period
Accounts that move servers

Histogram comparison of three potential recommendation models

User study

We want to also get information from potential real-world users.

To do this, I plan to run a small semi-structured interview study with a mix of users recruited from Mastodon and outside Mastodon.

Rules and Content Moderation in the Fediverse

Work forthcoming

Background

Rules and norms play an essential role in the operation of online communities.¹

Commercial social web sites like Reddit have site-wide rules which apply to all communities,² but decentralized online social networks like Mastodon do not have these.³

Figure 7: Rules as they appeared on Mastodon Social in 2022.

Mastodon servers tend to adopt the same rules

Puzzle: the Fediverse promises autonomy and indepedence, but servers look very similar in practice.

RQ1: How do rules originate and spread on the Fediverse?
RQ2: Why do Fediverse servers have similar rules?

Data

Description of servers in interview study
ID	Software	Size	Topic
FV1	Mastodon	[100–1K)	Regional
FV2	Mastodon	[1K–10K)	Language
FV3	Mastodon	[10–100)	Interest/Language
FV4	Mastodon	[1K–10K)	Interest(?)
FV5	Mastodon	[10-100)	Regional/Interst
FV6	Pleroma
FV7	Mastodon	[100–1K)
FV8	Mastodon	[100–1K)
FV9	Mastodon	[1K–10K)
FV10	Mastodon	[100–1K)
FV11	Mastodon	[100–1K)
FV12	Mastodon	[10–100)
FV13	Mastodon	[100–1K)	Interest
FV14	Pleroma
FV15	Pleroma		Religion
FV16	Misskey
FV17	Mastodon	[100–1K)

Table 5: Description of servers in interview study

Quantitative data processing

Most common rules

Rule	Count
No racism, sexism, homophobia, transphobia, xenophobia, or casteism	1925
No incitement of violence or promotion of violent ideologies	1629
No harassment, dogpiling or doxxing of other users	1581
No illegal content.	1273
Sexually explicit or violent media must be marked as sensitive when posting	1220
Do not share intentionally false or misleading information	1074
Be nice.	510
No spam or advertising.	410
Don't be a dick.	405
Be excellent to each other.	264

Table 6: Most common rules on Fediverse servers. Highlighted rules were once rules on Mastodon Social.

Questions I still intend to answer with the longitudinal data

How many Mastodon servers have rules?
What predicts if a server has rules?
How often do rules change? (Not very often)
Do similar servers adopt similar rules? (e.g. can we predict rule adoption)

Initial Findings

Mastodon rules are created in a process consistent with instutional isomorphism.¹

Figure 10: Diagram of three kinds of institutional isomorphism and how they shape rules in the Fediverse.

Qualitative Findings

Applying local rules to external posts
Proactive vs. reactive moderation
Rules as signposts
Rules to solicit reports
Network integrity

References

Abdollahpouri, Himan, Robin Burke, and Bamshad Mobasher. “Recommender Systems as Multistakeholder Environments.” In Proceedings of the 25th Conference on User Modeling, Adaptation and Personalization, 347–48. UMAP ’17. New York, NY, USA: Association for Computing Machinery, 2017. doi:10.1145/3079628.3079657.

Anaobi, Ishaku Hassan, Aravindh Raman, Ignacio Castro, Haris Bin Zia, Damilola Ibosiola, and Gareth Tyson. “Will Admins Cope? Decentralized Moderation in the Fediverse.” In Proceedings of the ACM Web Conference 2023, 3109–20. WWW ’23. New York, NY, USA: Association for Computing Machinery, 2023. doi:10.1145/3543507.3583487.

Baran, P. “On Distributed Communications Networks.” IEEE Transactions on Communications Systems 12, no. 1 (March 1964): 1–9. doi:10.1109/TCOM.1964.1088883.

Chandrasekharan, Eshwar, Shagun Jhaver, Amy Bruckman, and Eric Gilbert. “Quarantined! Examining the Effects of a Community-Wide Moderation Intervention on Reddit.” ACM Transactions on Computer-Human Interaction 29, no. 4 (March 2022): 29:1–26. doi:10.1145/3490499.

Chandrasekharan, Eshwar, Umashanthi Pavalanathan, Anirudh Srinivasan, Adam Glynn, Jacob Eisenstein, and Eric Gilbert. “You Can’t Stay Here: The Efficacy of Reddit’s 2015 Ban Examined Through Hate Speech.” Proc. ACM Hum.-Comput. Interact. 1, no. CSCW (December 2017): 31:1–22. doi:10.1145/3134666.

Chandrasekharan, Eshwar, Mattia Samory, Shagun Jhaver, Hunter Charvat, Amy Bruckman, Cliff Lampe, Jacob Eisenstein, and Eric Gilbert. “The Internet’s Hidden Rules: An Empirical Study of Reddit Norm Violations at Micro, Meso, and Macro Scales.” Proceedings of the ACM on Human-Computer Interaction 2, no. CSCW (November 2018): 1–25. doi:10.1145/3274301.

Colglazier, Carl. “Do Servers Matter on Mastodon? Data-driven Design for Decentralized Social Media.” International Workshop on Decentralizing the Web 1 (2024).

Colglazier, Carl, Nathan TeBlunthuis, and Aaron Shaw. “The Effects of Group Sanctions on Participation and Toxicity: Quasi-experimental Evidence from the Fediverse.” Proceedings of the International AAAI Conference on Web and Social Media 18 (May 2024): 315–28. doi:10.1609/icwsm.v18i1.31316.

DiMaggio, Paul J., and Walter W. Powell. “The Iron Cage Revisited: Institutional Isomorphism and Collective Rationality in Organizational Fields.” American Sociological Review 48, no. 2 (1983): 147–60. doi:10.2307/2095101.

Driscoll, Kevin. The Modem World: A Prehistory of Social Media. Yale University Press, 2022.

Fiesler, Casey, Jialun" Aaron" Jiang, Joshua McCann, Kyle Frye, and Jed R. Brubaker. “Reddit Rules! Characterizing an Ecosystem of Governance.” In Proceedings of the International AAAI Conference on Web and Social Media, 72–81. Stanford, CA: AAAI, 2018.

Gillespie, Tarleton. Custodians of the Internet: Platforms, Content Moderation, and the Hidden Decisions That Shape Social Media. New Haven: Yale University Press, 2018.

Gillett, Rosalie, and Nicolas Suzor. “Incels on Reddit: A Study in Social Norms and Decentralised Moderation.” First Monday, June 2022. doi:10.5210/fm.v27i6.12575.

Hovenkamp, Herbert. “Antitrust and Platform Monopoly.” The Yale Law Journal 130, no. 8 (June 2021).

Kraut, Robert E., Paul Resnick, and Sara Kiesler. Building Successful Online Communities: Evidence-based Social Design. Cambridge, MA: MIT Press, 2012.

Masnick, Mike. “Masnick’s Impossibility Theorem: Content Moderation At Scale Is Impossible To Do Well.” Techdirt. https://www.techdirt.com/2019/11/20/masnicks-impossibility-theorem-content-moderation-scale-is-impossible-to-do-well/, November 2019.

———. “Protocols, Not Platforms: A Technological Approach to Free Speech.” Knight First Amendment Institute, August 2019.

Narayanan, Arvind. “Understanding Social Media Recommendation Algorithms.” Knight First Amendment Institute, March 2023.

Nicholson, Matthew N., Brian C Keegan, and Casey Fiesler. “Mastodon Rules: Characterizing Formal Rules on Popular Mastodon Instances.” In Companion Publication of the 2023 Conference on Computer Supported Cooperative Work and Social Computing, 86–90. CSCW ’23 Companion. New York, NY, USA: Association for Computing Machinery, 2023. doi:10.1145/3584931.3606970.

Pinch, Trevor J., and Wiebe E. Bijker. “The Social Construction of Facts and Artefacts: Or How the Sociology of Science and the Sociology of Technology Might Benefit Each Other.” Social Studies of Science 14, no. 3 (August 1984): 399–441. doi:10.1177/030631284014003004.

Ribeiro, Manoel Horta, Shagun Jhaver, Savvas Zannettou, Jeremy Blackburn, Gianluca Stringhini, Emiliano De Cristofaro, and Robert West. “Do Platform Migrations Compromise Content Moderation? Evidence from r/The_Donald and r/Incels.” Proceedings of the ACM on Human-Computer Interaction 5, no. CSCW2 (October 2021): 316:1–24. doi:10.1145/3476057.

Robertson, Stephen, and Hugo Zaragoza. “The Probabilistic Relevance Framework: BM25 and Beyond.” Foundations and Trends in Information Retrieval 3, no. 4 (2009): 333–89. doi:10.1561/1500000019.

Shaw, Aaron. “Centralized and Decentralized Gatekeeping in an Open Online Collective.” Politics & Society 40, no. 3 (2012): 349–88. doi:10.1177/0032329212449009.

Zittrain, Jonathan. “A History of Online Gatekeeping.” Harvard Journal of Law & Technology 19 (2005/2006): 253.

Federating the Social Web

Background

Challenges in the Current Social Web Ecosystem

Protocols vs. Platforms

The Early Social Web Pre-dated Platforms

The Fediverse

Social Interoperability

Why Study The Fediverse?

Studies

Remaining Work

The Effects of Group Sanctions on Activity and Toxicity

Data

De-federation reduced activity on blocked servers, but not on blocking servers

We found no change in toxicity

Takeaways

Server Recommendations

Millions of newcomers joined Mastodon after Elon Musk bought Twitter

Many people find the Mastodon onborading process confusing

Do some Mastodon servers retain newcomers better than others?

Smaller, less general servers are more likely to retain new accounts

Accounts that move between servers are more likely to move to smaller servers

Our analysis suggests…

Can we build a system that helps people find servers?

Constraints

Concept

Implementation

Demo

Evaluation

User study

Rules and Content Moderation in the Fediverse

Background

Mastodon servers tend to adopt the same rules

Data

Quantitative data processing

Most common rules

Questions I still intend to answer with the longitudinal data

Initial Findings

Qualitative Findings

References