Newswise – Trolls, haters, flamers and other ugly characters are unfortunately a fact of life on much of the internet. Their ugliness is ruining social media networks and sites like Reddit and Wikipedia.
But toxic content looks different depending on its location, and identifying online toxicity is a first step in getting rid of it.
A team of researchers from the Institute for Software Research (ISR) at Carnegie Mellon University’s School of Computer Science recently teamed up with colleagues from Wesleyan University to make a first attempt at understanding toxicity on open source platforms such as GitHub.
“You have to know what that toxicity looks like to design tools to deal with it,” said Courtney Miller, a Ph.D. student in the ISR and lead author on the paper. “And dealing with that toxicity can lead to healthier, more inclusive, more diverse, and just better places overall.”
To better understand what toxicity looked like in the open source community, the team first collected toxic content. They used a toxicity and politeness detector developed for another platform to scan nearly 28 million posts on GitHub created between March and May 2020. The team also searched these posts for “code of conduct” — a phrase commonly used when responding to toxic content — and looked for locked or deleted issues, which can also be a sign of toxicity.
Through this curation process, the team developed a definitive dataset of 100 toxic messages. They then used this data to study the nature of the toxicity. Was it offensive, righteous, arrogant, trolling or unprofessional? Was it aimed at the code itself, at people, or somewhere else entirely?
“Toxicity is different in open source communities,” Miller said. “It’s more contextual, entitled, subtle, and passive-aggressive.”
Only about half of the toxic messages the team identified contained obscenities. Others were from demanding users of the software. Some came from users who post a lot of issues on GitHub, but otherwise don’t contribute much. Comments started about a software’s code became personal. None of the posts have helped make the open source software or the community any better.
“Worst app ever. Please don’t make it the worst app ever. Thanks,” one user wrote in a post in the dataset.
The team noticed a unique trend in how people responded to toxicity on open source platforms. Often the project developer did their best to accommodate the user or fix the issues with the toxic content. This often led to frustration.
“They wanted to give the benefit of the doubt and create a solution,” Miller said. “But that turned out to be quite onerous.”
Response to the article has been strong and positive, Miller said. Open source developers and community members were thrilled that this investigation took place and that the behavior they had been dealing with for a long time was finally recognised.
“We’ve been hearing from developers and community members for a long time about the unfortunate and almost ingrained toxicity in open source,” Miller said. “Open source communities are a little rough around the edges. They often have terrible diversity and retention, and it’s important that we start addressing and addressing the toxicity there to make it a more inclusive and better place.”
Miller hopes the research will provide a foundation for more and better work in this area. Her team stopped building a toxicity detector for the open source community, but the foundation has been laid.
“There’s so much work to do in this space,” Miller said. “I really hope people see this, expand it and keep the ball rolling.”
Joining Miller at work were: Daniel Klug, a systems scientist in the ISR; ISR faculty members Bogdan Vasilescu and Christian Kastner† and Sophie Cohen of Wesleyan University. The team’s newspaper, “Did you miss my comment or something?” Understanding toxicity in open source discussions,was presented last month at the ACM/IEEE International Conference on Software Engineering in Pittsburgh, where it won a Distinguished Paper Award.