ADVERTISEMENT

ShhorAI Attempts To Make The Internet Safer

Aindriya Barua's ShhorAI is fighting hate speech against queer communities in an attempt to make the internet safer.

<div class="paragraphs"><p>Image created with Meta AI.</p></div>
Image created with Meta AI.

“Your existence is political,” says Aindriya Barua, the founder of ShhorAI, a platform that is capable of filtering hate speech against queer communities. Barua identifies as non-binary and goes by the gender-neutral pronouns "they/them."

Barua spent a lot of their childhood drawing, which they posted online as they got older. The response was more than just unkind. It was hateful. While they were studying in Bengaluru, trying to finish their Bachelor of Technology, the trolling still hadn’t stopped. By the time they started working, the hateful messages had gotten so bad that people online ended up doxing their sister and threatening the life of her child.

Barua realised that they could leverage their expertise in Indic natural language processing to create some sort of technology that could identify hate speech and filter it out.

<div class="paragraphs"><p>Aindriya Barua, founder of ShhorAI (Source: ShhorAI website)</p></div>

Aindriya Barua, founder of ShhorAI (Source: ShhorAI website)

Natural Language Processing, or NLP, is a field of computer science and artificial intelligence that leverages machine learning to teach computers how to understand and communicate in human language. Indic NLP is simply teaching a machine the languages that are spoken in India.

Building large language models in India is already a challenge, given the lack of appropriate datasets. Despite that, a few AI startups have managed to create a few that are able to understand and respond in 12 of India’s major languages.

Barua was optimistic about finding the right datasets for what they wanted to do, but soon realised that was easier said than done. “I realised there are no specific datasets for this specific use case.”

But their desire to identify hate speech was already hindered by a few factors:

1) Big Tech’s content moderation doesn’t really account for abuse in Indian languages, therefore allowing hateful content to bypass existing filters.

2) Even if there are filters, a combination of numbers, letters and symbols can often bypass existing filters because they’re alphabet-based.

3) How abuse is spelled can often differ from person to person and region to region, thereby making it even harder for there to be just one word for a filter to flag.

“People may speak one language but they type online in English because keyboards are usually configured that way,” says Barua. In NLP, this phenomenon is called code-mixed language.

After realising that no such datasets exist, Barua ended up leveraging their social media to put out a call asking for links to hate speech, either directed at individuals or instances of it online. All those instances are automatically fed into a spreadsheet, which is still being updated with new data today.

But to create a dataset that could train an AI, Barua had to manually tag each link sent to them as hate speech or not hate speech. Through a United Nations Population Fund hackathon, they managed to get the help of 45 volunteers to help tag data. So far, they’ve managed to create a dataset which contains 45,000 instances of hate speech. Barua says the current dataset is enough for an AI to learn from.

Current Implementation

As proof of the concept of ShhorAI for the UNFPA hackathon, Barua ended up creating a bot for Reddit that flags content as hate speech in a subreddit (a forum on Reddit for discussing specific topics). “If a user is detected using hate speech, the bot gives users a warning; after three warnings, they’re banned from the subreddit.”

Banned users can still appeal the ban, which is where the human aspect of content moderation comes in. A human moderator will review the messages that caused a user to be banned. They determine whether the ban will be reversed or maintained.

Barua explains that while the Reddit bot was used for proof of concept, they don’t see ShhorAI solving hate speech against queer people through an UI-based or app-based approach. “Such an approach will never be a global solution; it won’t hold abusers accountable,” they say, adding, “All you’re doing is closing your eyes with a UI-based approach.”

They explain that ShhorAI has more application use cases as an application programming interface to slot into other systems. Basically, APIs are a set of rules and protocols that let different pieces of software communicate and exchange data.

The easiest way to explain it is when you’re looking at weather data on your phone. What’s actually happening is that the data being captured by India’s weather department is being accessed by an API on your phone to provide the user with the data.

Expanding ShhorAI

As they worked on ShhorAI, Barua came to realise that, while trying to make the internet a safer space for the queer community, there was an intersection between other issues that they couldn’t ignore.

“A lot of queer people depend on crowdfunding, but when a Dalit queer person calls for crowdfunding, they get a lot more abuse; they get casteist slurs along with queer-phobic slurs,” says Barua. As a result of this realisation, they expanded the ambit of ShhorAI into eight categories, namely: queerphobia, general sexism, political hate, communal violence, casteism, racism, ableism and general violence.

In fact, during the general elections held earlier this year, Barua saw an uptick in hate speech. “I figured out that most of the political hate speech had a very big intersection with communal hate speech,” says Barua. They’ve gone on to use the data gathered during the elections to further train ShhorAI.

Future For ShhorAI

“Hate speech doesn’t just happen on large social media platforms,” says Barua, pointing to dating apps like Grindr, which are popular among the LGBTQIA+ community. Apart from social media, they point to several other spaces that require a solution, like ShhorAI.

Online videogames are often a hotbed of racism, sexism and hate, which Barua says is a space worth targeting as a business. Barua believes that they can make a strong case for companies adopting ShhorAI to make sure their users feel safe while also fostering a better community.

When it comes to larger social media platforms, the Tripura native says that Big Tech should approach it like they’ve done with fake news, i.e., getting help from a third party.

Companies like Meta and Google have tie-ups with various fact-checking news organisations in India in an attempt to prevent fake news from being propagated on their platforms. Why shouldn’t they do the same thing with content moderation? asks Barua. “I think for hate speech and content moderation, eventually they should come down to third parties because it isn’t possible for them to understand the cultural nuances of every country.”

For now, however, Barua wants to start small, given that their operation is completely bootstrapped. “I’m trying to get some funding from civil society bodies,” they said.

ShhorAI is being worked on completely by Barua and 45 volunteers. They’re hoping to acquire clients like schools or offices, which often run their own intranets, considering they’re more likely to be willing to adopt such a technology.

Opinion
Gen AI To Have Most Immediate Impact On Forecast, Budget Variances In Finance: Gartner