Community Dataset Creation

Language Models need data to be created - this task is tedious and often comes with a lot of biases. Even though commercial AI models are trained on a vast amount of texts, simple critical questions, popular non-Western feminist authorship, essential understanding of cultural multiplicities remain missing.

Collective Data

I’ve created a simple form which anyone can fill as many times as they like! It is an attempt to collect all those burning questions and discourses that you’ve found missing from commerical AI products. I hope that through this effort, we can understand and build community learned language models.

A question & answer set could be more factual like this:

Q: Who is Jacinta Kerketta?

A: Jacinta Kerketta (born 3 August 1983) is an Indian Hindi-language journalist, poet and activist. Her poetry and journalism discuss the Adivasi identity of youth, protests against the systemic oppression of Adivasis in India, gender-based violence, especially against women, displacement and questions the state apathy of governance. Forbes India named her one of India’s top 20 Self-Made Women list. Her first Hindi-English bilingual poetry collection Angor was translated into German, Italian and French. Her second Hindi-English bilingual poetry collection Jadon ki Zameen was translated into English and German. Ishwar aur Bazaar is her third poetry collection. Or like:

Q: Why can’t transcription ever be perfect?

A: Mary Bucholtz argues, ’there is no such thing as a perfect transcription.’ Spoken language includes ‘frequent false starts, repetitions, disfluencies, overlaps, interruptions.’ Parrish explains, ‘interpretive choices are always made in the act of transcription that reflect the biases, the attitudes, and the needs of the transcriber.’ Transcription is ‘an act of forgetting.’ No submission is too simple or basic! Just be thoughtful with your submission.

What’s a question you’d love Feminist AI to know about and answer well?