Google, African institutions launch speech dataset to advance inclusive AI for 100m people

Google-WAXAL-647×248

By Chinenye Anuforo

 

Google, in partnership with leading African research institutions, has launched WAXAL, a large-scale, openly accessible speech dataset aimed at expanding inclusive artificial intelligence (AI) development and giving more than 100 million Africans a voice in the digital future.

The initiative, announced in Accra was designed to bridge a longstanding digital gap by providing foundational speech data for 21 Sub-Saharan African languages, including Hausa, Yoruba, Luganda and Acholi, languages that have historically been underrepresented in global AI systems.

Developed over three years with funding from Google, the WAXAL dataset contains 1,250 hours of transcribed natural speech alongside over 20 hours of studio-quality recordings intended for building high-fidelity synthetic voices.

Despite the growing global adoption of voice-enabled technologies, limited availability of high-quality speech data has prevented their development for most of Africa’s more than 2,000 languages, effectively excluding hundreds of millions of people from accessing digital tools in their native tongues.

The WAXAL project was created to directly address this imbalance.

“The ultimate impact of WAXAL is the empowerment of people in Africa,” said Aisha Walcott-Bryant, Head of Google Research Africa. “This dataset provides a critical foundation for students, researchers and entrepreneurs to build technology on their own terms, in their own languages, finally reaching over 100 million people.”

She added that Google expects African innovators to use the dataset to develop new educational tools, voice-enabled services and digital products capable of generating real economic opportunities across the continent.

A key principle of the initiative, according to the organisers, was ensuring that the project was built by and for African communities. Data collection was led by African academic and community-based organisations, including Makerere University, the University of Ghana, and Digital Umuganda, with technical guidance from Google experts.

Under the partnership model, the African institutions retain full ownership of the data, establishing what the partners describe as a new, more equitable framework for AI development that prioritises local leadership and shared value.

The dataset spans languages spoken across East, West and Southern Africa, including Acholi, Akan, Dagaare, Dagbani, Dholuo, Ewe, Fante, Fulani, Hausa, Igbo, Kikuyu, Lingala, Luganda, Malagasy, Shona, Swahili and Yoruba, among others.

The WAXAL dataset became publicly available on Monday, marking what stakeholders describe as a significant step toward building AI systems that better reflect Africa’s linguistic and cultural diversity.

Breaking news & top stories

Stay connected with The Sun Newspaper

Get breaking news, exclusive stories, and live updates delivered straight to your phone. Join thousands of readers already following us on Whatsapp Channel and Telegram.

Breaking news & top stories

Follow The Sun Newspaper

Get live updates & exclusive stories delivered straight to your phone.