SaLT UNIMAS - SARAWAK LANGUAGE TECHNOLOGY

Faculty of Computer Science and Information Technology (FCSIT) Universiti Malaysia Sarawak (UNIMAS)

ABOUT This organization hosts speech and language datasets developed by the Sarawak Language Technology (SaLT) research group at UNIMAS FCSIT. Our focus is on low-resource languages and dialects spoken in Sarawak, Malaysia — particularly Sarawak Malay and Iban.

DATASETS

  1. sarawak-malay-asr Language : Sarawak Malay (ms) Utterances : 1,164 Duration : ~1.9 hours Speakers : 42 Task : Automatic Speech Recognition Link : https://huggingface.co/datasets/SaLTUNIMAS/sarawak-malay-asr

  2. iban-speech Language : Iban (iba) Utterances : ~2,977 Duration : ~8 hours Task : Automatic Speech Recognition Link : https://huggingface.co/datasets/SaLTUNIMAS/iban-speech Source : https://github.com/sarahjuan/iban

LANGUAGES COVERED

QUICK START pip install datasets

Load Sarawak Malay dataset

from datasets import load_dataset ds = load_dataset("SaLTUNIMAS/sarawak-malay-asr")

Load Iban dataset

ds = load_dataset("SaLTUNIMAS/iban-speech")

CONTACT Faculty of Computer Science and Information Technology (FCSIT) Universiti Malaysia Sarawak (UNIMAS) 94300 Kota Samarahan, Sarawak, Malaysia https://www.fcsit.unimas.my/