Faculty of Computer Science and Information Technology (FCSIT) Universiti Malaysia Sarawak (UNIMAS)
ABOUT This organization hosts speech and language datasets developed by the Sarawak Language Technology (SaLT) research group at UNIMAS FCSIT. Our focus is on low-resource languages and dialects spoken in Sarawak, Malaysia — particularly Sarawak Malay and Iban.
DATASETS
sarawak-malay-asr Language : Sarawak Malay (ms) Utterances : 1,164 Duration : ~1.9 hours Speakers : 42 Task : Automatic Speech Recognition Link : https://huggingface.co/datasets/SaLTUNIMAS/sarawak-malay-asr
iban-speech Language : Iban (iba) Utterances : ~2,977 Duration : ~8 hours Task : Automatic Speech Recognition Link : https://huggingface.co/datasets/SaLTUNIMAS/iban-speech Source : https://github.com/sarahjuan/iban
LANGUAGES COVERED
QUICK START pip install datasets
from datasets import load_dataset ds = load_dataset("SaLTUNIMAS/sarawak-malay-asr")
ds = load_dataset("SaLTUNIMAS/iban-speech")
CONTACT Faculty of Computer Science and Information Technology (FCSIT) Universiti Malaysia Sarawak (UNIMAS) 94300 Kota Samarahan, Sarawak, Malaysia https://www.fcsit.unimas.my/