Chat2Find Publishes 255M+ Token Sri Lankan Trilingual Data-Sets on Hugging Face and Lanka Data

Chat2Find has announced the public release of 255M+ major trilingual (Sinhala/Tamil/Singlish/Tanglish) conversational datasets through LankaData.Net and OpenSources Platform Hugging Face marking a key milestone in Sri Lanka’s growing AI ecosystem.

Developed by Chat2Find and released as open-source, the dataset enables researchers, developers, and institutions to access high-quality, locally relevant data -an area that has long limited AI innovation in Sri Lanka. It is released under the MIT License and is suitable for continual pre-training (CPT) and supervised fine-tuning (SFT).



Access the dataset:

Data Corpus | Hugging Face: https://lnkd.in/gyx9fFBy
Data Conversations | Hugfing Face : https://lnkd.in/gECtWQH2
LankaData | Local Repository: https://lnkd.in/g-FjKuqP

This release positions LankaData as a key hub for open AI resources in Sri Lanka, supporting the next wave of locally grounded AI development.


https://lankadata.net

About

Sri Lanka’s First-ever AI Assistant for Business and Investments

LankaBIZ: AI Assistant for Business & Investments

Find answers to queries relating to Sri Lanka economy, Business regulations, Corporate Analysis & Stock Market Research

ResearchHUB

Digital Market Place to download AI generated Research & Analytical Reports of Sri Lankan Companies and Industries

Recent Post