Chat2Find has announced the public release of 255M+ major trilingual (Sinhala/Tamil/Singlish/Tanglish) conversational datasets through LankaData.Net and OpenSources Platform Hugging Face marking a key milestone in Sri Lanka’s growing AI ecosystem.
Developed by Chat2Find and released as open-source, the dataset enables researchers, developers, and institutions to access high-quality, locally relevant data -an area that has long limited AI innovation in Sri Lanka. It is released under the MIT License and is suitable for continual pre-training (CPT) and supervised fine-tuning (SFT).

Access the dataset:
Data Corpus | Hugging Face: https://lnkd.in/gyx9fFBy
Data Conversations | Hugfing Face : https://lnkd.in/gECtWQH2
LankaData | Local Repository: https://lnkd.in/g-FjKuqP
This release positions LankaData as a key hub for open AI resources in Sri Lanka, supporting the next wave of locally grounded AI development.
















