publications | Saṁbhāṣaṇa Research Group

2025

BHASHA ’25
Findings of the IndicGEC and IndicWG Shared Task at BHASHA 2025

Pramit Bhattacharyya, Karthika N J, Hrishikesh Terdalkar, and 4 more authors

In Proceedings of the 1st Workshop on Benchmarks, Harmonization, Annotation, and Standardization for Human-Centric AI in Indian Languages (BHASHA 2025), Dec 2025

Abs Bib PDF

This overview paper presents the findings of the two shared tasks organized as part of the 1st Workshop on Benchmarks, Harmonization, Annotation, and Standardization for Human-Centric AI in Indian Languages (BHASHA) co-located with IJCNLP-AACL 2025. The shared tasks are: (1) Indic Grammar Error Correction (IndicGEC) and (2) Indic Word Grouping (IndicWG). For GEC, participants were tasked with producing grammatically correct sentences based on given input sentences in five Indian languages. For WG, participants were required to generate a word-grouped variant of a provided sentence in Hindi. The evaluation metric used for GEC was GLEU, while Exact Matching was employed for WG. A total of 14 teams participated in the final phase of the Shared Task 1; 2 teams participated in the final phase of Shared Task 2. The maximum GLEU scores obtained for Hindi, Bangla, Telugu, Tamil and Malayalam languages are respectively 85.69, 95.79, 88.17, 91.57 and 96.02 for the IndicGEC shared task. The highest exact matching score obtained for IndicWG shared task is 45.13%.
@inproceedings{bhattacharyya-etal-2025-findings, title = {Findings of the {I}ndic{GEC} and {I}ndic{WG} Shared Task at {BHASHA} 2025}, author = {Bhattacharyya, Pramit and N J, Karthika and Terdalkar, Hrishikesh and Jagadeeshan, Manoj Balaji and Nigam, Shubham Kumar and Susmitha, Arvapalli Sai and Bhattacharya, Arnab}, booktitle = {Proceedings of the 1st Workshop on Benchmarks, Harmonization, Annotation, and Standardization for Human-Centric AI in Indian Languages (BHASHA 2025)}, month = dec, year = {2025}, address = {Mumbai, India}, publisher = {Association for Computational Linguistics}, url = {https://aclanthology.org/2025.bhasha-1.12/}, pages = {127--134}, }
BHASHA ’25
BHRAM-IL: A Benchmark for Hallucination Recognition and Assessment in Multiple Indian Languages

Hrishikesh Terdalkar, Kirtan Bhojani, Aryan Dongare, and 1 more author

In Proceedings of the 1st Workshop on Benchmarks, Harmonization, Annotation, and Standardization for Human-Centric AI in Indian Languages (BHASHA 2025), Dec 2025

Abs arXiv Bib PDF Code

Large language models (LLMs) are increasingly deployed in multilingual applications but often generate plausible yet incorrect or misleading outputs, known as hallucinations. While hallucination detection has been studied extensively in English, under-resourced Indian languages remain largely unexplored. We present BHRAM-IL, a benchmark for hallucination recognition and assessment in multiple Indian languages, covering Hindi, Gujarati, Marathi, Odia, along with English. The benchmark comprises 36,047 curated questions across nine categories spanning factual, numerical, reasoning, and linguistic tasks. We evaluate 14 state-of-the-art multilingual LLMs on a benchmark subset of 10,265 questions, analyzing cross-lingual and factual hallucinations across languages, models, scales, categories, and domains using category-specific metrics normalized to (0,1) range. Aggregation over all categories and models yields a primary score of 0.23 and a language-corrected fuzzy score of 0.385, demonstrating the usefulness of BHRAM-IL for hallucination-focused evaluation. The dataset, and the code for generation and evaluation are available on GitHub (https://github.com/sambhashana/BHRAM-IL/) and HuggingFace (https://huggingface.co/datasets/sambhashana/BHRAM-IL/) to support future research in multilingual hallucination detection and mitigation.
@inproceedings{terdalkar-etal-2025-bhram, title = {{BHRAM}-{IL}: A Benchmark for Hallucination Recognition and Assessment in Multiple {I}ndian Languages}, author = {Terdalkar, Hrishikesh and Bhojani, Kirtan and Dongare, Aryan and Behera, Omm Aditya}, booktitle = {Proceedings of the 1st Workshop on Benchmarks, Harmonization, Annotation, and Standardization for Human-Centric AI in Indian Languages (BHASHA 2025)}, month = dec, year = {2025}, address = {Mumbai, India}, publisher = {Association for Computational Linguistics}, url = {https://aclanthology.org/2025.bhasha-1.9/}, pages = {102--116}, }

Prior Work

2025

ACL ’25
A Case Study of Cross-Lingual Zero-Shot Generalization for Classical Languages in LLMs

V.S.D.S.Mahesh Akavarapu, Hrishikesh Terdalkar, Pramit Bhattacharyya, and 5 more authors

In Findings of the Association for Computational Linguistics: ACL 2025, Jul 2025

Abs DOI arXiv Bib PDF Code

Large Language Models (LLMs) have demonstrated remarkable generalization capabilities across diverse tasks and languages. In this study, we focus on natural language understanding in three classical languages—Sanskrit, Ancient Greek and Latin—to investigate the factors affecting cross-lingual zero-shot generalization. First, we explore named entity recognition and machine translation into English. While LLMs perform equal to or better than fine-tuned baselines on out-of-domain data, smaller models often struggle, especially with niche or abstract entity types. In addition, we concentrate on Sanskrit by presenting a factoid question–answering (QA) dataset and show that incorporating context via retrieval-augmented generation approach significantly boosts performance. In contrast, we observe pronounced performance drops for smaller LLMs across these QA tasks. These results suggest model scale as an important factor influencing cross-lingual generalization. Assuming that models used such as GPT-4o and Llama-3.1 are not instruction fine-tuned on classical languages, our findings provide insights into how LLMs may generalize on these languages and their consequent utility in classical studies.
@inproceedings{akavarapu-etal-2025-case, title = {A Case Study of Cross-Lingual Zero-Shot Generalization for Classical Languages in {LLM}s}, author = {Akavarapu, V.S.D.S.Mahesh and Terdalkar, Hrishikesh and Bhattacharyya, Pramit and Agarwal, Shubhangi and Deulgaonkar, Vishakha and Dangarikar, Chaitali and Manna, Pralay and Bhattacharya, Arnab}, booktitle = {Findings of the Association for Computational Linguistics: ACL 2025}, month = jul, year = {2025}, address = {Vienna, Austria}, publisher = {Association for Computational Linguistics}, url = {https://aclanthology.org/2025.findings-acl.141/}, doi = {10.18653/v1/2025.findings-acl.141}, pages = {2745--2761}, }
GRADES-NDA ’25
Graph Repairs with Large Language Models: An Empirical Study

Hrishikesh Terdalkar, Angela Bonifati, and Andrea Mauri

In Proceedings of the 8th Joint Workshop on Graph Data Management Experiences & Systems (GRADES) and Network Data Analytics (NDA), Berlin, Germany, 2025

Abs DOI arXiv Bib PDF Code

Property graphs are widely used in domains such as healthcare, finance, and social networks, but they often contain errors due to inconsistencies, missing data, or schema violations. Traditional rule-based and heuristic-driven graph repair methods are limited in their adaptability as they need to be tailored for each dataset. On the other hand, interactive human-in-the-loop approaches may become infeasible when dealing with large graphs, as the cost-both in terms of time and effort-of involving users becomes too high. Recent advancements in Large Language Models (LLMs) present new opportunities for automated graph repair by leveraging contextual reasoning and their access to real-world knowledge. We evaluate the effectiveness of six open-source LLMs in repairing property graphs. We assess repair quality, computational cost, and model-specific performance. Our experiments show that LLMs have the potential to detect and correct errors, with varying degrees of accuracy and efficiency. We discuss the strengths, limitations, and challenges of LLM-driven graph repair and outline future research directions for improving scalability and interpretability.
@inproceedings{terdalkar2025repair, title = {Graph Repairs with Large Language Models: An Empirical Study}, author = {Terdalkar, Hrishikesh and Bonifati, Angela and Mauri, Andrea}, booktitle = {Proceedings of the 8th Joint Workshop on Graph Data Management Experiences \& Systems (GRADES) and Network Data Analytics (NDA)}, year = {2025}, publisher = {Association for Computing Machinery}, url = {https://doi.org/10.1145/3735546.3735859}, doi = {10.1145/3735546.3735859}, pages = {1--10}, location = {Berlin, Germany}, }

2024

PACLIC ’24
Aganittyam: Learning Tamil Grammar through Knowledge Graph based Templatized Question Answering

Mithilesh K, Amarjit Madhumalararungeethayan, Dharanish Rahul S, and 3 more authors

In Proceedings of the 38th Pacific Asia Conference on Language, Information and Computation, Dec 2024

Abs Bib PDF

In this work, we introduce a novel Grammar Question-Answering System (Aganittyam) and its associated corpus on the dravidian language Tamil. It is one of the oldest surviving languages, with a documented history spanning over 2,000 years. Tamil is a classical language and it is official in three countries including India and various diasporic communities around the world speak it. Learning Tamil Grammar is still challenging due to its Agglutination and Complex Morphology. We created a Tamil Grammar Corpus focusing all kinds of learners and manually annotated the corpus since automated tools are not efficient enough. This made us to create a ontology on entity types and relationship types for the same. We identified entities and relationships and store the resultant triplets (subject-predicate–object) as a Knowledge Graph (KG) consisting of 63,587 entities. We also developed a framework for templatized Question-Answering along with it. We performed bi-fold evaluation (Query metrics and Human-Centric based) with thorough experimentation and show that our QA system is robust, reliable and fun in answering various objective questions.
@inproceedings{k-etal-2024-aganittyam, title = {Aganittyam: Learning {T}amil Grammar through Knowledge Graph based Templatized Question Answering}, author = {K, Mithilesh and Madhumalararungeethayan, Amarjit and S, Dharanish Rahul and Balan, Abhijith and C, Oswald and Terdalkar, Hrishikesh}, booktitle = {Proceedings of the 38th Pacific Asia Conference on Language, Information and Computation}, month = dec, year = {2024}, address = {Tokyo, Japan}, publisher = {Tokyo University of Foreign Studies}, url = {https://aclanthology.org/2024.paclic-1.81/}, pages = {838--852}, }
Book
Samanvaya: An Interlingua for Unity of Indian Languages

Chaitali Dangarikar, Arnab Bhattacharya, Karthika N J, and 6 more authors

Oct 2024

Abs Bib

Interlingua serves as a constructed bridge language, designed to simplify communication and analysis across diverse natural languages by identifying commonalities in structure and meaning. Samanvaya, our proposed interlingua for Indian languages, embodies this principle by unifying linguistic features shared across these languages, facilitating deeper linguistic analysis and cross-lingual understanding while preserving their unique cultural identities.
@book{samanvaya2024, title = {Samanvaya: An Interlingua for Unity of Indian Languages}, author = {Dangarikar, Chaitali and Bhattacharya, Arnab and N J, Karthika and Terdalkar, Hrishikesh and Bhattacharyya, Pramit and Kulkarni, Annarao and Lakkundi, Chaitanya S and Ramakrishnan, Ganesh and V, Shivani}, year = {2024}, month = oct, publisher = {Central Sanskrit University}, address = {India}, isbn = {978-93-48435-21-7}, }

2023

NLP-OSS ’23
Antarlekhaka: A Comprehensive Tool for Multi-task Natural Language Annotation

Hrishikesh Terdalkar and Arnab Bhattacharya

In Proceedings of the 3rd Workshop for Natural Language Processing Open Source Software (NLP-OSS 2023), Dec 2023

Abs DOI arXiv Bib PDF Code Slides

One of the primary obstacles in the advancement of Natural Language Processing (NLP) technologies for low-resource languages is the lack of annotated datasets for training and testing machine learning models. In this paper, we present \\textitAntarlekhaka, a tool for manual annotation of a comprehensive set of tasks relevant to NLP. The tool is Unicode-compatible, language-agnostic, Web-deployable and supports distributed annotation by multiple simultaneous annotators. The system sports user-friendly interfaces for 8 categories of annotation tasks. These, in turn, enable the annotation of a considerably larger set of NLP tasks. The task categories include two linguistic tasks not handled by any other tool, namely, sentence boundary detection and deciding canonical word order, which are important tasks for text that is in the form of poetry. We propose the idea of \\textitsequential annotation based on small text units, where an annotator performs several tasks related to a single text unit before proceeding to the next unit. The research applications of the proposed mode of multi-task annotation are also discussed. Antarlekhaka outperforms other annotation tools in objective evaluation. It has been also used for two real-life annotation tasks on two different languages, namely, Sanskrit and Bengali. The tool is available at \\urlhttps://github.com/Antarlekhaka/code
@inproceedings{terdalkar2023antarlekhaka, title = {Antarlekhaka: A Comprehensive Tool for Multi-task Natural Language Annotation}, author = {Terdalkar, Hrishikesh and Bhattacharya, Arnab}, booktitle = {Proceedings of the 3rd Workshop for Natural Language Processing Open Source Software (NLP-OSS 2023)}, month = dec, year = {2023}, address = {Singapore}, publisher = {Association for Computational Linguistics}, url = {https://aclanthology.org/2023.nlposs-1.23}, doi = {10.18653/v1/2023.nlposs-1.23}, pages = {199--211}, }
NYCIKS ’23
Āyurjñānam: Exploring Āyurveda using Knowledge Graphs

Hrishikesh Terdalkar, Vishakha Deulgaonkar, and Arnab Bhattacharya

2023

Presented at the National Youth Conference on Indian Knowledge Systems 2023

Best Poster Award Abs Bib

Best Poster Award

The Bṛhat-Trayī, consisting of Carakasaṃhitā, Suśrutasaṃhitā, and Aṣṭāṅgahṛdaya, is an encyclopaedic reference set in Āyurveda. However, the need for simpler texts led to the emergence of the Laghu-Trayī that includes Mādhavanidāna, Śārṅgadharasaṃhitā, and Bhāvaprakāśa. Authored by Ācārya Bhāvamiśra in the 16th century CE, Bhāvaprakāśa is a comprehensive work focused on medicine. The classification system of varga in its nighaṇṭu section, Bhāvaprakāśanighaṇṭu, categorizes substances based on type, origin, and medicinal properties. This valuable resource assists practitioners and researchers in Āyurveda. We present this information in an accessible manner to promote wider utilization of this knowledge. We create a robust ontology to capture the semantic information of medicinal substances, designing user-friendly interfaces for efficient annotation and curation, perform meticulous manual annotation on Bhāvaprakāśanighaṇṭu, and construct an accurate knowledge graph from three chapters of Bhāvaprakāśanighaṇṭu. The system is accessible at https://sanskrit.iitk.ac.in/ayurveda/.
@misc{terdalkar2023ayurjnanam, title = {{Āyurjñānam}: Exploring {Āyurveda} using Knowledge Graphs}, author = {Terdalkar, Hrishikesh and Deulgaonkar, Vishakha and Bhattacharya, Arnab}, note = {Presented at the National Youth Conference on Indian Knowledge Systems 2023}, year = {2023}, url = {https://sanskrit.iitk.ac.in/ayurveda/}, }
PhD
Sanskrit Knowledge-based Systems: Annotation and Computational Tools

Hrishikesh Terdalkar

Jun 2023

Available at \urlhttps://etd.iitk.ac.in:8443/jspui/handle/123456789/21176

Abs Bib PDF Slides

We address the challenges and opportunities in the development of knowledge systems for Sanskrit, with a focus on question answering. By proposing a framework for the automated construction of knowledge graphs, introducing annotation tools for ontology-driven and general-purpose tasks, and offering a diverse collection of web-interfaces, tools, and software libraries, we have made significant contributions to the field of computational Sanskrit. These contributions not only enhance the accessibility and accuracy of Sanskrit text analysis but also pave the way for further advancements in knowledge representation and language processing. Ultimately, this research contributes to the preservation, understanding, and utilization of the rich linguistic information embodied in Sanskrit texts.
@phdthesis{terdalkar2023sanskrit, title = {{Sanskrit Knowledge-based Systems: Annotation and Computational Tools}}, author = {Terdalkar, Hrishikesh}, school = {Indian Institute of Technology Kanpur}, year = {2023}, month = jun, type = {PhD Thesis}, note = {Available at \url{https://etd.iitk.ac.in:8443/jspui/handle/123456789/21176}}, }
WSC ’23
Vaiyyākaraṇaḥ: A Sanskrit Grammar Bot for Telegram

Hrishikesh Terdalkar, V S D S Mahesh Akavarapu, Shubhangi Agarwal, and 1 more author

Jan 2023

Presented at the 18th World Sanskrit Conference

Abs Bib PDF Code

\\textitVaiyyākaraṇaḥ is a Telegram bot aimed towards helping the learners of Sanskrit grammar (vyākaraṇa). The salient features of \\textitVaiyyākaraṇaḥ are: stem finder (prātipadikam), declension generation (subantāḥ), root finder (dhātuḥ), conjugation generation (tiṅantāḥ) and word segmentation (sandhisamāsau). State-of-the-art datasets, tools and technologies are used to offer these capabilities.
@misc{terdalkar2023vaiyyakarana, title = {{Vaiyyākaraṇaḥ}: A Sanskrit Grammar Bot for Telegram}, author = {Terdalkar, Hrishikesh and Akavarapu, V S D S Mahesh and Agarwal, Shubhangi and Bhattacharya, Arnab}, note = {Presented at the 18th World Sanskrit Conference}, month = jan, year = {2023}, address = {The Australian National University, Canberra, Australia}, url = {https://t.me/vyakarana_bot}, }
WSC ’23
Semantic Annotation and Querying Framework based on Semi-structured Ayurvedic Text

Hrishikesh Terdalkar, Arnab Bhattacharya, Madhulika Dubey, and 2 more authors

In Proceedings of the Computational Sanskrit & Digital Humanities: Selected papers presented at the 18th World Sanskrit Conference, Jan 2023

Abs arXiv Bib PDF Slides

Knowledge bases (KB) are an important resource in a number of natural language processing (NLP) and information retrieval (IR) tasks, such as semantic search, automated question-answering etc. They are also useful for researchers trying to gain information from a text. Unfortunately, however, the state-of-the-art in Sanskrit NLP does not yet allow automated construction of knowledge bases due to unavailability or lack of sufficient accuracy of tools and methods. Thus, in this work, we describe our efforts on manual annotation of Sanskrit text for the purpose of knowledge graph (KG) creation. We choose the chapter Dhānyavarga from Bhāvaprakāśanighaṇṭu of the Ayurvedic text Bhāvaprakāśa for annotation. The constructed knowledge graph contains 410 entities and 764 relationships. Since Bhāvaprakāśanighaṇṭu is a technical glossary text that describes various properties of different substances, we develop an elaborate ontology to capture the semantics of the entity and relationship types present in the text. To query the knowledge graph, we design 31 query templates that cover most of the common question patterns. For both manual annotation and querying, we customize the Sangrahaka framework previously developed by us. The entire system including the dataset is available from https://sanskrit.iitk.ac.in/ayurveda. We hope that the knowledge graph that we have created through manual annotation and subsequent curation will help in development and testing of NLP tools in future as well as studying of the Bhāvaprakāśanighaṇṭu text.
@inproceedings{terdalkar2023semantic, title = {Semantic Annotation and Querying Framework based on Semi-structured Ayurvedic Text}, author = {Terdalkar, Hrishikesh and Bhattacharya, Arnab and Dubey, Madhulika and Ramamurthy, S and Singh, Bhavna Naneria}, booktitle = {Proceedings of the Computational {S}anskrit {\&} Digital Humanities: Selected papers presented at the 18th World {S}anskrit Conference}, month = jan, year = {2023}, address = {Canberra, Australia (Online mode)}, publisher = {Association for Computational Linguistics}, url = {https://aclanthology.org/2023.wsc-csdh.11}, pages = {155--173}, }
WSC ’23
Jñānasaṅgrahaḥ: A Collection of Computational Applications related to Sanskrit

Hrishikesh Terdalkar and Arnab Bhattacharya

Jan 2023

Presented at the 18th World Sanskrit Conference

Abs Bib PDF

\\textitJñānasaṅgrahaḥ is a web-based collection of several computational applications related to the Sanskrit language. The aim is to highlight the features of Sanskrit language in a way that is approachable for an enthusiastic user, even if she has a limited Sanskrit background.
@misc{terdalkar2023jnanasangraha, title = {{Jñānasaṅgrahaḥ}: A Collection of Computational Applications related to Sanskrit}, author = {Terdalkar, Hrishikesh and Bhattacharya, Arnab}, note = {Presented at the 18th World Sanskrit Conference}, month = jan, year = {2023}, address = {The Australian National University, Canberra, Australia}, url = {https://sanskrit.iitk.ac.in/jnanasangraha/}, }
WSC ’23
PyCDSL: A Programmatic Interface to Cologne Digital Sanskrit Dictionaries

Hrishikesh Terdalkar and Arnab Bhattacharya

Jan 2023

Presented at the 18th World Sanskrit Conference

Abs Bib PDF Code

\\textitPyCDSL is a Python library that provides programmer friendly interface to Cologne Digital Sanskrit Dictionaries (CDSD). The library serves as a corpus management tool to download, update and access dictionaries from CDSD. The tool provides a command line interface for ease of search and a programmable interface for using CDSD in computational linguistic projects written in Python 3.
@misc{terdalkar2023pycdsl, title = {{PyCDSL}: A Programmatic Interface to {C}ologne Digital {S}anskrit Dictionaries}, author = {Terdalkar, Hrishikesh and Bhattacharya, Arnab}, note = {Presented at the 18th World Sanskrit Conference}, month = jan, year = {2023}, address = {The Australian National University, Canberra, Australia}, url = {https://pypi.org/project/PyCDSL/}, }
WSC ’23
Chandojnanam: A Sanskrit Meter Identification and Utilization System

Hrishikesh Terdalkar and Arnab Bhattacharya

In Proceedings of the Computational Sanskrit & Digital Humanities: Selected papers presented at the 18th World Sanskrit Conference, Jan 2023

Abs arXiv Bib PDF Code Slides

We present \\textitChandojñānam, a web-based Sanskrit meter (\\textitChanda) identification and utilization system. In addition to the core functionality of identifying meters, it sports a friendly user interface to display the scansion, which is a graphical representation of the metrical pattern. The system supports identification of meters from uploaded images by using optical character recognition (OCR) engines in the backend. It is also able to process entire text files at a time. The text can be processed in two modes, either by treating it as a list of individual lines, or as a collection of verses. When a line or a verse does not correspond exactly to a known meter, \\textitChandojñānam is capable of finding fuzzy (i.e., approximate and close) matches based on sequence matching. This opens up the scope of a meter based correction of erroneous digital corpora. The system is available for use at https://sanskrit.iitk.ac.in/jnanasangraha/chanda/, and the source code in the form of a Python library is made available at https://github.com/hrishikeshrt/chanda/.
@inproceedings{terdalkar2023chandojnanam, title = {Chandojnanam: A {S}anskrit Meter Identification and Utilization System}, author = {Terdalkar, Hrishikesh and Bhattacharya, Arnab}, booktitle = {Proceedings of the Computational {S}anskrit {\&} Digital Humanities: Selected papers presented at the 18th World {S}anskrit Conference}, month = jan, year = {2023}, address = {Canberra, Australia (Online mode)}, publisher = {Association for Computational Linguistics}, url = {https://aclanthology.org/2023.wsc-csdh.8}, pages = {113--127}, }

2022

COLING ’22
A Novel Multi-Task Learning Approach for Context-Sensitive Compound Type Identification in Sanskrit

Jivnesh Sandhan, Ashish Gupta, Hrishikesh Terdalkar, and 4 more authors

In Proceedings of the 29th International Conference on Computational Linguistics, Oct 2022

Abs arXiv Bib PDF Code

The phenomenon of compounding is ubiquitous in Sanskrit. It serves for achieving brevity in expressing thoughts, while simultaneously enriching the lexical and structural formation of the language. In this work, we focus on the Sanskrit Compound Type Identification (SaCTI) task, where we consider the problem of identifying semantic relations between the components of a compound word. Earlier approaches solely rely on the lexical information obtained from the components and ignore the most crucial contextual and syntactic information useful for SaCTI. However, the SaCTI task is challenging primarily due to the implicitly encoded context-sensitive semantic relation between the compound components. Thus, we propose a novel multi-task learning architecture which incorporates the contextual information and enriches the complementary syntactic information using morphological tagging and dependency parsing as two auxiliary tasks. Experiments on the benchmark datasets for SaCTI show 6.1 points (Accuracy) and 7.7 points (F1-score) absolute gain compared to the state-of-the-art system. Further, our multi-lingual experiments demonstrate the efficacy of the proposed architecture in English and Marathi languages.
@inproceedings{sandhan2022compound, title = {A Novel Multi-Task Learning Approach for Context-Sensitive Compound Type Identification in {S}anskrit}, author = {Sandhan, Jivnesh and Gupta, Ashish and Terdalkar, Hrishikesh and Sandhan, Tushar and Samanta, Suvendu and Behera, Laxmidhar and Goyal, Pawan}, booktitle = {Proceedings of the 29th International Conference on Computational Linguistics}, month = oct, year = {2022}, address = {Gyeongju, Republic of Korea}, publisher = {International Committee on Computational Linguistics}, url = {https://aclanthology.org/2022.coling-1.358}, pages = {4071--4083}, }

2021

ESEC/FSE ’21
Sangrahaka: A Tool for Annotating and Querying Knowledge Graphs

Hrishikesh Terdalkar and Arnab Bhattacharya

In Proceedings of the 29th ACM Joint Meeting on European Software Engineering Conference and Symposium on the Foundations of Software Engineering, Athens, Greece, 2021

Best Software Award Abs DOI arXiv Bib PDF Code Slides

Best Software Award at 57th Convocation IITK

We present a web-based tool \\emphSangrahaka for annotating entities and relationships from text corpora towards construction of a knowledge graph and subsequent querying using templatized natural language questions. The application is language and corpus agnostic, but can be tuned for specific needs of a language or a corpus. The application is freely available for download and installation. Besides having a user-friendly interface, it is fast, supports customization, and is fault tolerant on both client and server side. It outperforms other annotation tools in an objective evaluation metric. The framework has been successfully used in two annotation tasks.
@inproceedings{terdalkar2021sangrahaka, title = {Sangrahaka: A Tool for Annotating and Querying Knowledge Graphs}, author = {Terdalkar, Hrishikesh and Bhattacharya, Arnab}, booktitle = {Proceedings of the 29th ACM Joint Meeting on European Software Engineering Conference and Symposium on the Foundations of Software Engineering}, year = {2021}, publisher = {Association for Computing Machinery}, url = {https://doi.org/10.1145/3468264.3473113}, doi = {10.1145/3468264.3473113}, pages = {1520--1524}, location = {Athens, Greece}, }

2019

ISCLS ’19
Framework for Question-Answering in Sanskrit through Automated Construction of Knowledge Graphs

Hrishikesh Terdalkar and Arnab Bhattacharya

In Proceedings of the 6th International Sanskrit Computational Linguistics Symposium, Oct 2019

Abs arXiv Bib PDF Slides

Sanskrit (\\emphSaṃskṛta) enjoys one of the largest and most varied literature in the whole world. Extracting the knowledge from it, however, is a challenging task due to multiple reasons including complexity of the language and paucity of standard natural language processing tools. In this paper, we target the problem of building knowledge graphs for particular types of relationships from Saṃskṛta texts. We build a natural language question-answering system in Saṃskṛta that uses the knowledge graph to answer factoid questions. We design a framework for the overall system and implement two separate instances of the system on human relationships from Mahābhārata and Rāmāyaṇa, and one instance on synonymous relationships from Bhāvaprakāśa Nighaṇṭu, a technical text from Āyurveda. We show that about 50% of the factoid questions can be answered correctly by the system. More importantly, we analyse the shortcomings of the system in detail for each step, and discuss the possible ways forward.
@inproceedings{terdalkar2019framework, title = {Framework for Question-Answering in {S}anskrit through Automated Construction of Knowledge Graphs}, author = {Terdalkar, Hrishikesh and Bhattacharya, Arnab}, booktitle = {Proceedings of the 6th International Sanskrit Computational Linguistics Symposium}, month = oct, year = {2019}, address = {IIT Kharagpur, India}, publisher = {Association for Computational Linguistics}, url = {https://aclanthology.org/W19-7508}, pages = {97--116}, }
ISCLS ’19
KaTaPaYadi System

Hrishikesh Terdalkar and Arnab Bhattacharya

Oct 2019

Presented at the 6th International Sanskrit Computational Linguistics Symposium

Abs Bib PDF Slides

The \\emphKaṭapayādi system of encoding numbers as words by replacing each digit by a character was developed in ancient India. We present a web-based system that for conversion to and from the \\emphKaṭapayādi numbering scheme. It can both decode a word into its corresponding number, and can encode a number into word(s).
@misc{terdalkar2019katapayadi, title = {{KaTaPaYadi} System}, author = {Terdalkar, Hrishikesh and Bhattacharya, Arnab}, note = {Presented at the 6th International Sanskrit Computational Linguistics Symposium}, month = oct, year = {2019}, address = {IIT Kharagpur, India}, url = {https://sanskrit.iitk.ac.in/jnanasangraha/sankhya/katapayaadi/}, }