publications
research publications by Hrishikesh Terdalkar and the research group
2025
- BHASHA ’25Findings of the IndicGEC and IndicWG Shared Task at BHASHA 2025Pramit Bhattacharyya, Karthika N J, Hrishikesh Terdalkar, and 4 more authorsIn Proceedings of the 1st Workshop on Benchmarks, Harmonization, Annotation, and Standardization for Human-Centric AI in Indian Languages (BHASHA 2025), Dec 2025
This overview paper presents the findings of the two shared tasks organized as part of the 1st Workshop on Benchmarks, Harmonization, Annotation, and Standardization for Human-Centric AI in Indian Languages (BHASHA) co-located with IJCNLP-AACL 2025. The shared tasks are: (1) Indic Grammar Error Correction (IndicGEC) and (2) Indic Word Grouping (IndicWG). For GEC, participants were tasked with producing grammatically correct sentences based on given input sentences in five Indian languages. For WG, participants were required to generate a word-grouped variant of a provided sentence in Hindi. The evaluation metric used for GEC was GLEU, while Exact Matching was employed for WG. A total of 14 teams participated in the final phase of the Shared Task 1; 2 teams participated in the final phase of Shared Task 2. The maximum GLEU scores obtained for Hindi, Bangla, Telugu, Tamil and Malayalam languages are respectively 85.69, 95.79, 88.17, 91.57 and 96.02 for the IndicGEC shared task. The highest exact matching score obtained for IndicWG shared task is 45.13%.
@inproceedings{bhattacharyya-etal-2025-findings, title = {Findings of the {I}ndic{GEC} and {I}ndic{WG} Shared Task at {BHASHA} 2025}, author = {Bhattacharyya, Pramit and N J, Karthika and Terdalkar, Hrishikesh and Jagadeeshan, Manoj Balaji and Nigam, Shubham Kumar and Susmitha, Arvapalli Sai and Bhattacharya, Arnab}, booktitle = {Proceedings of the 1st Workshop on Benchmarks, Harmonization, Annotation, and Standardization for Human-Centric AI in Indian Languages (BHASHA 2025)}, month = dec, year = {2025}, address = {Mumbai, India}, publisher = {Association for Computational Linguistics}, url = {https://aclanthology.org/2025.bhasha-1.12/}, pages = {127--134}, } - BHASHA ’25BHRAM-IL: A Benchmark for Hallucination Recognition and Assessment in Multiple Indian LanguagesHrishikesh Terdalkar, Kirtan Bhojani, Aryan Dongare, and 1 more authorIn Proceedings of the 1st Workshop on Benchmarks, Harmonization, Annotation, and Standardization for Human-Centric AI in Indian Languages (BHASHA 2025), Dec 2025
Large language models (LLMs) are increasingly deployed in multilingual applications but often generate plausible yet incorrect or misleading outputs, known as hallucinations. While hallucination detection has been studied extensively in English, under-resourced Indian languages remain largely unexplored. We present BHRAM-IL, a benchmark for hallucination recognition and assessment in multiple Indian languages, covering Hindi, Gujarati, Marathi, Odia, along with English. The benchmark comprises 36,047 curated questions across nine categories spanning factual, numerical, reasoning, and linguistic tasks. We evaluate 14 state-of-the-art multilingual LLMs on a benchmark subset of 10,265 questions, analyzing cross-lingual and factual hallucinations across languages, models, scales, categories, and domains using category-specific metrics normalized to (0,1) range. Aggregation over all categories and models yields a primary score of 0.23 and a language-corrected fuzzy score of 0.385, demonstrating the usefulness of BHRAM-IL for hallucination-focused evaluation. The dataset, and the code for generation and evaluation are available on GitHub (https://github.com/sambhashana/BHRAM-IL/) and HuggingFace (https://huggingface.co/datasets/sambhashana/BHRAM-IL/) to support future research in multilingual hallucination detection and mitigation.
@inproceedings{terdalkar-etal-2025-bhram, title = {{BHRAM}-{IL}: A Benchmark for Hallucination Recognition and Assessment in Multiple {I}ndian Languages}, author = {Terdalkar, Hrishikesh and Bhojani, Kirtan and Dongare, Aryan and Behera, Omm Aditya}, booktitle = {Proceedings of the 1st Workshop on Benchmarks, Harmonization, Annotation, and Standardization for Human-Centric AI in Indian Languages (BHASHA 2025)}, month = dec, year = {2025}, address = {Mumbai, India}, publisher = {Association for Computational Linguistics}, url = {https://aclanthology.org/2025.bhasha-1.9/}, pages = {102--116}, }
2025
- ACL ’25A Case Study of Cross-Lingual Zero-Shot Generalization for Classical Languages in LLMsV.S.D.S.Mahesh Akavarapu, Hrishikesh Terdalkar, Pramit Bhattacharyya, and 5 more authorsIn Findings of the Association for Computational Linguistics: ACL 2025, Jul 2025
Large Language Models (LLMs) have demonstrated remarkable generalization capabilities across diverse tasks and languages. In this study, we focus on natural language understanding in three classical languages—Sanskrit, Ancient Greek and Latin—to investigate the factors affecting cross-lingual zero-shot generalization. First, we explore named entity recognition and machine translation into English. While LLMs perform equal to or better than fine-tuned baselines on out-of-domain data, smaller models often struggle, especially with niche or abstract entity types. In addition, we concentrate on Sanskrit by presenting a factoid question–answering (QA) dataset and show that incorporating context via retrieval-augmented generation approach significantly boosts performance. In contrast, we observe pronounced performance drops for smaller LLMs across these QA tasks. These results suggest model scale as an important factor influencing cross-lingual generalization. Assuming that models used such as GPT-4o and Llama-3.1 are not instruction fine-tuned on classical languages, our findings provide insights into how LLMs may generalize on these languages and their consequent utility in classical studies.
@inproceedings{akavarapu-etal-2025-case, title = {A Case Study of Cross-Lingual Zero-Shot Generalization for Classical Languages in {LLM}s}, author = {Akavarapu, V.S.D.S.Mahesh and Terdalkar, Hrishikesh and Bhattacharyya, Pramit and Agarwal, Shubhangi and Deulgaonkar, Vishakha and Dangarikar, Chaitali and Manna, Pralay and Bhattacharya, Arnab}, booktitle = {Findings of the Association for Computational Linguistics: ACL 2025}, month = jul, year = {2025}, address = {Vienna, Austria}, publisher = {Association for Computational Linguistics}, url = {https://aclanthology.org/2025.findings-acl.141/}, doi = {10.18653/v1/2025.findings-acl.141}, pages = {2745--2761}, } - GRADES-NDA ’25Graph Repairs with Large Language Models: An Empirical StudyHrishikesh Terdalkar, Angela Bonifati, and Andrea MauriIn Proceedings of the 8th Joint Workshop on Graph Data Management Experiences & Systems (GRADES) and Network Data Analytics (NDA), Berlin, Germany, 2025
Property graphs are widely used in domains such as healthcare, finance, and social networks, but they often contain errors due to inconsistencies, missing data, or schema violations. Traditional rule-based and heuristic-driven graph repair methods are limited in their adaptability as they need to be tailored for each dataset. On the other hand, interactive human-in-the-loop approaches may become infeasible when dealing with large graphs, as the cost-both in terms of time and effort-of involving users becomes too high. Recent advancements in Large Language Models (LLMs) present new opportunities for automated graph repair by leveraging contextual reasoning and their access to real-world knowledge. We evaluate the effectiveness of six open-source LLMs in repairing property graphs. We assess repair quality, computational cost, and model-specific performance. Our experiments show that LLMs have the potential to detect and correct errors, with varying degrees of accuracy and efficiency. We discuss the strengths, limitations, and challenges of LLM-driven graph repair and outline future research directions for improving scalability and interpretability.
@inproceedings{terdalkar2025repair, title = {Graph Repairs with Large Language Models: An Empirical Study}, author = {Terdalkar, Hrishikesh and Bonifati, Angela and Mauri, Andrea}, booktitle = {Proceedings of the 8th Joint Workshop on Graph Data Management Experiences \& Systems (GRADES) and Network Data Analytics (NDA)}, year = {2025}, publisher = {Association for Computing Machinery}, url = {https://doi.org/10.1145/3735546.3735859}, doi = {10.1145/3735546.3735859}, pages = {1--10}, location = {Berlin, Germany}, }
2024
- PACLIC ’24Aganittyam: Learning Tamil Grammar through Knowledge Graph based Templatized Question AnsweringMithilesh K, Amarjit Madhumalararungeethayan, Dharanish Rahul S, and 3 more authorsIn Proceedings of the 38th Pacific Asia Conference on Language, Information and Computation, Dec 2024
In this work, we introduce a novel Grammar Question-Answering System (Aganittyam) and its associated corpus on the dravidian language Tamil. It is one of the oldest surviving languages, with a documented history spanning over 2,000 years. Tamil is a classical language and it is official in three countries including India and various diasporic communities around the world speak it. Learning Tamil Grammar is still challenging due to its Agglutination and Complex Morphology. We created a Tamil Grammar Corpus focusing all kinds of learners and manually annotated the corpus since automated tools are not efficient enough. This made us to create a ontology on entity types and relationship types for the same. We identified entities and relationships and store the resultant triplets (subject-predicate–object) as a Knowledge Graph (KG) consisting of 63,587 entities. We also developed a framework for templatized Question-Answering along with it. We performed bi-fold evaluation (Query metrics and Human-Centric based) with thorough experimentation and show that our QA system is robust, reliable and fun in answering various objective questions.
@inproceedings{k-etal-2024-aganittyam, title = {Aganittyam: Learning {T}amil Grammar through Knowledge Graph based Templatized Question Answering}, author = {K, Mithilesh and Madhumalararungeethayan, Amarjit and S, Dharanish Rahul and Balan, Abhijith and C, Oswald and Terdalkar, Hrishikesh}, booktitle = {Proceedings of the 38th Pacific Asia Conference on Language, Information and Computation}, month = dec, year = {2024}, address = {Tokyo, Japan}, publisher = {Tokyo University of Foreign Studies}, url = {https://aclanthology.org/2024.paclic-1.81/}, pages = {838--852}, } - BookSamanvaya: An Interlingua for Unity of Indian LanguagesChaitali Dangarikar, Arnab Bhattacharya, Karthika N J, and 6 more authorsOct 2024
Interlingua serves as a constructed bridge language, designed to simplify communication and analysis across diverse natural languages by identifying commonalities in structure and meaning. Samanvaya, our proposed interlingua for Indian languages, embodies this principle by unifying linguistic features shared across these languages, facilitating deeper linguistic analysis and cross-lingual understanding while preserving their unique cultural identities.
@book{samanvaya2024, title = {Samanvaya: An Interlingua for Unity of Indian Languages}, author = {Dangarikar, Chaitali and Bhattacharya, Arnab and N J, Karthika and Terdalkar, Hrishikesh and Bhattacharyya, Pramit and Kulkarni, Annarao and Lakkundi, Chaitanya S and Ramakrishnan, Ganesh and V, Shivani}, year = {2024}, month = oct, publisher = {Central Sanskrit University}, address = {India}, isbn = {978-93-48435-21-7}, }
2023
- NLP-OSS ’23Antarlekhaka: A Comprehensive Tool for Multi-task Natural Language AnnotationHrishikesh Terdalkar and Arnab BhattacharyaIn Proceedings of the 3rd Workshop for Natural Language Processing Open Source Software (NLP-OSS 2023), Dec 2023
One of the primary obstacles in the advancement of Natural Language Processing (NLP) technologies for low-resource languages is the lack of annotated datasets for training and testing machine learning models. In this paper, we present \\textitAntarlekhaka, a tool for manual annotation of a comprehensive set of tasks relevant to NLP. The tool is Unicode-compatible, language-agnostic, Web-deployable and supports distributed annotation by multiple simultaneous annotators. The system sports user-friendly interfaces for 8 categories of annotation tasks. These, in turn, enable the annotation of a considerably larger set of NLP tasks. The task categories include two linguistic tasks not handled by any other tool, namely, sentence boundary detection and deciding canonical word order, which are important tasks for text that is in the form of poetry. We propose the idea of \\textitsequential annotation based on small text units, where an annotator performs several tasks related to a single text unit before proceeding to the next unit. The research applications of the proposed mode of multi-task annotation are also discussed. Antarlekhaka outperforms other annotation tools in objective evaluation. It has been also used for two real-life annotation tasks on two different languages, namely, Sanskrit and Bengali. The tool is available at \\urlhttps://github.com/Antarlekhaka/code
@inproceedings{terdalkar2023antarlekhaka, title = {Antarlekhaka: A Comprehensive Tool for Multi-task Natural Language Annotation}, author = {Terdalkar, Hrishikesh and Bhattacharya, Arnab}, booktitle = {Proceedings of the 3rd Workshop for Natural Language Processing Open Source Software (NLP-OSS 2023)}, month = dec, year = {2023}, address = {Singapore}, publisher = {Association for Computational Linguistics}, url = {https://aclanthology.org/2023.nlposs-1.23}, doi = {10.18653/v1/2023.nlposs-1.23}, pages = {199--211}, } - NYCIKS ’23Āyurjñānam: Exploring Āyurveda using Knowledge GraphsHrishikesh Terdalkar, Vishakha Deulgaonkar, and Arnab Bhattacharya2023Presented at the National Youth Conference on Indian Knowledge Systems 2023
Best Poster Award
The Bṛhat-Trayī, consisting of Carakasaṃhitā, Suśrutasaṃhitā, and Aṣṭāṅgahṛdaya, is an encyclopaedic reference set in Āyurveda. However, the need for simpler texts led to the emergence of the Laghu-Trayī that includes Mādhavanidāna, Śārṅgadharasaṃhitā, and Bhāvaprakāśa. Authored by Ācārya Bhāvamiśra in the 16th century CE, Bhāvaprakāśa is a comprehensive work focused on medicine. The classification system of varga in its nighaṇṭu section, Bhāvaprakāśanighaṇṭu, categorizes substances based on type, origin, and medicinal properties. This valuable resource assists practitioners and researchers in Āyurveda. We present this information in an accessible manner to promote wider utilization of this knowledge. We create a robust ontology to capture the semantic information of medicinal substances, designing user-friendly interfaces for efficient annotation and curation, perform meticulous manual annotation on Bhāvaprakāśanighaṇṭu, and construct an accurate knowledge graph from three chapters of Bhāvaprakāśanighaṇṭu. The system is accessible at https://sanskrit.iitk.ac.in/ayurveda/.
@misc{terdalkar2023ayurjnanam, title = {{Āyurjñānam}: Exploring {Āyurveda} using Knowledge Graphs}, author = {Terdalkar, Hrishikesh and Deulgaonkar, Vishakha and Bhattacharya, Arnab}, note = {Presented at the National Youth Conference on Indian Knowledge Systems 2023}, year = {2023}, url = {https://sanskrit.iitk.ac.in/ayurveda/}, } - PhDSanskrit Knowledge-based Systems: Annotation and Computational ToolsHrishikesh TerdalkarJun 2023Available at \urlhttps://etd.iitk.ac.in:8443/jspui/handle/123456789/21176
We address the challenges and opportunities in the development of knowledge systems for Sanskrit, with a focus on question answering. By proposing a framework for the automated construction of knowledge graphs, introducing annotation tools for ontology-driven and general-purpose tasks, and offering a diverse collection of web-interfaces, tools, and software libraries, we have made significant contributions to the field of computational Sanskrit. These contributions not only enhance the accessibility and accuracy of Sanskrit text analysis but also pave the way for further advancements in knowledge representation and language processing. Ultimately, this research contributes to the preservation, understanding, and utilization of the rich linguistic information embodied in Sanskrit texts.
@phdthesis{terdalkar2023sanskrit, title = {{Sanskrit Knowledge-based Systems: Annotation and Computational Tools}}, author = {Terdalkar, Hrishikesh}, school = {Indian Institute of Technology Kanpur}, year = {2023}, month = jun, type = {PhD Thesis}, note = {Available at \url{https://etd.iitk.ac.in:8443/jspui/handle/123456789/21176}}, } - WSC ’23Vaiyyākaraṇaḥ: A Sanskrit Grammar Bot for TelegramHrishikesh Terdalkar, V S D S Mahesh Akavarapu, Shubhangi Agarwal, and 1 more authorJan 2023Presented at the 18th World Sanskrit Conference
\\textitVaiyyākaraṇaḥ is a Telegram bot aimed towards helping the learners of Sanskrit grammar (vyākaraṇa). The salient features of \\textitVaiyyākaraṇaḥ are: stem finder (prātipadikam), declension generation (subantāḥ), root finder (dhātuḥ), conjugation generation (tiṅantāḥ) and word segmentation (sandhisamāsau). State-of-the-art datasets, tools and technologies are used to offer these capabilities.
@misc{terdalkar2023vaiyyakarana, title = {{Vaiyyākaraṇaḥ}: A Sanskrit Grammar Bot for Telegram}, author = {Terdalkar, Hrishikesh and Akavarapu, V S D S Mahesh and Agarwal, Shubhangi and Bhattacharya, Arnab}, note = {Presented at the 18th World Sanskrit Conference}, month = jan, year = {2023}, address = {The Australian National University, Canberra, Australia}, url = {https://t.me/vyakarana_bot}, } - WSC ’23Semantic Annotation and Querying Framework based on Semi-structured Ayurvedic TextHrishikesh Terdalkar, Arnab Bhattacharya, Madhulika Dubey, and 2 more authorsIn Proceedings of the Computational Sanskrit & Digital Humanities: Selected papers presented at the 18th World Sanskrit Conference, Jan 2023
Knowledge bases (KB) are an important resource in a number of natural language processing (NLP) and information retrieval (IR) tasks, such as semantic search, automated question-answering etc. They are also useful for researchers trying to gain information from a text. Unfortunately, however, the state-of-the-art in Sanskrit NLP does not yet allow automated construction of knowledge bases due to unavailability or lack of sufficient accuracy of tools and methods. Thus, in this work, we describe our efforts on manual annotation of Sanskrit text for the purpose of knowledge graph (KG) creation. We choose the chapter Dhānyavarga from Bhāvaprakāśanighaṇṭu of the Ayurvedic text Bhāvaprakāśa for annotation. The constructed knowledge graph contains 410 entities and 764 relationships. Since Bhāvaprakāśanighaṇṭu is a technical glossary text that describes various properties of different substances, we develop an elaborate ontology to capture the semantics of the entity and relationship types present in the text. To query the knowledge graph, we design 31 query templates that cover most of the common question patterns. For both manual annotation and querying, we customize the Sangrahaka framework previously developed by us. The entire system including the dataset is available from https://sanskrit.iitk.ac.in/ayurveda. We hope that the knowledge graph that we have created through manual annotation and subsequent curation will help in development and testing of NLP tools in future as well as studying of the Bhāvaprakāśanighaṇṭu text.
@inproceedings{terdalkar2023semantic, title = {Semantic Annotation and Querying Framework based on Semi-structured Ayurvedic Text}, author = {Terdalkar, Hrishikesh and Bhattacharya, Arnab and Dubey, Madhulika and Ramamurthy, S and Singh, Bhavna Naneria}, booktitle = {Proceedings of the Computational {S}anskrit {\&} Digital Humanities: Selected papers presented at the 18th World {S}anskrit Conference}, month = jan, year = {2023}, address = {Canberra, Australia (Online mode)}, publisher = {Association for Computational Linguistics}, url = {https://aclanthology.org/2023.wsc-csdh.11}, pages = {155--173}, } - WSC ’23Jñānasaṅgrahaḥ: A Collection of Computational Applications related to SanskritHrishikesh Terdalkar and Arnab BhattacharyaJan 2023Presented at the 18th World Sanskrit Conference
\\textitJñānasaṅgrahaḥ is a web-based collection of several computational applications related to the Sanskrit language. The aim is to highlight the features of Sanskrit language in a way that is approachable for an enthusiastic user, even if she has a limited Sanskrit background.
@misc{terdalkar2023jnanasangraha, title = {{Jñānasaṅgrahaḥ}: A Collection of Computational Applications related to Sanskrit}, author = {Terdalkar, Hrishikesh and Bhattacharya, Arnab}, note = {Presented at the 18th World Sanskrit Conference}, month = jan, year = {2023}, address = {The Australian National University, Canberra, Australia}, url = {https://sanskrit.iitk.ac.in/jnanasangraha/}, } - WSC ’23PyCDSL: A Programmatic Interface to Cologne Digital Sanskrit DictionariesHrishikesh Terdalkar and Arnab BhattacharyaJan 2023Presented at the 18th World Sanskrit Conference
\\textitPyCDSL is a Python library that provides programmer friendly interface to Cologne Digital Sanskrit Dictionaries (CDSD). The library serves as a corpus management tool to download, update and access dictionaries from CDSD. The tool provides a command line interface for ease of search and a programmable interface for using CDSD in computational linguistic projects written in Python 3.
@misc{terdalkar2023pycdsl, title = {{PyCDSL}: A Programmatic Interface to {C}ologne Digital {S}anskrit Dictionaries}, author = {Terdalkar, Hrishikesh and Bhattacharya, Arnab}, note = {Presented at the 18th World Sanskrit Conference}, month = jan, year = {2023}, address = {The Australian National University, Canberra, Australia}, url = {https://pypi.org/project/PyCDSL/}, } - WSC ’23Chandojnanam: A Sanskrit Meter Identification and Utilization SystemHrishikesh Terdalkar and Arnab BhattacharyaIn Proceedings of the Computational Sanskrit & Digital Humanities: Selected papers presented at the 18th World Sanskrit Conference, Jan 2023
We present \\textitChandojñānam, a web-based Sanskrit meter (\\textitChanda) identification and utilization system. In addition to the core functionality of identifying meters, it sports a friendly user interface to display the scansion, which is a graphical representation of the metrical pattern. The system supports identification of meters from uploaded images by using optical character recognition (OCR) engines in the backend. It is also able to process entire text files at a time. The text can be processed in two modes, either by treating it as a list of individual lines, or as a collection of verses. When a line or a verse does not correspond exactly to a known meter, \\textitChandojñānam is capable of finding fuzzy (i.e., approximate and close) matches based on sequence matching. This opens up the scope of a meter based correction of erroneous digital corpora. The system is available for use at https://sanskrit.iitk.ac.in/jnanasangraha/chanda/, and the source code in the form of a Python library is made available at https://github.com/hrishikeshrt/chanda/.
@inproceedings{terdalkar2023chandojnanam, title = {Chandojnanam: A {S}anskrit Meter Identification and Utilization System}, author = {Terdalkar, Hrishikesh and Bhattacharya, Arnab}, booktitle = {Proceedings of the Computational {S}anskrit {\&} Digital Humanities: Selected papers presented at the 18th World {S}anskrit Conference}, month = jan, year = {2023}, address = {Canberra, Australia (Online mode)}, publisher = {Association for Computational Linguistics}, url = {https://aclanthology.org/2023.wsc-csdh.8}, pages = {113--127}, }
2022
- COLING ’22A Novel Multi-Task Learning Approach for Context-Sensitive Compound Type Identification in SanskritJivnesh Sandhan, Ashish Gupta, Hrishikesh Terdalkar, and 4 more authorsIn Proceedings of the 29th International Conference on Computational Linguistics, Oct 2022
The phenomenon of compounding is ubiquitous in Sanskrit. It serves for achieving brevity in expressing thoughts, while simultaneously enriching the lexical and structural formation of the language. In this work, we focus on the Sanskrit Compound Type Identification (SaCTI) task, where we consider the problem of identifying semantic relations between the components of a compound word. Earlier approaches solely rely on the lexical information obtained from the components and ignore the most crucial contextual and syntactic information useful for SaCTI. However, the SaCTI task is challenging primarily due to the implicitly encoded context-sensitive semantic relation between the compound components. Thus, we propose a novel multi-task learning architecture which incorporates the contextual information and enriches the complementary syntactic information using morphological tagging and dependency parsing as two auxiliary tasks. Experiments on the benchmark datasets for SaCTI show 6.1 points (Accuracy) and 7.7 points (F1-score) absolute gain compared to the state-of-the-art system. Further, our multi-lingual experiments demonstrate the efficacy of the proposed architecture in English and Marathi languages.
@inproceedings{sandhan2022compound, title = {A Novel Multi-Task Learning Approach for Context-Sensitive Compound Type Identification in {S}anskrit}, author = {Sandhan, Jivnesh and Gupta, Ashish and Terdalkar, Hrishikesh and Sandhan, Tushar and Samanta, Suvendu and Behera, Laxmidhar and Goyal, Pawan}, booktitle = {Proceedings of the 29th International Conference on Computational Linguistics}, month = oct, year = {2022}, address = {Gyeongju, Republic of Korea}, publisher = {International Committee on Computational Linguistics}, url = {https://aclanthology.org/2022.coling-1.358}, pages = {4071--4083}, }
2021
- ESEC/FSE ’21Sangrahaka: A Tool for Annotating and Querying Knowledge GraphsHrishikesh Terdalkar and Arnab BhattacharyaIn Proceedings of the 29th ACM Joint Meeting on European Software Engineering Conference and Symposium on the Foundations of Software Engineering, Athens, Greece, 2021
Best Software Award at 57th Convocation IITK
We present a web-based tool \\emphSangrahaka for annotating entities and relationships from text corpora towards construction of a knowledge graph and subsequent querying using templatized natural language questions. The application is language and corpus agnostic, but can be tuned for specific needs of a language or a corpus. The application is freely available for download and installation. Besides having a user-friendly interface, it is fast, supports customization, and is fault tolerant on both client and server side. It outperforms other annotation tools in an objective evaluation metric. The framework has been successfully used in two annotation tasks.
@inproceedings{terdalkar2021sangrahaka, title = {Sangrahaka: A Tool for Annotating and Querying Knowledge Graphs}, author = {Terdalkar, Hrishikesh and Bhattacharya, Arnab}, booktitle = {Proceedings of the 29th ACM Joint Meeting on European Software Engineering Conference and Symposium on the Foundations of Software Engineering}, year = {2021}, publisher = {Association for Computing Machinery}, url = {https://doi.org/10.1145/3468264.3473113}, doi = {10.1145/3468264.3473113}, pages = {1520--1524}, location = {Athens, Greece}, }
2019
- ISCLS ’19Framework for Question-Answering in Sanskrit through Automated Construction of Knowledge GraphsHrishikesh Terdalkar and Arnab BhattacharyaIn Proceedings of the 6th International Sanskrit Computational Linguistics Symposium, Oct 2019
Sanskrit (\\emphSaṃskṛta) enjoys one of the largest and most varied literature in the whole world. Extracting the knowledge from it, however, is a challenging task due to multiple reasons including complexity of the language and paucity of standard natural language processing tools. In this paper, we target the problem of building knowledge graphs for particular types of relationships from Saṃskṛta texts. We build a natural language question-answering system in Saṃskṛta that uses the knowledge graph to answer factoid questions. We design a framework for the overall system and implement two separate instances of the system on human relationships from Mahābhārata and Rāmāyaṇa, and one instance on synonymous relationships from Bhāvaprakāśa Nighaṇṭu, a technical text from Āyurveda. We show that about 50% of the factoid questions can be answered correctly by the system. More importantly, we analyse the shortcomings of the system in detail for each step, and discuss the possible ways forward.
@inproceedings{terdalkar2019framework, title = {Framework for Question-Answering in {S}anskrit through Automated Construction of Knowledge Graphs}, author = {Terdalkar, Hrishikesh and Bhattacharya, Arnab}, booktitle = {Proceedings of the 6th International Sanskrit Computational Linguistics Symposium}, month = oct, year = {2019}, address = {IIT Kharagpur, India}, publisher = {Association for Computational Linguistics}, url = {https://aclanthology.org/W19-7508}, pages = {97--116}, } - ISCLS ’19KaTaPaYadi SystemHrishikesh Terdalkar and Arnab BhattacharyaOct 2019Presented at the 6th International Sanskrit Computational Linguistics Symposium
The \\emphKaṭapayādi system of encoding numbers as words by replacing each digit by a character was developed in ancient India. We present a web-based system that for conversion to and from the \\emphKaṭapayādi numbering scheme. It can both decode a word into its corresponding number, and can encode a number into word(s).
@misc{terdalkar2019katapayadi, title = {{KaTaPaYadi} System}, author = {Terdalkar, Hrishikesh and Bhattacharya, Arnab}, note = {Presented at the 6th International Sanskrit Computational Linguistics Symposium}, month = oct, year = {2019}, address = {IIT Kharagpur, India}, url = {https://sanskrit.iitk.ac.in/jnanasangraha/sankhya/katapayaadi/}, }