The workshop aims to introduce an interdisciplinary audience from the humanities, sciences & engineering to the techniques of computational epigraphy for decoding undeciphered scripts.
The lectures and tutorials will provide invaluable training to anyone interested in applying algorithmic methods to acquire, process and analyze data related to inscriptions in order to eventually reconstruct the underlying language and writing system.
IIT Madras
IIT Kharagpur
Ikigai, Boston
IIT Kharagpur
University of Calcutta, Kolkata
University of Calcutta, Kolkata
Ex-TIFR, Mumbai
TIFR, Mumbai
IMSc Chennai
Independent Scholar, Chennai
IMSc Chennai
iCEL & IMSc Chennai
iCEL & IMSc Chennai
iCEL & IMSc, Chennai
AI had been capturing popular imagination for several decades. Recently, it appears as if they are fulfilling their promise. Through the past several decades multiple useful techniques have been developed in this field for solving hard problems. In this talk I will give a very quick over view of the roots of AI and the intuition behind their recent successes.
Talk slidesI will discuss various ways we can enhance the performance of language generation through pretraining techniques.
Talk slidesThis talk will be a very brief primer to the architecture of the present day LLMs, followed by two digital humanities applications, viz., the efficacy of LLMs in (a) online hate speech identification and (b) native language identification.
Talk slidesWe will introduce Markov chains and hidden Markov models (HMMs) which are widely used in computational linguistics as well as bioinformatics. We will briefly discuss "profile HMMs" which are used to describe protein families but can perhaps have applications in linguistics and epigraphy as well.
Talk slidesUsing a few examples of real and fictional inscriptions, I will discuss how one can approach the problem of deciphering unknown writing.
Talk slides Lecture handout 1 Lecture handout 2This talk traces the historical development of artificial intelligence, beginning with the foundational concepts and early milestones. It briefly explores techniques such as linear models, decision trees, and random forests, illustrating their applications in classification, regression, and pattern recognition. The discussion then shifts to the emergence of neural networks, delving into their architecture, training methods, and their role in revolutionizing various AI domains.
Talk slidesAn introduction to the possible origins of writing and the different types of writitng systems seen across history, viz., ideographic, syllabic, logosyllabic, alphabetic etc.
Talk slidesThis talk will provide a context to the inscriptions of the Indus Civilization (2500-1900 BCE) that is possibly the only remaining major undeciphered writing system.
Talk slidesI will show how to use simple Python programs to extract inscriptions data from variuos sites in the web, which can be useful for building and analysing epigraphic databases. .
Lecture HandoutBuilding upon the foundations laid in the previous talk, this presentation goes deeper into generative models such as GANs and diffusion models. Then we will take a deeper look at transformer models and large language models. We will examine the self-attention mechanism that underpins transformers, their applications in natural language processing, and the capabilities of models like GPT and BERT. The talk also explores the rapidly evolving field of generative AI, including text, image, and audio generation, and its potential implications.
Talk slidesIn this talk we shall explore how probabilistic models have shaped our understanding of linguistic sequences, focusing in particular on Markov's work that prefigures that of Claude Shannon on the entropy of written English, as well as the work of Zipf on the frequency of words and the principle of least effort.
Talk slidesIf we look closely enough, it becomes apparent that networks are everywhere, and it turns out that a wide range of physical, biological and social interactions can be elegantly encapsulated in terms of network descriptions. I will provide a general overview of network science - a field of study focused on extracting information encoded by such networks - and will demonstrate some associated concepts in an interactive tutorial session using the network visualization software Gephi.
Talk slidesContributions of four major Indus text corpora and concordances – Hunter (1934), Mahadevan (1977), Parpola (1973-2010), and Wells (1998-present) - made towards the textual and contextual analysis of the Indus script.
Talk slidesThe Indus script has defied decipherment. The absence of a concrete understanding regarding its structure poses challenges in objectively assessing any purported decipherments. To address this gap, we have employed diverse computational techniques to analyze the structure of the Indus script. Our research aims to uncover patterns within Indus writing and investigate its fundamental principles without presupposing its content. In this presentation, I will provide an overview of our computational investigations into the Indus script.
Talk slidesWe will describe some basic string-matching algorithms, and introduce evolution of DNA sequence and phylogenetics, then look at applications of the same ideas in linguistics.
Talk slidesThe grammar of the Harappan Script is now fairly well understood. However, all interpretative models about the Harappan Script have fallen short in their consistency with its grammar. The cultural context of the Indus writing is also well understood. In the present talk, we will discuss the miniatures that were used by the Harappans to express themselves in a variety of ways. We will then discuss the larger issues of the time evolution of the Harappan Civilisation and look at the possible scenarios of how it evolved and changed. We will then summarise by discussing the possible new avenues about the Harappan Script that can be pursued to gain better insights.
Talk slidesThis is the concluding part of the talk whose first part was yesterday, focusing on how Shannon quantified information (measured by the unit of a "bit") using the concept of entropy and how it applies to language. We will also look at Zipf's law of abbreviation and the principle of least effort that he proposed to explain it.
Talk slidesThis study uncovers an universal pattern in language, revealing asymmetric sign distribution at word boundaries, and applies this insight to deduce the writing direction of undeciphered Indus inscriptions, showing its utility in archaeological decipherment.
Talk slides Lecture HandoutWriting in historical India, almost coterminous with the use of Brāhmī script throughout the subcontinent, witnessed an essentially chequered trajectory of evolution from the third century BCE till the advent of proto-regional scripts in the seventh-eighth centuries. This talk is aimed at taking a tour through this route, focusing on factors behind and processes leading to these varying lines of development.
Talk slidesThis talk will be a reading session on the most widely used proto=regional script of northern India called Siddhamātr̥kā.
Talk slidesThe study of Indian epigraphy was at its infancy in the late 18th century. The initial phase
began with the discovery of inscriptions and their publications by European scholars,
apparently with the help of Indian pundits. They appeared in journals like Asiatic Researches
and Journal of the Asiatic Society (both published by the Asiatic Society, Calcutta, established
in 1784).
The foundation of researches in Indian epigraphy was thus laid and in this process of
building interest in Indian epigraphy, the name of Charles Wilkins (1749-1836) stands out
most prominently. Wilkins was among the earliest scholars who deciphered inscriptions at a
period when those epigraphic records were unintelligible to others. When Wilkins began his
study there was no idea of epigraphs as sources of history. None could read the script. How
Wilkins made it possible is to decipher the inscriptions of sixth, ninth and tenth centuries is
still a wonder. This talk would focus on the method of handling epigraphic data in the initial
phase of Indian epigraphic studies (18th – 19th centuries).
The study introduces a segmentation method to analyze linguistic sequences, revealing a universal pattern of recurring word segments across languages. It suggests inherent cognitive or phonotactic constraints in language processing and word composition.
Talk slides
The Institute of Mathematical Sciences
CIT Campus, Taramani, Chennai 600113, India
Phone: 044-22543301
Email: icel.chennai@gmail.com