.

Workshop on
Computational Epigraphy

  • VenueRamanujan Auditorium, IMSc
  • DatesMarch 25-30, 2024

About The Workshop

The workshop aims to introduce an interdisciplinary audience from the humanities, sciences & engineering to the techniques of computational epigraphy for decoding undeciphered scripts.

The lectures and tutorials will provide invaluable training to anyone interested in applying algorithmic methods to acquire, process and analyze data related to inscriptions in order to eventually reconstruct the underlying language and writing system.

  • Origins of writing
  • Types of writing systems
  • Deciphering ancient scripts
  • Data science & digital humanities
  • Machine learning fundamentals
  • Network science tools & computational algorithms
  • Indus civilization inscriptions
  • Databases, problems and prospects
Organizers
Md Izhar Ashraf (IMSc Computational Epigraphy Lab)
Sitabhra Sinha (IMSc Computational Epigraphy Lab)

Speakers

Niloy Ganguly

IIT Kharagpur

Farhat Habib

Ikigai, Boston

Animesh Mukherjee

IIT Kharagpur

Sayantani Pal

University of Calcutta, Kolkata

Rajat Sanyal

University of Calcutta, Kolkata

Mayank Vahia

Ex-TIFR, Mumbai

Nisha Yadav

TIFR, Mumbai

Rahul Siddharthan

IMSc Chennai

C Subramanian

Independent Scholar, Chennai

Shakti N Menon

IMSc Chennai

Nandini Mitra

iCEL & IMSc Chennai

Md. Izhar Ashraf

iCEL & IMSc Chennai

Sitabhra Sinha

iCEL & IMSc, Chennai

Event Schedule

AI had been capturing popular imagination for several decades. Recently, it appears as if they are fulfilling their promise. Through the past several decades multiple useful techniques have been developed in this field for solving hard problems. In this talk I will give a very quick over view of the roots of AI and the intuition behind their recent successes.

Talk slides

I will discuss various ways we can enhance the performance of language generation through pretraining techniques.

Talk slides

This talk will be a very brief primer to the architecture of the present day LLMs, followed by two digital humanities applications, viz., the efficacy of LLMs in (a) online hate speech identification and (b) native language identification.

Talk slides

We will introduce Markov chains and hidden Markov models (HMMs) which are widely used in computational linguistics as well as bioinformatics. We will briefly discuss "profile HMMs" which are used to describe protein families but can perhaps have applications in linguistics and epigraphy as well.

Talk slides

Using a few examples of real and fictional inscriptions, I will discuss how one can approach the problem of deciphering unknown writing.

Talk slides Lecture handout 1 Lecture handout 2

This talk traces the historical development of artificial intelligence, beginning with the foundational concepts and early milestones. It briefly explores techniques such as linear models, decision trees, and random forests, illustrating their applications in classification, regression, and pattern recognition. The discussion then shifts to the emergence of neural networks, delving into their architecture, training methods, and their role in revolutionizing various AI domains.

Talk slides

An introduction to the possible origins of writing and the different types of writitng systems seen across history, viz., ideographic, syllabic, logosyllabic, alphabetic etc.

Talk slides

This talk will provide a context to the inscriptions of the Indus Civilization (2500-1900 BCE) that is possibly the only remaining major undeciphered writing system.

Talk slides

I will show how to use simple Python programs to extract inscriptions data from variuos sites in the web, which can be useful for building and analysing epigraphic databases. .

Lecture Handout

Building upon the foundations laid in the previous talk, this presentation goes deeper into generative models such as GANs and diffusion models. Then we will take a deeper look at transformer models and large language models. We will examine the self-attention mechanism that underpins transformers, their applications in natural language processing, and the capabilities of models like GPT and BERT. The talk also explores the rapidly evolving field of generative AI, including text, image, and audio generation, and its potential implications.

Talk slides

In this talk we shall explore how probabilistic models have shaped our understanding of linguistic sequences, focusing in particular on Markov's work that prefigures that of Claude Shannon on the entropy of written English, as well as the work of Zipf on the frequency of words and the principle of least effort.

Talk slides

If we look closely enough, it becomes apparent that networks are everywhere, and it turns out that a wide range of physical, biological and social interactions can be elegantly encapsulated in terms of network descriptions. I will provide a general overview of network science - a field of study focused on extracting information encoded by such networks - and will demonstrate some associated concepts in an interactive tutorial session using the network visualization software Gephi.

Talk slides

Contributions of four major Indus text corpora and concordances – Hunter (1934), Mahadevan (1977), Parpola (1973-2010), and Wells (1998-present) - made towards the textual and contextual analysis of the Indus script.

Talk slides

The Indus script has defied decipherment. The absence of a concrete understanding regarding its structure poses challenges in objectively assessing any purported decipherments. To address this gap, we have employed diverse computational techniques to analyze the structure of the Indus script. Our research aims to uncover patterns within Indus writing and investigate its fundamental principles without presupposing its content. In this presentation, I will provide an overview of our computational investigations into the Indus script.

Talk slides

We will describe some basic string-matching algorithms, and introduce evolution of DNA sequence and phylogenetics, then look at applications of the same ideas in linguistics.

Talk slides

The grammar of the Harappan Script is now fairly well understood. However, all interpretative models about the Harappan Script have fallen short in their consistency with its grammar. The cultural context of the Indus writing is also well understood. In the present talk, we will discuss the miniatures that were used by the Harappans to express themselves in a variety of ways. We will then discuss the larger issues of the time evolution of the Harappan Civilisation and look at the possible scenarios of how it evolved and changed. We will then summarise by discussing the possible new avenues about the Harappan Script that can be pursued to gain better insights.

Talk slides

This is the concluding part of the talk whose first part was yesterday, focusing on how Shannon quantified information (measured by the unit of a "bit") using the concept of entropy and how it applies to language. We will also look at Zipf's law of abbreviation and the principle of least effort that he proposed to explain it.

Talk slides

This study uncovers an universal pattern in language, revealing asymmetric sign distribution at word boundaries, and applies this insight to deduce the writing direction of undeciphered Indus inscriptions, showing its utility in archaeological decipherment.

Talk slides Lecture Handout

Writing in historical India, almost coterminous with the use of Brāhmī script throughout the subcontinent, witnessed an essentially chequered trajectory of evolution from the third century BCE till the advent of proto-regional scripts in the seventh-eighth centuries. This talk is aimed at taking a tour through this route, focusing on factors behind and processes leading to these varying lines of development.

Talk slides

This talk will be a reading session on the most widely used proto=regional script of northern India called Siddhamātr̥kā.

Talk slides

The study of Indian epigraphy was at its infancy in the late 18th century. The initial phase began with the discovery of inscriptions and their publications by European scholars, apparently with the help of Indian pundits. They appeared in journals like Asiatic Researches and Journal of the Asiatic Society (both published by the Asiatic Society, Calcutta, established in 1784).
The foundation of researches in Indian epigraphy was thus laid and in this process of building interest in Indian epigraphy, the name of Charles Wilkins (1749-1836) stands out most prominently. Wilkins was among the earliest scholars who deciphered inscriptions at a period when those epigraphic records were unintelligible to others. When Wilkins began his study there was no idea of epigraphs as sources of history. None could read the script. How Wilkins made it possible is to decipher the inscriptions of sixth, ninth and tenth centuries is still a wonder. This talk would focus on the method of handling epigraphic data in the initial phase of Indian epigraphic studies (18th – 19th centuries).

Talk slides

The study introduces a segmentation method to analyze linguistic sequences, revealing a universal pattern of recurring word segments across languages. It suggests inherent cognitive or phonotactic constraints in language processing and word composition.

Talk slides

Contact

The Institute of Mathematical Sciences
CIT Campus, Taramani, Chennai 600113, India
Phone: 044-22543301
Email: icel.chennai@gmail.com