Arun Waves

October 8, 2016

Indus Valley Civilization – The Script (2)- literature survey

In the last few weeks I have surveyed numerous articles related to Indus Valley Civilization and its undeciphered script. These are great for gaining knowledge about the topic but not good for some serious scientific analysis using my latest tool set …………….. Machine Learning.

Here is a compendium (pics are from the related links, so all credits go to the authors of those links/articles);

1] Awesome pics:


2] Archeological Survey of India’s original publications: Memoir #77 in

3] Read up on the history, not just pics of script:

4] More pics:


5] Ancient writing systems:

6] Book with a catalogue of pics:


For some real papers:




Here are some concise relevant information;

  1. There are approximately 400 unique symbols
  2. Average length of an inscription is 5 to 6 symbols
  3. Maximum length of inscription is 26 symbols
  4. There is strong evidence that the direction of writingย  was from right to left
  5. Most of the symbols are unique and do not repeat
  6. There is no cultural, political, religious, or linguistic ties to anything today, so no help from there
  7. No bilingual text has been discovered, like the Rosetta stone which helped crack the case of Egyptian hieroglyphs

August 7, 2016

Indus Valley Civilization – The Script

One of the few things that I impressed me in my high school History class was the chapter that dealt with the Indus Valley Civilization, a 4000 years old network of cities located in what is today the general area of India-Pakistan border. What impressed me was not how old it was but the evidence of advanced drainage and sanitary systems, broad roads that were straight and ran North-South/East-West, and the buildings that were deliberately laid out in a systematic pattern. I did not know the technical term at that time but here was clear evidence of Urban planning in a city that was built 4000 years old!!!


Indus Valley Civilization Script

The next thing that caught my attention were the numerous seals and tablets that contained interesting symbols that were sometimes accompanied by human and animal motifs. And guess what, this language has not be deciphered yet!!! I can’t help dramatize the whole thing ……….. imagine a person, 4000 years ago, picks up a hard and sharp tool, carefully inscribes these symbols which meant something to him just like these words mean something to you, then it gets used for many years and one day, someone lays it down one last time and then no one touched it for 1000s of years. What blows my mind is that 4000 years later, people anywhere on this planet can see this inscription from the comfort of their homes using technology that would be like magic to the person who inscribed this seal. And yet none of the 7 billion people can understand what was inscribed on that seal 4000 years ago.


What does it say?

Anyhow, snapping out of the dramatization, why did I awaken this dormant knowledge and mystery? Few months ago, during my routine nocturnal excursions or more accurately meanderings in the web, I came across an article in Nature ( which reported on a rare recent work done by Bryan Wells who is an archaeologist, epigrapher and geographer. He holds a PhD in anthropology from Harvard University and works at a university in Germany. I call it a ‘rare recent’ work since most of the work on the Indus Valley script is decades old and I believe the trail has gone cold. Around the same time, I was taking a serious interest in the field of Machine Learning and Artificial Intelligence especially the Neural Network part which leads to Deep Learning. Well, the nocturnal brain in its calm and meditative state, made a link between the two and I got hooked.

November 8, 2015

Big Data Analysis

Filed under: Data Analysis — Arun @ 1:36 am
Tags: , , , ,

My first attempt at Big Data Analysis!!! Disclaimer: it is ‘Big’ for what I have done so far, not ‘Big’ compared to the industry standards. Although the data had only 30,000 rows with 4 columns, yet it exposed me to certain caveats that one needs to watch for while doing Data Analysis, for example the importance of prepping the data, the need to understand the data before making any conclusions, the risks of applying inappropriate statistical methods like finding the average for a distribution with two peaks, etc.

The data is a WhatsApp chat archive of about 11 months from an active group. Excel was used for the analysis with the exception of the Word Cloud for which I used an online tool . This analysis also showed the importance of quantitative evidence and empirical data over what you feel or ‘think’, many of my assumptions were shattered ๐Ÿ™‚

Here is what I found, enjoy (feel free to click the images to see a bigger version):

  1. Total number of messages over a period of 11 months (Nov 2014 to Sept 2015): 29,752 messages
  2. Unique number of members: 35
  3. Total number of media (pics, video, audio) files: 3,079 files
  4. Approximately every 10th msg is a pic/video/audio file
  5. People have typed messages of length (no. of characters) all the way from 1 to 256. The shortest message that no one has typed is 257 characters long. After that point the distribution starts to falter.

6. Top message contributor …………. Flavin, the most talkative person in the group, followed by Mahalakshmi, Sriroopa.

Top Contributors

Top Contributors

7. Top media contributor ……… Mahalakshmi loves to forward pics/videos, followed by SK, Mahadevan.Media Contributors

8. Message distribution over the span of a week: surprizeeeeee, they are evenly distributed over all the 7 days of the week with slight reduction over the weekend ๐Ÿ™‚ A very consistent group.

Messages per Day of the Week

9. What is the peak traffic time and when is the lull? X-axis is US time, the labels are in India time. People love to chat around 9:30 pm India time!! Why is there a dip at 12:30pm India time?

Messages per hr of the day

10. All top contributors, except Flavin, don’t care if it is a weekday or the weekend, they just type away but Flavin throttles down over the weekend. Top contributors performance over the span of a week

11. My favorite, the Word Cloud, a visual representation of word usage, the bigger the word, the more frequently it is used. Our group loves exchanging pleasantries ‘good morning friends’ and for some reason the word ‘Flavin’. Note: this Word Cloud excludes common words like ‘the’, ‘a’, ‘and’, ‘is’ etc.

Word Cloud

12. All inclusive list of frequently used words without any exclusions. ‘Flavin’ is the only name that makes it into this list. Why does everyone mention him by name?

Most commonly used words

13. We also love our emoticons. Ignore the 16th bar.

Message sizes

14. Longest message award goes to ……………….. Mahadevan. How long you ask? 23,412 characters, one giant forwarded msg on 24th Nov, 2014. Do click the image to see it in its full glory.

Message length distribution

<<< THE END >>>

It took me about 20 hrs to do this analysis, for any more data analysis please be ready to pay me ๐Ÿ˜Ž

Blog at