Can data science help in smart governance? We often hear about big data and how it seemingly controls everything around us. Often, the influence big data has on our lives is opaque and invisible. Data informs and dictates our life decisions, from mundane activities like what you watch, read, listen to, reading news, grocery shopping, which route to take to career decision and medical decisions, credit applications and social interactions.

The prevalence of such data generated by citizens is increasingly being used by governments across the world to improve efficiency in government processes. Telangana, India’s youngest state, has tapped into the Big data available in government and is actively using analytics to meet the government’s stated objectives in e-governance. The goal is four-fold: increasing revenue, focused subsidy delivery, smart law enforcement and document-free governance.

This blog elaborates on one such initiative – “How to detect second vehicle registration”. The information presented here is from a impressive talk I attended at GartnerDA Summit in Mumbai, delivered by Mr. G T Venkateshwar Rao IRS, Commissioner ESD & Special Commissioner e Governance, Government of Telangana and information sourced from public sources. Full profile of Mr. Rao can be found here.

The Problem

 

On 2nd January 2008, a law was enacted to levy 2% additional life tax on every second personal vehicle registered. 14% for second vehicle vs 12% for first vehicle. The objective was to reduce the traffic congestion on roads which in turn helps curtail the pollution levels.

The state had not been able to enforce a law effectively. They found that past vehicle registrations were not linked to any IDs such as Aadhaar or PAN and details buyers often provided were misleading, proving it impossible to check if a buyer of a new vehicle was already the owner of another vehicle. Hence, the twin objectives of the legislation — to reduce congestion on the roads and bump up revenues — remained largely unmet.

During registration of a vehicle a citizen was expected to give a self-declaration stating that it was their first vehicle. The only tool for the assessing officer processing the application was a “Search Solution”. The search would

  1. Take in citizen’s Name, Father’s Name, Date of Birth and residential address as inputs.
  2. Search operation took on an average 4 to 5 minutes to show results.
  3. Results shown were in the order of 100’s.
  4. Forcing assessing officer to perform manual verification.
  5. Causing arguments with applicant and officer. Leading to officer deciding its the first vehicle registration
  6. Average transaction time 9 to 10 minutes per vehicle registration
  7. End result – Many instances of tax evasion as ownership of existing vehicles is not detected

The Search Problem

The “Search Solution” had limitations in identifying different patterns in how the Name and Address data was entered in each record.

Name

A name could be entered in multiple ways. E.g. Sundeep Reddy Mallu , is my full legal name. While filling the form I could get creative in the following ways

  1. Spelling – Sandeep Reddy Malu
  2. Abbreviation – Sundeep R M or Sundeep R Mallu
  3. Sequence Variation – Mallu Sundeep Reddy or R M Sundeep
  4. Additions/ Deletions – Sundeep Mallu
  5. Splitting or Combining – Sundeepreddy Mallu or ReddyMallu Sundeep
Smart Governance with Data Science

Smart Governance with Data Science

Address

An address can be entered in different ways. With no prescribed standard format its a lot tricker.

E.g. H.No 9/2, Road No: 10, Durga bhavani nagar, New Santosh Nagar, Hyderabad, Telangana 500059 is the complete address.

While filling the form I could get creative in following ways

  1. No House Number: Road No: 10, Durga bhavani nagar, New Santosh Nagar, Hyderabad, Telangana 500059
  2. No Street Name:H.No 9/2, Durga bhavani nagar, New Santosh Nagar, Hyderabad, Telangana 500059
  3. Skip Locality: H.No 9/2, Road No: 10, Hyderabad, Telangana 500059
  4. Skip Zipcode: H.No 9/2, Road No: 10, Durga bhavani nagar, New Santosh Nagar, Hyderabad, Telangana

Date format

Mixing date with month in the data format – mm/dd/yyyy vs dd/mm/yyyy

Big Data Problem

The Search solution was up against a classic big data problem with 3 V’s

  1. Volume – There are over 1 crore registered vehicles on road
  2. Velocity – 5000 to 6000 new registration each day across the state. Mind you this is during working hours and not the entire day.
  3. Variety – There is no standard defined in how a Name or Address has to be entered

The New Search Algorithm

Mr. Rao and the team, at Transportation Department, approached the problem of identifying second vehicle registration in two steps. As part of a greater initiative, a data lake was built that combined information from silos in 20 different government agencies. The aim was to have a 360-degree view of every citizen using big data and entity resolution, an algorithm which helps remove duplicates in the records without depending upon any unique ID like PAN or Aadhaar. This was named – Integrated people information hub software.

Smart Governance with Data Science

Smart Governance with Data Science

A new software algorithm was designed to cross-checks the names of vehicle owner, father’s name, date of birth, residential address, mobile number and Aadhaar number, to search if any other vehicle is existing in one of these names or numbers. Prior to building the new algorithm there was a detailed specification created on what the new search algorithm should do

  1. High Precision – Should have high matching accuracy. Never miss a valid match
  2. No False Positives – High negative match accuracy. Get least false positives.
  3. Quick Response – 3 second SLA for search response
  4. Rank results – Rank the top 4 or 5 best matches
  5. Dynamic Rules – Ability to define multiple rule combinations for identifying matches
  6. Handle Variations – Handle all variations in Name, Address and Date of Birth values

Execution & Results

 

A tender was floated soliciting bids for implementing the new search algorithm. The shortlisted software vendors had to build a prototype of their solution in 3 week period and prove their model accuracy against 40 lakh voter id records from the state. The implementation and deployment was completed in a month’s period post contract award.

There has been a significant increase in detection of second vehicles after the new search algorithm was rolled out. Less time for search translates to faster completion of vehicle registration and lesser arguments between citizen and assessing officer. Below is the summary of results from a comparable time period.

The answer to the question we asked in the title – Its a resounding YES.

The new search algorithm now fulfils the first goal of the law – Increase in revenue but doesn’t address the reduction in traffic yet :-). That for a different Big Data problem. We wish to see more such successful use of Big Data & Analytics by Government.

It was our pleasure to meet Mr. Rao after the chat for a lively conversation.

Please let us know what you think