This is an analysis of the voters list for Entire Bangalore city. The database consists of 4 years worth of voter records, each year has over 6 million records. For each voter the database captures the following fields.
Field | Description |
---|---|
AC[0-9]+ | A combination of the AC and the part number |
[0-9]+ | A 3 digit code |
[0-9]+ | Serial number |
[A-Z]+[0-9]+ | Voter ID Number a.k.a EPIC Number |
[A-Z]+ | Voters Name |
[FM] | Voters Gender |
[0-9]+ | Voters Age |
[A-Z]+ | Voters Relative’s Name |
[FH] | Relationship with Relative |
[A-Z] | Added/deleted/moved etc. |
R is not able to read the data and crashes with a memory limit message. So as a first step we will split each year’s data into separate AC wise data files. For the purpose of this experiment we will take the voter List released on Jan 2013 as our reference voter list.
To validate the robustness of the dataset we can try to answer the following questions.
Institutions like IIM, IIsc, Medical and Engineering colleges have a large number of out station students living in Hostel. As per the ECI Rule these students are eligible for Voters ID. The question we are trying to answer is, did they apply for and get their voter ID? Since the prof’s also live on campus we will assume a prof to student ratio of 1:20. Many of the Prof’s will also have their children staying with them so some of them may ‘leak’ into the student category. We will check the partwise ratio and compare it with the ratio for rest of the AC to determine the feasibility of identifying IISc Polling booths.
## minage maxage type
## [1,] "20" "27" "Students"
## type part_max max part_min min
## 1 Students AC157024 80 AC157027 7.097
Looking up the parts with high percentage of youth compared to citizens above 27 (AC157024=78% and AC157025= 46%) gives us MSR Hostel. So we did manage to find an educational institution, but not the one we expected to find.
We define a youth center as a place having significant population in between 18 to 25, a senior citizen facility as a place having significant population above 58
The top 5 AC’s which have a concentration of youth’s in their part No’s are
## part_max max
## 23 AC176289 37.14
## 24 AC160097 37.91
## 25 AC152081 42.61
## 26 AC172092 49.15
## 27 AC157024 80.49
The bottom 5 AC’s which have a the lowest concentration of youth’s in their part No’s are
## part_min min
## 1 AC166191 0.6452
## 2 AC161005 1.2500
## 3 AC160135 1.3889
## 4 AC169133 2.0997
## 5 AC164003 2.2388
Similarly there are some areas where senior citizens form a major part of the voters and some areas without any senior citizens.
The top 5 AC’s which have a concentration of senior citizen’s in their part No’s are
## part_max max
## 23 AC159177 35.21
## 24 AC158099 35.29
## 25 AC161148 39.08
## 26 AC157092 41.79
## 27 AC160159 42.08
The bottom 5 AC’s which have a the lowest concentration of Senior’s in their part No’s are
## part_min min
## 1 AC174172 0.6494
## 2 AC176174 1.5365
## 3 AC155243 1.7214
## 4 AC175155 2.4010
## 5 AC172205 2.4590
Since it is easier to detect maritial status of the female we will check for the age of marriage.
## "","x"
## "1","ac_maritial_status.csv"
We see that depending on the AC, from 12-25% of the females get married at the age of 18 (It is surprising to see that not even a single AC is at 0% at 18) and grows exponentially until we reach a peak at around 37. Around 90 we start seeing a relation other than husband again being used.
Checking for the Male:Female ratio We see that at the age of 18 we have 10% More males compared to Females. This ratio falls down to nearly 0% by the age of 30 after which it stabilizes to 5% more males till the age of 60-70 from the age of 90+ Male % starts falling and reaches 0% by the age of 100. Post 100 The % goes in negative, but this may be due to wrong record keeping by the EC and can be a stastical anolmoly due to the low voter count in this age range.
The typical child bearing age of a female is upto 32,We will assume that girls have their first child by 27 and second by 30-32 years.
So subtracting 27 from the age of all women above 27 will give us approximate age of the first child and subtracting 31 will give the approximate age of the second child. We will be using this data to identify AC’s which require pre-schools, Schools, and PU Colleges. We will project the requirement for the next 3 years. i.e.
The top-5 AC’s which need to invest in pre-schools are.
## X AC category count Standard.Deviation
## 76 76 AC175 Preschool 30210 0.8295
## 4 4 AC151 Preschool 30929 0.8326
## 16 16 AC155 Preschool 31036 0.8404
## 73 73 AC174 Preschool 33867 0.8311
## 79 79 AC176 Preschool 41515 0.8365
The top-5 AC’s which need to invest in PU-Colleges are.
## X AC category count Standard.Deviation
## 75 75 AC174 PU 16923 0.7950
## 15 15 AC154 PU 16997 0.7961
## 9 9 AC152 PU 17124 0.7985
## 12 12 AC153 PU 17252 0.8063
## 81 81 AC176 PU 21066 0.7985