Child Abuse Database
by Roedy Green ©1996-2009 Canadian Mind Products
This essay does not describe an existing computer program, just
one that should exist. This essay is about a suggested
student
project in Java programming. This essay gives a rough overview of how it
might work. I have
no source, object, specifications, file layouts or
anything else useful to implementing this project. Everything I have to say to
help you with this project is written below. I am
not prepared to help
you implement it; I have too many other projects of my own.
I do contract work for a living, which could include writing a program such as
this. However, I don’t do people’s homework
for them. That just robs them of an education.
You have my full permission to implement this project in any way you please and
to keep all the profits from your endeavor.
Tracks suspicious incidents, in themselves not conclusive, but when viewed in totality show a picture of imminent danger.
This is the brainchild Alexander Investigations run by Glen Morrison, an ex-cop and private investigator. By tying
together reports based on numerous small details which indicate they may refer to the same person, you can build a more
complete picture of a suspect.
Imagine computing a correlation matrix of every report against every other report by how many points of similarity they
had. You might then be able to group reports into related families. It would greatly help an investigator if he or she
had to look only at one family of reports at a time. The computer might then report the largest families as ones most
worthy of deeper investigation.
If a very serious incident were reported, such as an attempted abduction, an investigator could ask to see all reports
closely related to that crucial one. One of them might contain something very valuable like a license number or very
detailed physical description. Perhaps one of the suspect’s frequent haunts would come clear. From those aggregate
reports, the investigator might be able to put together a picture of the suspect’s movements over time, and use
that to help gather more information.
The main intent is to deal with anonymous sexual predators, but the system might also work to report incidents of
physically abusive parents. By seeing a large number of reports of a pattern of abuse from many different witnesses,
authorities might be more willing to act in a case where the parents are good liars and pass off the results of abuse as
accident proneness.
Confidentiality is very important. The whole project smacks of "Big Brother is watching you". It treads on
moral thin ice. Most of the reports in the database will be completely innocent. A father may be reported as lurking at
a playground when he was just waiting for his daughter to finish her swimming lesson. Ideally some group independent of
the police would run the database. The database would never be stored on a machine with an Internet connection and it
would need to be stored in an encrypted format. Since sexual child abusers may move from city to city, ideally there
should be ways of consolidating databases or exchanging information between databases for different cities. You might
permit reporting suspicious events via the Internet, but you would not want the main database online.
What sort of data might you potentially record about a suspicious incident? Here is a very rough guide of the sorts of
fields that would be needed. To refine the fields and multiple choice value encodings, you need to try encoding a few
hundred sample suspicious event reports.
Incident Report
- date/time called in.
- date/time of suspicious event.
- name of reporter (optional).
- gender of reporter.
- phone, address, contact information of reporter (optional),
- type of location of suspicious occurrence: playground, school, park, residence, other.
- street address of suspicious occurrence.
- grid location (e.g. latitude longitude) Ideally this could be deduced automatically from a street address.
- what happened that made you suspicious. encodings needed.
Person Report
- name of suspect. Will not normally be known, but a first name or nickname might.
- Contact information (phone, address, hangout) for the suspect. Will not generally be known.
- height: work in either English or metric units. Internally works in metric.
- weight: skinny, slim, average, heavyset, obese, athletic.
- race: Caucasian, black, hispanic, Asian, Indo.
- country of origin: Australia, New Zealand, USA, Germany, France, England,… See the list
of country codes .
- glasses:.
- facial hair: clean shaven, mustache, beard.
- head hair: skinhead, bald, short, long, shoulder length, shoulder plus.
- hair type: curly, straight, wavy.
- hair colour: brown, black, blond, red, dyed.
- eyes: brown, blue, green, hazel.
- tattoo: yes, no, description: (keyword encodings needed, make up new ones on the fly).
- distinguishing features: (keyword encodings needed, make up new ones on the fly).
Clothing Report
- jacket: blue, black, yellow, red, orange, green, indigo, light, dark.
- pants: long, short, blue, black, yellow, red, orange, green, indigo, light, dark.
- shirt: short sleaves, long sleaves, muscle shirt, blue, black, yellow, red, orange, green, indigo, light, dark.
- watch: on left or right hand.
- rings: count.
- gloves: leather, knit.
- hat: baseball cap, colours, toque, cowboy hat, military, police.
Vehicle Report
- license plate number, with province/state/country. See the list of
country codes.
- type of car: van, car, big, small, American, European, Japanese.
- colour: light, dark, red, blue, green, brown, black, white, yellow, purple two-tone.
- damage: any combination of: front, back, left, right, windows.
Matching logic would be very similar to that used in an Abundance program I wrote for the Humane Society to match lost
cats with their owners. For example a match of red hair against blond is considered a partial match — better than
red against black, but not as good as red against red. Some sorts of match are more important than others. You can build
up a points/weighting scheme. In the Abundance Cats program I did most of the calculations for cascaded weighting,
normalizing the weights at compile time, and removed as much as possible of the matrix calculation out of the inner
loops. This made the matching computation very fast. The databases could potentially get very large. You will need to
rely on SQL and various background batch processing. You use SQL for coarse match filtering, then weighted point scoring
for fine matching.