Child Abuse Database
by Roedy Green ©1996-2008 Canadian Mind Products
This essay is about a suggested
student project in
Java programming. This essay gives a rough overview of how it might work. It
does not describe an actual complete program. I have
no source, object,
specifications, file layouts or anything else useful to implementing this
project. Everything I have to say to help you with this project is written below.
I am
not prepared to help you implement it; I have too many other
projects of my own.
I do contract work for a living, which could include writing a program such as
this. However, I don’t do people’s homework
for them. That just robs them of an education.
You have my full permission to implement this project any way you please.
Tracks suspicious incidents, in themselves not conclusive, but when viewed in
totality show a picture of imminent danger. This is the brainchild Alexander
Investigations run by Glen Morrison, an ex-cop and private investigator. By
tying together reports based on numerous small details which indicate they may
refer to the same person, you can build a more complete picture of a suspect.
Imagine computing a correlation matrix of every report against every other
report by how many points of similarity they had. You might then be able to
group reports into related families. It would greatly help an investigator if he
or she had to look only at one family of reports at a time. The computer might
then report the largest families as ones most worthy of deeper investigation.
If a very serious incident were reported, such as an attempted abduction, an
investigator could ask to see all reports closely related to that crucial one.
One of them might contain something very valuable like a license number or very
detailed physical description. Perhaps one of the suspect’s frequent haunts
would come clear. From those aggregate reports, the investigator might be able
to put together a picture of the suspect’s movements over time, and use that to
help gather more information.
The main intent is to deal with anonymous sexual predators, but the system might
also work to report incidents of physically abusive parents. By seeing a large
number of reports of a pattern of abuse from many different witnesses,
authorities might be more willing to act in a case where the parents are good
liars and pass off the results of abuse as accident proneness.
Confidentiality is very important. The whole project smacks of "Big Brother
is watching you". It treads on moral thin ice. Most of the reports in the
database will be completely innocent. A father may be reported as lurking at a
playground when he was just waiting for his daughter to finish her swimming
lesson. Ideally some group independent of the police would run the database. The
database would never be stored on a machine with an Internet connection and it
would need to be stored in an encrypted format. Since sexual child abusers may
move from city to city, ideally there should be ways of consolidating databases
or exchanging information between databases for different cities. You might
permit reporting suspicious events via the Internet, but you would not want the
main database on-line.
What sort of data might you potentially record about a suspicious incident? Here
is a very rough guide of the sorts of fields that would be needed. To refine the
fields and multiple choice value encodings, you need to try encoding a few
hundred sample suspicious event reports.
Incident Report
- date/time called in.
- date/time of suspicious event.
- name of reporter (optional).
- gender of reporter.
- phone, address, contact information of reporter (optional),
- type of location of suspicious occurrence: playground, school, park, residence,
other.
- street address of suspicious occurrence.
- grid location (e.g. latitude longitude) Ideally this could be deduced
automatically from a street address.
- what happened that made you suspicious. encodings needed.
Person Report
- name of suspect. Will not normally be known, but a first name or nickname might.
- Contact information (phone, address, hangout) for the suspect. Will not
generally be known.
- height: work in either English or metric units. Internally works in metric.
- weight: skinny, slim, average, heavyset, obese, athletic.
- race: Caucasian, black, hispanic, Asian, Indo.
- country of origin: Australia, New Zealand, USA, Germany, France, England,…
See the list of country codes .
- glasses:.
- facial hair: clean shaven, mustache, beard.
- head hair: skinhead, bald, short, long, shoulder length, shoulder plus.
- hair type: curly, straight, wavy.
- hair colour: brown, black, blond, red, dyed.
- eyes: brown, blue, green, hazel.
- tattoo: yes, no, description: (keyword encodings needed, make up new ones on the
fly).
- distinguishing features: (keyword encodings needed, make up new ones on the fly).
Clothing Report
- jacket: blue, black, yellow, red, orange, green, indigo, light, dark.
- pants: long, short, blue, black, yellow, red, orange, green, indigo, light, dark.
- shirt: short sleaves, long sleaves, muscle shirt, blue, black, yellow, red,
orange, green, indigo, light, dark.
- watch: on left or right hand.
- rings: count.
- gloves: leather, knit.
- hat: baseball cap, colours, toque, cowboy hat, military, police.
Vehicle Report
- license plate number, with province/state/country. See the list
of country codes.
- type of car: van, car, big, small, American, European, Japanese.
- colour: light, dark, red, blue, green, brown, black, white, yellow, purple two-tone.
- damage: any combination of: front, back, left, right, windows.
Matching logic would be very similar to that used in an Abundance program I
wrote for the Humane Society to match lost cats with their owners. For example a
match of red hair against blond is considered a partial match — better
than red against black, but not as good as red against red. Some sorts of match
are more important than others. You can build up a points/weighting scheme. In
the Abundance Cats program I did most of the calculations for cascaded weighting,
normalizing the weights at compile time, and removed as much as possible of the
matrix calculation out of the inner loops. This made the matching computation
very fast. The databases could potentially get very large. You will need to rely
on SQL and various background batch processing. You use SQL for coarse match
filtering, then weighted point scoring for fine matching.