Fluffiness of File Formats


This essay does not describe an existing computer program, just one that should exist. This essay is about a suggested student project in Java programming. This essay gives a rough overview of how it might work. I have no source, object, specifications, file layouts or anything else useful to implementing this project. Everything I have prepared to help you is right here.

This project outline is not like the artificial, tidy little problems you are spoon-fed in school, when all the facts you need are included, nothing extraneous is mentioned, the answer is fully specified, along with hints to nudge you toward a single expected canonical solution. This project is much more like the real world of messy problems where it is up to you to fully the define the end point, or a series of ever more difficult versions of this project and research the information yourself to solve them.

Everything I have to say to help you with this project is written below. I am not prepared to help you implement it; or give you any additional materials. I have too many other projects of my own.

Though I am a programmer by profession, I don’t do people’s homework for them. That just robs them of an education.

You have my full permission to implement this project in any way you please and to keep all the profits from your endeavour.

Please do not email me about this project without reading the disclaimer above.

The intent of this project is to compare how space-efficient, compact various file formats are. I think you will discover XML (extensible Markup Language) is absurdly fluffy compared with the alternatives. This is a project suitable for a rank beginner.

To implement it, you create an array of sample Objects in RAM (Random Access Memory). The sample class would have a variety of data types, e.g. some ints, some longs, some Dates, some Strings.

You then write this data out in various formats: ObjectOutputStream, DataOutputStream, FileWriter, CSV (Comma-Separated Value), XML, ASN.1 (Abstract Syntax Notation 1) etc.

Then you measure the size of each of the output files and report on just how fluffy each format is.

You don’t need to write much code. The FileIO Amanuensis will generate code for ObjectOutputStream, DataOutputStream and FileWriter. For CSV, see CSVWriter. For XML, see XML built-in classes. For ASN.1 see ASN.1.

For a simpler project, leave out XML and ASN.1. For a more difficult project, add extra formats, e.g. SQL (Standard Query Language) databases from various vendors.

People can plop in their own datastructures into your framework to see the difference in their own projects. Leave instructions in the code and structure the code, to make this easy.

