This essay does not describe an existing computer program, just one that should exist. This essay is about a suggested student project in Java programming. This essay gives a rough overview of how it might work. I have no source, object, specifications, file layouts or anything else useful to implementing this project.
This project outline is not like the artificial, tidy little problems you are spoon-fed in school, when all the facts you need are included, nothing extraneous is mentioned, the answer is fully specified, along with hints to nudge you toward a single expected canonical solution. This project is much more like the real world of messy problems where it is up to you to fully the define the end point, or a series of ever more difficult versions of this project, and research the information yourself to solve them.
Everything I have to say to help you with this project is written below. I am not prepared to help you implement it; or give you any additional materials. I have too many other projects of my own.
Though I am a programmer, I don’t do people’s homework for them. That just robs them of an education.
You have my full permission to implement this project in any way you please and to keep all the profits from your endeavour.
Please do not email me about this project without reading the disclaimer above.To write an efficient disk defragger, model how you tidy your apartment, better still, how Martha Stewart tidies her house.
A defragger is a program that rearranges files on hard disk to speed up access. See defragger in the Java & Internet Glossary. You would want the defragger to put files in order by last access date so that the rarely used files are near the center (end) of the disk and the most used ones are near the outer edge (beginning). You want each file in one contiguous lump. Ideally, you want to move frequently used system files like the directories, free space tables, and system swap file as well. Within the i-node/MFT table you want to internally defrag as well, putting the directories together, and ordering by last access date, within that by directory. You can seek to optimise either file open time or find-file if you don’t have indexes to rapidly search.
A defragger may also do other tidying such as consolidating free space fragments, removing deleted dummy directory entries, sorting directories, and collecting files from the same directory together when their access timestamps are on the same day.
There may also be quick variants of the defragger that just seek to quickly clean up the files with the most fragmentation.
The defragger comes in five pieces:
Because undebugged defraggers trash hard disks, you will want to write a defragger simulator that is platform independent. You may want to write platform dependent data-gatherers, and ask your friends to send you copies of their messy disk layouts — not the data, just the current file positions and fragmentation so you can test your program on real-world situations.
In later debugging stages, you will want a large hard disk with a free partition you can trash and recreate with a partition copier.
The simulator lets various people try coding the algorithm safely without trashing any disks or getting involved in the low level details of defragging. You can then plug in the most efficient defragger.
Your simulator’s scorekeeper should reward doing I/O in large contiguous lumps. It should reward keeping disk arm motions to a minimum both in number and length of leap. It should, as best as possible, simulate true elapsed time.
I have noticed that most of the I/O of defragging with the OS (Operating System) interface is not moving the file bodies, but rather updating the directory to point to the new location. So I suspect an algorithm that only moved files whose destination is already free will be faster, even if the arms jump all over. You want very hard to avoid moving a file to a temporary resting place.
How do you defrag when there are other tasks using the files as you work?
Ideally, you should be able to specify a long list of moves to make, and NT would do them in one or more sweeps across the disk, handling perhaps dozens of files per sweep, depending on how much RAM (Random Access Memory) it has available. The only restriction is that all the target clusters need to be free before you submit the list. You would not be allowed to do a list of moves that requires an internal ordering of the moves to free up space before the rest of the list of moves can complete. When you need to do that, you must break the list in two and submit them separately.
If you watch Raxco PerfectDisk defragger’s display as it uses the NT interface, it can drive you nuts. Nothing appears to be happening for 5 minutes at a stretch. It behaves like some bored barber endlessly picking away at your hair, cutting individual strands, shortening by never more than a millimeter per snip. Then, all of a sudden, there is a noticeable little burst of activity as its moves a large file, which can be done in reasonably efficient chunks. Then it goes back to its nitting.
Imagine the disk arms are like a bus picking up passengers and dropping them off as it sweeps back and forth across the disk moving clusters to their new locations. The clusters being moved by read/write are like passengers. Here are some rules of thumb.
Your algorithm can be adapted to work with any sort of partition. You could produce versions for various types of Unix partitions or even mainframe partitions as well. Your product might then sell for thousands of dollars per copy. You must have a way of locking the entire disk for your exclusive access, and there must be a way to do low level physical i/o.
You might try simulating the algorithms used in the commercial defraggers such as Norton SpeedDisk for Windows9X, and NT, Executive Software Diskeeper, Raxco PerfectDisk or PowerQuest PartitionMagic. Then if your algorithms are substantially faster, you might be able to persuade them to reissue their products using your algorithms.
You might use your algorithms in proprietary firmware inside a hard disk to handle defragging/marthaing that was transparent to the operating system, thus creating a seemingly super-fast drive. It would appear to have almost 0 seek time. It might be implemented internally as two separate drives that spend their lives mirroring each other while simultaneously defragging to the other drive. By having two sets of heads, either on the same set of surfaces on separate sets, you would get away with much less head motion. The software can then be as messy as it likes. The hardware will clean up after it. In fact, the irony will be, defragging such a smart hard disk with software will cause the system to temporarily slow down! The hard drive internally is totally unaware of the file and directory structure the operating system is using. It works purely with a giant list of physical clusters. Because everything happens inside the hard disk, there is absolutely no problem with the disk being used simultaneously with being defragged.
What if you are a poor student with no access to such firmware? You can write a simulator that can be used to compute just how much faster such a smart disk would be in real life. Then perhaps you could persuade one of the high end SCSI (Small Computer System Interface) hard disk makers to build a prototype. They might be interested. It is extremely difficult to build a hard disk even a tiny bit faster than your competitors. If you can do it, you can charge a big premium.
You might build a realistic prototype using a PCI (Peripheral Component Interconnect) KVM (Kilobyte Virtual Machine) card that has hard disk support. You then need to write a device driver for your card that makes it look like an ordinary disk drive. The KVM runs your Martha software.
If you had an OS that would let you reserve additional physical space for a file beyond what is written, then you could monitor file growth rate, and allocate additional space so that a file would stay in one fragment until the next defrag. I don’t think Windows lets you do this. You could do it in the old IBM (International Business Machines) OS/360. The problem with Windows is every log file instantly fragments again after a defrag.
You could also write a defragger than required a spare empty drive to copy the data to, tidying as it went sorting by last-access-date, then copy back, always in the background. The OS would be tricked into looking for a file on one of the two disks, treating the pair as if it were one logical disk. Even a file in the process of being moved could be simultaneously used by the actual apps, with writes always going to the target.
available on the web at:
optional Replicator mirror
Please email your feedback for publication, letters to the editor, errors, omissions, typos, formatting errors, ambiguities, unclear wording, broken/redirected link reports, suggestions to improve this page or comments to Roedy Green : . If you want your message, your name or email kept confidential, not considered for public posting, please explicitly specify that. Unless you state otherwise, I will treat your message as a letter to the editor that I may or may not publish in the feedback section. After that, it will be too late to retract it. If you disagree with something I said, especially when sending an ad-hominem attack, a rant composed mainly of obscenities or a death threat, please quote the offending passage and cite the web page where you found it, tell me why you think it is wrong, and, if possible, provide some supporting evidence. I can’t very well fix erroneous or ambiguous text if I can’t find it.
Your face IP:[22.214.171.124]
|Feedback||You are visitor number 17,423.|