The flaws affect motherboards from ASUSTeK, AT&T (American Telephone & Telegraph), DEC (Digital Equipment Corporation), Dell, Gateway, Intel, Micron, NEC (Nippon Electric Corporation), Zeos and others. Since Intel makes so many of the motherboards sold under other brand names, the flaws affect many machines, both 486 and Pentium PCI.
The flaws show up most frequently when you run a true multitasking operating system such as OS/2 Warp or NT. It also shows up under Windows For WorkGroups in 32-bit mode during tape or floppy backup and restore. In theory the flaws could do damage under DOS (Disk Operating System), DESQview, Windows and Windows For WorkGroups in 16-bit mode, but so far there have been no damage reports. Windows-95 contains code to bypass the flaws.
The RZ-1000 has two flaws. The CMD-640 has those same two flaws plus three others. To make matters worse, most motherboard manufacturers using these two flawed chips connected them up incorrectly. There are software bypasses for these flaws. However, the Warp fix the CMD-640 reduces disk performance by 15 to 50%. The RZ-1000 fix has negligible impact on disk I/O though it can slow down background processes.
I would advise new hardware to bypass the CMD-640 flaws and living with software fixes to bypass the RZ-1000 flaws.
This corruption happens when you are simultaneously using your EIDE or IDE hard disk and some other device, most commonly the floppy drive or mag tape backup.
The same sorts of problem may occur on reading a CD-ROM (Compact Disc — Read Only Memory) drive attached to an EIDE port.
Not only does this corruption occur, but it occurs quietly, often going unnoticed.
If the system crashes, you usually put the blame on the operating system software, or the application. It might actually be a faulty RZ-1000 or CMD-640 EIDE controller chip nailing you.
When a directory becomes corrupted, you may not notice it until the damage is irreparable. If a spreadsheet application reads a comma-delimited ASCII (American Standard Code for Information Interchange) file, it may simply miss a few bytes in a number, an error that may go unnoticed and that error could cascade through the rest of the spreadsheet.
If you have had unexplained crashes in OS/2, you have probably experienced the problem and should make a thorough check for hidden corruption. Remember that the bug may only slightly alter your data and the corruption may not be obvious.
Keep in mind that not every problem is the RZ-1000’s or the CMD-640’s fault. Overheating, unrelated hardware faults and design flaws, or software bugs can cause similar symptoms. DMA (Direct Memory Access) channel conflicts also cause similar symptoms. Happily, EIDEtest and CDTest can unmask all manner of simultaneous I/O faults.
Unfortunately, correcting the problem just stops further file corruption. It will not help to clean up the existing damage to your files. Right now, the focus is on bypassing the flaws. Preventing further corruption is child’s play compared with the nightmare of trying to track down all the existing random errors in files. Backups even from day one may be corrupt. If you have the either of the flawed chips, you will probably never be able to completely eliminate the effects of past corruption.
PCI machines with Intel BIOS (Basic Input Output System) es that run only DOS, DESQview, Windows 3.1 or Windows-95 are safe. If you have a non-Intel BIOS and run only DOS, DESQview, Windows 3.1, Windows-95 and never use the fast mode simultaneous disk I/O feature on floppy or tape backup/restore, you are safe.
You still might want to test your machine. There are similar problems with other causes the tests will unmask.
Scot did most of the initial work documenting the first RZ-1000 flaw. He wrote a program called IOtest that can detect the flaws if:
Scot originally called his test program DMAtest because he erroneously thought simultaneous DMA was the sole culprit. Do not confuse PowerQuest DMAtest with Gazelle’s DMAtest which only tests if the floppy drive will work happily simultaneously with the hard disk.
The world needed an easier-to-use test that would run under DESQview, Windows, Windows For WorkGroups, Windows 95, NT and OS/2. So I wrote EIDEtest to test for the flaws without requiring you to create a special partition or buy Warp OS/2. I also wrote CDTest to test for the flaws when you have an EIDE CD-ROM drive.
You can also get both programs from me by snail mail.
If these tests fail, it proves you have a serious problem, but not necessarily that you have the RZ-1000 or CMD-640 chip.
If the tests pass, you still may have a problem since, especially under DOS, DESQview and Windows, the flaws may only show up very rarely. If you run the tests under Windows-95 they will always pass, even if you have a defective chip because the operating system already bypasses the flaws. If you suspect trouble, run the tests several times.
The Warp disk driver IBM1S506.ADD with the /V switch will tell you if you have the RZ-1000 or CMD-640 chip.
Intel has written a new test that looks directly for either of the two faulty chips called CtrlTest.exe, however, it is filed under its old name RZTest.exe.
The Windows-95 Control Panel will also report on the EIDE controller chip.
|Acculogic VL Paddleboard||CMD-640||Mark Lord (firstname.lastname@example.org) tentative|
|Acer Power P75||CMD-640||John Harvey, Beta Machinery Calgary|
|ACMA P590||?||Bob Smith|
|AST Bravo MS-T P/75||CMD-640||Mike Coplien (email@example.com)|
|ASUSTeK PCI/I P54SP4||CMD-640||Marco Trunzer (firstname.lastname@example.org)|
Maurice Schekkerman (email@example.com)
Mike Coplien (firstname.lastname@example.org)
Robert Schultz (email@example.com)
Thomas L. Kusterer (firstname.lastname@example.org)
|AT&T Globalyst 590||RZ-1000||Brian Myrick (email@example.com)|
|AT&T Globalyst 600||RZ-1000||Brian Myrick (firstname.lastname@example.org)|
|AT&T Globalyst 630||CMD-640||Mike Coplien (email@example.com)|
|CMD CSA-62101Kx VL2 IDE paddleboard||CMD-640B||George Voros (firstname.lastname@example.org)|
|Compaq Presario||CMD-640||Walter Wu (email@example.com)|
|Compaq Prolinea||CMD-640||Walter Wu (firstname.lastname@example.org)|
|DEC Celbris 590||CMD-640||Fred Thomsen (email@example.com)|
|DEC Starion 700I||CMD-640||Mike Coplien (firstname.lastname@example.org)|
|DEC Venturis 466||CMD-640||Mike Coplien (email@example.com)|
|DEC Venturis 560||CMD-640||Fred Thomsen (firstname.lastname@example.org)|
|Dell Dimension XPS (XML Paper Specification) P100||RZ-1000||Scot Llewelyn (email@example.com)|
|Dell Dimension XPS P75||RZ-1000||Steve Ertman (firstname.lastname@example.org)|
|Dell Dimension XPS P90||RZ-1000||Dong Chen (D_Chen@netcom.com)|
Larry Lai (email@example.com)
Lawrence Rounds (firstname.lastname@example.org)
Mike Griggs (email@example.com)
Mike Heath (firstname.lastname@example.org)
Moira Watson (email@example.com)
Nathaniel Beck @weber.ucsd.edu
Wijadi Jodi (firstname.lastname@example.org)
|Dell Optiplex 575||CMD-640||Mike Coplien (email@example.com)|
|Dell Optiplex XM 590||CMD-640||Aron Eisenpress (firstname.lastname@example.org)|
|Dell XPS-133c||neither||Blake Scholl (email@example.com)|
|EliteGroup S154P-AIO||CMD-640||Ulf Volz (firstname.lastname@example.org)|
|EliteGroup UM8810P-AIO||CMD-640||Bodo Huckestein (bh@thp.Uni-Koeln.DE) Guy Kapteijns (W.Kapteijns@kub.nl)|
(Intel Premiere ATLX)
|CMD-640||Detlef Meier (email@example.com)|
Rogier van Wanroij (firstname.lastname@example.org)
|Escom P60I||CMD-640||Tim Schofield (email@example.com)|
|Escom P90||RZ-1000||Karl Knoflach (firstname.lastname@example.org ) (Xav@mantra01.demon.co.uk)|
|Gateway 2000 P5-60, Intel Mercury Rev 3||RZ-1000||Angus Black (email@example.com)|
Gary Farr (firstname.lastname@example.org)
Daron Davis (email@example.com)
Jerry Lynch (firstname.lastname@example.org)
Keith Patterson (email@example.com)
Rick Gregory (firstname.lastname@example.org)
Roy L. Smith (email@example.com)
|Gateway 2000 P5-66||RZ-1000||Randy Nerwick (firstname.lastname@example.org)|
|Gateway 2000 P5-90||RZ-1000||Alan Murphy (email@example.com) Roy L. Smith (firstname.lastname@example.org)|
|CMD-640||Yacov Jegher (email@example.com)|
|HP (Hewlett Packard) Vectra 590||CMD-640||Javier Vizcaino (firstname.lastname@example.org)|
|Intel Hendrix||CMD-640||Clif Purkiser Intel Corp (email@example.com)|
|Intel Insight P5-60
Premiere PCI II Baby AT (Advanced Technology), Neptune Chipset
|RZ-1000||Jim Arnone (firstname.lastname@example.org)|
|Intel Plato 90||RZ-1000||Adrian Teo (email@example.com)|
Alain Rassel (Alain.Rassel@restena.lu)
Chris Norman (firstname.lastname@example.org)
Clif Purkiser Intel Corp (email@example.com)
Kevin Chua (firstname.lastname@example.org)
Kevin T. Van Maren (email@example.com)
Kim Hvarre (firstname.lastname@example.org)
Martin Kogelbauer (email@example.com)
Rick Nelson (firstname.lastname@example.org)
Richard Techmanski (email@example.com)
|Intel Premiere||RZ-1000||Clif Purkiser Intel Corp (firstname.lastname@example.org)|
|Intel Premiere LPX||CMD-640||Clif Purkiser Intel Corp (email@example.com)|
|Intel Premiere MM||CMD-640||Clif Purkiser Intel Corp (firstname.lastname@example.org)|
|Intel Robin LC||CMD-640||Clif Purkiser Intel Corp (email@example.com)|
|Knowledgebase P90 laptop||CMD-640||Andy Longton (firstname.lastname@example.org)|
|Micron P75||CMD-640||Leroy Latta (email@example.com)|
|Micron P5-90||CMD-640||Primary fails, secondary is OK. Eric Johnson (firstname.lastname@example.org) Jim Short (email@example.com) Mike Coplien (firstname.lastname@example.org)|
|Micronics M54Pi||CMD-640||Adam Haar (email@example.com)|
|Midwest Micro P90||CMD-640||(firstname.lastname@example.org)|
|NEC Image P90||CMD-640||Mike Coplien (email@example.com)|
|Packard Bell Legend 100CD||CMD-640||James Treworgy (firstname.lastname@example.org)|
|PCI-EIDE local clone, Phoenix BIOS 4.04, ALI chipset||CMD-640||(email@example.com)|
|Quantex P5/90 PM-2||RZ-1000||Jay Schamus (firstname.lastname@example.org)|
|S1366 PCI EIDE paddeboard||CMD-640B||Ross Fleming (email@example.com)|
|Scandic UMC VIO8810A||CMD-640B||Daniel Spangberg (firstname.lastname@example.org)|
|Soyo SY-4SA2 486 prior to B5||?||Jeffrey Hurwit (email@example.com)|
|Tagram SQ-588||CMD-640||Kurt Krasinski (firstname.lastname@example.org)|
|Unknown 486 DX||SMC37650||Eric Stephen Mountain (email@example.com )|
|Unknown 90 MHz||?||Andreas (firstname.lastname@example.org) Carol Lim (email@example.com)|
|Viglen P90 (Intel Plato)||RZ-1000||Phil Buckley (firstname.lastname@example.org)|
|Vobis||RZ-1000||Thomas Wagner (email@example.com)|
|Vobis 4886DX2-66||CMD-640||Guy Kapteijns (W.Kapteijns@kub.nl)|
|Zenon P90||RZ-1000||Aria Novianto (firstname.lastname@example.org)|
|ZEOS Pantera||RZ-1000||Paul Whitelock (paulw9DDFL3r.DDI@netcom.com)|
|Arsys P200-PCI||Triton/sis||Robert Aboud (email@example.com)|
|ASUSTek PCI/I-P54TP4||Triton||Roedy Green|
|Dell Dimension XPS P90c||?||Note: older versions of this board were flawed. Dave Nuttall (firstname.lastname@example.org)|
|Intel Zappa||Triton||Ron McGlade (email@example.com)|
|Micronics 486 VLB||?||Bob Meredith (firstname.lastname@example.org)|
|Seanix||Opti Viper||Bill Unruh (email@example.com)|
|Soyo SY-4SA2 486/B5||SYS||Jeffrey Hurwit (firstname.lastname@example.org)|
Whatever method you use to bypass the flaws, retest with EIDEtest and CDTest afterwards to be sure your fix worked and you caught all the problems.
The first thing to do is to re-install your operating system and all your application programs. This will replace any damaged EXE and DLL (Dynamic Link Library) files.
Catching errors in your data files is more difficult. Keep your eyes peeled for any improbable spreadsheet results. You may have to hire a programmer to write you some comb programs to sniff through your databases, looking for suspicious values.
If you routinely use the verify feature of Lotus Magellan for DOS, it can detect changes to files that should not have changed. This may help you uncover some of the damage. The flaws are not polite enough to redate the files they corrupt.
If you have backups from before the time you bought the faulty machine, you can restore them and re-key everything.
Most people will not be so fortunate. All their backups will also be corrupt.
Most people with flaws will just have to put up with random errors dotting their data files ever after.
|Operating System||Work Around|
SCO Unix 3.1+
|- No problems reported.|
|No problems reported so far. If you do have trouble:
|Windows For WorkGroups|
|Windows NT 3.1|
|Windows NT 3.5|
|OS/2 Warp 3||Apply Fixpack 10, it contains all the special fixes.
If for some reason, you are unwilling to apply Fixpack 10, you can do the following:
|Linux||All current Linux kernels have a workaround that can be compiled in (you may
have to compile your own kernel though).
For older versions:
PC-Tech manufactured the faulty RZ-1000 EIDE controller chip used in many PCI motherboards. PC-Tech is a subsidiary of ZEOS, the clonemaker. In turn Micron Electronics owns ZEOS. PC-Tech has offices just down the street from Zeos in Minnesota. Intel bought the chips from PC-Tech and in turn many clone makers bought motherboards from Intel. Other motherboard manufacturers also used the faulty chips. In a similar way Intel and other companies also used the CMD-640 chip from the CMD Technology Corporation of Irvine California.
PC-Tech, Intel and the clone makers all failed to test their designs properly. The software makers did not test their software on enough machines to show up the problem before releasing it.
Even worse, in some motherboard designs, Intel used the CMD-640 chip. This goof was inexcusable, since the chip, by deliberate design, is incapable of simultaneous I/O.
How did the flawed CMD-640 chip and the RZ-1000 slip through Quality Assurance testing? My guess is no one did real world testing; technicians only tested under laboratory conditions using only simple operating systems like DOS. They might have ignored flaws that happened only sporadically, blaming it on a faulty chip rather than a faulty design. It is very hard to catch a flaw that only manifests rarely.
CMD, PC-Tech, Intel and Microsoft have known about how to bypass these problems for quite some time. IBM was aware there was a problem but was unaware of the solution. For obvious reasons, these companies were reluctant to inform the public of the danger of the ongoing subtle corruption.
No one who understood the RZ-1000 and CMD-640 flaws publicised their findings. If PC-TECH, Intel and Microsoft had not been so secretive, they could have averted the damage. Perhaps they were silent because the flaws primarily hurt the customers of competitor, IBM.
The collective damage done by withholding information about the flaws is huge, certainly many millions of dollars for those large companies whose backups are corrupt as well. It will be interesting to see if anyone launches a damage lawsuit against CMD, PC-Tech, Intel or Microsoft. If they do, it might make both hardware and software makers more careful about releasing improperly tested products.
IBM is not totally innocent either. According to Massimiliano Vispi (email@example.com), on 1994-06-17, IBM posted a document:
Sam Detweiler of IBM explained that this referred only to the trailing 2 byte loss RZ-1000 problem. IBM was not aware of the concurrent floppy problem with prefetch at that time.
Discussions with Intel and PC-Tech lead IBM to believe that re-writing the interrupt handler to avoid reading the IDE status register recursively would solve the problem. PC-Tech never did explain the precise failure mechanism.
IBM says the CMD-640 problem also appeared in 1994-10 with the Vobis systems. CMD did not inform IBM of the problem.
Prefetch also affected the CMD chips (640, 640A and 640B). CMD built their own driver based on IBM code to handle the serialization problem. They did not fix the prefetch problem in their driver so it appears they too were unaware of it at this time.
There is potential here for some massive lawsuits. No wonder the companies who knew about the flaws have been so tight-lipped. Think of the damage if Boeing or GM (Genetically Modified) had its plans for coming products stored on flawed machines. Literally, these flaws could cause plane crashes.
Intel literature on the RZ-1000 and CMD-640 only refers to (3). Intel cannot very well speak for (1) and (2) where the PCI EIDE controller design was out of their control, even though these machines bear the "Intel Inside" logo.
Intel does not make this distinction clear in their literature.
According to Intel, "This problem is a consequence of the RZ-1000’s inability to fully compensate for all the implications of running an IDE hard disk as an extension of the PCI bus, instead of running as an extension of the AT bus which it was originally designed to do."
Intel would have us believe the problems are flaws per se, but rather a limitation that the programmers forgot to take into consideration.
The truth is grey. UART (Universal Asynchronous Receiver/Transmitter) chips have similar flaws. Programmers have gradually learned to code around them. We don’t insist that all COM (Component Object Model) port hardware be recalled. We now tend to blame a programmer if he does not bypass the known UART flaws.
Given that software work-arounds are now possible, the primary blame shifts for any perpetuation of the problem to the software authors.
However, there are many other EIDE chip designs that do not have this limitation. Since the chip are supposedly generic implementations of the ATA (Advanced Technology Attachment) interface standard, I cannot so lightly excuse these flaws.
Now that the OS/2 fixes are out, the pressure to set things right will dwindle. Since DOS, Windows in 16-bit mode, Windows-95 are immune, little pressure to correct the problem is likely to come from those camps.
The motherboard manufacturer has five options:
Intel has already set the precedent by offering to replace defective Pentiums, even though software can bypass its divide flaw. The RZ-1000 flaws are far more serious and the CMD-640 flaws are even more serious still.
Keeping this under wraps is going to be hard for the clone builders. Brooke Crothers of Infoworld did several stories based on my compilations. I have been in contact with Jerry Pournelle of Byte. I sent email to John Dvorak. Even Dean Takahashi of the San Jose Mercury Daily News did story. In the 1995-11 editions, a 1000-word abridged version of this essay appeared across Canada in The Computer Paper and Toronto Computes. The stonewall is coming tumbling down. As one individual pointed out, I read your postings on the Internet, and see them the next day quoted in my daily newspaper.
IBM Confirmed the RZ-1000 has two different flaws:
IBM confirmed the CMD-640 has five different flaws:
I have read about ten conflicting explanations from authorities on the cause of the problems. Much of the confusion comes because there are so many different flaws — all generating similar symptoms. I based the following explanations on postings from Sam Detweiler of IBM ’s Warp Device Driver section (firstname.lastname@example.org).
The RZ-1000 and CMD-640 both have the prefetch flaw and the floppy status flaw. The CMD-640 has three additional flaws. I will focus on the three most important.
Data moves from the hard disk to RAM (Random Access Memory) via a bit bucket brigade. The RZ-1000 grabs data 16 bits at a time from a buffer in the integrated controller in the hard disk and hands it off 32 bits at a time off to the PCI bus. The CPU sits in a tight loop grabbing data from PCI bus and storing it in RAM. In prefetch mode, the RZ-1000 keeps ahead of the CPU, requesting two 16-bit chunks from the hard disk, in order to have a 32-bit chunk ready when the CPU asks.
When you disable the prefetch buffer, you turn off the parallelism and run in a degraded lock-step mode. In degraded mode, the RZ-1000 waits until the CPU asks for a 32-bit chunk. Then it puts the CPU on hold while it asks the hard disk for two 16-bit chunks. It glues them together and puts them on the PCI bus and allows the CPU to continue.
I advise all but the most dedicated technophiles to skip the next paragraph.
If the RZ-1000 is running with prefetch enabled, it erroneously considers a sector read complete as soon as it has grabbed the last 16 bits from the hard disk and stuffed it into the prefetch FIFO (First In First Out) buffer. It should not consider it complete until the CPU has stuffed all the data into RAM. The RZ-1000 then starts to read the next sector. If the current read operation is interrupted, or delayed by simultaneous DMA from some unrelated device, before the last two bytes are read from the FIFO and the next sector is prefetched into the FIFO before the current data transfer completes, then the chip will erroneously signal yet another Data Available Interrupt. Because OS/2 has already signalled EOI (End Of Interrupt) (End Of Interrupt) to the PIC (Programmable Interrupt Controller) (Programmable Interrupt Controller) and enabled interrupts, it recurses into the disk driver interrupt handler. The driver then reads the status register. Unfortunately because of a cheap design shortcut, the FIFO is used both for data and status. The CPU reads the data in front of the status as if it were the status. This causes the interrupted data transfer to later read the following status as if it were data, resulting in corruption. Both the RZ-1000 and CMD-640 fail in exactly the same way.
There are two software techniques to bypass this flaw:
This flaw is the result of an incredible chain of blunders.
The original MFM (Modified Frequency Modulation) (the predecessor to IDE) interface design blunder was using different bits of the same I/O port, 3F7, for two unrelated purposes, detecting the floppy changeline and reporting hard disk status. Modern EIDE controllers are no longer supposed to do this, but some chips carry on in the old tradition and provide legacy logic. Motherboard manufacturers then often blunder by attaching the floppy changeline to the EIDE controller. This way both the EIDE controller and the floppy controller think they are in charge of reporting floppy changeline status. On top of that, the designers of both the RZ-1000 and CMD-640 chips both blundered by trying to save a little silicon by using the same registers to store both hard disk status and data.
For the insatiably curious here is precisely how the corruption occurs. Simultaneously I/Os to both the hard disk are floppy disk are running. The floppy controller generates an I/O complete interrupt. The floppy driver then check the floppy status. Part of reading floppy status is checking the changeline bit — contained in the ambiguous port 3F7.
If the motherboard manufacturer goofed and hooked up the floppy changeline to the EIDE controller, the RZ-1000 erroneously responds to the floppy status request. It is in charge of the hard disk, not the floppy. It is the floppy controller’s job is to respond. The RZ-1000 feeds two data bytes from its FIFO out as floppy status. These data were was supposed to go to the hard disk driver. Thus the chip loses two bytes from the hard disk transfer, corrupting data. Turning off prefetch also solves this problem. Unlike the first flaw, only simultaneous floppy I/O start can trigger this problem. Simultaneous I/O of any kind can trigger the first flaw.
Simultaneous I/O speed is the reason we put two EIDE devices on separate channels, both as masters, rather than making one a master and one a slave on the same channel.
IBM has a bypass for this blunder. When it detects a CMD-640, Warp never schedules more than one I/O at a time when the CMD-640 is active, reducing the operating system to DOS-like performance. Independent experiments show the degradation from using the CMD fix is 15 to 50%.
There are six kinds of I/O used in PCs (Personal Computers).
In a true multi-tasking system, such as OS/2, the CPU goes off and works on behalf of applications when the port is busy and trusts an interrupt to bring it back when the device needs more service. It schedules several I/Os simultaneously. In contrast, DOS and Windows never do more than one I/O at a time. Further, under DOS/Windows the CPU idles while waiting for its single I/O to complete rather than working on applications.
If you don’t want to install the entire Fixpack 10, you can install these Warp bypasses for the RZ-1000 and the CMD flaws. Warning. This file has been updated several times without changing the name. Make sure you get the most recent. The installation instructions are tricky. Follow them carefully.CMD fixes for various operating systems CMD-640 chip. Expand with PkUnZip -d 640X_USR.403
Information on the Premiere/PCI II motherboard, commonly referred to as 'Plato' can be obtained from Intel’s Faxback Service at 800-525-3019 or 503-264-6835 in the US or +44(0)1793-496646 in the UK. Press option 2 for "components, boards, platforms and tools for OEMs (Original Equipment Manufacturers) and developers" and follow the prompts. Request a 'SYSTEMs’ catalog. From this, you can reference documents and their associated FAXBACK document number.
You can upgrade to the latest BIOS version to see if that resolves your motherboard issues.
You may also want to call the Intel Technical Support line at 1-800-628-8686 forhelp with your processor issues.
This page is posted
Optional Replicator mirror
Your face IP:[188.8.131.52]
You are visitor number|