Y2K-Status.Org - A Comprehensive Resource for Year 2000 Status Patrick Bossert & Mike Echlin discuss RTC's

Draft - Not for Quotation

[PB]  - Patrick Bossert

[ME] -  Mike Echlin, President

[ME] About your Crouch Echlin page:

They [C-E] argue that PCs do not take more than 244us to read, but a BIOS patch which introduces extra instructions may result in this not being the case. It is a quite plausible failure, until you do a few calculations. We figure that on a 20Mhz PC (v.slow) you would need to introduce more than 500 instructions to cause the problem. - from Patrick's Crouch-Echlin Effect page.


[ME] You say "It is a quite plausible failure, until you do a few calculations. "

Most people who do speed calculations and speed timing on PC's for RTC-access do so after boot.  But the time/date is read from the RTC during POST, which as you know has the computer in a much different state than after BOOT:

[PB] You are absolutely correct in this matter - we used 20MHz for our calculations, whereas the clock rate we should have been working from is clearly 8MHz. This means that only 200 instructions or thereabouts could make a very big difference to the stability of the clock data being read. It also means that the real machine speed is almost immaterial. I have to agree with you.

[PB] The C-E effect can only appear if the extra instructions are executed in between certain RTC registers being read. If the extra calculation is performed at the end of the register read cycle then it is perfectly safe. My area of expertise is in Embedded systems, where I have designed a number of
different industrial process and security controllers, and I have never seen a case of embedded systems code where C-E effect could occur, as the registers are always read together. I have also seen code excerpts from a large number of systems tested with the Delta-T Probe and they confirm this view. I have used the following timing schematic (your email viewer needs to display this in a proportional font like Courier New for the bits to line up) to illustrate the problem:

            <-----244us----------->
------------XXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXX 'OK' flag
-----------------------------------XXXXXXXXXX Data
Reads: ^ S  M  H  D  M  Y  W
= All data is read OK

[PB] The question then is: where are the extra instructions added? We have not seen code from one single embedded system which does not read RTC registers consecutively and only then do any calculations on the date values. Most RTCs store their registers in consecutive locations, and any code written to read them will usually read the smallest (i.e. seconds or tenths of seconds) first. I have seen a number of bits of code which read a register value from the RTC, and only go on to read the next register if the value has reached wrap-around point. e.g. if seconds reach 00 then the code reads minutes etc. The following scenario based on the date window being calculated in the middle of the RTC register read sequence is very unlikely in my opinion, due to the order in which the registers are read (even for a PC BIOS), but would clearly cause the C-E effect to occur.

            <-----244us-----------
------------XXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXX 'OK' flag
-----------------------------------XXXXXXXXXX Data
Reads: ^ D  M  Y <extra code S  M  H  W
= Hour and day of Week may be wrongly read

[PB] This is not really applicable to embedded systems as they tend to be rather more simple in their peripherals, but could the extra code section on a PC be due to an NMI or DMA being serviced? (I guess the BIOS would disable IRQs during the RTC access) as an NMI or DMA will interrupt the clock reading process and return at a later time (assuming a long bit of activity) when the clock may well be unstable:

            <-----244us-----------
------------XXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXX 'OK' flag
-----------------------------------XXXXXXXXXX Data
Reads: ^ S  M <DMA from HardDisk H  D  M  Y  W
= Hour and day of Week may be wrongly read

[PB] This would, of course, mean that the problem would be equally likely to occur both pre- and post-millennium. I suspect it is a combination of both effects that may be the real reason for C-E effect.

[ME] Yes newer computers are faster then older ones, this makes the occurance of the effect less often, but does not eliminate it in newer computers. (we had original thought as you do, faster machines will be fast enough that the effect won't happen, and we calculated that the cut off would be 66 mhz, but we have seen, and have had numerous reports of machines of much faster speeds showing this effect.)

[PB] I agree that the speed is actually not a significant factor, certainly where the BIOS ROM sits on an ISA bus.

[ME] (Besides a nonbuffered rtc the other ingrediant needed is the accessing software or firmware has to have some type of error that allows it to access the rtc while the data is bad. Compaq has identified and confirmed 1 type of problem that can cause this, a bios that ignores the UIP bit. This may be many problems that allow the same symptoms to happen.)

[PB] Ignoring the UIP bit is clearly just bad programming. You would expect a PC BIOS which does this to have the occasional glitch in reading the clock on start-up. This makes the problem as likely to occur both pre and post-millennium.

You state "Embedded systems use a variety of different real time clock chip types which lessens the possible problem, and are in almost all cases unlikely to be affected for the above reason."  If the rtc is non-buffered, (and most are not), there is a chance the machine will be affected on random access of the rtc.  This is not just a PC problem, our research has mainly been on Ps for two reasons, ease of testing, and availablity. But we have reports and test results from other architechtures.

[PB] I agree that any non-buffered RTC may give rise to the same problem, but the way embedded systems are programmed to use their RTCs has, in all cases that we have seen to date, avoided the problem by implementing date windowing code after all the clock registers have been read. We have been able to see this by using the Delta-T Probe to capture the code using the RTC register access as a trigger. We have even looked at a few (very) old BIOS routines in early PCs to see how they read the RTC, and we found that they always read all the clock registers together.

[PB] Another factor is that a significant percentage of embedded systems use serial RTCs, which by definition produce stable data.

[ME] You also say, " Most embedded systems are rarely, if ever, switched off. "  This is largely true of the manufacturing world yes. But not all embeds are in manufacturing. Medical embeds are switched on and off all the time. the embeds in automobiles (and some car companies use 286s for their cars) are switched on each time you start the car. the embeds I design at my "real" job are for data aquisition and analysis in nuclear plants and nuclear facilities. They are also switched on and off daily, even multiple times daily. These are just some examples of embeds that are not on continuously.

[PB] This is true enough, and when you look closely you find that most embedded systems do not just read the RTC on start-up like a PC, they poll it all the time, or get the time on the basis of an interrupt which notifies the processor that the time has changed. Assuming the code was written in such a manner as to be unstable as a result of extra date windowing code being executed, any glitches would be momentary at best, and not persist. My experience has been that date windows are never implemented mid-way through the register reading process so the problem never surfaces. There is clearly an application of the Delta-T Probe to look at the code to verify this if it is cause for concern in a particular system.

[ME] Yes we will see this effect in embeds, maybe even more then in desktop machines, because embeds are generally based on tried and true designs, and they have a longer life span then desktops do. these 2 reasons combined assure us that there will be a lot more embeds out there that are older, slower and more likely to have a non-buffered rtc, and so more likely to be affected, and more likely to show the affects more often. 

[PB] We have yet to see a single instance of the C-E effect occurring in embedded systems, but I cannot rule out the fact that it is possible for the above reasons. If something is so critical that momentary clock variations may halt a process, then with the Delta-T Probe we have the tool to test it and find out whether it is an issue.

[ME] I hope this helps clear our postition. Please comment on this as I am sure you can help us understand this issue. - Mike.



[ME] The Crouch Echlin effect, detailed The Crouch Echlin effect is a random jump in date and time, that occurs at random boots of affected computers and embedded systems only after rollover to year 2000. It can also be accompanied by a loss of some hardware and CMOS settings. The systems that show the Crouch Echlin effect have one thing in common, a real time clock (RTC)/CMOS that, if accessed during the once a second update cycle, gives bad data to the BIOS POST date/time routine

Our testing has shown 4 things:

1: The effect only happens post rollover to Y2k. This was shown by the repeated pre/post 2 week cycles of testing.

2: The effect is characterized by random time/date jumps occuring only at startup of the computer.

Further investigation showed that the BIOS code that reads these chips has three paths to follow depending on the value of the Century Byte stored at register 0x32 of the CMOS memory.

If the value is anything but BCD19 or BCD20, that is an error value, and then the BIOS date is set to an error value such as "01.03.1980"

If the value is BCD19, the code follows one logic path, and if it is BCD20 it follows a different path.

This difference in logic path between BCD19 and BCD20 is the only difference in the code that the computer will follow in the whole start up code if all other things are not changed (such as was done in our testing). This points directly to this change in logic path as the only possible difference that would allow this effect to happen.

It is my theory that this difference in logic path changes the amount of time used to read the RTC enough to allow it to still be reading the RTC while it is in its update mode, and if the RTC is not buffered, then the value being read from the RTC at that time is not reliable, and can cause the effect to occur.

3: The one thing in common with these computers is a non-buffered Real Time Clock. (By observation, and correlation of those observations)

We discovered that just before and while the RTC is changing from one second to the next it displays a flag to tell the user "do not read from me now."

During our testing a pattern came to light. Those computers that showed no errors from the RTC during the time that the update flag was set to high also did not show any signs of the effect.

We learned that these RTCs have a double register buffer, allowing them to update their time internally, while not showing any of the errors that the non-buffered RTCs showed while updating.

4: When the time/date jumps and is wrong at the OS clock level, the RTC still has the correct time. (By analysis of machines when they were affected by the effect.)


Copyright 1988-2012  Richard Collins, All Rights Reserved