![]() |
Patrick Bossert & Mike Echlin discuss RTC's |
Draft - Not for Quotation
[PB] - Patrick Bossert
[ME] - Mike Echlin, President
[ME] About your Crouch Echlin page:
They [C-E] argue that PCs do not take more than 244us to read, but a BIOS patch which introduces extra instructions may result in this not being the case. It is a quite plausible failure, until you do a few calculations. We figure that on a 20Mhz PC (v.slow) you would need to introduce more than 500 instructions to cause the problem. - from Patrick's Crouch-Echlin Effect page.
[ME] You say "It is a quite plausible failure, until you do a few calculations.
"
Most people who do speed calculations and speed timing on PC's for RTC-access do so after
boot. But the time/date is read from the RTC during POST, which as you know has the
computer in a much different state than after BOOT:
[PB] You are absolutely correct in this matter - we used 20MHz for our calculations,
whereas the clock rate we should have been working from is clearly 8MHz. This means that
only 200 instructions or thereabouts could make a very big difference to the stability of
the clock data being read. It also means that the real machine speed is almost immaterial.
I have to agree with you.
[PB] The C-E effect can only appear if the extra instructions are executed in between
certain RTC registers being read. If the extra calculation is performed at the end of the
register read cycle then it is perfectly safe. My area of expertise is in Embedded
systems, where I have designed a number of
different industrial process and security controllers, and I have never seen a case of
embedded systems code where C-E effect could occur, as the registers are always read
together. I have also seen code excerpts from a large number of systems tested with the
Delta-T Probe and they confirm this view. I have used the following timing schematic (your
email viewer needs to display this in a proportional font like Courier New for the bits to
line up) to illustrate the problem:
<-----244us----------->
------------XXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXX 'OK' flag
-----------------------------------XXXXXXXXXX Data
Reads: ^ S M H D M Y W
= All data is read OK
[PB] The question then is: where are the extra instructions added? We have not seen code
from one single embedded system which does not read RTC registers consecutively and only
then do any calculations on the date values. Most RTCs store their registers in
consecutive locations, and any code written to read them will usually read the smallest
(i.e. seconds or tenths of seconds) first. I have seen a number of bits of code which read
a register value from the RTC, and only go on to read the next register if the value has
reached wrap-around point. e.g. if seconds reach 00 then the code reads minutes etc. The
following scenario based on the date window being calculated in the middle of the RTC
register read sequence is very unlikely in my opinion, due to the order in which the
registers are read (even for a PC BIOS), but would clearly cause the C-E effect to occur.
<-----244us-----------
------------XXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXX 'OK' flag
-----------------------------------XXXXXXXXXX Data
Reads: ^ D M Y <extra code S M H W
= Hour and day of Week may be wrongly read
[PB] This is not really applicable to embedded systems as they tend to be rather more
simple in their peripherals, but could the extra code section on a PC be due to an NMI or
DMA being serviced? (I guess the BIOS would disable IRQs during the RTC access) as an NMI
or DMA will interrupt the clock reading process and return at a later time (assuming a
long bit of activity) when the clock may well be unstable:
<-----244us-----------
------------XXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXX 'OK' flag
-----------------------------------XXXXXXXXXX Data
Reads: ^ S M <DMA from HardDisk H D M Y W
= Hour and day of Week may be wrongly read
[PB] This would, of course, mean that the problem would be equally likely to occur both
pre- and post-millennium. I suspect it is a combination of both effects that may be the
real reason for C-E effect.
[ME] Yes newer computers are faster then older ones, this makes the occurance of the
effect less often, but does not eliminate it in newer computers. (we had original thought
as you do, faster machines will be fast enough that the effect won't happen, and we
calculated that the cut off would be 66 mhz, but we have seen, and have had numerous
reports of machines of much faster speeds showing this effect.)
[PB] I agree that the speed is actually not a significant factor, certainly where the BIOS
ROM sits on an ISA bus.
[ME] (Besides a nonbuffered rtc the other ingrediant needed is the accessing software or
firmware has to have some type of error that allows it to access the rtc while the data is
bad. Compaq has identified and confirmed 1 type of problem that can cause this, a bios
that ignores the UIP bit. This may be many problems that allow the same symptoms to
happen.)
[PB] Ignoring the UIP bit is clearly just bad programming. You would expect a PC BIOS
which does this to have the occasional glitch in reading the clock on start-up. This makes
the problem as likely to occur both pre and post-millennium.
You state "Embedded systems use a variety of different real time clock chip types
which lessens the possible problem, and are in almost all cases unlikely to be affected
for the above reason." If the rtc is non-buffered, (and most are not), there is
a chance the machine will be affected on random access of the rtc. This is not just
a PC problem, our research has mainly been on Ps for two reasons, ease of testing, and
availablity. But we have reports and test results from other architechtures.
[PB] I agree that any non-buffered RTC may give rise to the same problem, but the way
embedded systems are programmed to use their RTCs has, in all cases that we have seen to
date, avoided the problem by implementing date windowing code after all the clock
registers have been read. We have been able to see this by using the Delta-T Probe to
capture the code using the RTC register access as a trigger. We have even looked at a few
(very) old BIOS routines in early PCs to see how they read the RTC, and we found that they
always read all the clock registers together.
[PB] Another factor is that a significant percentage of embedded systems use serial RTCs,
which by definition produce stable data.
[ME] You also say, " Most embedded systems are rarely, if ever, switched off.
" This is largely true of the manufacturing world yes. But not all embeds are
in manufacturing. Medical embeds are switched on and off all the time. the embeds in
automobiles (and some car companies use 286s for their cars) are switched on each time you
start the car. the embeds I design at my "real" job are for data aquisition and
analysis in nuclear plants and nuclear facilities. They are also switched on and off
daily, even multiple times daily. These are just some examples of embeds that are not on
continuously.
[PB] This is true enough, and when you look closely you find that most embedded systems do
not just read the RTC on start-up like a PC, they poll it all the time, or get the time on
the basis of an interrupt which notifies the processor that the time has changed. Assuming
the code was written in such a manner as to be unstable as a result of extra date
windowing code being executed, any glitches would be momentary at best, and not persist.
My experience has been that date windows are never implemented mid-way through the
register reading process so the problem never surfaces. There is clearly an application of
the Delta-T Probe to look at the code to verify this if it is cause for concern in a
particular system.
[ME] Yes we will see this effect in embeds, maybe even more then in desktop machines,
because embeds are generally based on tried and true designs, and they have a longer life
span then desktops do. these 2 reasons combined assure us that there will be a lot more
embeds out there that are older, slower and more likely to have a non-buffered rtc, and so
more likely to be affected, and more likely to show the affects more often.
[PB] We have yet to see a single instance of the C-E effect occurring in embedded
systems, but I cannot rule out the fact that it is possible for the above reasons. If
something is so critical that momentary clock variations may halt a process, then with the
Delta-T Probe we have the tool to test it and find out whether it is an issue.
[ME] I hope this helps clear our postition. Please comment on this as I am sure you can
help us understand this issue. - Mike.
[ME] The Crouch Echlin
effect, detailed The Crouch Echlin effect is a random jump in date and time, that
occurs at random boots of affected computers and embedded systems only after rollover to
year 2000. It can also be accompanied by a loss of some hardware and CMOS settings. The
systems that show the Crouch Echlin effect have one thing in common, a real time clock
(RTC)/CMOS that, if accessed during the once a second update cycle, gives bad data to the
BIOS POST date/time routine
Our testing has shown 4 things:
1: The effect only happens post rollover to Y2k. This was shown by the repeated pre/post 2 week cycles of testing.
2: The effect is characterized by random time/date jumps occuring only at startup of the computer.
Further investigation showed that the BIOS code that reads these chips has three paths to follow depending on the value of the Century Byte stored at register 0x32 of the CMOS memory.
If the value is anything but BCD19 or BCD20, that is an error value, and then the BIOS date is set to an error value such as "01.03.1980"
If the value is BCD19, the code follows one logic path, and if it is BCD20 it follows a different path.
This difference in logic path between BCD19 and BCD20 is the only difference in the code that the computer will follow in the whole start up code if all other things are not changed (such as was done in our testing). This points directly to this change in logic path as the only possible difference that would allow this effect to happen.
It is my theory that this difference in logic path changes the amount of time used to read the RTC enough to allow it to still be reading the RTC while it is in its update mode, and if the RTC is not buffered, then the value being read from the RTC at that time is not reliable, and can cause the effect to occur.
3: The one thing in common with these computers is a non-buffered Real Time Clock. (By observation, and correlation of those observations)
We discovered that just before and while the RTC is changing from one second to the next it displays a flag to tell the user "do not read from me now."
During our testing a pattern came to light. Those computers that showed no errors from the RTC during the time that the update flag was set to high also did not show any signs of the effect.
We learned that these RTCs have a double register buffer, allowing them to update their time internally, while not showing any of the errors that the non-buffered RTCs showed while updating.
4: When the time/date jumps and is wrong at the OS clock level, the RTC still has the correct time. (By analysis of machines when they were affected by the effect.)
Copyright 1988-2012 Richard Collins, All Rights Reserved