Mansoor Says That Dream Is Still A Good 20 Years Away Because It Depends On Better Data

Mansoor says that dream is still a good 20 years away because it depends
on better data, a reliable communications network and computer programs

capable of making decisions based on the data.
as consumers buy more computers, air conditioners and rechargeable
handhelds.
In the ten years since August 2003, the North American power industry has
invested billions of dollars upgrading computer systems, training control
room staff, and cutting down vegetation along power lines.
Reliability standards which were once voluntary have become mandatory.
New synchrophasors are being rolled that will provide updates about the
state of the grid several times a second, giving control room staff more
situational awareness.
Operators must assess the worst case scenario, usually the loss of the
largest generator or transmission line on the system, and plan how to meet
it (the “N-1 criterion”). If it happens, they must be able to bring the system
back into a safe operating condition within no more than 30 minutes, and
start planning to meet the next worst-case scenario.
To cope with emergencies, area controllers can order more generation to
come on line or seek help from neighbouring areas by requesting
transmission loading relief.
Grid managers can cut power supplies to customers with interruptible
power supplies and request voluntary conservation. But if all else fails,
controllers are expected to start disconnecting blocks of customers to
protect the rest. From a reliability perspective, it is better for a few
customers to suffer a power cut than risk a cascading power failure across
n its final report on the causes of the blackout, the U.S.-Canada Power
System Outage Task Force identified poor vegetation management,
computer failures, inadequate training and lack of real-time situational
awareness of grid conditions as the main factors behind the disaster.
First Energy was harshly criticised, but the task force identified institutional
failures across the industry, particularly in setting and enforcing reliability
standards, and coordinating across the grid. No fewer than 46
recommendations were made to prevent the blackout recurring (“Final
Report on the August 14, 2003 Blackout” April 2004).
the network
In the end, it turned out that a computer snafu actually played a significant
role in the cascading blackout - though it had nothing to do with viruses or
cyber terrorists. A silent failure of the alarm function in FirstEnergy's
computerized Energy Management System (EMS) is listed in the final
report as one of the direct causes of a blackout that eventually cut off
electricity to 50 million people in eight states and Canada.
"Without a functioning alarm system, the [FirstEnergy] control area
operators failed to detect the tripping of electrical facilities essential to
maintain the security of their control area," reads the report. "Unaware of
the loss of alarms and a limited EMS, they made no alternate
arrangements to monitor the system."
With the FirstEnergy control room blind to events, operators failed to take
actions that
"We spent a considerable amount of time analyzing that, trying to

understand if it was a software problem, or if - like some had speculated -
something different had happened."
A backup server kicked-in, but it also failed. By the time FirstEnergy
operators figured out what was going on and restarted the necessary
systems, hours had passed, and it was too late.
The company did everything it could, says Unum. "We text exhaustively,
we test with third parties, and we had in excess of three million online
operational hours in which nothing had ever exercised that bug," says
Unum. "I'm not sure that more testing would have revealed that.
Unfortunately, that's kind of the nature of software... you may never find the
problem. I don't think that's unique to control systems or any particular
vendor software."
"Code is so complex, that there are always going to be some things that,
no matter how hard you test, you're not going to catch," he says. "If we see
a system that's behaving abnormally well, we should probably be
suspicious, rather than assuming that it's behaving abnormally well."
But Peter Neumann, principal scientist at SRI International and moderator
of the Risks Digest, says that the root problem is that makers of critical
systems aren't availing themselves of a large body of academic research
into how to make software bulletproof.
"We keep having these things happen again and again, and we're not
learning from our mistakes," says Neumann. "There are many possible
problems that can cause massive failures, but they require a certain
discipline in the development of software, and in its operation and
administration, that we don't seem to find. ... If you go way back to the
AT&T collapse of 1990, that was a little software flaw that propagated
across the AT&T network. If you go ten years before that you have the
ARPAnet collapse.
Whether it's a race condition, or a bug in a recovery process as in the
AT&T case, there's this idea that you can build things that need to be totally
robust without really thinking through the design and implementation and all
of the things that might go wrong," Neumann says.
Among the recommendations, the task force says cyber security standards
established by the North America Electric Reliability Council, the industry
group responsible for keeping electricity flowing, should be vigorously
enforced. Joe Weiss, a control system cyber security consultant at KEMA,
and one of the authors of the NERC standards, says that's a good start.
""The NERC cyber security standards are very basic standards," says
Weiss. "They provide a minimum basis for due diligence."
But so far, it seems software failure has had more of an effect on the power
grid than computer intrusion. Nevertheless, both Weiss and EPRI's Kropp
believe that the final report is right to place more emphasis on
cybersecurity than software reliability. "You don't try to look for something
that's going to occur very, very, very infrequently," says Weiss. "Essentially,
a blackout like this was something like that. There are other issues that are
higher probability that need to be addressed."
The causes of the blackout described here did not result from inanimate
events, such as “the alarm processor failed” or “a tree contacted a power
line.” Rather, the causes of the blackout were rooted in deficiencies
resulting from decisions, actions, and the failure to act of the individuals,
groups, and organizations involved. These causes were preventable prior
to August 14 and are correctable. Simply put — blaming a tree for
contacting a line serves no useful purpose. The responsibility lies with the
organizations and persons charged with establishing and implementing an
effective vegetation management program to maintain safe clearances
between vegetation and energized conductors.

Mansoor Says That Dream Is Still A Good 20 Years Away Because It Depends On Better Data

Diunggah oleh

Informasi Dokumen

Judul Asli

Hak Cipta

Format Tersedia

Bagikan dokumen Ini

Bagikan atau Tanam Dokumen

Opsi Berbagi

Apakah menurut Anda dokumen ini bermanfaat?

Apakah konten ini tidak pantas?

Hak Cipta:

Format Tersedia

Mansoor Says That Dream Is Still A Good 20 Years Away Because It Depends On Better Data

Diunggah oleh

Hak Cipta:

Format Tersedia

Mansoor says that dream is still a good 20 years away because it depends

on better data, a reliable communications network and computer programs

"We spent a considerable amount of time analyzing that, trying to

Anda mungkin juga menyukai