I recently bought a top-of-the-line clock radio to replace the aging one I’d
had for years. When I unpacked it, I began looking for the slot to insert a
battery for when a power outage cuts the power to the clock. To my surprise,
there was no such slot. Looking over the instruction manual that came with
the clock I learned that my new clock radio had a ride-through memory that
would last up to one minute in the “unlikely” event of a power
outage.

This unexpected find raised a number of questions in my inquiring engineering
mind. Why only one minute? Why no battery that could keep my clock accurate
for hours? Could I rely on this design to ensure I wouldn’t miss important
business meetings?

After a little thought, I remembered the work I had done years earlier on outage
durations. Indeed, the vast majority of outages to the end customer are momentary.
Perhaps the designers of the radio got it right. They knew that data and designed
to it. Or perhaps it was just cheaper to put in a one-minute circuit instead
of a battery slot.

The experience of examining my new clock radio highlighted a point I have frequently
told people about our electric system. A reliable electricity supply has become
an expectation. We expect our clock radio will wake us because we expect that
the electricity that enables it will be uninterrupted.

Indeed, the electrical system of generation, transmission, and distribution
in the United States is actually quite reliable. Built
up over the decades since early in the last century, it is one of the enduring
elements of the New Deal. It is the foundation for commerce and everyday living.
Economists look at energy usage as a proxy for the degree of development of
a society. We expect energy reliability issues in Third World countries, but
not in the US.

This reality makes the blackout of August 2003 ever more remarkable and startling.
Within hours of the event, politicos and pundits were on TV speculating about
the cause. The blackout was the result of an aging and failing grid. Deregulation
was at fault. A computer worm had invaded the control computers. It was an
al Qaeda cyber attack. It would take billions of dollars of investment to fix.
The grid needed to be rebuilt.

We now have the luxury of time and good analysis in regard to the blackout.
The interim report on causes of the blackout by the US-Canada Power System
Outage Task Force outlined three key causes:

  • Inadequate situational analysis;
  • Inadequate reliability coordination diagnostic support; and
  • Inadequate tree trimming.

Not to over-simplify, but the report says trees downed three key transmission
paths. Some of the computer systems designed to detect this were also
down. Thus the operators responsible to affect remedial action were in
the dark
(so to speak); they had ineffective processes and information to isolate
the problem
before a cascade of outages required the failsafe process of system
protection (i.e., blackout) to take over.

So which grid failed? The electric grid? Was it the human grid of decision
and control? Or, was it the computer grid controlling the electrical
grid? Of course, it was all three. And all three require rebuilding
if additional
events of this nature are to be prevented. The good news is that
the actions required to rebuild all three are known.

The Transmission Infrastructure

The facts in regard to the need to invest in rebuilding the electric
transmission grid are uncontestable. The infrastructure of transmission
has endured
a dramatic level of under-investment for years if not decades.
Energy demand continues
to grow, and the introduction of competitive electric markets
has exasperated the situation by placing demands on the transmission
network never
envisioned while it was being designed and built. This fact is
not only known, it
has
been known for some time and is well documented. Consider the
following finding in the 1998 (yes 1998) NERC Reliability Assessment:

Very few bulk transmission additions planned. Only 6,588 miles
of new transmission (230 kV and above) are planned throughout North America
over the
next 10 years. This is significantly lower than the additions that had been
planned five years ago. The majority of the proposed transmission projects
are for local system support. As the demand on the transmission system continues
to rise, the ability to deliver remote resources to load centers will deteriorate.
New transmission limitations will appear in different and unexpected locations
as the generation patterns shift to accommodate market-driven energy transactions
and new independent generators. Delivering energy to deficient areas may become
more difficult.”

The lack of investment in the transmission infrastructure has as its root cause
the uncertainty in how the cost of such investment will be recovered. The process
of deregulation has caused such uncertainty. It will take tens of billions
of dollars of investment to replace and expand the network required. That much
money does not come easily if investors don’t see how they are going
to get their investment back.

And we know what to do. Every ISO/RTO has a grid planning function, and all
have completed the analysis of the issues in their jurisdiction and determined
what should be done. But who will make this investment, and how to pay for
it, is the issue.

The Human Grid

The term “grid” is commonly understood to mean the thousands of
miles of transmission infrastructure that connects generation with load. But
an equally important grid is the system of organizational and human processes
that coordinate and control the transmission grid.

Prior to deregulation, the process of control of transmission was undertaken
by utilities via their control centers. The process and protocols of this control
were governed by the 10 coordinating councils under the auspices of NERC. Extremely
detailed and well-understood sets of procedures and standards were developed
and monitored by NERC.

Under deregulation, in the jurisdictions where ISO/RTOs were implemented, the
ISO’s took over the role of grid control and security coordination. These
ISOs assumed the coordination role under the procedures of NERC. Deregulation
also added to the role of grid control the function of operating competitive
wholesale energy markets. ISO/RTOs implemented forward energy markets, real-time
balancing markets, and other support markets for ancillary services required
to provide appropriate reserve to the grid operators.

The ISO/RTO role in grid management evolved from different points depending
on the historical operating construct of the region. In some cases, the ISO
simply took over an already running tight energy pool operation. In other cases,
ISOs were established through the consolidation of multiple control areas into
one control area. In each case, the operational elements of the grid were expanded
or more tightly formalized under the ISO structure. But the basic process of
control area control changed little, albeit it was performed over ever-larger
geographical and operational areas.

The market side of the ISO operation was new. Although there was a general
framework that was consistent among the ISOs, each ISO/RTO built a market structure
that was different from their neighbors. In some cases the fundamental underpinnings
of the market design were radically different. This fact has led to a number
of issues primarily economic in nature. The so-called “seams issue” is
currently an important issue among various RTO/ISOs and FERC, who regulates
the various ISO markets. Indeed, FERC’s Standard Market Design (SMD)
was an attempt to standardize market design across ISO/RTOs in an effort to
address the seams issue, among other things. Of course, SMD has been stuck
in political quagmire for years now, and the likelihood of the full vision
of SMD being implemented soon is remote.

A number of ISO/RTOs have been working relatively collaboratively in the recent
past to solve the “seams issue” with technology. The notion in
play was to standardize and share market information in real time across ISO/RTO
market boundaries. Some functions envisioned were as aggressive as creating
a common market portal that could tie multiple divergent market transactions
across multiple market constructs into a single transaction (see Figure 1).

Interestingly, prior to the blackout, the idea to share real-time operational
information across the ISO/RTO seams also was introduced. The blackout gave
rise to a heightened interest in this notion as a way to more effectively counter
a number of the issues that were seen as contributing to the scope of the blackout.
Prior to this, the seams issue had been for the most part viewed as only a
market issue. The operational implications of market transactions are simply
handled through the interchange scheduling process completed the day before
in real time as it has been for decades.

However, the reality is that today’s markets and operations are much
more dynamically linked than the operating processes of ISOs reflect. Just
as the seams issue is a market issue, it is also an operations issue. And just
as market participants need cross-ISO/RTO market information, the ISO operators
need cross-ISO/RTO real-time operations information to effectively run the
grid.

This point was driven home to me during a recent meeting at an ISO. The topic
of discussion was the development of a Rosetta Stone to translate terms used
in operations into terms used in markets. Such a translation was needed because
the physical model of the network did not correspond to the commercial model
of the network. Busses, etc. that were singular in the real world were consolidated
in the commercial world and, interestingly, vice versa.

Understanding that operations are dynamically linked to the market requires
that the operations view of the big picture in real time has to be refined
and expanded significantly if the grid is going to work effectively and reliably
in the future.

The Computer Grid

The third grid that needs to be rebuilt is the grid of computers and telemetry
that controls today’s electrical system. During the blackout, a number
of these systems either failed or were unavailable.

As ISO/RTOs have been established, ISOs relied to a large extent on the embedded
infrastructure of control. Although nearly all have implemented new systems
for the ISO/RTO itself, the technology in the field that is performing the
monitoring is largely what was in place before deregulation.

This inherited control infrastructure suffers from the same issues as the transmission
infrastructure it controls. It was designed to monitor and control an infrastructure
not designed with deregulation in mind. With the lack of investment in transmission
infrastructure already outlined, a parallel under-investment in control systems
has occurred. And the sophistication of computer control technology has revolutionized
itself many times over in the decade or more since these transmission systems
were put into service.

With much of the computer control infrastructure aging and being technologically
obsolete by any current standard, the demand for ever-more real-time information
to both the market participants and operators has become a significant problem.
It is clear that rebuilding this computer control grid is an equally critical
element to rebuilding the transmission grid.

‘ Robust, Yet Fragile’

In the interim report, the authors point to a fascinating body of work by
professors Carson and Doyle. In their paper titled “Complexity and Robustness,” they
outline the idea of complex systems being “robust, yet fragile.” Carson
and Doyle go on to explain that complex systems are “robust to what
is common or anticipated but potentially fragile to what is rare or unanticipated
and also to flaws in design, manufacturing, or maintenance.”

This notion of complexity gives us excellent context to understand how a
system that was designed and built to be so reliable to support “electrification” of
the United States can be so fragile in the new dynamics of deregulation. It
is the fragile nature of the complex system we see emerging because the human
and computer control elements of the grid, in addition to the grid itself,
did not anticipate the new dynamics of today’s energy markets. Thus
it is clear that rebuilding the grid is just not as easy as building new
and better
transmission. The grids of human and computer systems required to support
the reliability of the transmission grid are equally as important to the
grid rebuilding
process.