Description
2025
|
Student ID |
Student Name |
|
|
1 |
||
|
2 |
||
|
3 |
||
|
4 |
Instructions:
1. Each group can have 2 to 4 students.
2. The report should be 5-10 pages only with Times New Roman font of size 12.
3. Provide screenshots and figures in your report.
4. Use references and citation in your report. Use APA or IEEE standard citation formats.
5. The report will be checked against plagiarism using the appropriate software
MARIE (‘Machine Architecture that is Really Intuitive and Easy’) is a machine architecture and assembly language from The Essentials of Computer Organization and Architecture. MarieSim is an environment within which you can write your own programs and watch how they would run on a real “von Neumann architecture” computer system (Null & Labour, 2003). MarieSim is a simple simulation for MANO Machine Architecture, consisting of memory (to store programs and data) and a CPU (consisting of an ALU and several registers). It has all the functional components necessary to be a real working computer.
In this project, each group are required to write a comprehensive report about MarieSim that should include:
1. Introduction
2. Overview of the MARIE Simulator
3. Key Features and Components
4. The MANO Computer and MARIE Simulator
5. Data and Instructions
6. Installation
7. Case study: implement and discuss the following programs:
a. A program that adds the contents of an array of 100 numbers.
b. A program that performs multiplication of 8-bit numbers.
c. A program that implements OR and XOR logical operations.
d. A program with a subroutine that moves a block of Data.
8. Conclusion
————————————————————-
MARIE (‘Machine Architecture that is Really Intuitive and Easy’) is a machine architecture and assembly language from The Essentials of Computer Organization and Architecture.
College of Engineering (CoE)
Computer Organization (0102240)
Group Project
CLO 3, 5, 6
First Semester (2025/2026)
Due Date: 24/11/2025
Student ID
Student Name
1
2
3
4
Instructions:
1. Each group can have 2 to 4 students.
2. The report should be 5-10 pages only with Times New Roman font of size 12.
3. Provide screenshots and figures in your report.
4. Use references and citation in your report. Use APA or IEEE standard citation
formats.
5. The report will be checked against plagiarism using the appropriate software
MARIE (‘Machine Architecture that is Really Intuitive and Easy’) is a machine architecture and
assembly language from The Essentials of Computer Organization and Architecture. MarieSim
is an environment within which you can write your own programs and watch how they would
run on a real “von Neumann architecture” computer system (Null & Labour, 2003). MarieSim
is a simple simulation for MANO Machine Architecture, consisting of memory (to store
programs and data) and a CPU (consisting of an ALU and several registers). It has all the
functional components necessary to be a real working computer.
In this project, each group are required to write a comprehensive report about MarieSim that
should include:
1.
2.
3.
4.
5.
6.
7.
Introduction
Overview of the MARIE Simulator
Key Features and Components
The MANO Computer and MARIE Simulator
Data and Instructions
Installation
Case study: implement and discuss the following programs:
a. A program that adds the contents of an array of 100 numbers.
b. A program that performs multiplication of 8-bit numbers.
c. A program that implements OR and XOR logical operations.
d. A program with a subroutine that moves a block of Data.
8. Conclusion
00068_CH04_Null.qxd
10/18/10
12:03 PM
Page 195
“When you wish to produce a result by means of an instrument, do not
allow yourself to complicate it.”
—Leonardo da Vinci
CHAPTER
4
4.1
MARIE: An Introduction
to a Simple Computer
INTRODUCTION
esigning a computer nowadays is a job for a computer engineer with plenty of
Dtraining. It is impossible in an introductory textbook such as this (and in an
introductory course in computer organization and architecture) to present everything necessary to design and build a working computer such as those we can buy
today. However, in this chapter, we first look at a very simple computer called
MARIE: a Machine Architecture that is Really Intuitive and Easy. We then provide brief overviews of Intel and MIPs machines, two popular architectures
reflecting the CISC and RISC design philosophies. The objective of this chapter
is to give you an understanding of how a computer functions. We have, therefore,
kept the architecture as uncomplicated as possible, following the advice in the
opening quote by Leonardo da Vinci.
4.2
CPU BASICS AND ORGANIZATION
From our studies in Chapter 2 (data representation) we know that a computer
must manipulate binary-coded data. We also know from Chapter 3 that memory is
used to store both data and program instructions (also in binary). Somehow, the
program must be executed and the data must be processed correctly. The central
processing unit (CPU) is responsible for fetching program instructions, decoding each instruction that is fetched, and performing the indicated sequence of
operations on the correct data. To understand how computers work, you must first
become familiar with their various components and the interaction among these
195
00068_CH04_Null.qxd
196
10/18/10
Chapter 4
12:03 PM
Page 196
/ MARIE: An Introduction to a Simple Computer
components. To introduce the simple architecture in the next section, we first
examine, in general, the microarchitecture that exists at the control level of modern computers.
All computers have a CPU that can be divided into two pieces. The first is the
datapath, which is a network of storage units (registers) and arithmetic and logic
units (for performing various operations on data) connected by buses (capable of
moving data from place to place) where the timing is controlled by clocks. The
second CPU component is the control unit, a module responsible for sequencing
operations and making sure the correct data are where they need to be at the correct time. Together, these components perform the tasks of the CPU: fetching
instructions, decoding them, and finally performing the indicated sequence of
operations. The performance of a machine is directly affected by the design of the
datapath and the control unit. Therefore, we cover these components of the CPU
in detail in the following sections.
4.2.1
The Registers
Registers are used in computer systems as places to store a wide variety of data,
such as addresses, program counters, or data necessary for program execution.
Put simply, a register is a hardware device that stores binary data. Registers are
located on the processor so information can be accessed very quickly. We saw in
Chapter 3 that D flip-flops can be used to implement registers. One D flip-flop is
equivalent to a 1-bit register, so a collection of D flip-flops is necessary to store
multi-bit values. For example, to build a 16-bit register, we need to connect 16 D
flip-flops together. We saw in our binary counter figure from Chapter 3 that these
collections of flip-flops must be clocked to work in unison. At each pulse of the
clock, input enters the register and cannot be changed (and thus is stored) until
the clock pulses again.
Data processing on a computer is usually done on fixed-size binary words
stored in registers. Therefore, most computers have registers of a certain size.
Common sizes include 16, 32, and 64 bits. The number of registers in a machine
varies from architecture to architecture, but is typically a power of 2, with 16 and
32 being most common. Registers contain data, addresses, or control information.
Some registers are specified as “special purpose” and may contain only data, only
addresses, or only control information. Other registers are more generic and may
hold data, addresses, and control information at various times.
Information is written to registers, read from registers, and transferred from
register to register. Registers are not addressed in the same way memory is
addressed (recall that each memory word has a unique binary address beginning
with location 0). Registers are addressed and manipulated by the control unit itself.
In modern computer systems, there are many types of specialized registers:
registers to store information, registers to shift values, registers to compare values, and registers that count. There are “scratchpad” registers that store temporary
values, index registers to control program looping, stack pointer registers to man-
00068_CH04_Null.qxd
10/18/10
12:03 PM
Page 197
4.3 / The Bus
197
age stacks of information for processes, status (or flag) registers to hold the status
or mode of operation (such as overflow, carry, or zero conditions), and general
purpose registers that are the registers available to the programmer. Most computers have register sets, and each set is used in a specific way. For example, the
Pentium architecture has a data register set and an address register set. Certain
architectures have very large sets of registers that can be used in quite novel ways
to speed up execution of instructions. (We discuss this topic when we cover
advanced architectures in Chapter 9.)
4.2.2
The ALU
The arithmetic logic unit (ALU) carries out the logic operations (such as comparisons) and arithmetic operations (such as add or multiply) required during the
program execution. You saw an example of a simple ALU in Chapter 3. Generally
an ALU has two data inputs and one data output. Operations performed in the
ALU often affect bits in the status register (bits are set to indicate actions such
as whether an overflow has occurred). The ALU knows which operations to perform because it is controlled by signals from the control unit.
4.2.3
The Control Unit
The control unit is the “policeman” or “traffic manager” of the CPU. It monitors
the execution of all instructions and the transfer of all information. The control
unit extracts instructions from memory, decodes these instructions, making sure
data are in the right place at the right time, tells the ALU which registers to use,
services interrupts, and turns on the correct circuitry in the ALU for the execution
of the desired operation. The control unit uses a program counter register to find
the next instruction for execution and a status register to keep track of overflows,
carries, borrows, and the like. Section 4.13 covers the control unit in more detail.
4.3
THE BUS
The CPU communicates with the other components via a bus. A bus is a set of
wires that acts as a shared but common datapath to connect multiple subsystems
within the system. It consists of multiple lines, allowing the parallel movement of
bits. Buses are low cost but very versatile, and they make it easy to connect new
devices to each other and to the system. At any one time, only one device (be it a
register, the ALU, memory, or some other component) may use the bus. However,
this sharing often results in a communications bottleneck. The speed of the bus is
affected by its length as well as by the number of devices sharing it. Quite often,
devices are divided into master and slave categories; a master device is one that
initiates actions and a slave is one that responds to requests by a master.
A bus can be point-to-point, connecting two specific components (as seen in
Figure 4.1a) or it can be a common pathway that connects a number of devices,
00068_CH04_Null.qxd
198
10/18/10
Chapter 4
12:03 PM
Page 198
/ MARIE: An Introduction to a Simple Computer
Serial
Port
Modem
(a)
Control
Unit
ALU
Printer
(b)
Computer 1
Computer 2
File
Server
Disk
CPU
Disk
Controller
Memory
Monitor
FIGURE 4.1
Disk
Controller
a) Point-to-Point Buses
b) Multipoint Buses
00068_CH04_Null.qxd
10/18/10
12:03 PM
Page 199
4.3 / The Bus
199
requiring these devices to share the bus (referred to as a multipoint bus and
shown in Figure 4.1b).
Because of this sharing, the bus protocol (set of usage rules) is very important. Figure 4.2 shows a typical bus consisting of data lines, address lines, control
lines, and power lines. Often the lines of a bus dedicated to moving data are
called the data bus. These data lines contain the actual information that must be
moved from one location to another. Control lines indicate which device has permission to use the bus and for what purpose (reading or writing from memory or
from an input/output [I/O] device, for example). Control lines also transfer
acknowledgments for bus requests, interrupts, and clock synchronization signals.
Address lines indicate the location (e.g., in memory) that the data should be
either read from or written to. The power lines provide the electrical power necessary. Typical bus transactions include sending an address (for a read or write),
transferring data from memory to a register (a memory read), and transferring
data to the memory from a register (a memory write). In addition, buses are used
for I/O reads and writes from peripheral devices. Each type of transfer occurs
within a bus cycle, the time between two ticks of the bus clock.
Because of the different types of information buses transport and the various
devices that use the buses, buses themselves have been divided into different
types. Processor-memory buses are short, high-speed buses that are closely
matched to the memory system on the machine to maximize the bandwidth
(transfer of data) and are usually design specific. I/O buses are typically longer
than processor-memory buses and allow for many types of devices with varying
bandwidths. These buses are compatible with many different architectures. A
backplane bus (Figure 4.3) is actually built into the chassis of the machine and
Power
CPU
Address Lines
Data Lines
Control Lines
I/O
Device
I/O
Device
I/O Subsystem
FIGURE 4.2
The Components of a Typical Bus
Main
Memory
00068_CH04_Null.qxd
200
10/18/10
Chapter 4
12:03 PM
Page 200
/ MARIE: An Introduction to a Simple Computer
System
Bus
Interface
Cards
FIGURE 4.3
Backplane Bus
connects the processor, the I/O devices, and the memory (so all devices share one
bus). Many computers have a hierarchy of buses, so it is not uncommon to have
two buses (e.g., a processor-memory bus and an I/O bus) or more in the same system. High-performance systems often use all three types of buses.
Personal computers have their own terminology when it comes to buses. They
have an internal bus (called the system bus) that connects the CPU, memory, and all
other internal components. External buses (sometimes referred to as expansion
buses) connect external devices, peripherals, expansion slots, and I/O ports to the
rest of the computer. Most PCs also have local buses, data buses that connect a
peripheral device directly to the CPU. These high-speed buses can be used to connect only a limited number of similar devices. Expansion buses are slower but allow
for more generic connectivity. Chapter 7 deals with these topics in great detail.
Buses are physically little more than bunches of wires, but they have specific
standards for connectors, timing, and signaling specifications and exact protocols
for use. Synchronous buses are clocked, and things happen only at the clock
ticks (a sequence of events is controlled by the clock). Every device is synchronized by the rate at which the clock ticks, or the clock rate. The bus cycle time
mentioned is the reciprocal of the bus clock rate. For example, if the bus clock
rate is 133 MHz, then the length of the bus cycle is 1/133,000,000 or
7.52 nanoseconds (ns). Because the clock controls the transactions, any clock
skew (drift in the clock) has the potential to cause problems, implying that the
bus must be kept as short as possible so the clock drift cannot get overly large. In
addition, the bus cycle time must not be shorter than the length of time it takes
information to traverse the bus. The length of the bus, therefore, imposes restrictions on both the bus clock rate and the bus cycle time.
With asynchronous buses, control lines coordinate the operations, and a complex handshaking protocol must be used to enforce timing. To read a word of data
from memory, for example, the protocol would require steps similar to the following:
1. ReqREAD: This bus control line is activated and the data memory address is put
on the appropriate bus lines at the same time.
00068_CH04_Null.qxd
10/18/10
12:03 PM
Page 201
4.4 / Clocks
201
2. ReadyDATA: This control line is asserted when the memory system has put the
required data on the data lines for the bus.
3. ACK: This control line is used to indicate that the ReqREAD or the ReadyDATA
has been acknowledged.
Using a protocol instead of the clock to coordinate transactions means that
asynchronous buses scale better with technology and can support a wider variety
of devices.
To use a bus, a device must reserve it, because only one device can use the bus
at a time. As mentioned, bus masters are devices that are allowed to initiate transfer of information (control bus), and bus slaves are modules that are activated by a
master and respond to requests to read and write data (so only masters can reserve
the bus). Both follow a communications protocol to use the bus, working within
very specific timing requirements. In a very simple system (such as the one we
present in the next section), the processor is the only device allowed to become a
bus master. This is good in terms of avoiding chaos, but bad because the processor
now is involved in every transaction that uses the bus.
In systems with more than one master device, bus arbitration is required.
Bus arbitration schemes must provide priority to certain master devices and, at
the same time, make sure lower priority devices are not starved out. Bus arbitration schemes fall into four categories:
1. Daisy chain arbitration: This scheme uses a “grant bus” control line that is
passed down the bus from the highest priority device to the lowest priority device.
(Fairness is not ensured, and it is possible that low-priority devices are “starved
out” and never allowed to use the bus.) This scheme is simple but not fair.
2. Centralized parallel arbitration: Each device has a request control line to the
bus and a centralized arbiter selects who gets the bus. Bottlenecks can result
using this type of arbitration.
3. Distributed arbitration using self-selection: This scheme is similar to centralized
arbitration but instead of a central authority selecting who gets the bus, the devices
themselves determine who has highest priority and who should get the bus.
4. Distributed arbitration using collision detection: Each device is allowed to
make a request for the bus. If the bus detects any collisions (multiple simultaneous requests), the device must make another request. (Ethernet uses this type
of arbitration.)
Chapter 7 contains more detailed information on buses and their protocols.
4.4
CLOCKS
Every computer contains an internal clock that regulates how quickly instructions
can be executed. The clock also synchronizes all of the components in the system. As the clock ticks, it sets the pace for everything that happens in the system,
00068_CH04_Null.qxd
202
10/18/10
Chapter 4
12:03 PM
Page 202
/ MARIE: An Introduction to a Simple Computer
much like a metronome or a symphony conductor. The CPU uses this clock to
regulate its progress, checking the otherwise unpredictable speed of the digital
logic gates. The CPU requires a fixed number of clock ticks to execute each
instruction. Therefore, instruction performance is often measured in clock
cycles—the time between clock ticks—instead of seconds. The clock frequency
(sometimes called the clock rate or clock speed) is measured in megahertz (MHz)
or gigahertz (GHz), as we saw in Chapter 1. The clock cycle time (or clock
period) is simply the reciprocal of the clock frequency. For example, an 800 MHz
machine has a clock cycle time of 1/800,000,000 or 1.25ns. If a machine has a
2ns cycle time, then it is a 500 MHz machine.
Most machines are synchronous: there is a master clock signal, which ticks
(changing from 0 to 1 to 0 and so on) at regular intervals. Registers must wait for the
clock to tick before new data can be loaded. It seems reasonable to assume that if we
speed up the clock, the machine will run faster. However, there are limits on how
short we can make the clock cycles. When the clock ticks and new data are loaded
into the registers, the register outputs are likely to change. These changed output
values must propagate through all the circuits in the machine until they reach the
input of the next set of registers, where they are stored. The clock cycle must be long
enough to allow these changes to reach the next set of registers. If the clock cycle is
too short, we could end up with some values not reaching the registers. This would
result in an inconsistent state in our machine, which is definitely something we must
avoid. Therefore, the minimum clock cycle time must be at least as great as the
maximum propagation delay of the circuit, from each set of register outputs to register inputs. What if we “shorten” the distance between registers to shorten the propagation delay? We could do this by adding registers between the output registers and
the corresponding input registers. But recall that registers cannot change values until
the clock ticks, so we have, in effect, increased the number of clock cycles. For
example, an instruction that would require two clock cycles might now require three
or four (or more, depending on where we locate the additional registers).
Most machine instructions require one or two clock cycles, but some can take
35 or more. We present the following formula to relate seconds to cycles:
CPU time
average cycles
seconds
seconds
instructions
program
program
instruction
cycle
It is important to note that the architecture of a machine has a large effect on its
performance. Two machines with the same clock speed do not necessarily execute instructions in the same number of cycles. For example, a multiply operation
on an older Intel 286 machine required 20 clock cycles, but on a new Pentium, a
multiply operation can be done in 1 clock cycle, which implies the newer
machine would be 20 times faster than the 286, even if they both had the same
internal system clock. In general, multiplication requires more time than addition,
floating-point operations require more cycles than integer ones, and accessing
memory takes longer than accessing registers.
00068_CH04_Null.qxd
10/18/10
12:03 PM
Page 203
4.5 / The Input/Output Subsystem
203
Generally, when we mention the clock, we are referring to the system clock,
or the master clock that regulates the CPU and other components. However, certain buses also have their own clocks. Bus clocks are usually slower than CPU
clocks, causing bottleneck problems.
System components have defined performance bounds, indicating the maximum time required for the components to perform their functions. Manufacturers
guarantee their components will run within these bounds in the most extreme circumstances. When we connect all of the components together serially, where one
component must complete its task before another can function properly, it is
important to be aware of these performance bounds so we are able to synchronize
the components properly. However, many people push the bounds of certain system components in an attempt to improve system performance. Overclocking is
one method people use to achieve this goal.
Although many components are potential candidates, one of the most popular
components for overclocking is the CPU. The basic idea is to run the CPU at
clock and/or bus speeds above the upper bound specified by the manufacturer.
Although this can increase system performance, one must be careful not to create
system timing faults or, worse yet, overheat the CPU. The system bus can also be
overclocked, which results in overclocking the various components that communicate via the bus. Overclocking the system bus can provide considerable performance improvements, but can also damage the components that use the bus or
cause them to perform unreliably.
4.5
THE INPUT/OUTPUT SUBSYSTEM
Input and output (I/O) devices allow us to communicate with the computer system. I/O is the transfer of data between primary memory and various I/O peripherals. Input devices such as keyboards, mice, card readers, scanners, voice
recognition systems, and touch screens allow us to enter data into the computer.
Output devices such as monitors, printers, plotters, and speakers allow us to get
information from the computer.
These devices are not connected directly to the CPU. Instead, there is an
interface that handles the data transfers. This interface converts the system bus
signals to and from a format that is acceptable to the given device. The CPU communicates to these external devices via I/O registers. This exchange of data is performed in two ways. In memory-mapped I/O, the registers in the interface
appear in the computer’s memory map and there is no real difference between
accessing memory and accessing an I/O device. Clearly, this is advantageous
from the perspective of speed, but it uses up memory space in the system. With
instruction-based I/O, the CPU has specialized instructions that perform the
input and output. Although this does not use memory space, it requires specific
I/O instructions, which implies it can be used only by CPUs that can execute these
specific instructions. Interrupts play a very important part in I/O, because they are
00068_CH04_Null.qxd
204
10/18/10
Chapter 4
12:03 PM
Page 204
/ MARIE: An Introduction to a Simple Computer
an efficient way to notify the CPU that input or output is available for use. We
explore thse I/O methods in detail in Chapter 7.
4.6
MEMORY ORGANIZATION AND ADDRESSING
We saw an example of a rather small memory in Chapter 3. However, we have not
yet discussed in detail how memory is laid out and how it is addressed. It is important that you have a good understanding of these concepts before we continue.
You can envision memory as a matrix of bits. Each row, implemented by a
register, has a length typically equivalent to the addressable unit size of the
machine. Each register (more commonly referred to as a memory location) has a
unique address; memory addresses usually start at zero and progress upward. Figure 4.4 illustrates this concept.
An address is typically represented by an unsigned integer. Recall from
Chapter 2 that four bits are a nibble and eight bits are a byte. Normally, memory
is byte addressable, which means that each individual byte has a unique address.
Some machines may have a word size that is larger than a single byte. For example, a computer might handle 32-bit words (which means it can manipulate 32
bits at a time through various instructions and it uses 32-bit registers), but still
employ a byte-addressable architecture. In this situation, when a word uses multiple bytes, the byte with the lowest address determines the address of the entire
word. It is also possible that a computer might be word addressable, which
means each word (not necessarily each byte) has its own address, but most current machines are byte addressable (even though they have 32-bit or larger
words). A memory address is typically stored in a single machine word.
If all this talk about machines using byte addressing with words of different
sizes has you somewhat confused, the following analogy may help. Memory is
similar to a street full of apartment buildings. Each building (word) has multiple
apartments (bytes), and each apartment has its own address. All of the apartments
are numbered sequentially (addressed), from 0 to the total number of apartments
in the complex minus one. The buildings themselves serve to group the apartments. In computers, words do the same thing. Words are the basic unit of size
Address
8-bit
0
1
2
3
…
N–1
Address
16-bit
0
1
2
3
…
M–1
(a)
FIGURE 4.4
(b)
a) N 8-Bit Memory Locations
b) M 16-Bit Memory Locations
00068_CH04_Null.qxd
10/18/10
12:03 PM
Page 205
4.6 / Memory Organization and Addressing
205
used in various instructions. For example, you may read a word from or write a
word to memory, even on a byte-addressable machine.
If an architecture is byte addressable, and the instruction set architecture word is
larger than 1 byte, the issue of alignment must be addressed. For example, if we
wish to read a 32-bit word on a byte-addressable machine, we must make sure that
(1) the word is stored on a natural alignment boundary, and (2) the access starts on
that boundary. This is accomplished, in the case of 32-bit words, by requiring the
address to be a multiple of 4. Some architectures allow certain instructions to perform unaligned accesses, where the desired address does not have to start on a natural boundary.
Memory is built from random access memory (RAM) chips. (We cover memory in detail in Chapter 6.) Memory is often referred to using the notation length
width (L W). For example, 4M 8 means the memory is 4M long (it has
4M = 22 220 = 222 items) and each item is 8 bits wide (which means that each
item is a byte). To address this memory (assuming byte addressing), we need to
be able to uniquely identify 222 different items, which means we need 222 different addresses. Because addresses are unsigned binary numbers, we need to count
from 0 to (222 1) in binary. How many bits does this require? Well, to count
from 0 to 3 in binary (for a total of four items), we need 2 bits. To count from 0 to
7 in binary (for a total of eight items), we need 3 bits. To count from 0 to 15 in
binary (for a total of 16 items), we need 4 bits. Do you see a pattern emerging
here? Can you fill in the missing value for Table 4.1?
The correct answer to the missing table entry is 5 bits. The number of bits
required for our 4M memory is 22. Since most memories are byte addressable, we
say we need N bits to uniquely address each byte. In general, if a computer has 2N
addressable units of memory, it requires N bits to uniquely address each unit.
To better illustrate the difference between words and bytes, suppose the
4M 8 memory referred to in the previous example were word addressable
instead of byte addressable and each word were 16 bits long. There are 222 unique
bytes, which implies there are 222 ÷ 2 = 221 total words, which would require 21,
not 22, bits per address. Each word would require two bytes, but we express the
address of the entire word by using the lower byte address.
While most memory is byte addressable and 8 bits wide, memory can vary in
width. For example, a 2K 16 memory holds 211 = 2048 16-bit items. This type
of memory is typically used on a word-addressable architecture with 16-bit words.
Main memory is usually larger than one RAM chip. Consequently, these chips
are combined into a single memory of the desired size. For example, suppose you
need to build a 32K 8 byte-addressable memory and all you have are 2K 8
RAM chips. You could connect 16 rows of chips together as shown in Figure 4.5.
Total Items
Total as a Power of 2
Number of Bits
TABLE 4.1
2
21
1
4
22
2
8
23
3
16
24
4
32
25
??
Calculating the Address Bits Required
00068_CH04_Null.qxd
206
10/18/10
Chapter 4
12:03 PM
Page 206
/ MARIE: An Introduction to a Simple Computer
Row 0
2K 8
Row 1
2K 8
•••
2K 8
Row 15
FIGURE 4.5
Memory as a Collection of RAM Chips
Each chip addresses 2K bytes. Addresses for this memory must have 15 bits
(there are 32K = 25 210 bytes to access). But each chip requires only 11 address
lines (each chip holds only 211 bytes). In this situation, a decoder is needed to
decode either the leftmost or rightmost 4 bits of the address to determine which
chip holds the desired data. Once the proper chip has been located, the remaining
11 bits are used to determine the offset on that chip. Whether we use the 4 leftmost or 4 rightmost bits depends on how the memory is interleaved.
A single memory module causes sequentialization of access (only one memory access can be performed at a time). Memory interleaving, which splits memory across multiple memory modules (or banks), can be used to help relieve this.
With low-order interleaving, the low-order bits of the address are used to select
the bank; in high-order interleaving, the high-order bits of the address are used.
Suppose we have a byte-addressable memory consisting of 8 modules of 4
bytes each, for a total of 32 bytes of memory. We need 5 bits to uniquely identify
each byte. Three of these bits are used to determine the module (we have 23 = 8
modules), and the remaining two are used to determine the offset within that
module. High-order interleaving, the most intuitive organization, distributes the
addresses so that each module contains consecutive addresses, as we see with the
32 addresses in Figure 4.6. Module 0 contains the data stored at addresses 0, 1, 2,
and 3; module 1 contains the data stored at addresses 4, 5, 6, and 7; and so on.
Module 0
Module 1
Module 2
Module 3
Module 4
Module 5
Module 6
Module 7
0
1
2
3
4
5
6
7
8
9
10
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
11
FIGURE 4.6
High-Order Memory Interleaving
00068_CH04_Null.qxd
10/18/10
12:03 PM
Page 207
4.6 / Memory Organization and Addressing
Module 0
Module 1
Module 2
Module 3
Module 4
Module 5
Module 6
Module 7
0
8
16
24
1
9
17
25
2
10
18
26
3
11
19
27
4
12
20
28
5
13
21
29
6
14
22
30
7
15
23
31
FIGURE 4.7
207
Low-Order Memory Interleaving
Consider address 3, which in binary (using our required 5 bits), is 00011. Highorder interleaving uses the leftmost three bits (000) to determine the module (so
the data at address 3 is in module 0). The remaining two bits (11) tell us that the
desired data is at offset 3 (112 is decimal value 3), the last address in module 0.
Low-order interleaved memory places consecutive addresses of memory in
different memory modules. Figure 4.7 shows low-order interleaving on 32
addresses. In this figure, we see that module 0 now contains the data stored at
addresses 0, 8, 16, and 24. To locate address 3 (00011), low-order interleaving
uses the rightmost 3 bits to determine the module (which points us to module 3),
and the remaining two bits, 00, tell us to look at offset zero within that module. If
you check module 3 in Figure 4.7, this is precisely where we find address 3.
With the appropriate buses using low-order interleaving, a read or write using
one module can be started before a read or write using another module actually
completes (reads and writes can be overlapped). For example, if an array of
length 4 is stored in the above example of memory using high-order interleaving
(stored at addresses 0, 1, 2, and 3), we are forced to access each array element
sequentially, as the entire array is stored in one module. If, however, low-order
interleaving is used (and the array is stored in modules 0, 1, 2, and 3 at offset 0 in
each), we can access the array elements in parallel because each array element is
in a different module.
Let’s return to the memory shown in Figure 4.5, a 32K 8 memory consisting of 16 chips (modules) of size 2K 8 each. Memory is 32K = 25 210 = 215
addressable units (in this case, bytes), which means we need 15 bits for each
address. Each chip holds 2K = 211 bytes, so 11 bits are used to determine the offset on the chip. There are 16 = 24 chips, so we need 4 bits to determine the chip.
Consider the address 001000000100111. Using high-order interleaving, we use
the 4 leftmost bits to determine the chip, and the remaining 11 as the offset:
4 bits
11 bits
0010
00000100111
chip
offset on chip
15-bits
00068_CH04_Null.qxd
208
10/18/10
Chapter 4
12:03 PM
Page 208
/ MARIE: An Introduction to a Simple Computer
The data at address 001000000100111 is stored on chip 2 (00102) at offset 39
(000001001112). If we use low-order interleaving, the rightmost 4 bits are used to
determine the chip:
11 bits
4 bits
00100000010
0111
offset on chip
chip
15-bits
So the data, using low-order interleaving, is stored on chip 7 (01112) at offset 258
(001000000102).
Although low-order interleaving allows for concurrent access of data stored
sequentially in memory (such as an array or the instructions in a program), highorder interleaving is more intuitive. Therefore, for the remainder of the book, we
assume high-order interleaving is being used.
The memory concepts we have covered are very important and appear in various places in the remaining chapters, in particular in Chapter 6, which discusses
memory in detail. The key concepts to focus on are: (1) Memory addresses are
unsigned binary values (although we often view them as hex values because it is
easier), and (2) The number of items to be addressed determines the numbers of
bits that occur in the address. Although we could always use more bits for the
address than required, that is seldom done because minimization is an important
concept in computer design.
4.7
INTERRUPTS
We have introduced the basic hardware information required for a solid understanding of computer architecture: the CPU, buses, control unit, registers, clocks,
I/O, and memory. However, there is one more concept we need to cover that deals
with how these components interact with the processor: Interrupts are events
that alter (or interrupt) the normal flow of execution in the system. An interrupt
can be triggered for a variety of reasons, including:
•
•
•
•
•
I/O requests
Arithmetic errors (e.g., division by 0)
Arithmetic underflow or overflow
Hardware malfunction (e.g., memory parity error)
User-defined break points (such as when debugging a program)
00068_CH04_Null.qxd
10/18/10
12:03 PM
Page 209
4.8 / MARIE
209
• Page faults (this is covered in detail in Chapter 6)
• Invalid instructions (usually resulting from pointer issues)
• Miscellaneous
The actions performed for each of these types of interrupts (called interrupt
handling) are very different. Telling the CPU that an I/O request has finished is
much different from terminating a program because of division by 0. But these
actions are both handled by interrupts because they require a change in the normal flow of the program’s execution.
An interrupt can be initiated by the user or the system, can be maskable (disabled or ignored) or nonmaskable (a high priority interrupt that cannot be disabled and must be acknowledged), can occur within or between instructions, may
be synchronous (occurs at the same place every time a program is executed) or
asynchronous (occurs unexpectedly), and can result in the program terminating or
continuing execution once the interrupt is handled. Interrupts are covered in more
detail in Section 4.9.2 and in Chapter 7.
Now that we have given a general overview of the components necessary for
a computer system to function, we proceed by introducing a simple, yet functional, architecture to illustrate these concepts.
4.8
MARIE
MARIE, a Machine Architecture that is Really Intuitive and Easy, is a simple
architecture consisting of memory (to store programs and data) and a CPU (consisting of an ALU and several registers). It has all the functional components
necessary to be a real working computer. MARIE will help to illustrate the concepts in this and the preceding three chapters. We describe MARIE’s architecture
in the following sections.
4.8.1
The Architecture
MARIE has the following characteristics:
•
•
•
•
•
•
•
•
Binary, two’s complement
Stored program, fixed word length
Word (but not byte) addressable
4K words of main memory (this implies 12 bits per address)
16-bit data (words have 16 bits)
16-bit instructions, 4 for the opcode and 12 for the address
A 16-bit accumulator (AC)
A 16-bit instruction register (IR)
00068_CH04_Null.qxd
210
10/18/10
Chapter 4
•
•
•
•
•
12:03 PM
Page 210
/ MARIE: An Introduction to a Simple Computer
A 16-bit memory buffer register (MBR)
A 12-bit program counter (PC)
A 12-bit memory address register (MAR)
An 8-bit input register
An 8-bit output register
Figure 4.8 shows the architecture for MARIE.
Before we continue, we need to stress one important point about memory. In
Chapter 3, we presented a simple memory built using D flip-flops. We emphasize
again that each location in memory has a unique address (represented in binary)
and each location can hold a value. These notions of the address versus what is
actually stored at that address tend to be confusing. To help avoid confusion,
visualize a post office. There are post office boxes with various “addresses” or
numbers. Inside the post office box, there is mail. To get the mail, the number of
the post office box must be known. The same is true for data or instructions that
need to be fetched from memory. The contents of any memory address are manipulated by specifying the address of that memory location. We shall see that there
are many different ways to specify this address.
Memory Address 0
OutREG
ALU
AC
InREG
MAR
MBR
Main
Memory
PC
IR
Control Unit
The CPU
FIGURE 4.8
Memory Address 4K–1
MARIE’s Architecture
00068_CH04_Null.qxd
10/18/10
12:03 PM
Page 211
4.8 / MARIE
4.8.2
211
Registers and Buses
Registers are storage locations within the CPU (as illustrated in Figure 4.8). The
ALU portion of the CPU performs all of the processing (arithmetic operations,
logic decisions, etc.). The registers are used for very specific purposes when programs are executing: They hold values for temporary storage, data that is being
manipulated in some way, or results of simple calculations. Many times, registers
are referenced implicitly in an instruction, as we see when we describe the
instruction set for MARIE in Section 4.8.3.
In MARIE, there are seven registers, as follows:
• AC: The accumulator, which holds data values. This is a general-purpose
register and holds data that the CPU needs to process. Most computers today
have multiple general-purpose registers.
• MAR: The memory address register, which holds the memory address of the
data being referenced.
• MBR: The memory buffer register, which holds either the data just read
from memory or the data ready to be written to memory.
• PC: The program counter, which holds the address of the next instruction to
be executed in the program.
• IR: The instruction register, which holds the next instruction to be executed.
• InREG: The input register, which holds data from the input device.
• OutREG: The output register, which holds data for the output device.
The MAR, MBR, PC, and IR hold very specific information and cannot be
used for anything other than their stated purposes. For example, we could not
store an arbitrary data value from memory in the PC. We must use the MBR or
the AC to store this arbitrary value. In addition, there is a status or flag register that holds information indicating various conditions, such as an overflow
in the ALU. However, for clarity, we do not include that register explicitly in
any figures.
MARIE is a very simple computer with a limited register set. Modern
CPUs have multiple general-purpose registers, often called user-visible registers, that perform functions similar to those of the AC. Today’s computers also
have additional registers; for example, some computers have registers that
shift data values and other registers that, if taken as a set, can be treated as a
list of values.
MARIE cannot transfer data or instructions into or out of registers without a
bus. In MARIE, we assume a common bus scheme. Each device connected to
the bus has a number, and before the device can use the bus, it must be set to
that identifying number. We also have some pathways to speed up execution.
We have a communication path between the MAR and memory (the MAR provides the inputs to the address lines for memory so the CPU knows where in
00068_CH04_Null.qxd
212
10/18/10
Chapter 4
12:03 PM
Page 212
/ MARIE: An Introduction to a Simple Computer
Bus
0
Main Memory
1
MAR
2
PC
3
MBR
ALU
4
AC
5
InREG
6
OutREG
7
IR
16-bit bus
FIGURE 4.9
Datapath in MARIE
memory to read or write), and a separate path from the MBR to the AC. There is
also a special path from the MBR to the ALU to allow the data in the MBR to be
used in arithmetic operations. Information can also flow from the AC through
the ALU and back into the AC without being put on the common bus. The
advantage gained using these additional pathways is that information can be put
on the common bus in the same clock cycle in which data are put on these other
pathways, allowing these events to take place in parallel. Figure 4.9 shows the
datapath (the path that information follows) in MARIE.
4.8.3
Instruction Set Architecture
MARIE has a very simple, yet powerful, instruction set. The instruction set
architecture (ISA) of a machine specifies the instructions that the computer can
perform and the format for each instruction. The ISA is essentially an interface
between the software and the hardware. Some ISAs include hundreds of instruc-
00068_CH04_Null.qxd
10/18/10
12:03 PM
Page 213
4.8 / MARIE
213
tions. We mentioned previously that each instruction for MARIE consists of 16
bits. The most significant 4 bits, bits 12 through 15, make up the opcode that
specifies the instruction to be executed (which allows for a total of 16 instructions). The least significant 12 bits, bits 0 through 11, form an address, which
allows for a maximum memory address of 212–1. The instruction format for
MARIE is shown in Figure 4.10.
Most ISAs consist of instructions for processing data, moving data, and controlling the execution sequence of the program. MARIE’s instruction set consists
of the instructions shown in Table 4.2.
The Load instruction allows us to move data from memory into the CPU (via
the MBR and the AC). All data (which includes anything that is not an instruction)
from memory must move first into the MBR and then into either the AC or the
ALU; there are no other options in this architecture. Notice that the Load instruction does not have to name the AC as the final destination; this register is implicit
in the instruction. Other instructions reference the AC register in a similar fashion.
The Store instruction allows us to move data from the CPU back to memory. The
Add and Subt instructions add and subtract, respectively, the data value found at
address X to or from the value in the AC. The data located at address X is copied
into the MBR where it is held until the arithmetic operation is executed. Input and
Output allow MARIE to communicate with the outside world.
Opcode
Bit
15
12 11
FIGURE 4.10
Instruction Number
Bin
Hex
Address
0
MARIE’s Instruction Format
Instruction
Meaning
Load the contents of address X into AC.
Store the contents of AC at address X.
Add the contents of address X to AC and store the result in AC.
Subtract the contents of address X from AC and store the
result in AC.
Input a value from the keyboard into AC.
Output the value in AC to the display.
Terminate the program.
Skip the next instruction on condition.
Load the value of X into PC.
0001
0010
0011
0100
1
2
3
4
Load X
Store X
Add X
Subt X
0101
0110
0111
1000
1001
5
6
7
8
9
Input
Output
Halt
Skipcond
Jump X
TABLE 4.2
MARIE’s Instruction Set
00068_CH04_Null.qxd
214
10/18/10
Chapter 4
12:03 PM
Page 214
/ MARIE: An Introduction to a Simple Computer
Input and output are complicated operations. In modern computers, input and
output are done using ASCII bytes. This means that if you type in the number 32
on the keyboard as input, it is actually read in as the ASCII character “3” followed by “2.” These two characters must be converted to the numeric value 32
before they are stored in the AC. Because we are focusing on how a computer
works, we are going to assume that a value input from the keyboard is “automatically” converted correctly. We are glossing over a very important concept: How
does the computer know whether an I/O value is to be treated as numeric or
ASCII, if everything that is input or output is actually ASCII? The answer is that
the computer knows through the context of how the value is used. In MARIE, we
assume numeric input and output only. We also allow values to be input as decimal and assume there is a “magic conversion” to the actual binary values that are
stored. In reality, these are issues that must be addressed if a computer is to work
properly.
The Halt command causes the current program execution to terminate. The
Skipcond instruction allows us to perform conditional branching (as is done with
“while” loops or “if” statements). When the Skipcond instruction is executed, the
value stored in the AC must be inspected. Two of the address bits (let’s assume
we always use the two address bits closest to the opcode field, bits 10 and 11)
specify the condition to be tested. If the two address bits are 00, this translates to
“skip if the AC is negative.” If the two address bits are 01 (bit eleven is 0 and bit
ten is 1), this translates to “skip if the AC is equal to 0.” Finally, if the two
address bits are 10 (or 2), this translates to “skip if the AC is greater than 0.” By
“skip” we simply mean jump over the next instruction. This is accomplished by
incrementing the PC by 1, essentially ignoring the following instruction, which is
never fetched. The Jump instruction, an unconditional branch, also affects the PC.
This instruction causes the contents of the PC to be replaced with the value of X,
which is the address of the next instruction to fetch.
We wish to keep the architecture and the instruction set as simple as possible and
yet convey the information necessary to understand how a computer works. Therefore, we have omitted several useful instructions. However, you will see shortly that
this instruction set is still quite powerful. Once you gain familiarity with how the
machine works, we will extend the instruction set to make programming easier.
Let’s examine the instruction format used in MARIE. Suppose we have the
following 16-bit instruction:
opcode
address
0 0 0 1 0 0 0 0 0 0 0 0 0 0 1 1
Bit
151413121110 9 8 7 6 5 4 3 2 1 0
The leftmost four bits indicate the opcode, or the instruction to be executed.
0001 is binary for 1, which represents the Load instruction. The remaining 12 bits
00068_CH04_Null.qxd
10/18/10
12:03 PM
Page 215
4.8 / MARIE
215
indicate the address of the value we are loading, which is address 3 in main memory. This instruction causes the data value found in main memory, address 3, to be
copied into the AC. Consider another instruction:
opcode
address
0 0 1 1 0 0 0 0 0 0 0 0 1 1 0 1
Bit
151413121110 9 8 7 6 5 4 3 2 1 0
The leftmost four bits, 0011, are equal to 3, which is the Add instruction. The
address bits indicate address 00D in hex (or 13 decimal). We go to main memory,
get the data value at address 00D, and add this value to the AC. The value in the
AC would then change to reflect this sum. One more example follows:
opcode
address
1 0 0 0 1 0 0 0 0 0 0 0 0 0 0 0
Bit
151413121110 9 8 7 6 5 4 3 2 1 0
The opcode for this instruction represents the Skipcond instruction. Bits ten
and eleven (read left to right, or bit eleven followed by bit ten) are 10, indicating
a value of 2. This implies a “skip if AC greater than 0.” If the value in the AC is
less than or equal to zero, this instruction is ignored and we simply go on to the
next instruction. If the value in the AC is greater than zero, this instruction causes
the PC to be incremented by 1, thus causing the instruction immediately following this instruction in the program to be ignored (keep this in mind as you read
the following section on the instruction cycle).
These examples bring up an interesting point. We will be writing programs
using this limited instruction set. Would you rather write a program using the
commands Load, Add, and Halt, or their binary equivalents 0001, 0011, and
0111? Most people would rather use the instruction name, or mnemonic, for the
instruction, instead of the binary value for the instruction. Our binary instructions
are called machine instructions. The corresponding mnemonic instructions are
what we refer to as assembly language instructions. There is a one-to-one correspondence between assembly language and machine instructions. When we type
in an assembly language program (i.e., using the instructions listed in Table 4.2),
we need an assembler to convert it to its binary equivalent. We discuss assemblers in Section 4.11.
4.8.4
Register Transfer Notation
We have seen that digital systems consist of many components, including arithmetic logic units, registers, memory, decoders, and control units. These units are
interconnected by buses to allow information to flow through the system. The
instruction set presented for MARIE in the preceding sections constitutes a set of
00068_CH04_Null.qxd
216
10/18/10
Chapter 4
12:03 PM
Page 216
/ MARIE: An Introduction to a Simple Computer
machine-level instructions used by these components to execute a program. Each
instruction appears to be very simplistic; however, if you examine what actually
happens at the component level, each instruction involves multiple operations.
For example, the Load instruction loads the contents of the given memory location
into the AC register. But, if we observe what is happening at the component level,
we see that multiple “mini-instructions” are being executed. First, the address from
the instruction must be loaded into the MAR. Then the data in memory at this location must be loaded into the MBR. Then the MBR must be loaded into the AC.
These mini-instructions are called microoperations and specify the elementary
operations that can be performed on data stored in registers.
The symbolic notation used to describe the behavior of microoperations is
called register transfer notation (RTN) or register transfer language (RTL).
We use the notation M[X] to indicate the actual data stored at location X in memory, and ← to indicate a transfer of information. In reality, a transfer from one
register to another always involves a transfer onto the bus from the source register, and then a transfer off the bus into the destination register. However, for the
sake of clarity, we do not include these bus transfers, assuming that you understand that the bus must be used for data transfer.
We now present the register transfer notation for each of the instructions in
the ISA for MARIE.
Load X
Recall that this instruction loads the contents of memory location X into the AC.
However, the address X must first be placed into the MAR. Then the data at location M[MAR] (or address X) is moved into the MBR. Finally, this data is placed
in the AC.
MAR ← X
MBR ← M[MAR]
AC ← MBR
Because the IR must use the bus to copy the value of X into the MAR, before
the data at location X can be placed into the MBR, this operation requires two bus
cycles. Therefore, these two operations are on separate lines to indicate they cannot
occur during the same cycle. However, because we have a special connection
between the MBR and the AC, the transfer of the data from the MBR to the AC can
occur immediately after the data is put into the MBR, without waiting for the bus.
Store X
This instruction stores the contents of the AC in memory location X:
MAR ← X, MBR ← AC
M[MAR] ← MBR
00068_CH04_Null.qxd
10/18/10
12:03 PM
Page 217
4.8 / MARIE
217
Add X
The data value stored at address X is added to the AC. This can be accomplished
as follows:
MAR ← X
MBR ← M[MAR]
AC ← AC + MBR
Subt X
Similar to Add, this instruction subtracts the value stored at address X from the
accumulator and places the result back in the AC:
MAR ← X
MBR ← M[MAR]
AC ← AC MBR
Input
Any input from the input device is first routed into the InREG. Then the data is
transferred into the AC.
AC ← InREG
Output
This instruction causes the contents of the AC to be placed into the OutREG,
where it is eventually sent to the output device.
OutREG ← AC
Halt
No operations are performed on registers; the machine simply ceases execution of
the program.
Skipcond
Recall that this instruction uses the bits in positions 10 and 11 in the address field
to determine what comparison to perform on the AC. Depending on this bit combination, the AC is checked to see whether it is negative, equal to 0, or greater
than 0. If the given condition is true, then the next instruction is skipped. This is
performed by incrementing the PC register by 1.
00068_CH04_Null.qxd
218
10/18/10
Chapter 4
12:03 PM
Page 218
/ MARIE: An Introduction to a Simple Computer
If IR[11–10] = 00 then
{if bits 10 and 11 in the IR are both 0}
If AC 0 then PC ← PC + 1
If the bits in positions ten and eleven are both ones, an error condition
results. However, an additional condition could also be defined using these
bit values.
Jump X
This instruction causes an unconditional branch to the given address, X. Therefore, to execute this instruction, X must be loaded into the PC.
PC ← X
In reality, the lower or least significant 12 bits of the instruction register (or
IR[11–0]) reflect the value of X. So this transfer is more accurately depicted as:
PC ← IR[11–0]
However, we feel that the notation PC ← X is easier to understand and relate
to the actual instructions, so we use this instead.
Register transfer notation is a symbolic means of expressing what is happening in the system when a given instruction is executing. RTN is sensitive to the
datapath, in that if multiple microoperations must share the bus, they must be
executed in a sequential fashion, one following the other.
4.9
INSTRUCTION PROCESSING
Now that we have a basic language with which to communicate ideas to our computer, we need to discuss exactly how a specific program is executed. All computers follow a basic machine cycle: the fetch, decode, and execute cycle.
4.9.1
The Fetch–Decode–Execute Cycle
The fetch–decode–execute cycle represents the steps that a computer follows to
run a program. The CPU fetches an instruction (transfers it from main memory to
the instruction register), decodes it (determines the opcode and fetches any data
necessary to carry out the instruction), and executes it (performs the operation[s]
indicated by the instruction). Notice that a large part of this cycle is spent copying data from one location to another. When a program is initially loaded, the
address of the first instruction must be placed in the PC. The steps in this cycle,
which take place in specific clock cycles, are listed below. Note that Steps 1 and
00068_CH04_Null.qxd
10/18/10
12:03 PM
Page 219
4.9 / Instruction Processing
219
2 make up the fetch phase, Step 3 makes up the decode phase, and Step 4 is the
execute phase.
1. Copy the contents of the PC to the MAR: MAR ← PC.
2. Go to main memory and fetch the instruction found at the address in the MAR,
placing this instruction in the IR; increment PC by 1 (PC now points to the next
instruction in the program): IR ← M[MAR] and then PC ← PC+1. (Note: Because
MARIE is word addressable, the PC is incremented by 1, which results in the
next word’s address occupying the PC. If MARIE were byte addressable, the
PC would need to be incremented by 2 to point to the address of the next
instruction, because each instruction would require 2 bytes. On a byte-addressable machine with 32-bit words, the PC would need to be incremented by 4.)
3. Copy the rightmost 12 bits of the IR into the MAR; decode the leftmost 4 bits
to determine the opcode, MAR ← IR[11–0], and decode IR[15–12].
4. If necessary, use the address in the MAR to go to memory to get data, placing
the data in the MBR (and possibly the AC), and then execute the instruction
MBR ← M[MAR] and execute the actual instruction.
This cycle is illustrated in the flowchart in Figure 4.11.
Note that computers today, even with large instruction sets, long instructions,
and huge memories, can execute millions of these fetch–decode–execute cycles
in the blink of an eye.
4.9.2
Interrupts and the Instruction Cycle
All computers provide a means for the normal fetch–decode–execute cycle to be
interrupted. These interruptions may be necessary for many reasons, including a
program error (such as division by 0, arithmetic overflow, stack overflow, or
attempting to access a protected area of memory); a hardware error (such as a
memory parity error or power failure); an I/O completion (which happens when a
disk read is requested and the data transfer is complete); a user interrupt (such as
hitting Ctrl-C or Ctrl-Break to stop a program); or an interrupt from a timer set by
the operating system (such as is necessary when allocating virtual memory or
performing certain bookkeeping functions). All of these have something in common: they interrupt the normal flow of the fetch–decode–execute cycle and tell
the computer to stop what it is currently doing and go do something else. They
are, naturally, called interrupts.
The speed with which a computer processes interrupts plays a key role in
determining the computer’s overall performance. Hardware interrupts can be
generated by any peripheral on the system, including memory, the hard drive,
the keyboard, the mouse, or even the modem. Instead of using interrupts,
processors could poll hardware devices on a regular basis to see if they need
anything done. However, this would waste CPU time as the answer would more
00068_CH04_Null.qxd
220
10/18/10
Chapter 4
12:03 PM
Page 220
/ MARIE: An Introduction to a Simple Computer
Start
Copy the PC to
the MAR
Copy the contents of
memory at address
MAR to IR;
Increment PC by 1
Decode the instruction and
place bits IR[11-0] in
MAR
Yes
Instruction
requires
operand?
No
Copy the contents of
memory at address
MAR to MBR
Execute the
instruction
FIGURE 4.11
The Fetch–Decode–Execute Cycle
often than not be “no.” Interrupts are nice because they let the CPU know the
device needs attention at a particular moment without requiring the CPU to constantly monitor the device. Suppose you need specific information that a friend
has promised to acquire for you. You have two choices: call the friend on a regular schedule (polling) and waste his or her time and yours if the information is
not ready, or wait for a phone call from your friend once the information has
been acquired. You may be in the middle of a conversation with someone else
when the phone call “interrupts” you, but the latter approach is by far the more
efficient way to handle the exchange of information.
Computers also employ software interrupts (also called traps or
exceptions) used by various software applications. Modern computers support
both software and hardware interrupts by using interrupt handlers. These han-
00068_CH04_Null.qxd
10/18/10
12:03 PM
Page 221
4.9 / Instruction Processing
221
dlers are simply routines (procedures) that are executed when their respective
interrupts are detected. The interrupts, along with their associated interrupt service routines (ISRs), are stored in an interrupt vector table.
How do interrupts fit into the fetch–decode–execute cycle? The CPU finishes
execution of the current instruction and checks, at the beginning of every
fetch–decode–execute cycle, to see if an interrupt has been generated, as shown
in Figure 4.12. Once the CPU acknowledges the interrupt, it must then process
the interrupt.
The details of the “Process the Interrupt” block are given in Figure 4.13. This
process, which is the same regardless of what type of interrupt has been invoked,
begins with the CPU detecting the interrupt signal. Before doing anything else,
the system suspends whatever process is executing by saving the program’s state
and variable information. The device ID or interrupt request number of the
device causing the interrupt is then used as an index into the interrupt vector
table, which is kept in very low memory. The address of the interrupt service
routine (known as its address vector) is retrieved and placed into the program
counter, and execution resumes (the fetch–decode–execute cycle begins again)
within the service routine. After the interrupt service has completed, the system
restores the information it saved from the program that was running when the
interrupt occurred, and program execution may resume—unless another interrupt
is detected, whereupon the interrupt is serviced as described.
It is possible to suspend processing of noncritical interrupts by use of a special interrupt mask bit found in the flag register. This is called interrupt masking, and interrupts that can be suspended are called maskable interrupts.
Nonmaskable interrupts cannot be suspended, because to do so, it is possible that
the system would enter an unstable or unpredictable state.
Assembly languages provide specific instructions for working with hardware
and software interrupts. When writing assembly language programs, one of the
Yes
Process the
interrupt
FIGURE 4.12
Has an
interrupt been
issued?
No
Perform fetch–
decode–execute
cycle
Fetch–Decode–Execute Cycle with Interrupt Checking
00068_CH04_Null.qxd
222
10/18/10
Chapter 4
12:03 PM
Page 222
/ MARIE: An Introduction to a Simple Computer
Start
Interrupt
signal
detected
Save
variables and
registers
Look up ISR
address in
interrupt
vector table
Place ISR
address
in PC
Start
Branch
to ISR
Perform work
specific to
interrupt
Restore
saved variables
and registers
Return
Branch to
top of
fetch-decodeexecute cycle
FIGURE 4.13
Processing an Interrupt
00068_CH04_Null.qxd
10/18/10
12:03 PM
Page 223
4.10 / A Simple Program
223
most common tasks is dealing with I/O through software interrupts (see Chapter
7 for additional information on interrupt-driven I/O). Indeed, one of the more
complicated functions for the novice assembly language programmer is reading
input and writing output, specifically because this must be done using interrupts.
MARIE simplifies the I/O process for the programmer by avoiding the use of
interrupts for I/O.
4.9.3
MARIE’s I/O
I/O processing is one of the most challenging aspects of computer system design
and programming. Our model is necessarily simplified, and we provide it at this
point only to complete MARIE’s functionality.
MARIE has two registers to handle input and output. One, the input register,
holds data being transferred from an input device into the computer; the other, the
output register, holds information ready to be sent to an output device. The timing
used by these two registers is very important. For example, if you are entering input
from the keyboard and type very fast, the computer must be able to read each character that is put into the input register. If another character is entered into that register before the computer has a chance to process the current character, the current
character is lost. It is more likely, because the processor is very fast and keyboard
input is very slow, that the processor might read the same character from the input
register multiple times. We must avoid both of these situations.
To get around problems like these, MARIE employs a modified type of programmed I/O (discussed in Chapter 7) that places all I/O under the direct control
of the programmer. MARIE’s output action is simply a matter of placing a value
into the OutREG. This register can be read by an output controller that sends it to
an appropriate output device, such as a terminal display, printer, or disk. For
input, MARIE, being the simplest of simple systems, places the CPU into a wait
state until a character is entered into the InREG. The InREG is then copied to the
accumulator for subsequent processing as directed by the programmer. We
observe that this model provides no concurrency. The machine is essentially idle
while waiting for input. Chapter 7 explains other approaches to I/O that make
more efficient use of machine resources.
4.10
A SIMPLE PROGRAM
We now present a simple program written for MARIE. In Section 4.12, we present several additional examples to illustrate the power of this minimal architecture. It can even be used to run programs with procedures, various looping
constructs, and different selection options.
Our first program adds two numbers together (both of which are found in
main memory), storing the sum in memory. (We forgo I/O for now.)
00068_CH04_Null.qxd
224
10/18/10
Chapter 4
12:03 PM
Page 224
/ MARIE: An Introduction to a Simple Computer
Hex
Address
100
101
102
103
104
105
106
Instruction
Binary Contents of
Memory Address
Hex Contents
of Memory
Load
Add
Store
Halt
0023
FFE9
0000
0001000100000100
0011000100000101
0010000100000110
0111000000000000
0000000000100011
1111111111101001
0000000000000000
1104
3105
2106
7000
0023
FFE9
0000
TABLE 4.3
104
105
106
A Program to Add Two Numbers
Table 4.3 lists an assembly language program to do this, along with its corresponding machine-language program. The list of instructions under the Instruction column constitutes the actual assembly language program. We know that the
fetch–decode–execute cycle starts by fetching the first instruction of the program,
which it finds by loading the PC with the address of the first instruction when the
program is loaded for execution. For simplicity, let’s assume our programs in
MARIE are always loaded starting at address 100 (in hex).
The list of instructions under the Binary Contents of Memory Address column constitutes the actual machine language program. It is often easier for
humans to read hexadecimal as opposed to binary, so the actual contents of memory are displayed in hexadecimal.
This program loads 002316 (or decimal value 35) into the AC. It then adds the
hex value FFE9 (decimal 23) that it finds at address 105. This results in a value
of 12 in the AC. The Store instruction stores this value at memory location 106.
When the program is done, the binary contents of location 106 change to
0000000000001100, which is hex 000C, or decimal 12. Figure 4.14 indicates the
contents of the registers as the program executes.
The last RTN instruction in part c places the sum at the proper memory location. The statement “decode IR[15–12]” simply means the instruction must be
decoded to determine what is to be done. This decoding can be done in software
(using a microprogram) or in hardware (using hardwired circuits). These two concepts are covered in more detail in Section 4.13.
Note that there is a one-to-one correspondence between the assembly language and the machine-language instructions. This makes it easy to convert
assembly language into machine code. Using the instruction tables given in this
chapter, you should be able to hand assemble any of our example programs. For
this reason, we look at only the assembly language code from this point on.
Before we present more programming examples, however, a discussion of the
assembly process is in order.
00068_CH04_Null.qxd
10/18/10
12:03 PM
Page 225
225
4.10 / A Simple Program
(a) Load 104
Step
(initial values)
Fetch
Decode
Get operand
Execute
RTN
MAR
PC
IR
M[MAR]
PC
PC + 1
MAR
IR[11—0]
(Decode IR[15—12])
MBR
M[MAR]
AC
MBR
PC
IR
MAR
MBR
AC
100 —— —— —— —–100 —— 100 —— —–100 1104
100 —— —–101 1104
100 —— —–101 1104
104 —— —–101 1104
104 —— —–101 1104
104
0023 —–101 1104
104
0023
0023
(b) Add 105
Step
(initial values)
Fetch
Decode
Get operand
Execute
RTN
PC
IR
MAR
MBR
AC
MAR
PC
IR
M[MAR]
PC
PC + 1
MAR
IR[11—0]
(Decode IR[15—12])
MBR
M[MAR]
AC
AC + MBR
101
101
101
102
102
102
102
102
1104
1104
3105
3105
3105
3105
3105
3105
104
101
101
101
105
105
105
105
0023
0023
0023
0023
0023
0023
FFE9
FFE9
0023
0023
0023
0023
0023
0023
0023
000C
RTN
PC
IR
MAR
MBR
AC
MAR
PC
IR
M[MAR]
PC
PC + 1
MAR
IR[11—0]
(Decode IR[15—12])
(not necessary)
MBR
AC
M[MAR]
MBR
102
102
102
103
103
103
103
103
103
3105
3105
2106
2106
2106
2106
2106
2106
2106
105
102
102
102
106
106
106
106
106
FFE9
FFE9
FFE9
FFE9
FFE9
FFE9
FFE9
000C
000C
000C
000C
000C
000C
000C
000C
000C
000C
000C
(c) Store 106
Step
(initial values)
Fetch
Decode
Get operand
Execute
FIGURE 4.14
A Trace of the Program to Add Two Numbers
00068_CH04_Null.qxd
226
10/18/10
Chapter 4
12:03 PM
Page 226
/ MARIE: An Introduction to a Simple Computer
4.11 A DISCUSSION ON ASSEMBLERS
In the program shown in Table 4.3, it is a simple matter to convert from the
assembly language instruction Load 104, for example, to the machine language instruction 1104 (in hex). But why bother with this conversion? Why
not just write in machine code? Although it is very efficient for computers to
see these instructions as binary numbers, it is difficult for human beings to
understand and program in sequences of 0s and 1s. We prefer words and symbols over long numbers, so it seems a natural solution to devise a program
that does this simple conversion for us. This program is called an assembler.
4.11.1
What Do Assemblers Do?
An assembler’s job is to convert assembly language (using mnemonics) into
machine language (which consists entirely of binary values, or strings of 0s
and 1s). Assemblers take a programmer’s assembly language program, which
is really a symbolic representation of the binary numbers, and convert it into
binary instructions, or the machine code equivalent. The assembler reads a
source file (assembly program) and produces an object file (the machine
code).
Substituting simple alphanumeric names for the opcodes makes programming
much easier. We can also substitute labels (simple names) to identify or name particular memory addresses, making the task of writing assembly programs even
simpler. For example, in our program to add two numbers, we can use labels to
indicate the memory addresses, thus making it unnecessary to know the exact
memory address of the operands for instructions. Table 4.4 illustrates this concept.
When the address field of an instruction is a label instead of an actual physical
address, the assembler still must translate it into a real, physical address in main
memory. Most assembly languages allow for labels. Assemblers typically specify
formatting rules for their instructions, including those with labels. For example, a
label might be limited to three characters and may also be required to occur as the
first field in the instruction. MARIE requires labels to be followed by a comma.
Labels are nice for programmers. However, they make more work for the
assembler. It must make two passes through a program to do the translation. This
Address
Instruction
100
101
102
103
104
105
106
Load
Add
Store
Halt
0023
FFE9
0000
TABLE 4.4
X,
Y,
Z,
X
Y
Z
An Example Using Labels
00068_CH04_Null.qxd
10/18/10
12:03 PM
Page 227
4.11 / A Discussion on Assemblers
227
means the assembler reads the program twice, from top to bottom each time. On the
first pass, the assembler builds a set of correspondences called a symbol table. For
the above example, it builds a table with three symbols: X, Y, and Z. Because an
assembler goes through the code from top to bottom, it cannot translate the entire
assembly language instruction into machine code in one pass; it does not know
where the data portion of the instruction is located if it is given only a label. But
after it has built the symbol table, it can make a second pass and “fill in the blanks.”
In the above program, the first pass of the assembler creates the following
symbol table:
104
105
106
X
Y
Z
It also begins to translate the instructions. After the first pass, the translated
instructions would be incomplete as follows:
1
3
2
7
0
0
X
Y
Z
0
On the second pass, the assembler uses the symbol table to fill in the addresses
and create the corresponding machine language instructions. Thus, on the second
pass, it would know that X is located at address 104, and would then substitute 104
for the X. A similar procedure would replace the Y and Z, resulting in:
1
3
2
7
1
1
1
0
0
0
0
0
4
5
6
0
Because most people are uncomfortable reading hexadecimal, most assembly
languages allow the data values stored in memory to be specified as binary, hexadecimal, or decimal. Typically, some sort of assembler directive (an instruction
specifically for the assembler that is not supposed to be translated into machine
code) is given to the assembler to specify which base is to be used to interpret the
value. We use DEC for decimal and HEX for hexadecimal in MARIE’s assembly
language. For example, we rewrite the program in Table 4.4 as shown in Table 4.5.
Instead of requiring the actual binary data value (written in HEX), we specify
a decimal value by using the directive DEC. The assembler recognizes this directive and converts the value accordingly before storing it in memory. Again, directives are not converted to machine language; they simply instruct the assembler
in some way.
Another kind of directive common to virtually every programming language
is the comment delimiter. Comment delimiters are special characters that tell the
00068_CH04_Null.qxd
228
10/18/10
Chapter 4
12:03 PM
Page 228
/ MARIE: An Introduction to a Simple Computer
Address
100
101
102
103
104
105
106
TABLE 4.5
X,
Y,
Z,
Instruction
Load
Add
Store
Halt
DEC
DEC
HEX
X
Y
Z
35
–23
0000
An Example Using Directives for Constants
assembler (or compiler) to ignore all text following the special character.
MARIE’s comment delimiter is a front slash (“/”), which causes all text between
the delimiter and the end of the line to be ignored.
4.11.2
Why Use Assembly Language?
Our main objective in presenting MARIE’s assembly language is to give you an
idea of how the language relates to the architecture. Understanding how to program in assembly goes a long way toward understanding the architecture (and
vice versa). Not only do you learn basic computer architecture, but you also can
learn exactly how the processor works and gain significant insight into the particular architecture on which you are programming. There are many other situations
where assembly programming is useful.
Most programmers agree that 10% of the code in a program uses approximately 90% of the CPU time. In time-critical applications, we often need to optimize this 10% of the code. Typically, the compiler handles this optimization for us.
The compiler takes a high-level language (such as C++) and converts it into assembly language (which is then converted into machine code). Compilers have been
around a long time and in most cases they do a great job. Occasionally, however,
programmers must bypass some of the restrictions found in high-level languages
and manipulate the assembly code themselves. By doing this, programmers can
make the program more efficient in terms of time (and space). This hybrid
approach (most of the program written in a high-level language, with part rewritten
in assembly) allows the programmer to take advantage of the best of both worlds.
Are there situations in which entire programs should be written in assembly
language? If the overall size of the program or response time is critical, assembly
language often becomes the language of choice. This is because compilers tend to
obscure information about the cost (in time) of various operations and programmers
often find it difficult to judge exactly how their compiled programs will perform.
Assembly language puts the programmer closer to the architecture and, thus, in
firmer control. Assembly language might actually be necessary if the programmer
wishes to accomplish certain operations not available in a high-level language.
A perfect example, in terms of both response performance and space-critical
design, is found in embedded systems. These are systems in which the computer
is integrated into a device that is typically not a computer. Embedded systems
00068_CH04_Null.qxd
10/18/10
12:03 PM
Page 229
4.12 / Extending Our Instruction Set
229
must be reactive and often are found in time-constrained environments. These
systems are designed to perform either a single instruction or a very specific set
of instructions. Chances are you use some type of embedded system every day.
Consumer electronics (such as cameras, camcorders, cellular phones, PDAs, and
interactive games), consumer products (such as washers, microwave ovens, and
washing machines), automobiles (particularly engine control and antilock
brakes), medical instruments (such as CAT scanners and heart monitors), and
industry (for process controllers and avionics) are just a few of the examples of
where we find embedded systems.
The software for an embedded system is critical. An embedded software program must perform within very specific response parameters and is limited in the
amount of space it can consume. These are perfect applications for assembly language programming. We delve deeper into this topic in Chapter 10.
4.12
EXTENDING OUR INSTRUCTION SET
Even though MARIE’s instruction set is sufficient to write any program we wish,
there are a few instructions we can add to make programming much simpler. We
have 4 bits allocated to the opcode, which implies we can have 16 unique instructions, and we are using only 9 of them. Surely, we can make many programming
tasks much easier by adding a few well-chosen instructions to our instruction set.
Our new instructions are summarized in Table 4.6.
The JnS (jump-and-store) instruction allows us to store a pointer to a return
instruction and then proceeds to set the PC to a different instruction. This enables
us to call procedures and other subroutines, and then return to the calling point in
our code once the subroutine has finished. The Clear instruction moves all 0s
Instruction
Number (hex)
Instruction
Meaning
0
A
B
JnS X
Clear
AddI X
Store the PC at address X and jump to X + 1.
Put all zeros in AC.
Add indirect: Go to address X. Use the value at X
as the actual address of the data
operand to add to AC.
C
JumpI X
Jump indirect: Go to address X. Use the value at X
as the actual address of the location to
jump to.
D
LoadI X
Load indirect: Go to address X. Use the value at
X as the actual address of the operand to
load into the AC.
E
StoreI X
Store indirect: Go to address X. Use the value at
X as the destination address for storing
the value in the accumulator.
TABLE 4.6
MARIE’s Extended Instruction Set
00068_CH04_Null.qxd
230
10/18/10
Chapter 4
12:03 PM
Page 230
/ MARIE: An Introduction to a Simple Computer
into the accumulator. This saves the machine cycles that would otherwise be
expended in loading a 0 operand from memory.
With the AddI, JumpI, LoadI, and StoreI instructions we introduce a different addressing mode. All previous instructions assume the value in the data portion of the instruction is the direct address of the operand required for the
instruction. These instructions use the indirect addressing mode. Instead of using
the value found at location X as the actual address, we use the value found in X as
a pointer to a new memory location that contains the data we wish to use in the
instruction. For example, to execute the instruction AddI 400, we first go to location 400. If we find the value 240 stored at location 400, we would go to location
240 to get the actual operand for the instruction. We have essentially allowed for
pointers in our language, giving us tremendous power to create advanced data
structures and manipulate strings. (We delve more deeply into addressing modes
in Chapter 5)
Our six new instructions are detailed below using register transfer notation.
JnS
MBR ← PC
MAR ← X
M[MAR] ← MBR
MBR ← X
AC ← 1
LoadI X
MBR ← X
MBR ← M[MAR]
MAR ← MBR
MBR ← M[MAR]
AC ← MBR
AC ← AC + MBR
PC ← AC
Clear
AC ← 0
AddI X
MAR ← X
MBR ← M[MAR]
MAR ← MBR
MBR ← M[MAR]
AC ← AC + MBR
StoreI X
MBR ← X
MBR ← M[MAR]
MAR ← MBR
MBR ← AC
M[MAR] ← MBR
JumpI X
MAR ← X
MBR ← M[MAR]
PC ← MBR
Table 4.7 summarizes MARIE’s entire instruction set.
Let’s look at some examples using the full instruction set.
00068_CH04_Null.qxd
10/18/10
12:03 PM
Page 231
4.12 / Extending Our Instruction Set
231
EXAMPLE 4.1 Here is an example using a loop to add five numbers:
Address
100
101
102
103
104
105
106
107
108
109
10A
10B
10C
10D
10E
10F
110
111
112
113
114
115
116
117
118
119
11A
11B
Loop,
Addr,
Next,
Num,
Sum,
Ctr,
One,
Instruction
Load
Addr
Store
Next
Load
Num
Subt
One
Store
Ctr
Load
Sum
AddI
Next
Store
Sum
Load
Next
Add
One
Store
Next
Load
Ctr
Subt
One
Store
Ctr
Skipcond 000
Jump
Halt
Hex
Hex
Dec
Dec
Hex
Dec
Dec
Dec
Dec
Dec
Dec
Loop
117
0
5
0
0
1
10
15
20
25
30
/Load address of first number to be added
/Store this address as our Next pointer
/Load the number of items to be added
/Decrement
/Store this value in Ctr to control looping
/Load the Sum into AC
/Add the value pointed to by location Next
/Store this sum
/Load Next
/Increment by one to point to next address
/Store in our pointer Next
/Load the loop control variable
/Subtract one from the loop control variable
/Store this new value in loop control variable
/If control variable 0 then PC
PC + 1
1001
Jump X
PC
IR[11—0]
1010
Clear
AC
0
1011
AddI X
MAR
MBR
MAR
MBR
AC
X
M[MAR]
MBR
M[MAR]
AC + MBR
1100
JumpI X
MAR
MBR
PC
X
M[MAR]
MBR
1101
LoadI X
MAR
MBR
MAR
MBR
AC
X
M[MAR]
MBR
M[MAR]
MBR
1110
StoreI X
MAR
MBR
MAR
MBR
M[MAR]
TABLE 4.7
X
M[MAR]
MBR
AC
AC
X
M[MAR]
MBR
AC
MBR
MARIE’s Full Instruction Set
00068_CH04_Null.qxd
10/18/10
12:03 PM
Page 233
4.12 / Extending Our Instruction Set
233
Example 4.2 shows how you can use the Skipcond and Jump instructions to perform selection. Although this example illustrates an if/else construct, you can easily
modify this to perform an if/then structure, or even a case (or switch) structure.
EXAMPLE 4.2 This example illustrates the use of an if/else construct to
allow for selection. In particular, it implements the following:
if X = Y then
X = X 2
else
Y = Y X;
Address
100
101
102
103
104
105
106
107
108
109
10A
10B
10C
10D
If,
Then,
Else,
Instruction
Load
X
Subt
Y
Skipcond 400
Jump
Else
Load
X
Add
X
Store
X
Jump
Endif
Load
Subt
Store
Endif, Halt
X,
Dec
Y,
Dec
Y
X
Y
12
20
/Load the first value
/Subtract the value of Y, store result in AC
/If AC = 0, skip the next instruction
/Jump to Else part if AC is not equal to 0
/Reload X so it can be doubled
/Double X
/Store the new value
/Skip over the false, or else, part to end of
/if
/Start the else part by loading Y
/Subtract X from Y
/Store Y – X in Y
/Terminate program (it doesn’t do much!)
/Load the loop control variable
/Subtract one from the loop control variable
EXAMPLE 4.3 This program demonstrates the use of indirect addressing to
traverse and output a string. The string is terminated with a null.
Address
100
101
102
103
104
105
106
107
108
109
10A
Instruction
Getch, LoadI
Chptr
Skipcond 400
Jump
Outp
Halt
Outp,
Output
Load
Chptr
Add
One
Store
Chptr
Jump
Getch
One,
Hex
0001
Chptr, Hex
10B
/ Load the character found at address Chptr.
/ If AC = 0, skip next instruction.
/ Otherwise, proceed with operation.
/ Output the character.
/ Move pointer to next character.
/ Process next character.
/ Pointer to “current” character.
00068_CH04_Null.qxd
234
10B
10C
10D
10E
10F
110
111
112
113
114
115
116
117
END
10/18/10
Chapter 4
12:03 PM
Page 234
/ MARIE: An Introduction to a Simple Computer
String, Dec
Dec
Dec
Dec
Dec
Dec
Dec
Dec
Dec
Dec
Dec
Dec
Dec
072
101
108
108
111
032
119
111
114
108
100
033
000
/ H / String definition starts here.
/ e
/ l
/ l
/ o
/ [space]
/ w
/ o
/ r
/ l
/ d
/ !
/ [null]
Example 4.3 demonstrates the use of the LoadI and StoreI instructions by
printing a string. Readers who understand the C and C++ programming languages
will recognize the pattern: We start by declaring the memory location of the first
character of the string and read it until we find a null character. Once the LoadI
instruction places a null in the accumulator, Skipcond 400 evaluates to true, and
the Halt instruction is executed. You will notice that to process each character of
the string, we increment the “current character” pointer, Chptr, so that it points to
the next character to print.
Example 4.4 demonstrates how JnS and JumpI are used to allow for subroutines.
This program includes an END statement, another example of an assembler directive.
This statement tells the assembler where the program ends. Other potential directives
include statements to let the assembler know where to find the first program instruction, how to set up memory, and whether blocks of code are procedures.
EXAMPLE 4.4 This example illustrates the use of a simple subroutine to double any
number and can be coded.
Address
100
101
102
103
104
105
106
107
108
109
10A
10B
X,
Y,
Temp,
Instruction
Load
X
Store
Temp
JnS
Subr
Store
X
Load
Y
Store
Temp
JnS
Subr
Store
Y
Halt
Dec
20
Dec
48
Dec
0
/Load the first number to be doubled
/Use Temp as a parameter to pass value to Subr
/Store return address, jump to procedure
/Store first number, doubled
/Load the second number to be doubled
/Use Temp as a parameter to pass value to Subr
/Store return address, jump to procedure
/Store second number, doubled
/End program
00068_CH04_Null.qxd
10/18/10
12:03 PM
Page 235
4.13 / A Discussion on Decoding: Hardwired versus Microprogrammed Control
10C
10D
10E
10F
Subr,
Hex
Load
Add
JumpI
END
0
Temp
Temp
Subr
235
/Store return address here
/Subroutine to double numbers
Note: Line numbers in program are given for information only and are not used in the MarieSim environment.
Using MARIE’s simple instruction set, you should be able to implement any
high-level programming language construct, such as loop statements and while
statements. These are left as exercises at the end of the chapter.
4.13
A DISCUSSION ON DECODING: HARDWIRED VERSUS
MICROPROGRAMMED CONTROL
How does the control unit really function? We have done some hand waving and
simply assumed everything works as described, with a basic understanding that, for
each instruction, the control unit causes the CPU to execute a sequence of steps correctly. In reality, there must be control signals to assert lines on various digital components to make things happen as described (recall the various digital components
from Chapter 3). For example, when we perform an Add instruction in MARIE in
assembly language, we assume the addition takes place because the control signals
for the ALU are set to “add” and the result is put into the AC. The ALU has various
control lines that determine which operation to perform. The question we need to
answer is, “How do these control lines actually become asserted?”
There are two methods by which control lines can be set. The first approach,
hardwired control, directly connects the control lines to the actual machine
instructions. The instructions are divided into fields, and bits in the fields are connected to input lines that drive various digital logic components. The second
approach, microprogrammed control, employs software consisting of microinstructions that carry out an instruction’s microoperations. We look at both of these
control methods in more detail after we describe machine control in general.
4.13.1
Machine Control
In Sections 4.8 and 4.12, we provided register transfer language for each of MARIE’s
instructions. The microoperations described by the register transfer language actually
define the operation of the control unit. Each microoperation is associated with a distinctive signal pattern. The signals are fed to combinational circuits within the control
unit that carry out the logical operations appropriate to the instruction.
A schematic of MARIE’s data path is shown in Figure 4.9. We see that each
register and main memory has an address (0 through 7) along the datapath. These
addresses, in the form of signal patterns, are used by the control unit to enable the
flow of bytes through the system. For the sake of example, we define two sets of
signals: P2, P1, P0 that can enable reading from memory or a register and P5, P4,
00068_CH04_Null.qxd
236
10/18/10
Chapter 4
12:03 PM
Page 236
/ MARIE: An Introduction to a Simple Computer
P3 that can enable writing to a register or memory. The control lines that convey
these signals are connected to registers through combinational logic circuits.
A close-up view of the connection of MARIE’s MBR (with address 3) to the
datapath is shown in Figure 4.15. You can see how this register is enabled for
reading when signals P1 and P0 are high, and writing to the MBR is enabled when
signals P4 and P3 are high. (Note that these signals correspond to the binary string
of the address of the MBR, 0112.) No other signal pattern is recognized by this
register’s circuits. (The combinational logic that enables the other entities on the
datapath is left as an exercise.)
If you study MARIE’s instruction set, you will see that the ALU has only
three operations: add, subtract, and clear. We also need to consider the case where
the ALU is not involved in an instruction, so we’ll define “do nothing” as a fourth
ALU state. Thus, with only four operations, MARIE’s ALU can be controlled
D15 D14
D0
D15 D14
D Q
P5 P4 P3
FIGURE 4.15
D Q
16
D Flip-Flops
Control Unit
D Q
P2 P1 P0
Connection of MARIE’s MBR to the Datapath
D0
00068_CH04_Null.qxd
10/18/10
12:03 PM
Page 237
4.13 / A Discussion on Decoding: Hardwired versus Microprogrammed Control
237
ALU Control
Signals
A0
A1
ALU Response
0
1
0
1
0
0
1
1
Do Nothing
AC d AC + MBR
AC d AC – MBR
AC d 0 (Clear)
TABLE 4.8
Caption to come
using only two control signals that we’ll call A0 and A1. These control signals and
the ALU response are given in Table 4.8.
A computer’s clock sequences microoperations by raising the right signals at
the right time. MARIE’s instructions vary in the number of clock cycles each
requires. The activities taking place during each clock cycle are coordinated with
signals from a cycle counter. One way of doing this is to connect the clock to a
synchronous counter, and the counter to a decoder. Suppose that the largest number of clock cycles required by any instruction is eight. Then we need a 3-bit
counter and a 3 8 decoder. The output of the decoder, signals T0 through T7, is
ANDed with combinational components and registers to produce the behavior
required by the instruction. If fewer than eight clock cycles are needed for an
instruction, the cycle counter reset signal, Cr, is asserted to get ready for the next
machine instruction.
To pull all of this together, consider MARIE’s Add instruction. The RTN is:
MAR ← X
MBR ← M[MAR]
AC ← AC + MBR
After the Add instruction is fetched, X is in the rightmost 12 bits of the IR and
the IR’s datapath address is 7, so we need to raise all three datapath read signals,
P2 P1 P0, to place IR bits 0 through 11 on the bus. The MAR, with an address of 1,
is activated for writing by raising only P3. Using the signals as we have just
defined them, we can now add the signal patterns to our RTN as follows:
P2P1P0P3T0:
MAR ← X
P4P3T1:
MBR ← M[MAR]
A0P1P0P5T2:
CrT3:
AC ← AC + MBR
[Reset the clock cycle counter.]
All signals, except for data signals (D0. . .D15), are assumed to be low unless
specified in the RTN. Figure 4.16 is a timing diagram that illustrates the sequence
of signal patterns just described. As you can see, at clock cycle C0, all signals
00068_CH04_Null.qxd
238
10/18/10
Chapter 4
12:03 PM
Page 238
/ MARIE: An Introduction to a Simple Computer
C0
C1
C2
C3
T0
T1
T2
T3
P0
P1
P2
P3
P4
P5
A0
Cr
FIGURE 4.16
Timing Diagram for the Microoperations of MARIE’s Add
Instruction
except P0, P1, P2, P3, and T0 are low. Enabling P0, P1, and P2 allows the IR to be
read from, and asserting P3 allows the MAR to be written to. This action occurs
only when T0 is asserted. At clock cycle C1, all signals except P3, P4, and T1 are
low. This machine state, occurring at time T1, connects the bytes read from main
memory (address zero) to the inputs on the MBR. The last microinstruction of the
Add sequence occurs at clock cycle T3, when the timing signals are reset to 0.
4.13.2
Hardwired Control
There are three essential components common to all hardwired control units: the
instruction decoder, the cycle counter, and the control matrix. Depending on the
complexity of the system, specialized registers and sets of status flags may be
00068_CH04_Null.qxd
10/18/10
12:03 PM
Page 239
4.13 / A Discussion on Decoding: Hardwired versus Microprogrammed Control
239
Instruction Register
Instruction Decoder
•••
•••
Control Matrix
(Combinational Circuit)
•••
Cycle Counter
Input from system bus
(such as interrupts)
•••
Input from clock
Input from status/
flag registers
•••
Control Signals
(These signals go to registers,
the datapath, and the ALU.)
FIGURE 4.17
Hardwired Control Unit
provided as well. Figure 4.17 illustrates a simplified control unit. Let us look at it
in detail.
The first essential component is the instruction decoder. Its job is to raise the
unique output signal corresponding to the opcode in the instruction register. If we
have a four-bit opcode, the instruction decoder could have as many as 16 output
signal lines. (Why?) A partial decoder for MARIE’s instruction set is shown in
Figure 4.18.
The next important component is the control unit’s cycle counter. It raises a
single, distinct timing signal, T0, T1, T2, …, Tn, for each tick of the system clock.
After Tn is reached, the counter cycles back to T0. The maximum number of
microoperations required to carry out any of the instructions in the instruction set
determines the number of distinct signals (i.e., the value of n in Tn). MARIE’s
timer needs to count only up to 7 (T0 through T6) to accommodate the JnS
instruction. (You can verify this statement with a close inspection of Table 4.7.)
00068_CH04_Null.qxd
240
10/18/10
Chapter 4
12:03 PM
Page 240
/ MARIE: An Introduction to a Simple Computer
Opcode
Operand
Instruction
Register
JnS
Load
Store
Add
FIGURE 4.18
Partial Instruction Decoder for MARIE’s Instruction Set
The sequential logic circuit that provides a repeated series of timing signals is
called a ring counter. Figure 4.19 shows one implementation of a ring counter
using D flip-flops. Initially, all of the flip-flop inputs are low except for the input
to D0 (because of the inverted OR gate on the other outputs). Thus, in the
counter’s initial state, output T0 is energized. At the next tick of the clock, the
output of D0 goes high, causing the input of D0 to go low (because of the inverted
OR gate). T0 turns off and T1 turns on. As you can readily see, we have effectively moved a “timing bit” from D0 to D1. This bit circulates through the ring of
flip-flops until it reaches Dn, unless the ring is first reset by way of the clock reset
signal, Cr.
Signals from the counter and instruction decoder are combined within the
control matrix to produce the series of signals that result in the execution of
microoperations involv…
Purchase answer to see full
attachment