Our Services

Get 15% Discount on your First Order

[rank_math_breadcrumb]

Fundamentals of Database IT403

Description

  • You must submit two separate copies (one Word file and one PDF file) using the Assignment Template on
  • Blackboard via the allocated folder. These files must not be in compressed format.

  • It is your responsibility to check and make sure that you have uploaded both the correct files.
  • Zero mark will be given if you try to bypass the SafeAssign (e.g. misspell words, remove spaces between
  • words, hide characters, use different character sets, convert text into image or languages other than English

    or any kind of manipulation).

  • Email submission will not be accepted.
  • You are advised to make your work clear and well-presented. This includes filling your information on the cover
  • page.

  • You must use this template, failing which will result in zero mark.
  • You MUST show all your work, and text must not be converted into an image, unless specified otherwise by
  • the question.

  • Late submission will result in ZERO mark.
  • The work should be your own, copying from students or other resources will result in ZERO mark.
  • Use Times New Roman font for all your answers.
  • Textbook : Ramez Elmasri & Shamkant Navathe (2017), Fundamentals of Database Systems, Global Edition, 7th Edition, Person, ISBN-13: 978-0133970777 | ISBN-10: 0133970779.
  • College of Computing and Informatics

    Assignment 2
    Deadline: 27/11/2024 @ 23:59
    [Total Mark for this Assignment is 8]
    Student Details:
    Name: ###

    ID: ###

    CRN: ###
    Instructions:

    • You must submit two separate copies (one Word file and one PDF file) using the Assignment Template on
    Blackboard via the allocated folder. These files must not be in compressed format.

    • It is your responsibility to check and make sure that you have uploaded both the correct files.
    • Zero mark will be given if you try to bypass the SafeAssign (e.g. misspell words, remove spaces between
    words, hide characters, use different character sets, convert text into image or languages other than English
    or any kind of manipulation).

    • Email submission will not be accepted.
    • You are advised to make your work clear and well-presented. This includes filling your information on the cover
    page.

    • You must use this template, failing which will result in zero mark.
    • You MUST show all your work, and text must not be converted into an image, unless specified otherwise by
    the question.

    • Late submission will result in ZERO mark.
    • The work should be your own, copying from students or other resources will result in ZERO mark.
    • Use Times New Roman font for all your answers.
    Restricted – ‫مقيد‬

    Question One

    Pg. 01
    Learning
    Outcome(s):

    Question One

    CLO 3

    Consider the following EMPLOYEE relation from a company database. Write

    Create EntityRelationship
    model, Relational
    model, and write
    SQL queries

    2 Marks

    SQL queries to answer the following questions:
    EmpID

    Name

    Salary

    Gender

    Address

    123

    Sarah

    50000

    Female

    Jeddah

    234

    Mohamed

    45000

    Male

    Riyadh

    376

    Ahmed

    70000

    Male

    Jeddah

    498

    Maryam

    10000

    Female

    Jeddah

    555

    Muna

    30000

    Female

    Riyadh

    1- Retrieve the count of Female Employees.
    2- Retrieve the average Salary of all Employees.
    3- Delete the record of the Employee named “Muna”.
    4- Insert a new employee into the table.

    Restricted – ‫مقيد‬

    Question Two

    Pg. 02
    Learning
    Outcome(s):

    Question Two

    2 Marks

    Answer the following questions by referring to the tables below:
    CLO 3: Create
    EntityRelationship
    model, Relational
    model, and write
    SQL queries.

    COURSE
    Course_name

    Course_number

    Credit_hours

    Department

    Intro to Computer Science

    CS310

    4

    CS

    System Analysis and
    Design

    IT353

    4

    IT

    Discrete Mathematics

    MATH410

    3

    MATH

    IT403

    3

    IT

    Fundamentals of
    Database

    SECTION
    Section_ID
    80

    Course_no
    MATH410

    Semester
    Fall

    Year
    23

    Instructor
    Sara

    95
    103

    CS310
    IT353

    Fall
    Fall

    23
    24

    Ali
    Khaled

    115

    MATH410

    Fall

    24

    Ahmed

    120
    140

    CS310
    IT403

    Spring
    Fall

    24
    24

    Ali
    Eman

    a) Create a virtual table called (COURSE_ON_SEC) that summarize courses
    information along with its section and instructor, listed alphabetically by
    course name.
    b) Retrieve the section ID, course name, course number, department, and
    instructor of the courses taught by instructor “Ali” from
    COURSE_ON_SEC table.

    Restricted – ‫مقيد‬

    Question Three

    Pg. 03
    Learning
    Outcome(s):
    CLO4 – Design a
    database starting

    Question Three

    2 Marks

    You are given a table with unnormalized design. You are required to transform the
    provided table into First Normal Form (1NF), and Second Normal Form (2NF). For
    each step, provide the new table(s) and explain the changes made.

    from the
    conceptual design

    Order_ID

    Supplier_Name

    Products_Info

    Order_Date

    001

    Amazon

    P1, product101,
    product102

    01-11-2024

    002

    Noon

    P2, product201,
    product202,
    product203

    04-11-2024

    003

    Alim

    P3, product301

    10-11-2024

    to the
    implementation of
    database
    schemas

    Restricted – ‫مقيد‬

    Question Four

    Pg. 04
    Learning
    Outcome(s):
    CLO3:

    Question Four

    2 Marks

    Using join:

    write an SQL query to retrieve the names and addresses of all employees who work in
    Research departments.

    Create EntityRelationship
    model, Relational
    model, and write
    SQL queries.

    Restricted – ‫مقيد‬

    Write an SQL query to retrieve the names of all employees who have a higher salary
    than their manager. Use a self-join to compare the salaries.

    FUNDAMENTALS OF

    Database
    Systems
    SEVENTH EDITION

    This page intentionally left blank

    FUNDAMENTALS OF

    Database
    Systems
    SEVENTH EDITION

    Ramez Elmasri
    Department of Computer Science and Engineering
    The University of Texas at Arlington

    Shamkant B. Navathe
    College of Computing
    Georgia Institute of Technology

    Boston Columbus Indianapolis New York San Francisco Hoboken
    Amsterdam Cape Town Dubai London Madrid Milan Munich Paris Montreal Toronto
    Delhi Mexico City São Paulo Sydney Hong Kong Seoul Singapore Taipei Tokyo

    Vice President and Editorial Director, ECS:
    Marcia J. Horton
    Acquisitions Editor: Matt Goldstein
    Editorial Assistant: Kelsey Loanes
    Marketing Managers: Bram Van Kempen, Demetrius Hall
    Marketing Assistant: Jon Bryant
    Senior Managing Editor: Scott Disanno
    Production Project Manager: Rose Kernan
    Program Manager: Carole Snyder
    Global HE Director of Vendor Sourcing
    and Procurement: Diane Hynes
    Director of Operations: Nick Sklitsis

    Operations Specialist: Maura Zaldivar-Garcia
    Cover Designer: Black Horse Designs
    Manager, Rights and Permissions: Rachel Youdelman
    Associate Project Manager, Rights and Permissions:
    Timothy Nicholls
    Full-Service Project Management: Rashmi Tickyani,
    iEnergizer Aptara®, Ltd.
    Composition: iEnergizer Aptara®, Ltd.
    Printer/Binder: Edwards Brothers Malloy
    Cover Printer: Phoenix Color/Hagerstown
    Cover Image: Micha Pawlitzki/Terra/Corbis
    Typeface: 10.5/12 Minion Pro

    Copyright © 2016, 2011, 2007 by Ramez Elmasri and Shamkant B. Navathe. All rights reserved. Manufactured
    in the United States of America. This publication is protected by Copyright and permissions should be obtained
    from the publisher prior to any prohibited reproduction, storage in a retrieval system, or transmission in any
    form or by any means, electronic, mechanical, photocopying, recording, or likewise. To obtain permission(s) to
    use materials from this work, please submit a written request to Pearson Higher Education, Permissions
    Department, 221 River Street, Hoboken, NJ 07030.
    Many of the designations by manufacturers and seller to distinguish their products are claimed as trademarks.
    Where those designations appear in this book, and the publisher was aware of a trademark claim, the designations
    have been printed in initial caps or all caps.
    The author and publisher of this book have used their best efforts in preparing this book. These efforts include
    the development, research, and testing of theories and programs to determine their effectiveness. The author and
    publisher make no warranty of any kind, expressed or implied, with regard to these programs or the
    documentation contained in this book. The author and publisher shall not be liable in any event for incidental or
    consequential damages with, or arising out of, the furnishing, performance, or use of these programs.
    Microsoft and/or its respective suppliers make no representations about the suitability of the information
    contained in the documents and related graphics published as part of the services for any purpose. All such
    documents and related graphics are provided “as is” without warranty of any kind. Microsoft and/or its respective
    suppliers hereby disclaim all warranties and conditions with regard to this information, including all warranties
    and conditions of merchantability. Whether express, implied or statutory, fitness for a particular purpose, title
    and non-infringement. In no event shall microsoft and/or its respective suppliers be liable for any special,
    indirect or consequential damages or any damages whatsoever resulting from loss of use, data or profits, whether
    in an action of contract. Negligence or other tortious action, arising out of or in connection with the use or
    performance of information available from the services.
    The documents and related graphics contained herein could include technical inaccuracies or typographical
    errors. Changes are periodically added to the information herein. Microsoft and/or its respective suppliers may
    make improvements and/or changes in the product(s) and/or the program(s) described herein at any time.
    Partial screen shots may be viewed in full within the software version specified.
    Library of Congress Cataloging-in-Publication Data on File

    10 9 8 7 6 5 4 3 2 1
    ISBN-10:
    0-13-397077-9
    ISBN-13: 978-0-13-397077-7

    To Amalia
    and
    to Ramy, Riyad, Katrina, and Thomas
    R. E.
    To my wife Aruna for her love, support, and understanding
    and
    to Rohan, Maya, and Ayush for bringing so much joy into our lives
    S.B.N.

    This page intentionally left blank

    Preface

    T

    his book introduces the fundamental concepts
    necessary for designing, using, and implementing
    database systems and database applications. Our presentation stresses the fundamentals of database modeling and design, the languages and models provided by the
    database management systems, and database system implementation techniques.
    The book is meant to be used as a textbook for a one- or two-semester course in
    database systems at the junior, senior, or graduate level, and as a reference book. Our
    goal is to provide an in-depth and up-to-date presentation of the most important
    aspects of database systems and applications, and related technologies. We assume
    that readers are familiar with elementary programming and data-structuring concepts and that they have had some exposure to the basics of computer organization.

    New to This Edition
    The following key features have been added in the seventh edition:

    A reorganization of the chapter ordering (this was based on a survey of the
    instructors who use the textbook); however, the book is still organized so
    that the individual instructor can choose to follow the new chapter ordering
    or choose a different ordering of chapters (for example, follow the chapter
    order from the sixth edition) when presenting the materials.
    ■ There are two new chapters on recent advances in database systems and big
    data processing; one new chapter (Chapter 24) covers an introduction to the
    newer class of database systems known as NOSQL databases, and the other
    new chapter (Chapter 25) covers technologies for processing big data,
    including MapReduce and Hadoop.
    ■ The chapter on query processing and optimization has been expanded and
    reorganized into two chapters; Chapter 18 focuses on strategies and algorithms for query processing whereas Chapter 19 focuses on query optimization techniques.
    ■ A second UNIVERSITY database example has been added to the early chapters (Chapters 3 through 8) in addition to our COMPANY database example
    from the previous editions.
    ■ Many of the individual chapters have been updated to varying degrees to include
    newer techniques and methods; rather than discuss these enhancements here,
    vii

    viii

    Preface

    we will describe them later in the preface when we discuss the organization of
    the seventh edition.
    The following are key features of the book:

    A self-contained, flexible organization that can be tailored to individual
    needs; in particular, the chapters can be used in different orders depending
    on the instructor’s preference.
    ■ A companion website (
    includes data to be loaded into various types of relational databases for more
    realistic student laboratory exercises.
    ■ A dependency chart (shown later in this preface) to show which chapters
    depend on other earlier chapters; this can guide the instructor who wants to
    tailor the order of presentation of the chapters.
    ■ A collection of supplements, including a robust set of materials for instructors and students such as PowerPoint slides, figures from the text, and an
    instructor’s guide with solutions.

    Organization and Contents of the Seventh Edition
    There are some organizational changes in the seventh edition as well as improvement to the individual chapters. The book is now divided into 12 parts as follows:

    Part 1 (Chapters 1 and 2) describes the basic introductory concepts necessary for a good understanding of database models, systems, and languages.
    Chapters 1 and 2 introduce databases, typical users, and DBMS concepts,
    terminology, and architecture, as well as a discussion of the progression of
    database technologies over time and a brief history of data models. These
    chapters have been updated to introduce some of the newer technologies
    such as NOSQL systems.

    Part 2 (Chapters 3 and 4) includes the presentation on entity-relationship
    modeling and database design; however, it is important to note that instructors can cover the relational model chapters (Chapters 5 through 8) before
    Chapters 3 and 4 if that is their preferred order of presenting the course
    materials. In Chapter 3, the concepts of the Entity-Relationship (ER) model
    and ER diagrams are presented and used to illustrate conceptual database
    design. Chapter 4 shows how the basic ER model can be extended to incorporate additional modeling concepts such as subclasses, specialization, generalization, union types (categories) and inheritance, leading to the
    enhanced-ER (EER) data model and EER diagrams. The notation for the class
    diagrams of UML are also introduced in Chapters 7 and 8 as an alternative
    model and diagrammatic notation for ER/EER diagrams.

    Part 3 (Chapters 5 through 8) includes a detailed presentation on relational
    databases and SQL with some additional new material in the SQL chapters
    to cover a few SQL constructs that were not in the previous edition. Chapter 5

    Preface

    describes the basic relational model, its integrity constraints, and update
    operations. Chapter 6 describes some of the basic parts of the SQL standard
    for relational databases, including data definition, data modification operations, and simple SQL queries. Chapter 7 presents more complex SQL queries, as well as the SQL concepts of triggers, assertions, views, and schema
    modification. Chapter 8 describes the formal operations of the relational
    algebra and introduces the relational calculus. The material on SQL (Chapters 6 and 7) is presented before our presentation on relational algebra and
    calculus in Chapter 8 to allow instructors to start SQL projects early in a
    course if they wish (it is possible to cover Chapter 8 before Chapters 6 and 7
    if the instructor desires this order). The final chapter in Part 2, Chapter 9,
    covers ER- and EER-to-relational mapping, which are algorithms that can be
    used for designing a relational database schema from a conceptual ER/EER
    schema design.

    Part 4 (Chapters 10 and 11) are the chapters on database programming techniques; these chapters can be assigned as reading materials and augmented
    with materials on the particular language used in the course for programming projects (much of this documentation is readily available on the Web).
    Chapter 10 covers traditional SQL programming topics, such as embedded
    SQL, dynamic SQL, ODBC, SQLJ, JDBC, and SQL/CLI. Chapter 11 introduces
    Web database programming, using the PHP scripting language in our examples, and includes new material that discusses Java technologies for Web
    database programming.

    Part 5 (Chapters 12 and 13) covers the updated material on object-relational
    and object-oriented databases (Chapter 12) and XML (Chapter 13); both of
    these chapters now include a presentation of how the SQL standard incorporates object concepts and XML concepts into more recent versions of the
    SQL standard. Chapter 12 first introduces the concepts for object databases,
    and then shows how they have been incorporated into the SQL standard in
    order to add object capabilities to relational database systems. It then covers
    the ODMG object model standard, and its object definition and query languages. Chapter 13 covers the XML (eXtensible Markup Language) model
    and languages, and discusses how XML is related to database systems. It
    presents XML concepts and languages, and compares the XML model to
    traditional database models. We also show how data can be converted
    between the XML and relational representations, and the SQL commands
    for extracting XML documents from relational tables.

    Part 6 (Chapters 14 and 15) are the normalization and relational design
    theory chapters (we moved all the formal aspects of normalization algorithms to Chapter 15). Chapter 14 defines functional dependencies, and
    the normal forms that are based on functional dependencies. Chapter 14
    also develops a step-by-step intuitive normalization approach, and includes
    the definitions of multivalued dependencies and join dependencies.
    Chapter 15 covers normalization theory, and the formalisms, theories,

    ix

    x

    Preface

    and algorithms developed for relational database design by normalization, including the relational decomposition algorithms and the relational
    synthesis algorithms.
    ■ Part 7 (Chapters 16 and 17) contains the chapters on file organizations on
    disk (Chapter 16) and indexing of database files (Chapter 17). Chapter 16
    describes primary methods of organizing files of records on disk, including
    ordered (sorted), unordered (heap), and hashed files; both static and
    dynamic hashing techniques for disk files are covered. Chapter 16 has been
    updated to include materials on buffer management strategies for DBMSs as
    well as an overview of new storage devices and standards for files and modern storage architectures. Chapter 17 describes indexing techniques for files,
    including B-tree and B+-tree data structures and grid files, and has been
    updated with new examples and an enhanced discussion on indexing,
    including how to choose appropriate indexes and index creation during
    physical design.

    Part 8 (Chapters 18 and 19) includes the chapters on query processing algorithms (Chapter 18) and optimization techniques (Chapter 19); these two
    chapters have been updated and reorganized from the single chapter that
    covered both topics in the previous editions and include some of the newer
    techniques that are used in commercial DBMSs. Chapter 18 presents algorithms for searching for records on disk files, and for joining records from
    two files (tables), as well as for other relational operations. Chapter 18 contains new material, including a discussion of the semi-join and anti-join
    operations with examples of how they are used in query processing, as well
    as a discussion of techniques for selectivity estimation. Chapter 19 covers
    techniques for query optimization using cost estimation and heuristic rules;
    it includes new material on nested subquery optimization, use of histograms,
    physical optimization, and join ordering methods and optimization of
    typical queries in data warehouses.

    Part 9 (Chapters 20, 21, and 22) covers transaction processing concepts;
    concurrency control; and database recovery from failures. These chapters
    have been updated to include some of the newer techniques that are used
    in some commercial and open source DBMSs. Chapter 20 introduces the
    techniques needed for transaction processing systems, and defines the
    concepts of recoverability and serializability of schedules; it has a new section on buffer replacement policies for DBMSs and a new discussion on
    the concept of snapshot isolation. Chapter 21 gives an overview of the various types of concurrency control protocols, with a focus on two-phase
    locking. We also discuss timestamp ordering and optimistic concurrency
    control techniques, as well as multiple-granularity locking. Chapter 21
    includes a new presentation of concurrency control methods that are based
    on the snapshot isolation concept. Finally, Chapter 23 focuses on database
    recovery protocols, and gives an overview of the concepts and techniques
    that are used in recovery.

    Preface

    Part 10 (Chapters 23, 24, and 25) includes the chapter on distributed databases (Chapter 23), plus the two new chapters on NOSQL storage systems
    for big data (Chapter 24) and big data technologies based on Hadoop and
    MapReduce (Chapter 25). Chapter 23 introduces distributed database
    concepts, including availability and scalability, replication and fragmentation of data, maintaining data consistency among replicas, and many other
    concepts and techniques. In Chapter 24, NOSQL systems are categorized
    into four general categories with an example system in each category used
    for our examples, and the data models, operations, as well as the replication/distribution/scalability strategies of each type of NOSQL system are
    discussed and compared. In Chapter 25, the MapReduce programming
    model for distributed processing of big data is introduced, and then we
    have presentations of the Hadoop system and HDFS (Hadoop Distributed
    File System), as well as the Pig and Hive high-level interfaces, and the
    YARN architecture.
    ■ Part 11 (Chapters 26 through 29) is entitled Advanced Database Models,
    Systems, and Applications and includes the following materials: Chapter 26
    introduces several advanced data models including active databases/triggers (Section 26.1), temporal databases (Section 26.2), spatial databases (Section 26.3), multimedia databases (Section 26.4), and deductive
    databases (Section 26.5). Chapter 27 discusses information retrieval (IR)
    and Web search, and includes topics such as IR and keyword-based search,
    comparing DB with IR, retrieval models, search evaluation, and ranking
    algorithms. Chapter 28 is an introduction to data mining including overviews of various data mining methods such as associate rule mining, clustering, classification, and sequential pattern discovery. Chapter 29 is an
    overview of data warehousing including topics such as data warehousing
    models and operations, and the process of building a data warehouse.
    ■ Part 12 (Chapter 30) includes one chapter on database security, which
    includes a discussion of SQL commands for discretionary access control
    (GRANT, REVOKE), as well as mandatory security levels and models for
    including mandatory access control in relational databases, and a discussion
    of threats such as SQL injection attacks, as well as other techniques and
    methods related to data security and privacy.
    Appendix A gives a number of alternative diagrammatic notations for displaying a
    conceptual ER or EER schema. These may be substituted for the notation we use, if
    the instructor prefers. Appendix B gives some important physical parameters of
    disks. Appendix C gives an overview of the QBE graphical query language, and
    Appendixes D and E (available on the book’s Companion Website located at
    cover legacy database systems, based on
    the hierarchical and network database models. They have been used for more than
    thirty years as a basis for many commercial database applications and transactionprocessing systems.

    xi

    xii

    Preface

    Guidelines for Using This Book
    There are many different ways to teach a database course. The chapters in Parts 1
    through 7 can be used in an introductory course on database systems in the order
    that they are given or in the preferred order of individual instructors. Selected chapters and sections may be left out and the instructor can add other chapters from the
    rest of the book, depending on the emphasis of the course. At the end of the opening section of some of the book’s chapters, we list sections that are candidates for
    being left out whenever a less-detailed discussion of the topic is desired. We suggest
    covering up to Chapter 15 in an introductory database course and including selected
    parts of other chapters, depending on the background of the students and the
    desired coverage. For an emphasis on system implementation techniques, chapters
    from Parts 7, 8, and 9 should replace some of the earlier chapters.
    Chapters 3 and 4, which cover conceptual modeling using the ER and EER models,
    are important for a good conceptual understanding of databases. However, they
    may be partially covered, covered later in a course, or even left out if the emphasis
    is on DBMS implementation. Chapters 16 and 17 on file organizations and indexing
    may also be covered early, later, or even left out if the emphasis is on database models and languages. For students who have completed a course on file organization,
    parts of these chapters can be assigned as reading material or some exercises can be
    assigned as a review for these concepts.
    If the emphasis of a course is on database design, then the instructor should cover
    Chapters 3 and 4 early on, followed by the presentation of relational databases. A
    total life-cycle database design and implementation project would cover conceptual
    design (Chapters 3 and 4), relational databases (Chapters 5, 6, and 7), data model
    mapping (Chapter 9), normalization (Chapter 14), and application programs
    implementation with SQL (Chapter 10). Chapter 11 also should be covered if the
    emphasis is on Web database programming and applications. Additional documentation on the specific programming languages and RDBMS used would be required.
    The book is written so that it is possible to cover topics in various sequences. The
    following chapter dependency chart shows the major dependencies among chapters. As the diagram illustrates, it is possible to start with several different topics
    following the first two introductory chapters. Although the chart may seem complex, it is important to note that if the chapters are covered in order, the dependencies are not lost. The chart can be consulted by instructors wishing to use an
    alternative order of presentation.
    For a one-semester course based on this book, selected chapters can be assigned as
    reading material. The book also can be used for a two-semester course sequence.
    The first course, Introduction to Database Design and Database Systems, at the
    sophomore, junior, or senior level, can cover most of Chapters 1 through 15. The
    second course, Database Models and Implementation Techniques, at the senior or
    first-year graduate level, can cover most of Chapters 16 through 30. The twosemester sequence can also be designed in various other ways, depending on the
    preferences of the instructors.

    Preface

    xiii

    1, 2
    Introductory

    5
    Relational
    Model

    3, 4
    ER, EER
    Models

    6, 7
    SQL

    8
    Relational
    Algebra
    16, 17
    File Organization,
    Indexing

    9
    ER-, EER-toRelational

    20, 21, 22
    Transactions,
    CC, Recovery
    14, 15
    FD, MVD,
    Normalization

    23, 24, 25
    DDB, NOSQL,
    Big Data

    12, 13
    ODB, ORDB,
    XML

    10, 11
    DB, Web
    Programming

    26, 27
    Advanced
    Models, IR

    28, 29
    Data Mining,
    Warehousing

    18, 19
    Query Processing,
    Optimization

    Supplemental Materials
    Support material is available to qualified instructors at Pearson’s instructor
    resource center ( For access, contact your
    local Pearson representative.

    PowerPoint lecture notes and figures.
    ■ A solutions manual.

    Acknowledgments
    It is a great pleasure to acknowledge the assistance and contributions of many individuals to this effort. First, we would like to thank our editor, Matt Goldstein, for
    his guidance, encouragement, and support. We would like to acknowledge the
    excellent work of Rose Kernan for production management, Patricia Daly for a

    30
    DB
    Security

    xiv

    Preface

    thorough copy editing of the book, Martha McMaster for her diligence in proofing
    the pages, and Scott Disanno, Managing Editor of the production team. We also
    wish to thank Kelsey Loanes from Pearson for her continued help with the project,
    and reviewers Michael Doherty, Deborah Dunn, Imad Rahal, Karen Davis, Gilliean
    Lee, Leo Mark, Monisha Pulimood, Hassan Reza, Susan Vrbsky, Li Da Xu, Weining
    Zhang and Vincent Oria.
    Ramez Elmasri would like to thank Kulsawasd Jitkajornwanich, Vivek Sharma, and
    Surya Swaminathan for their help with preparing some of the material in Chapter 24. Sham Navathe would like to acknowledge the following individuals who
    helped in critically reviewing and revising various topics. Dan Forsythe and Satish
    Damle for discussion of storage systems; Rafi Ahmed for detailed re-organization
    of the material on query processing and optimization; Harish Butani, Balaji
    Palanisamy, and Prajakta Kalmegh for their help with the Hadoop and MapReduce
    technology material; Vic Ghorpadey and Nenad Jukic for revision of the Data
    Warehousing material; and finally, Frank Rietta for newer techniques in database
    security, Kunal Malhotra for various discussions, and Saurav Sahay for advances in
    information retrieval systems.
    We would like to repeat our thanks to those who have reviewed and contributed to
    previous editions of Fundamentals of Database Systems.

    First edition. Alan Apt (editor), Don Batory, Scott Downing, Dennis
    Heimbinger, Julia Hodges, Yannis Ioannidis, Jim Larson, Per-Ake Larson,
    Dennis McLeod, Rahul Patel, Nicholas Roussopoulos, David Stemple,
    Michael Stonebraker, Frank Tompa, and Kyu-Young Whang.
    ■ Second edition. Dan Joraanstad (editor), Rafi Ahmed, Antonio Albano, David
    Beech, Jose Blakeley, Panos Chrysanthis, Suzanne Dietrich, Vic Ghorpadey,
    Goetz Graefe, Eric Hanson, Junguk L. Kim, Roger King, Vram Kouramajian,
    Vijay Kumar, John Lowther, Sanjay Manchanda, Toshimi Minoura, Inderpal
    Mumick, Ed Omiecinski, Girish Pathak, Raghu Ramakrishnan, Ed Robertson,
    Eugene Sheng, David Stotts, Marianne Winslett, and Stan Zdonick.
    ■ Third edition. Maite Suarez-Rivas and Katherine Harutunian (editors);
    Suzanne Dietrich, Ed Omiecinski, Rafi Ahmed, Francois Bancilhon, Jose
    Blakeley, Rick Cattell, Ann Chervenak, David W. Embley, Henry A. Etlinger,
    Leonidas Fegaras, Dan Forsyth, Farshad Fotouhi, Michael Franklin, Sreejith
    Gopinath, Goetz Craefe, Richard Hull, Sushil Jajodia, Ramesh K. Karne,
    Harish Kotbagi, Vijay Kumar, Tarcisio Lima, Ramon A. Mata-Toledo, Jack
    McCaw, Dennis McLeod, Rokia Missaoui, Magdi Morsi, M. Narayanaswamy,
    Carlos Ordonez, Joan Peckham, Betty Salzberg, Ming-Chien Shan, Junping
    Sun, Rajshekhar Sunderraman, Aravindan Veerasamy, and Emilia E. Villareal.
    ■ Fourth edition. Maite Suarez-Rivas, Katherine Harutunian, Daniel Rausch,
    and Juliet Silveri (editors); Phil Bernhard, Zhengxin Chen, Jan Chomicki,
    Hakan Ferhatosmanoglu, Len Fisk, William Hankley, Ali R. Hurson, Vijay
    Kumar, Peretz Shoval, Jason T. L. Wang (reviewers); Ed Omiecinski (who
    contributed to Chapter 27). Contributors from the University of Texas at

    Preface

    Arlington are Jack Fu, Hyoil Han, Babak Hojabri, Charley Li, Ande Swathi,
    and Steven Wu; Contributors from Georgia Tech are Weimin Feng, Dan Forsythe, Angshuman Guin, Abrar Ul-Haque, Bin Liu, Ying Liu, Wanxia Xie,
    and Waigen Yee.
    ■ Fifth edition. Matt Goldstein and Katherine Harutunian (editors); Michelle
    Brown, Gillian Hall, Patty Mahtani, Maite Suarez-Rivas, Bethany Tidd, and
    Joyce Cosentino Wells (from Addison-Wesley); Hani Abu-Salem, Jamal R.
    Alsabbagh, Ramzi Bualuan, Soon Chung, Sumali Conlon, Hasan Davulcu,
    James Geller, Le Gruenwald, Latifur Khan, Herman Lam, Byung S. Lee,
    Donald Sanderson, Jamil Saquer, Costas Tsatsoulis, and Jack C. Wileden
    (reviewers); Raj Sunderraman (who contributed the laboratory projects);
    Salman Azar (who contributed some new exercises); Gaurav Bhatia, Fariborz Farahmand, Ying Liu, Ed Omiecinski, Nalini Polavarapu, Liora Sahar,
    Saurav Sahay, and Wanxia Xie (from Georgia Tech).
    ■ Sixth edition. Matt Goldstein (editor); Gillian Hall (production management); Rebecca Greenberg (copy editing); Jeff Holcomb, Marilyn Lloyd,
    Margaret Waples, and Chelsea Bell (from Pearson); Rafi Ahmed, Venu
    Dasigi, Neha Deodhar, Fariborz Farahmand, Hariprasad Kumar, Leo Mark,
    Ed Omiecinski, Balaji Palanisamy, Nalini Polavarapu, Parimala R. Pranesh,
    Bharath Rengarajan, Liora Sahar, Saurav Sahay, Narsi Srinivasan, and
    Wanxia Xie.
    Last, but not least, we gratefully acknowledge the support, encouragement, and
    patience of our families.
    R. E.
    S.B.N.

    xv

    This page intentionally left blank

    Contents
    Preface
    vii
    About the Authors

    xxx

    1

    ■ part
    Introduction to Databases ■
    chapter 1 Databases and Database Users

    3

    1.1 Introduction
    4
    1.2 An Example
    6
    1.3 Characteristics of the Database Approach
    10
    1.4 Actors on the Scene
    15
    1.5 Workers behind the Scene
    17
    1.6 Advantages of Using the DBMS Approach
    17
    1.7 A Brief History of Database Applications
    23
    1.8 When Not to Use a DBMS
    27
    1.9 Summary
    27
    Review Questions
    28
    Exercises
    28
    Selected Bibliography
    29

    chapter 2 Database System Concepts
    and Architecture

    31

    2.1 Data Models, Schemas, and Instances
    32
    2.2 Three-Schema Architecture and Data Independence
    36
    2.3 Database Languages and Interfaces
    38
    2.4 The Database System Environment
    42
    2.5 Centralized and Client/Server Architectures for DBMSs
    46
    2.6 Classification of Database Management Systems
    51
    2.7 Summary
    54
    Review Questions
    55
    Exercises
    55
    Selected Bibliography
    56

    xvii

    xviii

    Contents

    2

    ■ part
    Conceptual Data Modeling and Database Design ■
    chapter 3 Data Modeling Using the Entity–Relationship (ER)
    Model

    59

    3.1 Using High-Level Conceptual Data Models
    for Database Design
    60
    3.2 A Sample Database Application
    62
    3.3 Entity Types, Entity Sets, Attributes, and Keys
    63
    3.4 Relationship Types, Relationship Sets, Roles, and Structural
    Constraints
    72
    3.5 Weak Entity Types
    79
    3.6 Refining the ER Design for the COMPANY Database
    80
    3.7 ER Diagrams, Naming Conventions, and Design Issues
    81
    3.8 Example of Other Notation: UML Class Diagrams
    85
    3.9 Relationship Types of Degree Higher than Two
    88
    3.10 Another Example: A UNIVERSITY Database
    92
    3.11 Summary
    94
    Review Questions
    96
    Exercises
    96
    Laboratory Exercises
    103
    Selected Bibliography
    104

    chapter 4 The Enhanced Entity–Relationship (EER)
    Model

    107

    4.1 Subclasses, Superclasses, and Inheritance
    108
    4.2 Specialization and Generalization
    110
    4.3 Constraints and Characteristics of Specialization and Generalization
    Hierarchies
    113
    4.4 Modeling of UNION Types Using Categories
    120
    4.5 A Sample UNIVERSITY EER Schema, Design Choices, and Formal
    Definitions
    122
    4.6 Example of Other Notation: Representing Specialization and
    Generalization in UML Class Diagrams
    127
    4.7 Data Abstraction, Knowledge Representation, and Ontology
    Concepts
    128
    4.8 Summary
    135
    Review Questions
    135
    Exercises
    136
    Laboratory Exercises
    143
    Selected Bibliography
    146

    Contents

    3

    ■ part
    The Relational Data Model and SQL ■
    chapter 5 The Relational Data Model and Relational
    Database Constraints

    149

    5.1 Relational Model Concepts
    150
    5.2 Relational Model Constraints and Relational Database Schemas
    5.3 Update Operations, Transactions, and Dealing with Constraint
    Violations
    165
    5.4 Summary
    169
    Review Questions
    170
    Exercises
    170
    Selected Bibliography
    175

    chapter 6 Basic SQL

    157

    177

    6.1 SQL Data Definition and Data Types
    179
    6.2 Specifying Constraints in SQL
    184
    6.3 Basic Retrieval Queries in SQL
    187
    6.4 INSERT, DELETE, and UPDATE Statements in SQL
    6.5 Additional Features of SQL
    201
    6.6 Summary
    202
    Review Questions
    203
    Exercises
    203
    Selected Bibliography
    205

    198

    chapter 7 More SQL: Complex Queries, Triggers, Views,
    and Schema Modification

    207

    7.1 More Complex SQL Retrieval Queries
    207
    7.2 Specifying Constraints as Assertions and Actions as Triggers
    7.3 Views (Virtual Tables) in SQL
    228
    7.4 Schema Change Statements in SQL
    232
    7.5 Summary
    234
    Review Questions
    236
    Exercises
    236
    Selected Bibliography
    238

    chapter 8 The Relational Algebra and Relational Calculus
    8.1 Unary Relational Operations: SELECT and PROJECT
    8.2 Relational Algebra Operations from Set Theory
    246

    241

    225

    239

    xix

    xx

    Contents

    8.3 Binary Relational Operations: JOIN and DIVISION
    8.4 Additional Relational Operations
    259
    8.5 Examples of Queries in Relational Algebra
    265
    8.6 The Tuple Relational Calculus
    268
    8.7 The Domain Relational Calculus
    277
    8.8 Summary
    279
    Review Questions
    280
    Exercises
    281
    Laboratory Exercises
    286
    Selected Bibliography
    288

    251

    chapter 9 Relational Database Design by ER- and
    EER-to-Relational Mapping

    289

    9.1 Relational Database Design Using ER-to-Relational Mapping
    9.2 Mapping EER Model Constructs to Relations
    298
    9.3 Summary
    303
    Review Questions
    303
    Exercises
    303
    Laboratory Exercises
    305
    Selected Bibliography
    306

    290

    4

    ■ part
    Database Programming Techniques ■
    chapter 10 Introduction to SQL Programming
    Techniques

    309

    10.1 Overview of Database Programming Techniques and Issues
    10.2 Embedded SQL, Dynamic SQL, and SQL J
    314
    10.3 Database Programming with Function Calls and Class
    Libraries: SQL/CLI and JDBC
    326
    10.4 Database Stored Procedures and SQL/PSM
    335
    10.5 Comparing the Three Approaches
    338
    10.6 Summary
    339
    Review Questions
    340
    Exercises
    340
    Selected Bibliography
    341

    chapter 11 Web Database Programming Using PHP
    11.1 A Simple PHP Example
    344
    11.2 Overview of Basic Features of PHP

    346

    310

    343

    Contents

    11.3 Overview of PHP Database Programming
    353
    11.4 Brief Overview of Java Technologies for Database Web
    Programming
    358
    11.5 Summary
    358
    Review Questions
    359
    Exercises
    359
    Selected Bibliography
    359

    ■ part

    5

    Object, Object-Relational, and XML: Concepts, Models,
    Languages, and Standards ■
    chapter 12 Object and Object-Relational
    Databases

    363

    12.1 Overview of Object Database Concepts
    365
    12.2 Object Database Extensions to SQL
    379
    12.3 The ODMG Object Model and the Object Definition Language
    ODL
    386
    12.4 Object Database Conceptual Design
    405
    12.5 The Object Query Language OQL
    408
    12.6 Overview of the C++ Language Binding in the ODMG
    Standard
    417
    12.7 Summary
    418
    Review Questions
    420
    Exercises
    421
    Selected Bibliography
    422

    chapter 13 XML: Extensible Markup Language

    425

    13.1 Structured, Semistructured, and Unstructured Data
    426
    13.2 XML Hierarchical (Tree) Data Model
    430
    13.3 XML Documents, DTD, and XML Schema
    433
    13.4 Storing and Extracting XML Documents
    from Databases
    442
    13.5 XML Languages
    443
    13.6 Extracting XML Documents from Relational Databases
    447
    13.7 XML/SQL: SQL Functions for Creating XML Data
    453
    13.8 Summary
    455
    Review Questions
    456
    Exercises
    456
    Selected Bibliography
    456

    xxi

    xxii

    Contents

    6

    ■ part
    Database Design Theory and Normalization ■
    chapter 14 Basics of Functional Dependencies
    and Normalization for Relational
    Databases
    459
    14.1 Informal Design Guidelines for Relation
    Schemas
    461
    14.2 Functional Dependencies
    471
    14.3 Normal Forms Based on Primary Keys
    474
    14.4 General Definitions of Second and Third Normal
    Forms
    483
    14.5 Boyce-Codd Normal Form
    487
    14.6 Multivalued Dependency and Fourth
    Normal Form
    491
    14.7 Join Dependencies and Fifth Normal Form
    494
    14.8 Summary
    495
    Review Questions
    496
    Exercises
    497
    Laboratory Exercises
    501
    Selected Bibliography
    502

    chapter 15 Relational Database Design Algorithms
    and Further Dependencies

    503

    15.1 Further Topics in Functional Dependencies: Inference Rules,
    Equivalence, and Minimal Cover
    505
    15.2 Properties of Relational Decompositions
    513
    15.3 Algorithms for Relational Database Schema
    Design
    519
    15.4 About Nulls, Dangling Tuples, and Alternative Relational
    Designs
    523
    15.5 Further Discussion of Multivalued Dependencies
    and 4NF
    527
    15.6 Other Dependencies and Normal Forms
    530
    15.7 Summary
    533
    Review Questions
    534
    Exercises
    535
    Laboratory Exercises
    536
    Selected Bibliography
    537

    Contents

    7

    ■ part
    File Structures, Hashing, Indexing, and Physical
    Database Design ■
    chapter 16 Disk Storage, Basic File Structures,
    Hashing, and Modern Storage
    Architectures
    541
    16.1 Introduction
    542
    16.2 Secondary Storage Devices
    547
    16.3 Buffering of Blocks
    556
    16.4 Placing File Records on Disk
    560
    16.5 Operations on Files
    564
    16.6 Files of Unordered Records (Heap Files)
    16.7 Files of Ordered Records (Sorted Files)
    16.8 Hashing Techniques
    572
    16.9 Other Primary File Organizations
    582
    16.10 Parallelizing Disk Access Using RAID
    Technology
    584
    16.11 Modern Storage Architectures
    588
    16.12 Summary
    592
    Review Questions
    593
    Exercises
    595
    Selected Bibliography
    598

    567
    568

    chapter 17 Indexing Structures for Files and Physical
    Database Design

    601

    17.1 Types of Single-Level Ordered Indexes
    602
    17.2 Multilevel Indexes
    613
    17.3 Dynamic Multilevel Indexes Using B-Trees
    and B+-Trees
    617
    17.4 Indexes on Multiple Keys
    631
    17.5 Other Types of Indexes
    633
    17.6 Some General Issues Concerning Indexing
    638
    17.7 Physical Database Design in Relational
    Databases
    643
    17.8 Summary
    646
    Review Questions
    647
    Exercises
    648
    Selected Bibliography
    650

    xxiii

    xxiv

    Contents

    8

    ■ part
    Query Processing and Optimization ■
    chapter 18 Strategies for Query Processing
    18.1 Translating SQL Queries into Relational Algebra
    and Other Operators
    657
    18.2 Algorithms for External Sorting
    660
    18.3 Algorithms for SELECT Operation
    663
    18.4 Implementing the JOIN Operation
    668
    18.5 Algorithms for PROJECT and Set Operations
    676
    18.6 Implementing Aggregate Operations and Different
    Types of JOINs
    678
    18.7 Combining Operations Using Pipelining
    681
    18.8 Parallel Algorithms for Query Processing
    683
    18.9 Summary
    688
    Review Questions
    688
    Exercises
    689
    Selected Bibliography
    689

    chapter 19 Query Optimization

    691

    19.1 Query Trees and Heuristics for Query
    Optimization
    692
    19.2 Choice of Query Execution Plans
    701
    19.3 Use of Selectivities in Cost-Based
    Optimization
    710
    19.4 Cost Functions for SELECT Operation
    714
    19.5 Cost Functions for the JOIN Operation
    717
    19.6 Example to Illustrate Cost-Based Query
    Optimization
    726
    19.7 Additional Issues Related to Query
    Optimization
    728
    19.8 An Example of Query Optimization in Data
    Warehouses
    731
    19.9 Overview of Query Optimization in Oracle
    733
    19.10 Semantic Query Optimization
    737
    19.11 Summary
    738
    Review Questions
    739
    Exercises
    740
    Selected Bibliography
    740

    655

    Contents

    9

    ■ part
    Transaction Processing, Concurrency Control,
    and Recovery ■
    chapter 20 Introduction to Transaction Processing
    Concepts and Theory

    745

    20.1 Introduction to Transaction Processing
    746
    20.2 Transaction and System Concepts
    753
    20.3 Desirable Properties of Transactions
    757
    20.4 Characterizing Schedules Based on Recoverability
    20.5 Characterizing Schedules Based on Serializability
    20.6 Transaction Support in SQL
    773
    20.7 Summary
    776
    Review Questions
    777
    Exercises
    777
    Selected Bibliography
    779

    chapter 21 Concurrency Control Techniques

    759
    763

    781

    21.1 Two-Phase Locking Techniques for Concurrency
    Control
    782
    21.2 Concurrency Control Based on Timestamp Ordering
    792
    21.3 Multiversion Concurrency Control Techniques
    795
    21.4 Validation (Optimistic) Techniques and Snapshot Isolation
    Concurrency Control
    798
    21.5 Granularity of Data Items and Multiple Granularity
    Locking
    800
    21.6 Using Locks for Concurrency Control in Indexes
    805
    21.7 Other Concurrency Control Issues
    806
    21.8 Summary
    807
    Review Questions
    808
    Exercises
    809
    Selected Bibliography
    810

    chapter 22 Database Recovery Techniques

    813

    22.1 Recovery Concepts
    814
    22.2 NO-UNDO/REDO Recovery Based on Deferred
    Update
    821
    22.3 Recovery Techniques Based on Immediate Update

    823

    xxv

    xxvi

    Contents

    22.4 Shadow Paging
    826
    22.5 The ARIES Recovery Algorithm
    827
    22.6 Recovery in Multidatabase Systems
    831
    22.7 Database Backup and Recovery from Catastrophic Failures
    22.8 Summary
    833
    Review Questions
    834
    Exercises
    835
    Selected Bibliography
    838

    832

    10

    ■ part
    Distributed Databases, NOSQL Systems,
    and Big Data ■
    chapter 23 Distributed Database Concepts

    841

    23.1 Distributed Database Concepts
    842
    23.2 Data Fragmentation, Replication, and Allocation Techniques for
    Distributed Database Design
    847
    23.3 Overview of Concurrency Control and Recovery in Distributed
    Databases
    854
    23.4 Overview of Transaction Management in Distributed Databases
    857
    23.5 Query Processing and Optimization in Distributed Databases
    859
    23.6 Types of Distributed Database Systems
    865
    23.7 Distributed Database Architectures
    868
    23.8 Distributed Catalog Management
    875
    23.9 Summary
    876
    Review Questions
    877
    Exercises
    878
    Selected Bibliography
    880

    chapter 24 NOSQL Databases and Big Data Storage
    Systems

    883

    24.1 Introduction to NOSQL Systems
    884
    24.2 The CAP Theorem
    888
    24.3 Document-Based NOSQL Systems and MongoDB
    24.4 NOSQL Key-Value Stores
    895
    24.5 Column-Based or Wide Column NOSQL Systems
    24.6 NOSQL Graph Databases and Neo4j
    903
    24.7 Summary
    909
    Review Questions
    909
    Selected Bibliography
    910

    890
    900

    Contents

    chapter 25 Big Data Technologies Based on MapReduce
    and Hadoop

    911

    25.1 What Is Big Data?
    914
    25.2 Introduction to MapReduce and Hadoop
    25.3 Hadoop Distributed File System (HDFS)
    25.4 MapReduce: Additional Details
    926
    25.5 Hadoop v2 alias YARN
    936
    25.6 General Discussion
    944
    25.7 Summary
    953
    Review Questions
    954
    Selected Bibliography
    956

    916
    921

    11

    ■ part
    Advanced Database Models, Systems, and
    Applications ■
    chapter 26 Enhanced Data Models: Introduction to Active,
    Temporal, Spatial, Multimedia, and Deductive
    Databases 961
    26.1 Active Database Concepts and Triggers
    963
    26.2 Temporal Database Concepts
    974
    26.3 Spatial Database Concepts
    987
    26.4 Multimedia Database Concepts
    994
    26.5 Introduction to Deductive Databases
    999
    26.6 Summary
    1012
    Review Questions
    1014
    Exercises
    1015
    Selected Bibliography
    1018

    chapter 27 Introduction to Information Retrieval
    and Web Search

    1021

    27.1 Information Retrieval (IR) Concepts
    1022
    27.2 Retrieval Models
    1029
    27.3 Types of Queries in IR Systems
    1035
    27.4 Text Preprocessing
    1037
    27.5 Inverted Indexing
    1040
    27.6 Evaluation Measures of Search Relevance
    1044
    27.7 Web Search and Analysis
    1047

    xxvii

    xxviii

    Contents

    27.8 Trends in Information Retrieval
    27.9 Summary
    1063
    Review Questions
    1064
    Selected Bibliography
    1066

    1057

    chapter 28 Data Mining Concepts

    1069

    28.1 Overview of Data Mining Technology
    1070
    28.2 Association Rules
    1073
    28.3 Classification
    1085
    28.4 Clustering
    1088
    28.5 Approaches to Other Data Mining Problems
    1091
    28.6 Applications of Data Mining
    1094
    28.7 Commercial Data Mining Tools
    1094
    28.8 Summary
    1097
    Review Questions
    1097
    Exercises
    1098
    Selected Bibliography
    1099

    chapter 29 Overview of Data Warehousing
    and OLAP

    1101

    29.1 Introduction, Definitions, and Terminology
    1102
    29.2 Characteristics of Data Warehouses
    1103
    29.3 Data Modeling for Data Warehouses
    1105
    29.4 Building a Data Warehouse
    1111
    29.5 Typical Functionality of a Data Warehouse
    1114
    29.6 Data Warehouse versus Views
    1115
    29.7 Difficulties of Implementing Data Warehouses
    1116
    29.8 Summary
    1117
    Review Questions
    1117
    Selected Bibliography
    1118

    12

    ■ part
    Additional Database Topics: Security ■
    chapter 30 Database Security

    1121

    30.1 Introduction to Database Security Issues
    1122
    30.2 Discretionary Access Control Based on Granting and Revoking
    Privileges
    1129
    30.3 Mandatory Access Control and Role-Based Access Control for
    Multilevel Security
    1134

    Contents

    30.4 SQL Injection
    1143
    30.5 Introduction to Statistical Database Security
    1146
    30.6 Introduction to Flow Control
    1147
    30.7 Encryption and Public Key Infrastructures
    1149
    30.8 Privacy Issues and Preservation
    1153
    30.9 Challenges to Maintaining Database Security
    1154
    30.10 Oracle Label-Based Security
    1155
    30.11 Summary
    1158
    Review Questions
    1159
    Exercises
    1160
    Selected Bibliography
    1161

    appendix A Alternative Diagrammatic Notations for ER
    Models

    1163

    appendix B Parameters of Disks

    1167

    appendix C Overview of the QBE Language

    1171

    C.1 Basic Retrievals in QBE
    1171
    C.2 Grouping, Aggregation, and Database Modification in QBE

    appendix

    D Overview of the Hierarchical Data Model
    (located on the Companion Website at

    appendix

    E Overview of the Network Data Model
    (located on the Companion Website at

    Selected Bibliography
    Index

    1215

    1179

    1175

    xxix

    About the Authors
    Ramez Elmasri is a professor and the associate chairperson of the Department of
    Computer Science and Engineering at the University of Texas at Arlington. He has
    over 140 refereed research publications, and has supervised 16 PhD students and
    over 100 MS students. His research has covered many areas of database management and big data, including conceptual modeling and data integration, query
    languages and indexing techniques, temporal and spatio-temporal databases, bioinformatics databases, data collection from sensor networks, and mining/analysis
    of spatial and spatio-temporal data. He has worked as a consultant to various companies, including Digital, Honeywell, Hewlett Packard, and Action Technologies,
    as well as consulting with law firms on patents. He was the Program Chair of the
    1993 International Conference on Conceptual Modeling (ER conference) and program vice-chair of the 1994 IEEE International Conference on Data Engineering.
    He has served on the ER conference steering committee and has been on the program committees of many conferences. He has given several tutorials at the VLDB,
    ICDE, and ER conferences. He also co-authored the book “Operating Systems: A
    Spiral Approach” (McGraw-Hill, 2009) with Gil Carrick and David Levine. Elmasri
    is a recipient of the UTA College of Engineering Outstanding Teaching Award in
    1999. He holds a BS degree in Engineering from Alexandria University, and MS
    and PhD degrees in Computer Science from Stanford University.
    Shamkant B. Navathe is a professor and the founder of the database research group
    at the College of Computing, Georgia Institute of Technology, Atlanta. He has
    worked with IBM and Siemens in their research divisions and has been a consultant
    to various companies including Digital, Computer Corporation of America,
    Hewlett Packard, Equifax, and Persistent Systems. He was the General Co-chairman
    of the 1996 International VLDB (Very Large Data Base) conference in Bombay,
    India. He was also program co-chair of ACM SIGMOD 1985 International Conference and General Co-chair of the IFIP WG 2.6 Data Semantics Workshop in 1995.
    He has served on the VLDB foundation and has been on the steering committees of
    several conferences. He has been an associate editor of a number of journals
    including ACM Computing Surveys, and IEEE Transactions on Knowledge and
    Data Engineering. He also co-authored the book “Conceptual Design: An Entity
    Relationship Approach” (Addison Wesley, 1992) with Carlo Batini and Stefano
    Ceri. Navathe is a fellow of the Association for Computing Machinery (ACM) and
    recipient of the IEEE TCDE Computer Science, Engineering and Education Impact
    award in 2015. Navathe holds a PhD from the University of Michigan and has over
    150 refereed publications in journals and conferences.

    xxx

    part

    1

    Introduction
    to Databases

    This page intentionally left blank

    chapter

    1

    Databases and
    Database Users

    D

    atabases and database systems are an essential
    component of life in modern society: most of us
    encounter several activities every day that involve some interaction with a database.
    For example, if we go to the bank to deposit or withdraw funds, if we make a hotel
    or airline reservation, if we access a computerized library catalog to search for a
    bibliographic item, or if we purchase something online—such as a book, toy, or
    computer—chances are that our activities will involve someone or some computer
    program accessing a database. Even purchasing items at a supermarket often automatically updates the database that holds the inventory of grocery items.
    These interactions are examples of what we may call traditional database
    applications, in which most of the information that is stored and accessed is either
    textual or numeric. In the past few years, advances in technology have led to exciting
    new applications of database systems. The proliferation of social media Web sites,
    such as Facebook, Twitter, and Flickr, among many others, has required the creation of huge databases that store nontraditional data, such as posts, tweets,
    images, and video clips. New types of database systems, often referred to as big data
    storage systems, or NOSQL systems, have been created to manage data for social
    media applications. These types of systems are also used by companies such as
    Google, Amazon, and Yahoo, to manage the data required in their Web search
    engines, as well as to provide cloud storage, whereby users are provided with storage capabilities on the Web for managing all types of data including documents,
    programs, images, videos and emails. We will give an overview of these new types
    of database systems in Chapter 24.
    We now mention some other applications of databases. The wide availability of
    photo and video technology on cellphones and other devices has made it possible to
    3

    4

    Chapter 1 Databases and Database Users

    store images, audio clips, and video streams digitally. These types of files are becoming an important component of multimedia databases. Geographic information
    systems (GISs) can store and analyze maps, weather data, and satellite images.
    Data warehouses and online analytical processing (OLAP) systems are used in
    many companies to extract and analyze useful business information from very large
    databases to support decision making. Real-time and active database technology
    is used to control industrial and manufacturing processes. And database search
    techniques are being applied to the World Wide Web to improve the search for
    information that is needed by users browsing the Internet.
    To understand the fundamentals of database technology, however, we must start
    from the basics of traditional database applications. In Section 1.1 we start by defining a database, and then we explain other basic terms. In Section 1.2, we provide a
    simple UNIVERSITY database example to illustrate our discussion. Section 1.3
    describes some of the main characteristics of database systems, and Sections 1.4
    and 1.5 categorize the types of personnel whose jobs involve using and interacting
    with database systems. Sections 1.6, 1.7, and 1.8 offer a more thorough discussion
    of the various capabilities provided by database systems and discuss some typical
    database applications. Section 1.9 summarizes the chapter.
    The reader who desires a quick introduction to database systems can study
    Sections 1.1 through 1.5, then skip or browse through Sections 1.6 through 1.8 and
    go on to Chapter 2.

    1.1 Introduction
    Databases and database technology have had a major impact on the growing use of
    computers. It is fair to say that databases play a critical role in almost all areas where
    computers are used, including business, electronic commerce, social media, engineering, medicine, genetics, law, education, and library science. The word database
    is so commonly used that we must begin by defining what a database is. Our initial
    definition is quite general.
    A database is a collection of related data.1 By data, we mean known facts that can
    be recorded and that have implicit meaning. For example, consider the names,
    telephone numbers, and addresses of the people you know. Nowadays, this data is
    typically stored in mobile phones, which have their own simple database software.
    This data can also be recorded in an indexed address book or stored on a hard
    drive, using a personal computer and software such as Microsoft Access or Excel.
    This collection of related data with an implicit meaning is a database.
    The preceding definition of database is quite general; for example, we may consider
    the collection of words that make up this page of text to be related data and hence to
    1

    We will use the word data as both singular and plural, as is common in database literature; the context
    will determine whether it is singular or plural. In standard English, data is used for plural and datum for
    singular.

    1.1 Introduction

    constitute a database. However, the common use of the term database is usually
    more restricted. A database has the following implicit properties:

    A database represents some aspect of the real world, sometimes called the
    miniworld or the universe of discourse (UoD). Changes to the miniworld
    are reflected in the database.
    ■ A database is a logically coherent collection of data with some inherent
    meaning. A random assortment of data cannot correctly be referred to as a
    database.
    ■ A database is designed, built, and populated with data for a specific purpose.
    It has an intended group of users and some preconceived applications in
    which these users are interested.
    In other words, a database has some source from which data is derived, some degree
    of interaction with events in the real world, and an audience that is actively interested in its contents. The end users of a database may perform business transactions
    (for example, a customer buys a camera) or events may happen (for example, an
    employee has a baby) that cause the information in the database to change. In order
    for a database to be accurate and reliable at all times, it must be a true reflection of
    the miniworld that it represents; therefore, changes must be reflected in the database as soon as possible.
    A database can be of any size and complexity. For example, the list of names and
    addresses referred to earlier may consist of only a few hundred records, each with a
    simple structure. On the other hand, the computerized catalog of a large library
    may contain half a million entries organized under different categories—by primary author’s last name, by subject, by book title—with each category organized
    alphabetically. A database of even greater size and complexity would be maintained
    by a social media company such as Facebook, which has more than a billion users.
    The database has to maintain information on which users are related to one another
    as friends, the postings of each user, which users are allowed to see each posting,
    and a vast amount of other types of information needed for the correct operation of
    their Web site. For such Web sites, a large number of databases are needed to keep
    track of the constantly changing information required by the social media Web site.
    An example of a large commercial database is Amazon.com. It contains data for
    over 60 million active users, and millions of books, CDs, videos, DVDs, games,
    electronics, apparel, and other items. The database occupies over 42 terabytes
    (a terabyte is 1012 bytes worth of storage) and is stored on hundreds of computers
    (called servers). Millions of visitors access Amazon.com each day and use the
    database to make purchases. The database is continually updated as new books
    and other items are added to the inventory, and stock quantities are updated as
    purchases are transacted.
    A database may be generated and maintained manually or it may be computerized. For example, a library card catalog is a database that may be created and
    maintained manually. A computerized database may be created and maintained
    either by a group of application programs written specifically for that task or by a

    5

    6

    Chapter 1 Databases and Database Users

    database management system. Of course, we are only concerned with computerized databases in this text.
    A database management system (DBMS) is a computerized system that enables
    users to create and maintain a database. The DBMS is a general-purpose software
    system that facilitates the processes of defining, constructing, manipulating, and
    sharing databases among various users and applications. Defining a database
    involves specifying the data types, structures, and constraints of the data to be
    stored in the database. The database definition or descriptive information is also
    stored by the DBMS in the form of a database catalog or dictionary; it is called
    meta-data. Constructing the database is the process of storing the data on some
    storage medium that is controlled by the DBMS. Manipulating a database includes
    functions such as querying the database to retrieve specific data, updating the database to reflect changes in the miniworld, and generating reports from the data.
    Sharing a database allows multiple users and programs to access the database
    simultaneously.
    An application program accesses the database by sending queries or requests for
    data to the DBMS. A query2 typically causes some data to be retrieved; a transaction
    may cause some data to be read and some data to be written into the database.
    Other important functions provided by the DBMS include protecting the database
    and maintaining it over a long period of time. Protection includes system protection against hardware or software malfunction (or crashes) and security protection
    against unauthorized or malicious access. A typical large database may have a life
    cycle of many years, so the DBMS must be able to maintain the database system by
    allowing the system to evolve as requirements change over time.
    It is not absolutely necessary to use general-purpose DBMS software to implement
    a computerized database. It is possible to write a customized set of programs to create and maintain the database, in effect creating a special-purpose DBMS software
    for a specific application, such as airlines reservations. In either case—whether we
    use a general-purpose DBMS or not—a considerable amount of complex software
    is deployed. In fact, most DBMSs are very complex software systems.
    To complete our initial definitions, we will call the database and DBMS software
    together a database system. Figure 1.1 illustrates some of the concepts we have
    discussed so far.

    1.2 An Example
    Let us consider a simple example that most readers may be familiar with: a
    UNIVERSITY database for maintaining information concerning students, courses,
    and grades in a university environment. Figure 1.2 shows the database structure
    and a few sample data records. The database is organized as five files, each of which
    2

    The term query, originally meaning a question or an inquiry, is sometimes loosely used for all types of
    interactions with databases, including modifying the data.

    1.2 An Example

    Users/Programmers
    Database
    System
    Application Programs/Queries

    DBMS
    Software

    Software to Process
    Queries/Programs

    Software to Access
    Stored Data

    Stored Database
    Definition
    (Meta-Data)

    Stored Database
    Figure 1.1
    A simplified database
    system environment.

    stores data records of the same type.3 The STUDENT file stores data on each student, the COURSE file stores data on each course, the SECTION file stores data on
    each section of a course, the GRADE_REPORT file stores the grades that students
    receive in the various sections they have completed, and the PREREQUISITE file
    stores the prerequisites of each course.
    To define this database, we must specify the structure of the records of each file by
    specifying the different types of data elements to be stored in each record. In
    Figure 1.2, each STUDENT record includes data to represent the student’s Name,
    Student_number, Class (such as freshman or ‘1’, sophomore or ‘2’, and so forth),
    and Major (such as mathematics or ‘MATH’ and computer science or ‘CS’); each
    COURSE record includes data to represent the Course_name, Course_number,
    Credit_hours, and Department (the department that offers the course), and so
    on. We must also specify a data type for each data element within a record. For
    example, we can specify that Name of STUDENT is a string of alphabetic characters,
    Student_number of STUDENT is an integer, and Grade of GRADE_REPORT is a
    3

    We use the term file informally here. At a conceptual level, a file is a collection of records that may or
    may not be ordered.

    7

    8

    Chapter 1 Databases and Database Users

    STUDENT
    Name

    Student_number

    Class

    Major

    Smith

    17

    1

    CS

    Brown

    8

    2

    CS

    COURSE
    Course_name

    Course_number

    Credit_hours

    Department

    Intro to Computer Science

    CS1310

    4

    CS

    Data Structures

    CS3320

    4

    CS

    Discrete Mathematics

    MATH2410

    3

    MATH

    Database

    CS3380

    3

    CS

    SECTION
    Section_identifier

    Course_number

    Semester

    Year

    Instructor

    85

    MATH2410

    Fall

    07

    King

    92

    CS1310

    Fall

    07

    Anderson

    102

    CS3320

    Spring

    08

    Knuth

    112

    MATH2410

    Fall

    08

    Chang

    119

    CS1310

    Fall

    08

    Anderson

    135

    CS3380

    Fall

    08

    Stone

    GRADE_REPORT
    Student_number

    Section_identifier

    Grade

    17

    112

    B

    17

    119

    C

    8

    85

    A

    8

    92

    A

    8

    102

    B

    8

    135

    A

    PREREQUISITE
    Course_number
    Figure 1.2
    A database that stores
    student and course
    information.

    Prerequisite_number

    CS3380

    CS3320

    CS3380

    MATH2410

    CS3320

    CS1310

    1.2 An Example

    single character from the set {‘A’, ‘B’, ‘C’, ‘D’, ‘F’, ‘I’}. We may also use a coding
    scheme to represent the values of a data item. For example, in Figure 1.2 we represent the Class of a STUDENT as 1 for freshman, 2 for sophomore, 3 for junior,
    4 for senior, and 5 for graduate student.
    To construct the UNIVERSITY database, we store data to represent each student,
    course, section, grade report, and prerequisite as a record in the appropriate file.
    Notice that records in the various files may be related. For example, the record for
    Smith in the STUDENT file is related to two records in the GRADE_REPORT file that
    specify Smith’s grades in two sections. Similarly, each record in the PREREQUISITE
    file relates two course records: one representing the course and the other representing the prerequisite. Most medium-size and large databases include many types of
    records and have many relationships among the records.
    Database manipulation involves querying and updating. Examples of queries are as
    follows:

    Retrieve the transcript—a list of all courses and grades—of ‘Smith’
    List the names of students who took the section of the ‘Database’ course
    offered in fall 2008 and their grades in that section
    ■ List the prerequisites of the ‘Database’ course

    Examples of updates include the following:

    Change the class of ‘Smith’ to sophomore
    ■ Create a new section for the ‘Database’ course for this semester
    ■ Enter a grade of ‘A’ for ‘Smith’ in the ‘Database’ section of last semester
    These informal queries and updates must be specified precisely in the query language of the DBMS before they can be processed.
    At this stage, it is useful to describe the database as part of a larger undertaking
    known as an information system within an organization. The Information Technology (IT) department within an organization designs and maintains an information system consisting of various computers, storage systems, application software,
    and databases. Design of a new application for an existing database or design of a
    brand new database starts off with a phase called requirements specification and
    analysis. These requirements are documented in detail and transformed into a
    conceptual design that can be represented and manipulated using some computerized tools so that it can be easily maintained, modified, and transformed into a
    database implementation. (We will introduce a model called the Entity-Relationship model in Chapter 3 that is used for this purpose.) The design is then translated
    to a logical design that can be expressed in a data model implemented in a commercial DBMS. (Various types of DBMSs are discussed throughout the text, with an
    emphasis on relational DBMSs in Chapters 5 through 9.)
    The final stage is physical design, during which further specifications are provided for
    storing and accessing the database. The database design is implemented, populated
    with actual data, and continuously maintained to reflect the state of the miniworld.

    9

    10

    Chapter 1 Databases and Database Users

    1.3 Characteristics of the Database Approach
    A number of characteristics distinguish the database approach from the much
    older approach of writing customized programs to access data stored in files. In
    traditional file processing, each user defines and implements the files needed for a
    specific software application as part of programming the application. For example,
    one user, the grade reporting office, may keep files on students and their grades.
    Programs to print a student’s transcript and to enter new grades are implemented
    as part of the application. A second user, the accounting office, may keep track of
    students’ fees and their payments. Although both users are interested in data about
    students, each user maintains separate files—and programs to manipulate these
    files—because each requires some data not available from the other user’s files.
    This redundancy in defining and storing data results in wasted storage space and
    in redundant efforts to maintain common up-to-date data.
    In the database approach, a single repository maintains data that is defined once
    and then accessed by various users repeatedly through queries, transactions, and
    application programs. The main characteristics of the database approach versus the
    file-processing approach are the following:

    Self-describing nature of a database system
    ■ Insulation between programs and data, and data abstraction
    ■ Support of multiple views of the data
    ■ Sharing of data and multiuser transaction processing
    We describe each of these characteristics in a separate section. We will discuss additional characteristics of database systems in Sections 1.6 through 1.8.

    1.3.1 Self-Describing Nature of a Database System
    A fundamental characteristic of the database approach is that the database system
    contains not only the database itself but also a complete definition or description of
    the database structure and constraints. This definition is stored in the DBMS catalog, which contains information such as the structure of each file, the type and storage format of each data item, and various constraints on the data. The information
    stored in the catalog is called meta-data, and it describes the structure of the primary database (Figure 1.1). It is important to note that some newer types of database systems, known as NOSQL systems, do not require meta-data. Rather the data
    is stored as self-describing data that includes the data item names and data values
    together in one structure (see Chapter 24).
    The catalog is used by the DBMS software and also by database users who need
    information about the database structure. A general-purpose DBMS software
    package is not written for a specific database application. Therefore, it must refer
    to the catalog to know the structure of the files in a specific database, such as the
    type and format of data it will access. The DBMS software must work equally well
    with any number of database applications—for example, a university database, a

    1.3 Characteristics of the Database Approach

    11

    banking database, or a company database—as long as the database definition is
    stored in the catalog.
    In traditional file processing, data definition is typically part of the application programs themselves. Hence, these programs are constrained to work with only one
    specific database, whose structure is declared in the application programs. For
    example, an application program written in C++ may have struct or class declarations. Whereas file-processing software can access only specific databases, DBMS
    software can access diverse databases by extracting the database definitions from
    the catalog and using these definitions.
    For the example shown in Figure 1.2, the DBMS catalog will store the definitions of
    all the files shown. Figure 1.3 shows some entries in a database catalog. Whenever a
    request is made to access, say, the Name of a STUDENT record, the DBMS software
    refers to the catalog to determine the structure of the STUDENT file and the position
    and size of the Name data item within a STUDENT record. By contrast, in a typical
    file-processing application, the file structure and, in the extreme case, the exact
    location of Name within a STUDENT record are already coded within each program
    that accesses this data item.

    Figure 1.3
    An example of a
    database catalog for
    the database in
    Figure 1.2.

    RELATIONS
    Relation_name

    No_of_columns

    STUDENT

    4

    COURSE

    4

    SECTION

    5

    GRADE_REPORT

    3

    PREREQUISITE

    2

    COLUMNS
    Column_name

    Data_type

    Belongs_to_relation

    Name

    Character (30)

    STUDENT

    Student_number

    Character (4)

    STUDENT

    Class

    Integer (1)

    STUDENT

    Major

    Major_type

    STUDENT

    Course_name

    Character (10)

    COURSE

    Course_number

    XXXXNNNN

    COURSE

    ….

    ….

    …..

    ….

    ….

    …..

    ….

    ….

    …..

    Prerequisite_number

    XXXXNNNN

    PREREQUISITE

    Note: Major_type is defined as an enumerated type with all known majors.
    XXXXNNNN is used to define a type with four alphabetic characters followed by four numeric digits.

    12

    Chapter 1 Databases and Database Users

    1.3.2 Insulation between Programs and Data,
    and Data Abstraction
    In traditional file processing, the structure of data files is embedded in the application programs, so any changes to the structure of a file may require changing all
    programs that access that file. By contrast, DBMS access programs do not require
    such changes in most cases. The structure of data files is stored in the DBMS catalog separately from the access programs. We call this property program-data
    independence.
    For example, a file access program may be written in such a way that it can access
    only STUDENT records of the structure shown in Figure 1.4. If we want to add
    another piece of data to each STUDENT record, say the Birth_date, such a program
    will no longer work and must be changed. By contrast, in a DBMS environment, we
    only need to change the description of STUDENT records in the catalog (Figure 1.3)
    to reflect the inclusion of the new data item Birth_date; no programs are changed.
    The next time a DBMS program refers to the catalog, the new structure of
    STUDENT records will be accessed and used.
    In some types of database systems, such as object-oriented and object-relational
    systems (see Chapter 12), users can define operations on data as part of the database
    definitions. An operation (also called a function or method) is specified in two
    parts. The interface (or signature) of an operation includes the operation name and
    the data types of its arguments (or parameters). The implementation (or method) of
    the operation is specified separately and can be changed without affecting the interface. User application programs can operate on the data by invoking these operations through their names and arguments, regardless of how the operations are
    implemented. This may be termed program-operation independence.
    The characteristic that allows program-data independence and program-operation
    independence is called data abstraction. A DBMS provides users with a conceptual
    representation of data that does not include many of the details of how the data is
    stored or how the operations are implemented. Informally, a data model is a type of
    data abstraction that is used to provide this conceptual representation. The data
    model uses logical concepts, such as objects, their properties, and their interrelationships, that may be easier for most users to understand than computer storage
    concepts. Hence, the data model hides storage and implementation details that are
    not of interest to most database users.
    Looking at the example in Figures 1.2 and 1.3, the internal implementation of the
    STUDENT file may be defined by its record length—the number of characters
    (bytes) in each record—and each data item may be specified by its starting byte
    within a record and its length in bytes. The STUDENT record would thus be represented as shown in Figure 1.4. But a typical database user is not concerned with the
    location of each data item within a record or its length; rather, the user is concerned
    that when a reference is made to Name of STUDENT, the correct value is returned.
    A conceptual representation of the STUDENT records is shown in Figure 1.2. Many
    other details of file storage organization—such as the access paths specified on a

    1.3 Characteristics of the Database Approach

    Starting Position in Record

    Length in Characters (bytes)

    Name

    Data Item Name

    1

    30

    Student_number

    31

    4

    Class

    35

    1

    Major

    36

    4

    13

    Figure 1.4
    Internal storage format
    for a STUDENT record,
    based on the database
    catalog in Figure 1.3.

    file—can be hidden from database users by the DBMS; we discuss storage details in
    Chapters 16 and 17.
    In the database approach, the detailed structure and organization of each file are
    stored in the catalog. Database users and application programs refer to the conceptual representation of the files, and the DBMS extracts the details of file storage
    from the catalog when these are needed by the DBMS file access modules. Many
    data models can be used to provide this data abstraction to database users. A major
    part of this text is devoted to presenting various data models and the concepts they
    use to abstract the representation of data.
    In object-oriented and object-relational databases, the abstraction process includes
    not only the data structure but also the operations on the data. These operations
    provide an abstraction of miniworld activities commonly understood by the users.
    For example, an operation CALCULATE_GPA can be applied to a STUDENT object
    to calculate the grade point average. Such operations can be invoked by the user
    queries or application programs without having to know the details of how the
    operations are implemented.

    1.3.3 Support of Multiple Views of the Data
    A database typically has many types of users, each of whom may require a different
    perspective or view of the database. A view may be a subset of the database or it may
    contain virtual data that is derived from the database files but is not explicitly stored.
    Some users may not need to be aware of whether the data they refer to is stored or
    derived. A multiuser DBMS whose users have a variety of distinct applications must
    provide facilities for defining multiple views. For example, one user of the database
    of Figure 1.2 may be interested only in accessing and printing the transcript of each
    student; the view for this user is shown in Figure 1.5(a). A second user, who is interested only in checking that students have taken all the prerequisites of each course
    for which the student registers, may require the view shown in Figure 1.5(b).

    1.3.4 Sharing of Data and Multiuser Transaction Processing
    A multiuser DBMS, as its name implies, must allow multiple users to access the
    database at the same time. This is essential if data for multiple applications is to be
    integrated and maintained in a single database. The DBMS must include concurrency
    control software to ensure that several users trying to update the same data

    14

    Chapter 1 Databases and Database Users

    TRANSCRIPT
    Student_name
    Smith

    Brown
    (a)

    Student_transcript
    Course_number

    Grade

    Semester

    Year

    Section_id

    CS1310

    C

    Fall

    08

    119

    MATH2410

    B

    Fall

    08

    112

    MATH2410

    A

    Fall

    07

    85

    CS1310

    A

    Fall

    07

    92

    CS3320

    B

    Spring

    08

    102

    CS3380

    A

    Fall

    08

    135

    COURSE_PREREQUISITES
    Course_name

    (b)

    Course_number

    Database

    CS3380

    Data Structures

    CS3320

    Prerequisites
    CS3320
    MATH2410
    CS1310

    Figure 1.5
    Two views derived from the database in Figure 1.2. (a) The TRANSCRIPT view.
    (b) The COURSE_PREREQUISITES view.

    do so in a controlled manner so that the result of the updates is correct. For example, when several reservation agents try to assign a seat on an airline flight, the
    DBMS should ensure that each seat can be accessed by only one agent at a time for
    assignment to a passenger. These types of applications are generally called online
    transaction processing (OLTP) applications. A fundamental role of multiuser
    DBMS software is to ensure that concurrent transactions operate correctly and
    efficiently.
    The concept of a transaction has become central to many database applications. A
    transaction is an executing program or process that includes one or more database
    accesses, such as reading or updating of database records. Each transaction is supposed to execute a logically correct database access if executed in its entirety without interference from other transactions. The DBMS must enforce several
    transaction properties. The isolation property ensures that each transaction
    appears to execute in isolation from other transactions, even though hundreds of
    transactions may be executing concurrently. The atomicity property ensures that
    either all the database operations in a transaction are executed or none are. We discuss transactions in detail in Part 9.
    The preceding characteristics are important in distinguishing a DBMS from traditional file-processing software. In Section 1.6 we discuss additional features that
    characterize a DBMS. First, however, we categorize the different types of people
    who work in a database system environment.

    1.4 Actors on the Scene

    1.4 Actors on the Scene
    For a small personal database, such as the list of addresses discussed in Section 1.1,
    one person typically defines, constructs, and manipulates the database, and there is
    no sharing. However, in large organizations, many people are involved in the
    design, use, and maintenance of a large database with hundreds or thousands of
    users. In this section we identify the people whose jobs involve the day-to-day use
    of a large database; we call them the actors on the scene. In Section 1.5 we consider
    people who may be called workers behind the scene—those who work to maintain
    the database system environment but who are not actively interested in the database contents as part of their daily job.

    1.4.1 Database Administrators
    In any organization where many people use the same resources, there is a need for
    a chief administrator to oversee and manage these resources. In a database environment, the primary resource is the database itself, and the secondary resource is the
    DBMS and related software. Administering these resources is the responsibility of
    the database administrator (DBA). The DBA is responsible for authorizing access
    to the database, coordinating and monitoring its use, and acquiring software and
    hardware resources as needed. The DBA is accountable for problems such as security breaches and poor system response time. In large organizations, the DBA is
    assisted by a staff that carries out these functions.

    1.4.2 Database Designers
    Database designers are responsible for identifying the data to be stored in the database and for choosing appropriate structures to represent and store this data. These
    tasks are mostly undertaken before the database is actually implemented and populated with data. It is the responsibility of database designers to communicate with
    all prospective database users in order to understand their requirements and to create a design that meets these requirements. In many cases, the designers are on the
    staff of the DBA and may be assigned other staff responsibilities after the database
    design is completed. Database designers typically interact with each potential group
    of users and develop views of the database that meet the data and processing
    requirements of these groups. Each view is then analyzed and integrated with the
    views of other user groups. The final database design must be capable of supporting
    the requirements of all user groups.

    1.4.3 End Users
    End users are the people whose jobs require access to the database for querying,
    updating, and generating reports; the database primarily exists for their use. There
    are several categories of end users:

    Casual end users occasionally access the database, but they may need different information each time. They use a sophisticated database query interface

    15

    16

    Chapter 1 Databases and Database Users

    to specify their requests and are typically middle- or high-level managers or
    other occasional browsers.
    ■ Naive or parametric end users make up a sizable portion of database
    end users. Their main job function revolves around constantly querying
    and updating the database, using standard types of queries and updates—
    called canned transactions—that have been carefully programmed and
    tested. Many of these tasks are now available as mobile apps for use with
    mobile devices. The tasks that such users perform are varied. A few
    examples are:
    Bank customers and tellers check account balances and post withdrawals
    and deposits.
    Reservation agents or customers for airlines, hotels, and car rental companies check availability for a given request and make reservations.
    Employees at receiving stations for shipping companies enter package
    identifications via bar codes and descriptive information through buttons
    to update a central database of received and in-transit packages.
    Social media users post and read items on social media Web sites.
    ■ Sophisticated end users include engineers, scientists, business analysts, and
    others who thoroughly familiarize themselves with the facilities of the DBMS
    in order to implement their own applications to meet their complex requirements.
    ■ Standalone users maintain personal databases by using ready-made program packages that provide easy-to-use menu-based or graphics-based
    interfaces. An example is the user of a financial software package that stores
    a variety of personal financial data.
    A typical DBMS provides multiple facilities to access a database. Naive end users
    need to learn very little about the facilities provided by the DBMS; they simply have
    to understand the user interfaces of the mobile apps or standard transactions
    designed and implemented for their use. Casual users learn only a few facilities that
    they may use repeatedly. Sophisticated users try to learn most of the DBMS facilities
    in order to achieve their complex requirements. Standalone users typically become
    very proficient in using a specific software package.

    1.4.4 System Analysts and Application Programmers
    (Software Engineers)
    System analysts determine the requirements of end users, especially naive and
    parametric end users, and develop specifications for standard canned transactions
    that meet these requirements. Application programmers implement these specifications as programs; then they test, debug, document, and maintain these canned
    transactions. Such analysts and programmers—commonly referred to as software
    developers or software engineers—should be familiar with the full range of capabilities provided by the DBMS to accomplish their tasks.

    1.6 Advantages of Using the DBMS Approach

    1.5 Workers behind the Scene
    In addition to those who design, use, and administer a database, others are associated with the design, development, and operation of the DBMS software and system
    environment. These persons are typically not interested in the database content
    itself. We call them the workers behind the scene, and they include the following
    categories:

    DBMS system designers and implementers design and implement the
    DBMS modules and interfaces as a software package. A DBMS is a very
    complex software system that consists of many components, or modules,
    including modules for implementing the catalog, query language processing, interface processing, accessing and buffering data, controlling concurrency, and handling data recovery and security. The DBMS must interface
    with other system software, such as the operating system and compilers for
    various programming languages.
    ■ Tool developers design and implement tools—the software packages that
    facilitate database modeling and design, database system design, and
    improved performance. Tools are optional packages that are often purchased separately. They include packages for database design, performance
    monitoring, natural language or graphical interfaces, prototyping, simulation, and test data generation. In many cases, independent software vendors
    develop and market these tools.
    ■ Operators and maintenance personnel (system administration personnel)
    are responsible for the actual running and maintenance of the hardware and
    software environment for the database system.
    Although these categories of workers behind the scene are instrumental in making
    the database system available to end users, they typically do not use the database
    contents for their own purposes.

    1.6 Advantages of Using the DBMS Approach
    In this section we discuss some additional advantages of using a DBMS and the
    capabilities that a good DBMS should possess. These capabilities are in addition to
    the four main characteristics discussed in Section 1.3. The DBA must utilize these
    capabilities to accomplish a variety of objectives related to the design, administration, and use of a large multiuser database.

    1.6.1 Controlling Redundancy
    In traditional software development utilizing file processing, every user group
    maintains its own files for handling its data-processing applications. For example,
    consider the UNIVERSITY database example of Section 1.2; here, two groups of
    users might be the course registration personnel and the accounting office. In the
    traditional approach, each group independently keeps files on students. The

    17

    18

    Chapter 1 Databases and Database Users

    accounting office keeps data on registration and related billing information,
    whereas the registration office keeps track of student courses and grades. Other
    groups may further duplicate some or all of the same data in their own files.
    This redundancy in storing the same data multiple times leads to several problems.
    First, there is the need to perform a single logical update—such as entering data on
    a new student—multiple times: once for each file where student data is recorded.
    This leads to duplication of effort. Second, storage space is wasted when the same
    data is stored repeatedly, and this problem may be serious for large databases.
    Third, files that represent the same data may become inconsistent. This may happen
    because an update is applied to some of the files but not to others. Even if an
    update—such as adding a new student—is applied to all the appropriate files, the
    data concerning the student may still be inconsistent because the updates are applied
    independently by each user group. For example, one user group may enter a student’s birth date erroneously as ‘JAN-19-1988’, whereas the other user groups may
    enter the correct value of ‘JAN-29-1988’.
    In the database approach, the views of different user groups are integrated during
    database design. Ideally, we should have a database design that stores each logical
    data item—such as a student’s name or birth date—in only one place in the database. This is known as data normalization, and it ensures consistency and saves
    storage space (data normalization is described in Part 6 of the text).
    However, in practice, it is sometimes necessary to use controlled redundancy to
    improve the performance of queries. For example, we may store Student_name and
    Course_number redundantly in a GRADE_REPORT file (Figure 1.6(a)) because
    whenever we retrieve a GRADE_REPORT record, we want to retrieve the student
    name and course number along with the grade, student number, and section identifier. By placing all the data together, we do not have to search multiple files to collect this data. This is known as denormalization. In such cases, the DBMS should

    Figure 1.6
    Redundant storage
    of Student_name
    and Course_name in
    GRADE_REPORT.
    (a) Consistent data.
    (b) Inconsistent
    record.

    GRADE_REPORT

    (a)

    Student_number

    Student_name

    Section_identifier Course_number

    Grade

    17

    Smith

    112

    MATH2410

    B

    17

    Smith

    119

    CS1310

    C

    8

    Brown

    85

    MATH2410

    A

    8

    Brown

    92

    CS1310

    A

    8

    Brown

    102

    CS3320

    B

    8

    Brown

    135

    CS3380

    A

    GRADE_REPORT

    (b)

    Student_number

    Student_name

    17

    Brown

    Section_identifier Course_number
    112

    MATH2410

    Grade
    B

    1.6 Advantages of Using the DBMS Approach

    have the capability to control this redundancy in order to prohibit inconsistencies among the files. This may be done by automatically checking that the
    Student_name–Student_number values in any GRADE_REPORT record in Figure 1.6(a) match one of the Name–Student_number values of a STUDENT record (Figure 1.2). Similarly, the Section_identifier–Course_number values in GRADE_REPORT
    can be checked against SECTION records. Such checks can be specified to the DBMS
    during database design and automatically enforced by the DBMS whenever the
    GRADE_REPORT file is updated. Figure 1.6(b) shows a GRADE_REPORT record that
    is inconsistent with the STUDENT file in Figure 1.2; this kind of error may be entered
    if the redundancy is not controlled. Can you tell which part is inconsistent?

    1.6.2 Restricting Unauthorized Access
    When multiple users share a large database, it is likely that most users will not be
    authorized to access all information in the database. For example, financial data
    such as salaries and bonuses is often considered confidential, and only authorized persons are allowed to access such data. In addition, some users may only
    be permitted to retrieve data, whereas others are allowed to retrieve and update.
    Hence, the type of access operation—retrieval or update—must also be controlled. Typically, users or user groups are given account numbers protected by
    passwords, which they can use to gain access to the database. A DBMS should
    provide a security and authorization subsystem, which the DBA uses to create
    accounts and to specify account restrictions. Then, the DBMS should enforce
    these restrictions automatically. Notice that we can apply similar controls to the
    DBMS software. For example, only the DBA’s staff may be allowed to use certain
    privileged software, such as the software for creating new accounts. Similarly,
    parametric users may be allowed to access the database only through the predefined apps or canned transactions developed for their use. We discuss database security and authorization in Chapter 30.

    1.6.3 Providing Persistent Storage for Program Objects
    Databases can be used to provide persistent storage for program objects and data
    structures. This is one of the main reasons for object-oriented database systems
    (see Chapter 12). Programming languages typically have complex data structures,
    such as structs or class definitions in C++ or Java. The values of program variables
    or objects are discarded once a program terminates, unless the programmer explicitly stores them in permanent files, which often involves converting these complex
    structures into a format suitable for file storage. When the need arises to read this
    data once more, the programmer must convert from the file format to the program
    variable or object structure. Object-oriented database systems are compatible with
    programming languages such as C++ and Java, and the DBMS software automatically performs any necessary conversions. Hence, a complex object in C++
    can be stored permanently in an object-oriented DBMS. Such an object is said to
    be persistent, since it survives the termination of program execution and can
    later be directly retrieved by another program.

    19

    20

    Chapter 1 Databases and Database Users

    The persistent storage of program objects and data structures is an important function of database systems. Traditional database systems often suffered from the socalled impedance mismatch problem, since the data structures provided by the
    DBMS were incompatible with the programming language’s data structures.
    Object-oriented database systems typically offer data structure compatibility with
    one or more object-oriented programming languages.

    1.6.4 Providing Storage Structures and Search
    Techniques for Efficient Query Processing
    Database systems must provide capabilities for efficiently executing queries and
    updates. Because the database is typically stored on disk, the DBMS must provide
    specialized data structures and search techniques to speed up disk search for the
    desired records. Auxiliary files called indexes are often used for this purpose.
    Indexes are typically based on tree data structures or hash data structures that are
    suitably modified for disk search. In order to process the database records needed
    by a particular query, those records must be copied from disk to main memory.
    Therefore, the DBMS often has a buffering or caching module that maintains parts
    of the database in main memory buffers. In general, the operating system is responsible for disk-to-memory buffering. However, because data buffering is crucial to
    the DBMS performance, most DBMSs do their own data buffering.
    The query processing and optimization module of the DBMS is responsible for
    choosing an efficient query execution plan for each query based on the existing
    storage structures. The choice of which indexes to create and maintain is part of
    physical database design and tuning, which is one of the responsibilities of the DBA
    staff. We discuss query processing and optimization in Part 8 of the text.

    1.6.5 Providing Backup and Recovery
    A DBMS must provide facilities for recovering from hardware or software failures.
    The backup and recovery subsystem of the DBMS is responsible for recovery. For
    example, if the computer system fails in the middle of a complex update transaction, the recovery subsystem is responsible for making sure that the database is
    restored to the state it was in before the transaction started executing. Disk backup
    is also necessary in case of a catastrophic disk failure. We discuss recovery and
    backup in Chapter 22.

    1.6.6 Providing Multiple User Interfaces
    Because many types of users with varying levels of technical knowledge use a database, a DBMS should provide a variety of user interfaces. These include apps for
    mobile users, query languages for casual users, programming language interfaces
    for application programmers, forms and command codes for parametric users,
    and menu-driven interfaces and natural language interfaces for standalone users.
    Both forms-style interfaces and menu-driven interfaces are commonly known as

    1.6 Advantages of Using the DBMS Approach

    graphical user interfaces (GUIs). Many specialized languages and environments
    exist for specifying GUIs. Capabilities for providing Web GUI interfaces to a
    database—or Web-enabling a database—are also quite common.

    1.6.7 Representing Complex Relationships among Data
    A database may include numerous varieties of data that are interrelated in many
    ways. Consider the example shown in Figure 1.2. The record for ‘Brown’ in the
    STUDENT file is related to four records in the GRADE_REPORT file. Similarly,
    each section record is related to one course record and to a number of
    GRADE_REPORT records—one for each student who completed that section. A
    DBMS must have the capability to represent a variety of complex relationships
    among the data, to define new relationships as they arise, and to retrieve and
    update related data easily and efficiently.

    1.6.8 Enforcing Integrity Constraints
    Most database applications have certain integrity constraints that must hold for
    the data. A DBMS should provide capabilities for defining and enforcing these
    constraints. The simplest type of integrity constraint involves specifying a data
    type for each data item. For example, in Figure 1.3, we specified that the value of
    the Class data item within each STUDENT record must be a one-digit integer and
    that the value of Name must be a string of no more than 30 alphabetic characters.
    To restrict the value of Class between 1 and 5 would be an additional constraint
    that is not shown in the current catalog. A more complex type of constraint that
    frequently occurs involves specifying that a record in one file must be related to
    records in other files. For example, in Figure 1.2, we can specify that every section
    record must be related to a course record. This is known as a referential integrity
    constraint. Another type of constraint specifies uniqueness on data item values,
    such as every course record must have a unique value for Course_number. This is
    known as a key or uniqueness constraint. These constraints are derived from the
    meaning or semantics of the data and of the miniworld it represents. It is the
    responsibility of the database designers to identify integrity constraints during
    database design. Some constraints can be specified to the DBMS and automatically
    enforced. Other constraints may have to be checked by update programs or at the
    time of data entry. For typical large applications, it is customary to call such constraints business rules.
    A data item may be entered erroneously and still satisfy the specified integrity constraints. For example, if a student receives a grade of ‘A’ but a grade of ‘C’ is entered
    in the database, the DB…
    Purchase answer to see full
    attachment

    Share This Post

    Email
    WhatsApp
    Facebook
    Twitter
    LinkedIn
    Pinterest
    Reddit

    Order a Similar Paper and get 15% Discount on your First Order

    Related Questions

    Ct. Q.

    Description 1 Research on Diabetic Foot Disease Name of Student Institutional Affiliation Course Date 2 Research on Diabetic Foot Disease Introduction A severe and complex consequence of diabetes mellitus known as diabetic foot disease (DFD) includes infections, foot ulcers, peripheral neuropathy, and peripheral arterial disease. DFD can have profound effects

    Management of technology 325

    Description Please answer it as its written in the file Chapter Ten Organizing for Innovation Organizing for Innovation at Google Google was founded in 1998 by two Stanford Ph.D. students, Sergey Brin and Larry Page, who had developed a formula for rank ordering random search results by relevancy. Their formula

    Management Question

    Description Follow the instructions and the number of words for each question in the assignment file. Similarity and copying and pasting are prohibited. The grade will be zero. Add sources. ‫المملكة العربية السعودية‬ ‫وزارة التعليم‬ ‫الجامعة السعودية اإللكترونية‬ Kingdom of Saudi Arabia Ministry of Education Saudi Electronic University College of

    mgt402 Entrepreneurship and small bus

    Description Please solve it as it is written on the file Case 1 Panda Sunglasses How Should a Start-Up Business with a Social Mission Market Its Sunglasses with Bamboo Frames? V incent Ko showed his entrepreneurial potential in high school in Rockville, Maryland, when, as a young hockey player, he

    dis

    Description 2 dis Last week, you completed a personal development plan. For this week’s discussion, please consider how you would use employee development plans as a motivational tool in your organization. Consider how data from personal development plans could be used track organizational compliance to accreditation and credentialing standards. How

    CTA – Engaging the Organization in the Change Process

    Description Critical Thinking Assignment Module 12: Engaging the Organization in the Change Process As we move through this module, consider the ramifications related to unethical conduct that some companies have experienced. Also consider how that conduct could have been avoided by applying some of the lessons learned in this module.

    Algorithms & Data Structures Question

    Description Saudi Electronic University Health Sciences Collage Master of Healthcare Administration HCM 600 Research Project Examining The Long-Term Interventions Effects of Telepsychiatry on Chronic Mental Health Conditions in Saudi Arabia: Systematic Review A Research Project Submitted in Partial Fulfillment of the Requirements for the Degree (MSc of Healthcare Administration) Presented

    373 ASS 60

    Description SEE College of Health Sciences Department of Public Health PHC 241 Fall 2021 Paper assignment rubric Thinking Content Presentation Criteria Proficiency 2 Some Proficiency 1.75 Limited Proficiency 1.50 No Proficiency 1 The purpose and focus are clear and consistent Punctuation, grammar, spelling, and mechanics are appropriate Information and evidence

    Management Question

    Description I have as assignment please make sure there is no plagiarism ‫المملكة العربية السعودية‬ ‫وزارة التعليم‬ ‫الجامعة السعودية اإللكترونية‬ Kingdom of Saudi Arabia Ministry of Education Saudi Electronic University College of Administrative and Financial Sciences Assignment-3 MGT324-Public Management Due Date: 30/11/2024 @ 23:59 Course Name: Public Management Course Code:

    Management Question

    Description I have as assignment please make sure there is no plagiarism ‫المملكة العربية السعودية‬ ‫وزارة التعليم‬ ‫الجامعة السعودية اإللكترونية‬ Kingdom of Saudi Arabia Ministry of Education Saudi Electronic University College of Administrative and Financial Sciences Assignment 3 Management of Technology (MGT 325) Deadline: 30/11/2024 @ 23:59 Course Name: Management

    Management Question

    Description I have as assignment please make sure there is no plagiarism ‫المملكة العربية السعودية‬ ‫وزارة التعليم‬ ‫الجامعة السعودية اإللكترونية‬ Kingdom of Saudi Arabia Ministry of Education Saudi Electronic University College of Administrative and Financial Sciences Assignment 3 Strategic Management (MGT 401) Due Date: 30/11/2024 @ 23:59 Course Name: Strategic

    Management Question

    Description I want to solve the attached assignment, and please follow the instructions described on the main page, and I hope that there will be no similar version of another solution, and I want the solution in detail, not in short ‫المملكة العربية السعودية‬ ‫وزارة التعليم‬ ‫الجامعة السعودية اإللكترونية‬ Kingdom

    mng403.badeer

    Description Hello, I hope you pay attention. I want correct and perfect work. I want all the questions to be solved correctly and completely without plagiarism. I emphasize this important point. Any percentage of plagiarism will lead to the cancellation of the work. I want a correct solution with references

    periodic report

    Description I did a product journey map The secret customer A correction plan for a product website An internal campaign to raise awareness about the customer experience and its importance Participated with the research and marketing department to work on a survey to measure the satisfaction of internal employees Trained

    Management Question

    Description ‫المملكة العربية السعودية‬ ‫وزارة التعليم‬ ‫الجامعة السعودية اإللكترونية‬ Kingdom of Saudi Arabia Ministry of Education Saudi Electronic University College of Administrative and Financial Sciences Assignment 3 MGT101 (1st Term 2024-2025) Deadline: 30/11/2024 @ 23:59 (To be released to students on BB in Week 10) Course Name: Principles of Management

    Finance Question

    Description I want to solve the attached assignment and please follow the instructions described on the main page of the duty and I hope that there are no copies or similarities and if I want the solution to be short, I want details ‫المملكة العربية السعودية‬ ‫وزارة التعليم‬ ‫الجامعة السعودية

    Mgt430 internship

    Description I’m working on my final report and presentation for my internship I need support. All details are attached please follow requirement. The report should be submitted within two weeks after you finish your Co-op training Program. In addition, the report should be approximately 3000 – 4000, single –spaced and

    Turning Around Cote Construction Company Case Study

    Description Read the “Turning Around Cote Construction Company” found at the end of Chapter 9 and follow these steps before answering the case study questions. In order to answer the case study questions you will apply the Change Path Model from Chapter 9 to the Cote Construction Company case. A