Our Services

Get 15% Discount on your First Order

[rank_math_breadcrumb]

Fundamentals of Database IT403

Description

  • You must submit two separate copies (one Word file and one PDF file) using the Assignment Template on
  • Blackboard via the allocated folder. These files must not be in compressed format.

  • It is your responsibility to check and make sure that you have uploaded both the correct files.
  • Zero mark will be given if you try to bypass the SafeAssign (e.g. misspell words, remove spaces between
  • words, hide characters, use different character sets, convert text into image or languages other than English

    or any kind of manipulation).

  • Email submission will not be accepted.
  • You are advised to make your work clear and well-presented. This includes filling your information on the cover
  • page.

  • You must use this template, failing which will result in zero mark.
  • You MUST show all your work, and text must not be converted into an image, unless specified otherwise by
  • the question.

  • Late submission will result in ZERO mark.
  • The work should be your own, copying from students or other resources will result in ZERO mark.
  • Use Times New Roman font for all your answers.
  • Textbook : Ramez Elmasri & Shamkant Navathe (2017), Fundamentals of Database Systems, Global Edition, 7th Edition, Person, ISBN-13: 978-0133970777 | ISBN-10: 0133970779.
  • ‫الجامعة السعودية االلكترونية‬
    ‫الجامعة السعودية االلكترونية‬

    ‫‪26/12/2021‬‬

    College of Computing and Informatics
    Data Science Pre-Master Program
    IT244
    Introduction to Database

    IT244
    Introduction to Database
    Week 1
    Orientation and Introduction

    Contents

    1. Understand the Major Objective of Saudi Vision 2030
    2. Demonstrate the Saudi Digital Library and searching for
    knowledge
    3. Databases and Database Users
    4. Introduction to NOSQL

    Weekly Learning Outcomes

    1. Understand the Major Objective of Saudi Vision 2030.
    2. Demonstrate the Saudi Digital Library and searching for
    knowledge.
    3. Explain the basic concepts of Database.

    Required Reading
    1.
    2.

    Chapter 1: The Complete chapter
    Chapter 24: Introduction to NOSQL
    (Fundamentals of Database Systems, Global Edition, 7th Edition (2017) by Ramez
    Elmasri & Shamkant Navathe)

    Recommended Reading
    Saudi Digital Library:
    Introduction to Database Management Systems:

    This Presentation is mainly dependent on the textbook: Fundamentals of Database Systems, Global Edition, 7th Edition (2017) by Ramez Elmasri & Shamkant Navathe

    • Saudi Vision 2030

    Weekly
    Learning
    Outcomes

    Saudi Vision 2030
    • Under the leadership of the Custodian of the Two Holy Mosques,
    Vision 2030 was launched, a roadmap drawn up by His Royal
    Highness the Crown Prince, to harness the strengths God
    bestowed upon us – our strategic position, investment power and
    place at the center of Arab and Islamic worlds. The full attention
    of the Kingdom, and our Leadership, is on harnessing our
    potential to achieve our ambitions.

    Reference:

    Saudi Vision 2030
    • Vision 2030 Draws on The Nation’s Intrinsic Strengths
    1. Saudi Arabia is the land of the Two Holy Mosques which positions the
    Kingdom at the heart of the Arab and Islamic worlds
    2. Saudi Arabia is using its investment power to create a more diverse
    and sustainable economy
    3. The Kingdom is using its strategic location to build its role as an
    integral driver of international trade and to connect three continents:
    Africa, Asia and Europe

    Reference:

    • Saudi Digital Library

    Weekly
    Learning
    Outcomes

    Saudi Digital Library
    • Saudi Digital Library, is the largest academic gathering of
    information sources in the Arab world, with more than (310،000)
    scientific reference, covering all academic disciplines, and the
    continuous updating of the content in this; thus achieving huge
    accumulation cognitive in the long run. Library has contracted
    with more than 300 global publisher. The library won the award
    for the Arab Federation for Libraries and Information ‘know’ for
    outstanding projects in the Arab world in 2010.

    Reference:

    Saudi Digital Library
    • Advantages:
    • One central management, manages this huge content, and constantly
    updated.
    • Common share for the benefit of, any University would benefit other
    universities that are now available to the other, in any scientific field.
    • Enhance the status of universities when evaluating, for Academic
    Accreditation, and through sources rich, modern, and publish the best
    Global Publishers.
    • Bridging the gap between Saudi universities, where emerging
    universities can get the same service, you get major Saudi universities.

    Reference:

    • Databases and Database Users

    Chapter 1 Outline
    Types of Databases and Database Applications
     Basic Definitions
     Typical DBMS Functionality
     Example of a Database (UNIVERSITY)
     Main Characteristics of the Database Approach
     Types of Database Users
     Advantages of Using the Database Approach
     When Not to Use Databases
     Introduction to NOSQL

    Basic Definitions(1)

    Data
     Known facts that can be recorded and have an implicit meaning.

    Database
     A collection of related data in a DBMS.

    Example: the names, telephone numbers, and addresses of the people you know

    Example: the list of names and addresses, and computerized catalog of a large library

    Defining a database
     Involves specifying the data types, structures, and constraints of the data to be stored in the
    database.
    Meta-data
     The database definition or descriptive information is also stored by the DBMS in the form of
    a database catalog or dictionary.
    Database Management System (DBMS)
     A computerized system that enables users to create and maintain a database. It is a generalpurpose software system that facilitates the processes of defining, constructing,
    manipulating, and sharing databases among various users and applications.
    Database System
     The database and DBMS software together; Sometimes, the application programs and
    interfaces are also included.

    Basic Definitions(2)

    Manipulating a database
     Includes querying the database to retrieve specific data, updating the database, and
    generating reports from the data.
    Sharing a database
     Allows multiple users and programs to access the database simultaneously.
    Application program
     Accesses the database by sending queries or requests for data to the DBMS.
    Query
     A query causes some data to be retrieved from the database.
    Transaction
     May cause some data to be read from and some data to be written into the database.
    Protection
     May includes system protection against hardware or software malfunction (or crashes) and
    security protection against unauthorized or malicious access.
    Maintenance
     A typical large database has a life cycle of many years, so the DBMS must be allowing the
    system to evolve as requirements change over time.

    Simplified database system environment

    Implicit Properties of a Database
    A database represents some aspect of the real world, called
    the miniworld or the universe of discourse (UoD). Changes to
    the miniworld are reflected in the database.
     A database is a logically coherent collection of data with some
    inherent meaning. A random assortment of data cannot
    correctly be referred to as a database.
     A database is designed, built, and populated with data for a
    specific purpose. It has an intended group of users and some
    preconceived applications in which these users are interested.

    Example of a Database UNIVERSITY
    Application(1)

    Mini-world for the example:

    Part of a UNIVERSITY environment.

    Some mini-world entities:

    INSTRUCTORs
     STUDENTs
     DEPARTMENTs
     COURSEs
     SECTIONs (of COURSEs)
    Some mini-world relationships:





    SECTIONs are of specific COURSEs
    STUDENTs take SECTIONs
    COURSEs have prerequisite COURSEs
    INSTRUCTORs teach SECTIONs
    COURSEs are offered by DEPARTMENTs
    STUDENTs major in DEPARTMENTs

    Note: The above entities and relationships are typically expressed in the ENTITYRELATIONSHIP data model

    Example of a Database UNIVERSITY
    Application(2)

    Example of a Database UNIVERSITY
    Application(3)

    Typical DBMS Functionality
    Define a particular database in terms of its data types, structures, and
    constraints
     Construct or Load the initial database contents on a secondary storage
    medium
     Manipulating the database:

    Retrieval: Querying, generating reports
     Modification: Insertions, deletions and updates to its content
     Accessing/changing the database through Web applications

    Processing and Sharing by a set of concurrent users and application
    programs
     Protection or Security measures to prevent unauthorized access
     “Active” processing to take internal actions on data
     Presentation and Visualization of data
     Maintaining the database and associated programs over its lifetime

    Main Characteristics of the Database
    Approach(1)

    Self-describing nature of a database system:
    A DBMS catalog stores the description of a particular database
     The description is called meta-data
     This allows the DBMS software to be integrated with different database
    applications

    Insulation between programs and data:

    Allows changing data structures and data storage organization without having to
    change the DBMS access programs.

    Accomplished through data abstraction
     A data model is used to hide storage details and present the users with a conceptual
    view of the database.

    Programs refer to the data model constructs rather than data storage details Called

    program-data independence.

    Main Characteristics of the Database Approach (2)

    Support of multiple views of the data:

    Each user may see a different view of the database, which
    describes only the data of interest to that user.
     Sharing of data and multi-user transaction processing:

    Allowing a set of user transactions to access and update the
    database concurrently (at the same time).
     Concurrency control within the DBMS guarantees that each
    transaction is correctly executed or aborted
     Recovery subsystem ensures each completed transaction has its
    effect permanently recorded in the database
     OLTP (Online Transaction Processing) is a major part of database
    applications (allows hundreds of concurrent transactions to
    execute per second)

    Example of meta-date in a simplified database
    catalog

    Types of Database Users (Actors on the
    scene)
    Database administrators:
     Responsible for authorizing/controlling access to the database; coordinating
    and monitoring its use; acquiring software and hardware resources; and
    monitoring efficiency of operations. The DBA is accountable for security
    breaches and poor system response time.
     Database Designers:
     Responsible for defining database structure, constraints, and transactions;
    communicate with users to understand their needs.
     End-users: Use the database for queries, reports, and updating the database
    content. Can be categorized into:
     Casual end-users: access database occasionally when needed
     Naïve (or Parametric) end-users: largest section of end-user population.

    Sophisticated end-users:

    Use previously implemented and tested programs (called “canned transactions”) to
    access/update the database. Examples are bank-tellers or hotel reservation clerks or sales clerks.
    These include business analysts, scientists, engineers, etc. Many use tools of software packages
    that work closely with the stored database.

    Stand-alone end-users:

    Mostly maintain personal databases using ready-to-use packaged applications.

    Types of Database Applications

    Traditional Applications:

    Numeric and Textual Databases in Business Applications

    More Recent Applications:
    Multimedia Databases (images, videos, voice, etc.)
     Geographic Information Systems (GIS)
     Data Warehouses
     Real-time and Active Databases
     Many other applications

    Advantages of Using the Database
    Approach
    Controlling redundancy in data storage and in development and
    maintenance efforts.
     Restricting unauthorized access to data.
     Providing persistent storage for program Objects
     Providing Storage Structures (e.g. indexes) for efficient Query
    Processing
     Providing backup and recovery services.
     Providing multiple interfaces to different classes of users.
     Representing complex relationships among data.
     Enforcing integrity constraints on the database.
     Permitting inferencing and actions using rules and triggers
     Allowing multiple “views” of the same data

    Additional Implications of Using the Database
    Approach

    Potential for enforcing standards:

    Reduced application development time:

    Storage structures may evolve to improve performance, or because of
    new requirements.

    Availability of up-to-date information:

    The time needed to add each new application is reduced.

    Flexibility to change data storage structures:

    Crucial for the success of database applications in large organizations.
    Standards refer to data item names, display formats, screens, report
    structures, meta-data, etc.

    Extremely important for on-line transaction systems such as airline, hotel, car
    reservations.

    Economies of scale:

    Wasteful overlap of resources and personnel can be avoided by consolidating
    data and applications across departments.

    Historical Development of Database
    Technology

    Early Database Applications using Hierarchical and Network Systems:

    Relational Model based Systems:

    Relational model was introduced in 1970, and heavily researched and experimented with
    at IBM Research and several universities. Relational DBMS Products emerged in the early
    1980s and now exist on almost all types of computers, from small personal computers to
    large servers.

    Object-oriented and emerging applications:

    Starting in the mid-1960s and continuing through the 1970s and 1980s. Were based on
    three main paradigms: hierarchical systems, network model–based systems, and inverted
    file systems.

    Object Databases were introduced in late 1980s and early 1990s. Their use has not taken
    off much. Many relational DBMSs have incorporated object database concepts, leading
    to a new category called object-relational databases (ORDBs)
    Extended relational systems add further capabilities (e.g. for multimedia data, XML,
    spatial, and other data types)

    Data on the Web and E-commerce Applications:

    Starting in the 1990s, e-commerce emerged as a major application on the Web. The
    critical information on e-commerce Web pages is dynamically extracted data from
    DBMSs, such as flight information, product prices, and product availability.
    The eXtended Markup Language (XML) is one standard for interchanging data among
    various types of databases and Web pages.

    Extending Database Capabilities

    New functionality is being added to DBMSs in the following
    areas:
     Scientific Applications
     XML (eXtensible Markup Language)
     Image Storage and Management
     Audio and Video Data Management
     Data Warehousing and Data Mining
     Spatial Data Management and Geographic Information
    Systems
     Time Series and Historical Data Management
     Collecting and fusing data from distributed sensors

    When not to use a DBMS

    Main inhibitors (costs) of using a DBMS:
    High initial investment and possible need for additional hardware.
     Overhead for providing generality, security, concurrency control, recovery,
    and other functions.

    When a DBMS may be unnecessary:
    If the database and applications are simple, well defined, and not expected
    to change.
     If there are stringent real-time requirements that may not be met because
    of DBMS overhead.
     If access to data by multiple users is not required.

    When no DBMS may suffice:
    If the database system is not able to handle the complexity of data because of
    modeling limitations
     If the database users need special operations not supported by the DBMS
     When DBMS overhead makes it impossible to achieve the needed application
    performance

    • Introduction to NOSQL

    Adopted from slides and/or materials by P. Hoekstra, J. Lu, A. Lakshman, P. Malik, J. Lin, R. Sunderraman, T. Ivarsson, J. Pokorny, N. Lynch, S. Gilbert, J. Widom, R. Jin, P. McFadin, C. Nakhli, and R. Ho

    Background
    • Relational databases are suitable for conventional business applications
    • In the first decade of the twenty-first century, the proliferation of social
    media Web sites, large e-commerce companies, Web search indexes, and
    cloud storage/backup led to a surge in the amount of data stored on large
    databases and massive servers.
    • Employing DBMS/RDBMS to web-based application create problems
    • New types of database systems were necessary to manage these huge
    databases with fast search and retrieval as well as reliable and safe storage of
    nontraditional types of data, such as social media posts and tweets.
    • Web-based applications require more than the DBMS/RDBMS can provide
    • Explosion of social media sites with very large data size
    • Many cloud-based applications provide simple storage solution

    35

    Issues with scaling up
    • Best way to provide ACID and rich query model is to have the dataset
    on a single machine
    • Limits to scaling up (or vertical scaling: make a “single” machine
    more powerful)  dataset is just too big!
    • Scaling out (or horizontal scaling: adding more smaller/cheaper
    servers) is a better choice
    • Different approaches for horizontal scaling (multi-node database):
    • Master/Slave
    • Sharding (partitioning)

    36

    Scaling out RDBMS
    • Master/Slave
    • All writes are written to the master
    • All reads performed against the replicated slave databases
    • Critical reads may be incorrect as writes may not have been propagated down
    • Large datasets can pose problems as master needs to duplicate data to slaves

    • Sharding (Partitioning)
    • Scales well for both reads and writes
    • Not transparent, application needs to be partition-aware
    • Can no longer have relationships/joins across partitions
    • Loss of referential integrity across shards

    • Other Ways
    • Multi-Master replication
    • INSERT only, not UPDATES/DELETES
    • No JOINs, thereby reducing query time
    • This involves de-normalizing data
    • In-memory databases

    37

    NOSQL (1)
    • The Name:
    • Stands for Not Only SQL
    • The term NOSQL was introduced by Carl Strozzi in 1998 to name his file-based
    database
    • It was again re-introduced by Eric Evans when an event was organized to
    discuss open source distributed databases
    • Eric states that “… but the whole point of seeking alternatives is that you need
    to solve a problem that relational databases are a bad fit for. …”

    38

    NOSQL(2)
    • Key features (also advantages):
    • Non-relational
    • Don’t require schema
    • Data are replicated to multiple
    nodes (so, identical & fault-tolerant)
    and can be partitioned:
    • Down nodes easily replaced
    • No single point of failure

    • Horizontal scalable
    • Cheap, easy to implement
    (open-source)
    • Massive write performance
    • Fast key-value access

    39

    NOSQL(3)
    • Disadvantages:
    • Don’t fully support relational features
    • No join, group by, order by operations (except within partitions)
    • No referential integrity constraints across partitions

    • No declarative query language (e.g., SQL)  more programming
    • Relaxed ACID (see CAP theorem)  fewer guarantees
    • No easy integration with other applications that support SQL

    40

    NOSQL categories
    1. Key-value
    • Example: DynamoDB, Voldermort, Scalaris

    2. Document-based
    • Example: MongoDB, CouchDB

    3. Column-based
    • Example: BigTable, Cassandra, Hbased

    4. Graph-based
    • Example: Neo4J, InfoGrid

    • “No-schema” is a common characteristics of most NOSQL storage
    systems
    • Provide “flexible” data types
    41

    Key-value
    • Focus on scaling to huge amounts of data
    • Designed to handle massive load

    • Based on Amazon’s dynamo paper
    • Data model: (global) collection of Key-value pairs
    • Dynamo ring partitioning and replication
    • Example: (DynamoDB)
    • items having one or more attributes (name, value)
    • An attribute can be single-valued or multi-valued like set.
    • items are combined into a table

    • Basic API access:
    • get(key): extract the value given a key
    • put(key, value): create or update the value given its key
    • delete(key): remove the key and its associated value
    • execute(key, operation, parameters): invoke an operation to the value (given its key) which is a
    special data structure (e.g. List, Set, Map …. etc)
    42

    Key-value
    Pros:
    • Very fast
    • Very scalable (horizontally distributed to nodes based on key)
    • Simple data model

    • Eventual consistency
    • Fault-tolerance

    Cons:
    • Can’t model more complex data structure such as objects

    43

    Document-based
    • Can model more complex objects
    • Inspired by Lotus Notes
    • Data model: collection of documents
    • Document: JSON (JavaScript Object Notation is a data model, key-value
    pairs, which supports objects, records, structs, lists, array, maps, dates,
    Boolean with nesting), XML, other semi-structured formats.
    • Example: (MongoDB) document
    • {Name:”Jaroslav”,
    Address:”Malostranske nám. 25, 118 00 Praha 1”,
    Grandchildren: {Claire: “7”, Barbara: “6”, “Magda: “3”, “Kirsten: “1”, “Otis: “3”, Richard:
    “1“}
    Phones: [ “123-456-7890”, “234-567-8963” ]
    }
    44

    Document-based
    Name

    MongoDB

    Couchbase

    Producer

    10gen

    Couchbase1

    Data model

    object-structured
    documents stored in
    collections;
    each object has a primary
    key called ObjectId
    document as a list of
    named (structured) items
    (JSON document)

    Querying

    manipulations with objects in
    collections (find object or
    objects via simple selections
    and logical expressions,
    delete, update,)
    by key and key range, views
    via Javascript and
    MapReduce

    45

    Column-based
    • Based on Google’s BigTable paper
    • Like column oriented relational databases (store data in column order) but with a
    twist
    • Tables similarly to RDBMS, but handle semi-structured
    • Data model:
    • Collection of Column Families
    • Column family = (key, value) where value = set of related columns (standard, super)
    • indexed by row key, column key and timestamp

    allow key-value pairs to be stored (and retrieved on key) in a massively parallel system
    storing principle: big hashed distributed tables
    properties: partitioning (horizontally and/or vertically), high availability etc.
    completely transparent to application

    * Better: extendible records

    46

    Column-based
    • One column family can have variable
    numbers of columns
    • Cells within a column family are sorted “physically”
    • Very sparse, most cells have null values
    • Comparison: RDBMS vs column-based NOSQL
    • Query on multiple tables
    • RDBMS: must fetch data from several places on disk and glue together
    • Column-based NOSQL: only fetch column families of those columns that are required
    by a query (all columns in a column family are stored together on the disk, so
    multiple rows can be retrieved in one read operation  data locality)

    • Example: (Cassandra column family–timestamps removed for simplicity)
    UserProfile = {
    Cassandra = { emailAddress:”[email protected]” , age:”20”}
    TerryCho = { emailAddress:”[email protected]” , gender:”male”}

    Cath = { emailAddress:”[email protected]” , age:”20”,gender:”female”,address:”Seoul”}
    }
    47

    Column-based
    Name

    Producer

    BigTable

    Google

    HBase

    Apache

    Hypertable

    Data model
    set of couples (key, {value})

    groups of columns (a BigTable
    clone)
    Hypertable like BigTable

    CASSANDRA Apache
    (originally
    Facebook)
    PNUTS
    Yahoo

    columns, groups of columns
    corresponding to a key
    (supercolumns)
    (hashed or ordered) tables,
    typed arrays, flexible schema

    Querying
    selection (by combination of
    row, column, and time stamp
    ranges)
    JRUBY IRB-based shell
    (similar to SQL)
    HQL (Hypertext Query
    Language)
    simple selections on key,
    range queries, column or
    columns ranges
    selection and projection from a
    single table (retrieve an
    arbitrary single record by
    primary key, range queries,
    complex predicates, ordering,
    top-k)
    48

    Graph-based
    • Focus on modeling the structure of data (interconnectivity)
    • Scales to the complexity of data
    • Inspired by mathematical Graph Theory (G=(E,V))
    • Data model:
    • (Property Graph) nodes and edges

    • Nodes may have properties (including ID)
    • Edges may have labels or roles

    • Key-value pairs on both

    • Interfaces and query languages vary
    • Single-step vs path expressions vs full recursion
    • Example:
    • Neo4j, FlockDB, Pregel, InfoGrid …

    49

    Conclusion
    • NOSQL database cover only a part of data-intensive cloud applications
    (mainly Web applications)
    • Problems with cloud computing:
    • SaaS (Software as a Service or on-demand software) applications require enterpriselevel functionality, including ACID transactions, security, and other features
    associated with commercial RDBMS technology, i.e. NOSQL should not be the only
    option in the cloud

    • Hybrid solutions:
    • Voldemort with MySQL as one of storage backend
    • deal with NOSQL data as semi-structured data
     integrating RDBMS and NOSQL via SQL/XML

    50

    Main Reference
    1.
    2.

    Chapter 1: The Complete chapter
    Chapter 24: Introduction to NOSQL
    (Fundamentals of Database Systems, Global Edition, 7th Edition (2017) by Ramez
    Elmasri & Shamkant Navathe)

    Additional References
    Saudi Digital Library:

    This Presentation is mainly dependent on the textbook: Fundamentals of Database Systems, Global Edition, 7th Edition (2017) by Ramez Elmasri & Shamkant Navathe

    Thank You

    ‫الجامعة السعودية االلكترونية‬
    ‫الجامعة السعودية االلكترونية‬

    ‫‪26/12/2021‬‬

    College of Computing and Informatics
    Data Science Pre-Master Program
    IT244
    Introduction to Database

    IT244
    Introduction to Database
    Week 2
    Database Architecture and Classification

    Contents

    1. Database System Concepts and Architecture
    2. Classification of Database Management Systems

    Weekly Learning Outcomes

    1. Explain Database System Concepts and Architecture.
    2. Explain types of Databases.

    Required Reading

    1. Chapter 2: Database System Concepts and
    Architecture
    2. Chapter 2: Classification of Database Management
    Systems
    (Fundamentals of Database Systems, Global Edition,
    Recommended
    Reading
    7th Edition (2017) by Ramez Elmasri & Shamkant
    Database
    System Concepts and Architecture:
    Navathe)

    NoSQL databases:

    This Presentation is mainly dependent on the textbook: Fundamentals of Database Systems, Global Edition, 7th Edition (2017) by Ramez Elmasri & Shamkant Navathe

    • Database System Concepts and Architecture

    Data Models
    • Data Model:
    – A set of concepts to describe the structure of a database, the
    operations for manipulating the data, and the constraints that the data
    should follow.

    • Data Model Structure and Constraints:
    – Data Model constructs define the database structure
    – Data model constructs often include: data elements and their data
    types (often called attributes); grouping of related elements into
    entities (also called objects or records or tuples); and relationships
    among entities
    – Constraints specify restrictions on the stored data; the data that
    satisfies the constraints is called valid data

    • Data Model Operations:
    – These operations are used for specifying database retrievals and
    updates by referring to the constructs of the data model.
    – Operations on the data model may include basic model
    operations (e.g. generic insert, delete, update) and user-defined
    operations (e.g. compute_student_gpa, update_inventory)

    Categories of Data Models
    • Conceptual (high-level, semantic) data models:
    – Provide concepts that are close to the way many users perceive
    data.
    • (Also called entity-based or object-based data models.)

    • Physical (low-level, internal) data models:
    – Provide concepts that describe details of how data is stored in
    the computer. These are usually specified in an ad-hoc manner
    through DBMS design and administration manuals

    • Implementation (representational) data models:
    – Provide concepts that fall between the above two, used by many
    commercial DBMS implementations (e.g. relational data models
    used in many commercial systems).

    Database Schema versus Database
    State(1)
    • Database Schema:
    – The description of a database.
    – Includes descriptions of the database structure, relationships, data types, and
    constraints

    • Schema Diagram:
    – An illustrative display of a database schema

    • Schema Construct:
    – A component of the schema or an object in the schema, e.g., STUDENT,
    COURSE, Name

    • Database State:
    – The actual data stored in a database at a particular moment in time. This
    includes the collection of all the data in the database.
    – Also called a database instance, occurrence, or snapshot.

    Database Schema vs. Database State(2)
    • Initial Database State:
    – Refers to the database state when it is initially loaded into
    the system.

    • Valid State:
    – A state that satisfies the structure and constraints of the
    database.

    • Distinction
    – The database schema changes very infrequently.
    – The database state changes every time the database is
    updated.

    • Schema is also called intension.
    • State is also called extension.

    Example of a Database Schema

    Three-Schema Architecture(1)
    • Proposed to support DBMS characteristics of:
    – Program-data independence.
    – Support of multiple views of the data.

    • Not explicitly used in commercial DBMS products, but has
    been useful in explaining database system organization
    • Defines DBMS schemas at three levels:
    – Internal schema at the internal level to describe physical storage
    structures and access paths (e.g. indexes).
    • Typically uses a physical data model.

    – Conceptual schema at the conceptual level to describe the structure
    and constraints for the whole database for a community of users. Uses
    an implementation (or a conceptual) data model.

    – External schemas at the external level to describe the various user
    views.
    • Usually uses the same data model as the conceptual schema.

    The three-schema architecture (2)

    Three-Schema Architecture (3)
    • Mappings among schema levels are needed to
    transform requests and data.
    – Users and programs refer to an external schema, and are
    mapped by the DBMS to the internal schema for execution.
    – Data extracted from the internal DBMS level is reformatted
    to match the user’s external view (e.g. formatting the
    results of an SQL query for display as a Web page)

    Data Independence
    • Logical Data Independence:
    – The capacity to change the conceptual schema without having to
    change the external schemas and their associated application
    programs.

    • Physical Data Independence:
    – The capacity to change the internal schema without having to change
    the conceptual schema.
    – For example, the internal schema may be changed when certain file
    structures are reorganized or new indexes are created to improve
    database performance

    • When a schema at a lower level is changed, only the mappings
    between this schema and higher-level schemas need to be
    changed in a DBMS that fully supports data independence.
    • The higher-level schemas themselves are unchanged.
    – Hence, the application programs need not be changed since they refer
    to the external schemas.

    DBMS Languages(DDL)
    • Data Definition Language (DDL):
    – Used by the DBA and database designers to specify the
    conceptual schema of a database.
    – In many DBMSs, the DDL is also used to define internal
    and external schemas (views).
    – Theoretically, separate storage definition language
    (SDL) and view definition language (VDL) can used to
    define internal and external schemas. In practice:
    • SDL is typically realized via DBMS commands provided to the
    DBA and database designers
    • VDL is typically part of the same language as DDL

    DBMS Languages (DML)
    • Data Manipulation Language (DML):
    – Used to specify database retrievals and updates
    – DML commands (data sublanguage) can be embedded in a generalpurpose programming language (host language), such as COBOL, C,
    C++, or Java
    • A library of functions can also be provided to access the DBMS from a
    programming language

    – Alternatively, stand-alone DML commands can be applied directly
    (called a query language).

    • Types of DML:

    – High-Level
    Declarative
    (Set-oriented,
    Non-procedural)
    Languages, such as the relational language SQL

    • Specify “what” data to retrieve rather then “how” to retrieve it
    • May be used in a standalone way or may be embedded in a programming
    language

    – Low Level or Procedural (Record-at-a-time) Languages:
    • Must be embedded in a programming language
    • Need programming language constructs such as looping

    DBMS Interfaces
    • Stand-alone query language interfaces
    – Example: Typing SQL queries directly through the DBMS
    interactive SQL interface (e.g. SQL*Plus in ORACLE)

    • Programmer interfaces for embedding DML in
    programming languages
    • User-friendly interfaces (often Web-based)
    – Menu-based, forms-based, graphics-based, etc.

    DBMS Programming Language
    Interfaces
    • Programmer interfaces for embedding DML in a
    programming language:
    – Embedded Approach: e.g. embedded SQL (for C, C++,
    etc.), SQLJ (for Java)
    – Procedure Call Approach: e.g. JDBC for Java, ODBC for
    other programming languages
    – Database Programming Language Approach: e.g.
    ORACLE has PL/SQL, a programming language based
    on SQL; language incorporates SQL and its data types
    as integral components

    User-Friendly and Web-based DBMS Interfaces
    • Menu-based, popular for browsing on the web
    • Forms-based, designed for naïve users
    • Graphics-based
    • Supports Point and Click, Drag and Drop, etc.

    • Natural language: requests in written English
    • Combinations of the above:
    • For example, both menus and forms used extensively in Web
    database interfaces

    Other DBMS Interfaces
    • Speech as Input and Output
    • Web Browser as an interface
    • Parametric interfaces, e.g., bank tellers using function
    keys.
    • Interfaces for the DBA:
    • Creating user accounts, granting authorizations
    • Setting system parameters
    • Changing schemas or storage structures/access paths (physical
    database)

    Database System Utilities
    • To perform certain functions such as:
    – Loading data stored in files into a database; includes data
    conversion tools.
    – Backing up the database periodically on tape.
    – Reorganizing database file structures.
    – Report generation utilities.
    – Performance monitoring utilities.
    – Other functions, such as sorting, user monitoring, data
    compression, etc.

    Other Tools
    • Data dictionary/repository:
    – Used to store schema descriptions and other information such as
    design decisions, application program descriptions, user
    information, usage standards, etc.
    – Active data dictionary is accessed by DBMS software and
    users/DBA.
    – Passive data dictionary is accessed by users/DBA only.

    • Application Development Environments and CASE
    (Computer-aided software engineering) tools often have
    a database design component
    • Examples:
    – PowerBuilder (Sybase)
    – JBuilder (Borland)
    – JDeveloper 10G (Oracle)

    Typical DBMS Component Modules

    DBMS Architectures
    • Centralized DBMS Architecture:

    – Combines everything into single computer system, including:
    DBMS software, hardware, application programs, and user
    interface processing software.
    – User can still connect through a remote terminal – however, all
    processing is done at centralized site (computer).

    • Basic 2-tier Client-Server Architecture: Specialized Server
    nodes with Specialized functions
    – Print server
    – File server
    – DBMS server
    – Web server
    – Email server

    • Client nodes can access the specialized servers as needed

    A Physical Centralized Architecture

    Logical two-tier client server architecture

    DBMS Server
    • Provides database query and transaction services to the clients
    • Relational DBMS servers are often called SQL servers, query
    servers, or transaction servers
    • Applications running on clients utilize an Application Program
    Interface (API) to access server databases via standard
    interface such as:
    – ODBC: Open Database Connectivity standard
    – JDBC: for Java programming access

    • Client and server must install appropriate client module and
    server module software for ODBC or JDBC

    Client nodes
    • Provide appropriate interfaces through a client
    software module to access and utilize the various
    server resources.
    • Clients may be PCs or Workstations (or even diskless
    machines) with the client software installed.
    • Connected to the servers via a network.
    – LAN: local area network
    – wireless network

    Two Tier Client-Server DBMS
    Architecture
    • A program running at a client may connect to several
    DBMSs (also called data sources).
    • In general, data sources can be files or other nonDBMS software that manages data.
    • Client focuses on user interface interactions and only
    accesses database when needed.
    • In some cases (e.g. some object DBMSs), more
    functionality is transferred to clients (e.g. data
    dictionary functions, optimization and recovery
    across multiple servers, etc.)

    Three Tier Client-Server DBMS
    Architecture
    • Common for Web applications
    • Third intermediate layer (middle tier) called Application Server
    or Web Server:
    – Stores the web connectivity software and the business logic part
    of the application
    – Accesses and updates data on the database server
    – Acts like a conduit for sending partially processed data between
    the database server and the client.

    • Three-tier Architecture Can Enhance Security:
    – Database server only accessible via middle tier
    – Clients cannot directly access database server

    Three-tier client-server architecture

    • Classification of Database Management Systems

    Classification of DBMSs

    Based on the data model used
    – Traditional: Relational, Hierarchical, Network.
    – Emerging: Object-oriented, Object-relational, NOSQL, Key-value.

    Based on allowed users
    – Single-user (typically used with personal computers) vs. Multi-user (most DBMSs).
    – Centralized (uses a single computer with one database) vs. Distributed (uses
    multiple computers, multiple databases)

    Based on number of sites
    – A DBMS is centralized if the data is stored at a single computer site.
    – A distributed DBMS (DDBMS) can have the actual database and DBMS software
    distributed over many sites connected by a computer network.

    Based on type of DBMS software
    – Homogeneous DDBMSs use the same DBMS software at all the sites.
    – Heterogeneous DDBMSs can use different DBMS software at each site.
    This may further lead to federated DBMS.

    Big data systems (NOSQL systems)



    Key-value model
    Document-based model
    Graph-based model
    Column-based model

    Cost considerations for DBMSs
    • Cost Range: from free open-source systems to configurations
    costing millions of dollars
    • Examples of free relational DBMSs: MySQL, PostgreSQL, others
    • Commercial DBMSs offer additional specialized modules, e.g.
    time-series module, spatial data module, document module,
    XML module
    – These offer additional specialized functionality when purchased
    separately
    – Sometimes called cartridges (e.g., in Oracle) or blades

    • Different licensing options: site license, maximum number of
    concurrent users (seat license), single user, etc.

    History of Data Models(1)
    1. Network Model
    2. Hierarchical Model
    3. Relational Model
    4. Object-oriented Data Models
    5. Object-Relational Models
    • Network Model:

    – The first network DBMS was implemented by Honeywell in 196465 (IDS System).
    – Adopted heavily due to the standard support by CODASYL
    (Conference on Data Systems Languages) (CODASYL – DBTG report
    of 1971).
    – Later implemented in a large variety of systems – IDMS (Cullinet now Computer Associates), DMS 1100 (Unisys), IMAGE (H.P.
    (Hewlett-Packard)), VAX -DBMS (Digital Equipment Corp., next
    COMPAQ, now H.P.).

    Example of Network Model Schema

    Network Model
    • Advantages:
    – Can model complex relationships among records and represents
    semantics of add/delete on the relationships.
    – Can handle most situations for modeling using record types and
    relationship types.
    – Language is navigational; uses constructs like FIND, FIND
    member, FIND owner, FIND NEXT within set, GET, etc.
    • Programmers can do optimal navigation through the database.

    • Disadvantages:
    – Navigational and procedural nature of processing requires
    programming access
    – Intermixes storage structures with conceptual modeling
    relationships
    – Database contains a complex array of pointers that thread
    through a set of records.
    • Little scope for automated “query optimization”

    Hierarchical Model
    • Hierarchical Data Model:
    – Initially implemented in a joint effort by IBM and North American Rockwell
    around 1965. Resulted in the IMS family of systems.
    – IBM’s IMS product had a very large customer base worldwide
    – Hierarchical model was formalized based on the IMS system
    – Other systems based on this model: System 2k (SAS inc.)

    • Advantages:

    – Can implement certain tasks very efficiently
    – Easy to store hierarchically organized data, e.g., organization (“org”) charts

    • Disadvantages:

    – Navigational and procedural nature of processing
    – Difficult to store databases where multiple relationships exist among the
    data records
    – Little scope for “query optimization” by system (programmer must
    optimize the programs)
    – Language is procedural: Uses constructs like GET, GET UNIQUE, GET NEXT,
    GET NEXT WITHIN PARENT, etc.

    Relational Model
    • Relational Model:
    – Proposed in 1970 by E.F. Codd (IBM), first commercial systems in
    early 1980s.
    – Now in many commercial products (e.g. DB2, ORACLE, MS SQL
    Server, SYBASE, INFORMIX).
    – Several free open source implementations, e.g. MySQL,
    PostgreSQL
    – Currently most dominant for developing database applications.
    – SQL relational standards: SQL-89 (SQL1), SQL-92 (SQL2), SQL-99,
    SQL3, SQL-2008
    – Chapters 3 through 6 describe this model in detail

    Object-oriented Models
    • Object-oriented Data Models:
    – Allow databases to be used seamlessly with object-oriented
    programming languages.
    – Can store persistent objects created in O-O Programming
    Languages such as C++ (e.g., in OBJECTSTORE or VERSANT), and
    Smalltalk (e.g., in GEMSTONE).
    – Other experimental systems include O2, ORION (at MCC – then
    ITASCA), IRIS (at H.P.- used in Open OODB).
    – Object Database Standard: ODMG-93, ODMG-version 2.0,
    ODMG-version 3.0.

    Object-Relational Models
    • Object-Relational Models:
    – Relational systems incorporated concepts from object
    databases leading to object-relational.
    – Exemplified in the latest versions of Oracle-10i, DB2, and
    SQL Server and other DBMSs.
    – Standards started in SQL-99 and enhanced in SQL-2008.

    Main Reference

    1. Chapter 2: Database System Concepts and
    Architecture
    2. Chapter 2: Classification of Database Management
    Systems
    (Fundamentals of Database Systems, Global Edition,
    Additional
    References
    7th Edition (2017) by Ramez Elmasri & Shamkant
    Navathe)

    This Presentation is mainly dependent on the textbook: Fundamentals of Database Systems, Global Edition, 7th Edition (2017) by Ramez Elmasri & Shamkant Navathe

    Thank You

    ‫الجامعة السعودية االلكترونية‬
    ‫الجامعة السعودية االلكترونية‬

    ‫‪26/12/2021‬‬

    College of Computing and Informatics
    Data Science Pre-Master Program
    IT244
    Introduction to Database

    IT244
    Introduction to Database
    Week 3
    Relational Model

    Contents

    1. The Relational Data Model and Relational Database
    Constraints
    2. The Relational Algebra

    Weekly Learning Outcomes

    1. Create a Relational model of a Database.
    2. Describe Relational Algebra.

    Required Reading

    1. Chapter 5: The Relational Data Model and Relational
    Database Constraints
    2. Chapter 8: The Relational Algebra
    (Fundamentals of Database Systems, Global Edition,
    7th Edition (2017) by Ramez Elmasri & Shamkant
    Navathe)
    Recommended Reading
    Relational Model and Relational Algebra:

    This Presentation is mainly dependent on the textbook: Fundamentals of Database Systems, Global Edition, 7th Edition (2017) by Ramez Elmasri & Shamkant Navathe

    • The Relational Data Model and Relational Database
    Constraints

    Relational Model Concepts and its Origin
    • The formal relational Model of Data is based on the concept of a
    Relation
    – Has a formal mathematical foundation provided by set theory and first
    order predicate logic

    • In practice, there is a standard model based on SQL (Structured
    Query Language)
    • There are several important differences between the formal model
    and the practical model, as we shall see
    • The model was first proposed by Dr. E.F. Codd of IBM Research in
    1970 in the following paper:
    – “A Relational Model for Large Shared Data Banks,” Communications of
    the ACM, June 1970

    • The above paper caused a major revolution in the field of database
    management
    • Dr. Codd earned the coveted ACM Turing Award in 1981

    Informal Definitions (1)
    • Informally, a relation looks like a table of values (see Figure 3.1 on next
    slide).
    • A relation contains a set of rows.
    • The data elements in each row represent certain facts that correspond to
    a real-world entity or relationship
    – In the formal model, rows are called tuples
    • Each column has a column header that gives an indication of the meaning
    of the data items in that column
    – In the formal model, the column header is called an attribute name (or just
    attribute)

    Informal Definitions (2)
    • Key of a Relation:
    – Each row (tuple) in the table is uniquely identified by the
    value of a particular attribute (or several attributes
    together)
    • Called the key of the relation

    – In the STUDENT relation, SSN is the key
    – If no attributes posses this uniqueness property, a new
    attribute can be added to the relation to assign unique
    row-id values (e.g. unique sequential numbers) to
    identify the rows in a relation
    • Called artificial key or surrogate key

    Formal Definitions – Relation Schema
    • Relation Schema (or description) of a Relation:
    – Denoted by R(A1, A2, …, An)
    – R is the name of the relation
    – The attributes of the relation are A1, A2, …, An
    – n is the cardinality of the relation

    • Example:
    CUSTOMER (Cust-id, Cust-name, Address, Phone#)
    – CUSTOMER is the relation name
    – The CUSTOMER relation schema (or just relation) has four
    attributes: Cust-id, Cust-name, Address, Phone#

    • Each attribute has a domain or a set of valid values.
    – For example, the domain of Cust-id can be 6 digit numbers.

    Formal Definitions – Tuple
    • A tuple is an ordered set of values (enclosed in angled
    brackets ‘’)
    • Each value is derived from an appropriate domain.
    • A row in the CUSTOMER relation is a 4-tuple and would consist
    of four values, for example:

    – Called a 4-tuple because it has 4 values
    – In general, a particular relation will have n-tuples, where n is the
    number of attributes for the relation

    • A relation is a set of such tuples (rows)

    Formal Definitions – Domain
    • A domain of values can have a logical definition:
    – Example: “USA_phone_numbers” are the set of 10 digit phone numbers
    valid in the U.S.

    • A domain also has a data-type or a format defined for it.
    – The USA_phone_numbers may have a format: (ddd)ddd-dddd where each
    d is a decimal digit.

    – Dates have various formats such as year, month, date formatted as
    yyyy-mm-dd, or as dd:mm:yyyy etc.
    • The attribute name designates the role played by a domain in a relation:
    – Used to interpret the meaning of the data elements corresponding to
    that attribute
    – Example: The domain Date may be used to define two attributes “Invoicedate” and “Payment-date” with different meanings (roles)

    Formal Definitions – State of a Relation
    • Formally, a relation state r(R) is a subset of the Cartesian product
    of the domains of its attributes
    – Each domain contains the set of all possible values the attribute can
    take.
    – The Cartesian product contains all possible tuples from the attribute
    domains
    – The relations state r(R) is the subset of tuples that represent valid
    information in the mini-world at a particular time

    • Formally (see Figure 3.1),
    – Given relation schema R(A1, A2, ………., An)
    – Relation state r(R)  dom(A1) X dom(A2) X ….X dom(An)

    • r(R): is a specific state (or “instance” or “population”) of relation R
    – this is a set of tuples (rows) in the relation at a particular moment
    in time
    – r(R) = {t1, t2, …, tn} where each ti is an n-tuple
    – ti = where each vj element-of dom(Aj)

    Formal Definitions – Example
    • Let R(A1, A2) be a relation schema:
    – Let dom(A1) = {0, 1}
    – Let dom(A2) = {a, b, c}

    • Then: The Cartesian product dom(A1) X dom(A2) contains all
    possible tuples from these domains:
    { , , , , , }

    • The relation state r(R)  dom(A1) X dom(A2)
    • For example: One possible state r(R) could be { , ,
    }
    – This state has three 2-tuples: , ,

    Relation Definitions Summary
    Informal Terms

    Formal Terms

    Table

    Relation

    Column Header

    Attribute

    All possible Column Values or Data Type

    Domain

    Row

    Tuple

    Table Definition

    Schema of a Relation

    Populated Table

    State of the Relation

    Characteristics of a Relation (1)
    • Ordering of tuples in a relation r(R):
    – The tuples are not considered to be ordered, because a relation is a set of
    tuples

    • Ordering of attributes in a relation schema R (and of values within
    each tuple):
    – The attributes in R(A1, A2, …, An) and the values in each t= are
    considered to be ordered
    – However, a more general definition of relation does not require attribute ordering
    – In this case, a tuple t = { , …, } is an unordered set of n
    pairs – one pair for each of the relation attributes (see Figure 3.3)

    Characteristics of Relations (2)
    • Values in a tuple:
    – All values are considered atomic (indivisible).
    – Each value must be from the domain of the attribute for that
    column
    • If tuple t = is a tuple (row) in the relation state r of R(A1,
    A2, …, An)
    • Then each vi must be a value from dom(Ai)

    – A special null value is used to represent values that are unknown
    or inapplicable to certain tuples.

    • Notation:
    – We refer to component values of a tuple t by:
    • t[Ai] or t.Ai
    • This is the value vi of attribute Ai for tuple t

    – Similarly, t[Au, Av, …, Aw] refers to the subtuple of t containing
    the values of attributes Au, Av, …, Aw, respectively in t

    Relational Integrity Constraints
    • Constraints are conditions that must hold on all valid relation
    states.
    • Constraints are derived from the mini-world semantics
    • There are three main types of built-in constraints in the
    relational model:
    – Key constraints
    – Entity integrity constraints
    – Referential integrity constraints

    • Another implicit constraint is the domain constraint
    – Every value in a tuple must be from the domain of its attribute
    (or it could be null, if allowed for that attribute)

    Key Constraints (1)
    • Superkey SK of R:
    – Is a set of attributes SK of R with the following condition:
    • No two tuples in any valid relation state r(R) will have the same
    value for SK
    • That is, for any two distinct tuples t1 and t2 in r(R), t1.SK  t2.SK
    • This condition must hold in any valid state r(R)

    • Key (also called Candidate key) K of R:
    – Is a “minimal” superkey
    – Formally, a key K is a superkey such that removal of any
    attribute from K results in a set of attributes that is not a
    superkey (or key) any more (does not possess the superkey
    uniqueness property)
    – Hence, a superkey with one attribute is always a key

    Key Constraints (2)
    • Example: Consider the CAR relation schema:
    – CAR(State, Reg#, SerialNo, Make, Model, Year)
    – CAR has two keys (determined from the mini-world constraints):
    • Key1 = {State, Reg#}
    • Key2 = {SerialNo}

    – Both are also superkeys of CAR
    – However, {SerialNo, Make} is a superkey but not a key.

    • In general:
    – Any key is a superkey (but not vice versa)
    – Any set of attributes that includes a key is a superkey
    – A minimal superkey is also a key

    Key Constraints (3)
    • If a relation has several keys, they are called candidate keys;
    one is chosen to be the primary key; the others are called
    unique (or secondary) keys
    – The primary key attributes are underlined.

    • Example: Consider the CAR relation schema:

    – CAR(State, Reg#, SerialNo, Make, Model, Year)
    – We choose License_number (which contains (State, Reg#)
    together) as the primary key – see Figure 3.4

    • The primary key value is used to uniquely identify each tuple in
    a relation
    – Provides the tuple identity
    – Also used to reference the tuple from other tuples

    • General rule: Choose the smallest-sized candidate key (in
    bytes) as primary key

    – Not always applicable – choice is sometimes subjective (as in
    Figure 3.4 – see next slide)

    Relational Database Schema
    • Relational Database Schema:
    – A set S of relation schemas that belong to the same database.
    – S is the name of the whole database schema
    – S = {R1, R2, …, Rn}
    – R1, R2, …, Rn are the names of the individual relation schemas within the
    database S
    – Figure 3.5 shows a COMPANY database schema with 6 relation schemas

    Example of Relational Database State
    • Next slide show an example of a COMPANY database state
    (Figure 3.6)
    – Each relation has a set of tuples
    • The tuples in each table satisfy key and other constraints
    • If all constraints are satisfied by a database state, it is called a valid
    state

    – The database state changes to another state whenever the
    tuples in any relation are changed via insertions, deletions,
    or updates

    Entity Integrity Constraint
    • Entity Integrity:
    – The primary key attributes PK of each relation schema R in
    S cannot have null values in any tuple of r(R).
    • This is because primary key values are used to identify the
    individual tuples.
    • t.PK  null for any tuple t in r(R)
    • If PK has several attributes, null is not allowed in any of these
    attributes

    – Note: Other attributes of R may be also be constrained to
    disallow null values (called NOT NULL constraint), even
    though they are not members of the primary key.

    Referential Integrity Constraint (1)
    • A constraint involving two relations
    – The previous constraints (key, entity integrity) involve a single
    relation.

    • Used to specify a relationship among tuples in two relations:
    – The referencing relation and the referenced relation.

    • Tuples in the referencing relation R1 have attributes FK (called
    foreign key attributes) that reference the primary key
    attributes PK of the referenced relation R2.
    – A tuple t1 in R1 is said to reference a tuple t2 in R2 if t1.FK = t2.PK

    • Referential integrity can be displayed as a directed arc from
    R1.FK to R2.PK – see Figure 3.7

    Referential Integrity (or foreign key) Constraint
    (2)
    • Statement of the constraint
    – For a particular database state, the value of the foreign key
    attribute (or attributes) FK in each tuple of the referencing
    relation R1 can be either:
    • (1) An existing primary key (PK) value of a tuple in the referenced
    relation R2, or
    • (2) a null.

    • In case (2), the FK in R1 should not be a part of its
    own primary key, and cannot have the NOT NULL
    constraint.

    Other Types of Constraints
    • Semantic Integrity Constraints:
    – cannot be expressed by the built-in model constraints
    – Example: “the max. no. of hours per employee for all
    projects he or she works on is 56 hrs per week”

    • A constraint specification language can be used to
    express these
    • SQL has TRIGGERS and ASSERTIONS to express some
    of these constraints

    Operations to Modify Relations (1)
    • Each relation will have many tuples in its current relation state
    • The relational database state is a union of all the individual relation
    states at a particular time
    • Whenever the database is changed, a new state arises
    • Basic operations for changing the database:
    – INSERT new tuples in a relation
    – DELETE existing tuples from a relation
    – UPDATE attribute values of existing tuples

    • Integrity constraints should not be violated by the update
    operations.
    • Several update operations may have to be grouped together into a
    transaction.

    Operations to Modify Relations (2)
    • Updates may propagate to cause other updates
    automatically. This may be necessary to maintain
    integrity constraints.
    • In case of integrity violation, several actions can be
    taken:
    – Cancel the operation that causes the violation (RESTRICT or
    REJECT option)
    – Perform the operation but inform the user of the violation
    – Trigger additional updates so the violation is corrected
    (CASCADE option, SET NULL option)
    – Execute a user-specified error-correction routine

    INSERT operation
    • INSERT one or more new tuples into a relation
    • INSERT may violate any of the constraints:
    – Domain constraint:
    • if one of the attribute values provided for a new tuple is not of the
    specified attribute domain

    – Key constraint:
    • if the value of a key attribute in a new tuple already exists in
    another tuple in the relation

    – Referential integrity:
    • if a foreign key value in a new tuple references a primary key value
    that does not exist in the referenced relation

    – Entity integrity:
    • if the primary key value is null in a new tuple

    DELETE operation
    • DELETE one or more existing tuples from a relation
    • DELETE may violate only referential integrity:
    – If the primary key value of the tuple being deleted is referenced
    from other tuples in the database
    • Can be remedied by several actions: RESTRICT, CASCADE, SET NULL
    – RESTRICT option: reject the deletion
    – CASCADE option: propagate the deletion by automatically deleting
    the referencing tuples
    – SET NULL option: set the foreign keys of the referencing tuples to
    NULL (the foreign keys cannot have NOT NULL constraint)

    – One of the above options must be specified during database
    design for each referential integrity (foreign key) constraint

    UPDATE operation
    • UPDATE modifies the values of attributes in one or more
    existing tuples in a relation
    • UPDATE may violate domain constraint and NOT NULL
    constraint on an attribute being modified
    • Other constraints may also be violated:
    – Updating the primary key (PK):
    • Similar to a DELETE followed by an INSERT
    • Need to specify similar options to DELETE
    • The CASCADE option propagates the new value of PK to the foreign
    keys of the referencing tuples automatically
    – Updating a foreign key (FK) may violate referential integrity

    – Updating an ordinary attribute (neither PK nor FK):
    • Can only violate domain or NOT NULL constraints

    • Relational Algebra

    Relational Algebra Overview (1)
    • Relational algebra is the basic set of operations for the
    theoretical relational model
    • SQL is the standard language for practical relational
    databases
    • Relational algebra operations are used in DBMS query
    optimization
    – SQL is converted to relational operations for optimization and
    processing

    • Input to a relational algebra operation is one or more
    relations
    • These operations can specify basic retrieval requests (or
    queries)

    Relational Algebra Overview (2)
    • The result of an operation is a new relation, derived
    from the input relations
    – This property makes the algebra “closed” (i.e. all objects in
    relational algebra are relations)

    • A query will typically require multiple operations
    – Result of one operation can be further manipulated using
    additional operations of the same algebra

    • A sequence of relational algebra operations forms a
    relational algebra expression
    – Final result is a relation that represents the result of a
    database query (or retrieval request)

    Relational Algebra Overview (3)
    • Relational Algebra consists of several groups of operations
    – Unary Relational Operations – one input table
    • SELECT (symbol: σ (sigma))
    • PROJECT (symbol:  (pi))
    • RENAME (symbol:  (rho))

    – Binary Relational Algebra Operations From Set Theory – two input tables
    • UNION ( ꓴ ), INTERSECTION ( ꓵ ), DIFFERENCE (or MINUS, – )
    • CARTESIAN PRODUCT ( x )

    – Binary Relational Operations – two input tables
    • JOIN (several variations of JOIN exist)
    • DIVISION

    – Additional Relational Operations
    • OUTER JOINS, OUTER UNION
    • AGGREGATE FUNCTIONS (These compute summary of information, E.g. SUM,
    COUNT, AVG, MIN, MAX)
    • Other operations

    SELECT operation (1)
    • The SELECT operation (denoted by σ (sigma)) selects a subset of the tuples
    from a relation based on a selection condition.
    – The selection condition acts as a filter

    – Keeps only those tuples that satisfy the qualifying condition
    – Tuples satisfying the condition are selected whereas the other
    tuples are discarded (filtered out)
    – Important Note: The SELECT operation is different from the
    SELECT-clause of SQL
    • Examples:
    – Select the EMPLOYEE tuples whose department number is 4:
    σ DNO = 4 (EMPLOYEE)
    – Select the employee tuples whose salary is greater than $30,000:
    σ SALARY > 30,000 (EMPLOYEE)

    SELECT operation (2)
    – The general form of select operation is
    σ (R), where
    • the symbol σ (sigma) is used to denote the select operator
    • the selection condition is a Boolean (conditional) expression
    specified on the attributes of relation R
    • tuples that make the condition true are selected
    – appear in the result of the operation

    • tuples that make the condition false are filtered out
    – discarded from the result of the operation

    SELECT operation (3)
    • SELECT Operation Properties
    – SELECT operation σ (R) produces a relation S that has
    the same schema (same attributes) as R
    – SELECT σ is commutative:
    • σ (σ (R)) = σ (σ (R))
    – Because of commutativity property, a cascade (sequence) of
    SELECT operations may be applied in any order:
    • σ (σ (σ (R)) = σ (σ (σ ( R)))
    – A cascade of SELECT operations may be replaced by a single
    selection with a conjunction of all the conditions:
    • σ (σ (σ (R)) = σ AND AND (R)))
    – The number of tuples in the result of a SELECT is less than (or
    equal to) the number of tuples in the input relation R

    PROJECT operation (1)
    • PROJECT Operation is denoted by  (pi symbol)
    • Keeps certain columns (attributes) from a relation
    and discards the other columns.
    – PROJECT creates a vertical partitioning
    • The list of specified columns (attributes) is kept in each tuple
    • The other attributes in each tuple are discarded

    • Example: List each employee’s first and last name
    and salary:
     LNAME, FNAME, SALARY(EMPLOYEE)

    PROJECT operation (2)
    • General form of project operation is:
     (R), where

    –  (pi) is the symbol used to represent the project operation
    – is the desired list of attributes from relation R.

    • The operation removes any duplicate tuples
    – This is because the result of the project operation must be a set of tuples
    • Mathematical sets do not allow duplicate elements.

    • PROJECT Operation Properties
    – Number of tuples in result of projection  (R) less or equal to the
    number of tuples in R
    • If the list of projected attributes includes a key of R, then number of tuples in
    result of PROJECT is equal to the number of tuples in R

    – PROJECT is not commutative
     ( (R) ) =  (R) as long as contains the attributes in

    Relational Algebra Expressions
    • To apply several relational algebra operations
    – Either we write the operations as a single relational
    algebra expression by nesting the operations in
    parentheses, or
    – We apply one operation at a time and create intermediate
    result relations.

    • In the latter case, we must give names to the
    relations that hold the intermediate results.

    Single expression versus sequence of
    relational operations (Example)
    • To retrieve the first name, last name, and salary of all
    employees who work in department number 5, we must
    apply a select and a project operation
    • Single relational algebra expression:
    –  FNAME, LNAME, SALARY(σ DNO=5(EMPLOYEE))

    • OR sequence of operations, giving a name to each
    intermediate relation:
    – DEP5_EMPS  σ DNO=5(EMPLOYEE)
    – RESULT   FNAME, LNAME, SALARY (DEP5_EMPS)

    RENAME operations (1)
    • The RENAME operator is denoted by  (rho)
    • Used to rename the attributes of a relation or the
    relation name or both
    – Useful when a query requires multiple operations
    – Necessary in some cases (see JOIN operation later)

    • The general RENAME operation  can be expressed by
    any of the following forms:
     S(B1, B2, …, Bn)(R) changes both:

     the relation name to S, and
     the column (attribute) names to B1, B1, …, Bn

     S(R) changes:

     the relation name only to S
     (B1, B2, …, Bn )(R) changes:
    • the column (attribute) names only to B1, B1, …..Bn

    RENAME operation (2)
    • For convenience, we also use shorthand for renaming an
    intermediate relation:
    – If we write:
    • R(FN, LN, SAL)  πFNAME, LNAME, SALARY(DEP5_EMPS)
    • R will rename FNAME to FN, LNAME to LN, and SALARY to SAL
    • If we write:
    • R(F, M, L, S, B, A, SX, SAL, SU, D)  EMPLOYEE
    • Then EMPLOYEE is renamed to R and the 10 attributes of EMPLOYEE are
    renamed to F, M, L, S, B, A, SX, SAL, SU, D, respectively

    Example
    • Next slide (Figure 6.2) shows the result of the following two
    expressions when applied to the database state in Figure 3.6
    • Single relational algebra expression (Fig 6.2(a)):
    – πFNAME, LNAME, SALARY(σDNO=5(EMPLOYEE))

    • Sequence of operations, giving a name to each intermediate
    relation (Fig 6.2(b)):
    – TEMP  σDNO=5(EMPLOYEE)
    – R  πFNAME, LNAME, SALARY(TEMP)

    UNION Operation (1)
    • UNION Operation
    – Binary operation, denoted by ꓴ
    – The result of R ꓴ S, is a relation that includes all tuples that
    are either in R or in S or in both R and S
    – Duplicate tuples are eliminated
    – The two operand relations R and S must be “type
    compatible” (or UNION compatible)
    • R and S must have same number of attributes
    • Each pair of corresponding attributes must be type
    compatible (have same or compatible domains)

    UNION operation (2)
    • Example:
    – To retrieve the social security numbers of all employees who either
    work in department 5 (RESULT1 below) or directly supervise an
    employee who works in department 5 (RESULT2 below)
    – We can use the UNION operation as follows:



    DEP5_EMPS  σDNO=5 (EMPLOYEE)
    RESULT1  σ SSN(DEP5_EMPS)
    RESULT2(SSN)  πSUPERSSN(DEP5_EMPS)
    RESULT  RESULT1 ꓴ RESULT2

    – The union operation produces the tuples that are in either RESULT1
    or RESULT2 or both (see Fig 6.3)

    Type Compatibility
    • Type Compatibility of operands is required for the binary set
    operation UNION ꓴ, (also for INTERSECTION ꓴ, and SET
    DIFFERENCE –, see next slides)
    • R1(A1, A2, …, An) and R2(B1, B2, …, Bn) are type compatible if:
    – they have the same number of attributes, and
    – the domains of corresponding attributes are type compatible (i.e.
    dom(Ai)=dom(Bi) for i=1, 2, …, n).

    • The resulting relation for R1 ꓴ R2 (also for R1 ꓴ R2, or R1 – R2,
    see next slides) has the same attribute names as the first
    operand relation R1 (by convention)

    INTERSECTION operation
    • INTERSECTION is denoted by ꓵ
    • The result of the operation R ꓵ S, is a relation that
    includes all tuples that are in both R and S

    – The attribute names in the result will be the
    same as the attribute names in R
    • The two operand relations R and S must be “type
    compatible”

    SET DIFFERENCE operation
    • SET DIFFERENCE (also called MINUS or EXCEPT) is
    denoted by –
    • The result of R – S, is a relation that includes all tuples
    that are in R but not in S

    – The attribute names in the result will be the
    same as the attribute names in R
    • The two operand relations R and S must be “type
    compatible”

    Set Operations in SQL
    • Figure 6.4 (next slide) shows some examples of set
    operations on relations
    • SQL has UNION, INTERSECT, EXCEPT operations

    – These operations work with sets of tuples
    (duplicate tuples are eliminated)
    – In addition, SQL has UNION ALL, INTERSECT
    ALL, EXCEPT ALL for multisets (duplicates are
    allowed)

    Some properties of UNION,
    INTERSECT, and DIFFERENCE
    • Both union and intersection are commutative operations;
    that is
    – R ꓴ S = S ꓴ R, and R ꓵ S = S ꓵ R

    • Both union and intersection can be treated as n-ary
    operations applicable to any number of relations as both are
    associative operations; that is
    – R ꓴ (S ꓴ T) = (R ꓴ S) ꓴ T
    – (R ꓵ S) ꓵ T = R ꓵ (S ꓵ T)

    • The minus operation is not commutative; that is, in general
    – R–S≠S–R

    CARTESIAN PRODUCT operation (1)
    • CARTESIAN (or CROSS) PRODUCT Operation
    – Different from the other three set operations
    – Used to combine tuples from two relations in a combinatorial
    fashion.
    – Denoted by R(A1, A2, . . ., An) x S(B1, B2, . . ., Bm)
    – Result is a relation Q with degree n + m attributes:
    • Q(A1, A2, . . ., An, B1, B2, . . ., Bm), in that order.

    – The resulting relation state has one tuple for each combination
    of tuples—one from R and one from S.
    – Hence, if R has nR tuples (denoted as |R| = nR ), and S has nS
    tuples, then R x S will have nR * nS tuples.
    – R and S do NOT have to be “type compatible”

    CARTESIAN PRODUCT operation (2)
    • Generally, CROSS PRODUCT is not a meaningful
    operation
    – Can become meaningful when followed by other
    operations

    • Example (not meaningful):
    – FEMALE_EMPS  σSEX=‘F’(EMPLOYEE)
    – EMPNAMES  πFNAME, LNAME, SSN (FEMALE_EMPS)
    – EMP_DEPENDENTS  EMPNAMES x DEPENDENT

    • EMP_DEPENDENTS will contain every combination of tuples
    from EMPNAMES and DEPENDENT
    – whether or not the tuples are actually related

    CARTESIAN PRODUCT operation (3)
    • To keep only combinations where the DEPENDENT is
    related to the EMPLOYEE by the condition ESSN=SSN, we
    add a SELECT operation

    • Example (meaningful, see Figure 6.5, next two
    slides):
    – FEMALE_EMPS  σSEX=‘F’(EMPLOYEE)
    – EMPNAMES  πFNAME, LNAME, SSN (FEMALE_EMPS)
    – EMP_DEPENDENTS  EMPNAMES x DEPENDENT
    – ACTUAL_DEPS  σSSN=ESSN(EMP_DEPENDENTS)
    – RESULT  πFNAME, LNAME, DEPENDENT_NAME (ACTUAL_DEPS)

    • RESULT will now contain the name of female employees and
    their dependents

    JOIN operation (also called INNER JOIN)
    • JOIN Operation (denoted by ⋈ (bowtie symbol))
    – The sequence of CARTESIAN PRODECT followed by SELECT can be
    used to identify and select related tuples from two relations
    – The JOIN operation combines this sequence into a single
    operation
    – JOIN is very important for any relational database with more
    than a single relation, because it allows to combine related
    tuples from various relations
    – The general form of a join operation on two relations R(A1, A2, . .
    ., An) and S(B1, B2, . . ., Bm) is:
    • R ⋈S
    – where R and S can be base relations or any relations that result
    from general relational algebra expressions.

    JOIN operation (2)
    • Example: Suppose that we want to retrieve the name of the
    manager of each department.
    – To get the manager’s name, we need to combine each
    DEPARTMENT tuple with the EMPLOYEE tuple whose SSN value
    matches the MGRSSN value in the department tuple.
    – We do this by using the join ⋈ operation
    – DEPT_MGR  DEPARTMENT ⋈MGRSSN=SSN EMPLOYEE

    • MGRSSN=SSN is called the join condition
    – Combines each department record with the employee who
    manages the department
    – The join condition can also be specified as
    – DEPARTMENT.MGRSSN= EMPLOYEE.SSN

    – Result in Figure 6.6 (next slide)

    Properties of JOIN (1)
    • Consider the following JOIN operation:
    – R(A1, A2, . . ., An) ⋈R.Ai=S.BjS(B1, B2, . . ., Bm)

    – Result is a relation Q with degree n + m attributes:
    • Q(A1, A2, . . ., An, B1, B2, . . ., Bm), in that order.

    – The resulting relation state has one tuple for each combination
    of tuples – r from R and s from S, but only if they satisfy the join
    condition r.Ai=s.Bj
    – Hence, if R has nR tuples, and S has nS tuples, then the join
    result will generally have less than nR * nS tuples.
    – Only related tuples (based on the join condition) will appear in
    the result

    Properties of JOIN (2)
    • General case of JOIN operation is called a Theta-join:
    R ⋈ theta S
    • The join condition is called theta
    • Theta can be any general boolean expression on the
    attributes of R and S; for example:
    – R.Ai
    Purchase answer to see full
    attachment

    Share This Post

    Email
    WhatsApp
    Facebook
    Twitter
    LinkedIn
    Pinterest
    Reddit

    Order a Similar Paper and get 15% Discount on your First Order

    Related Questions

    Ct. Q.

    Description 1 Research on Diabetic Foot Disease Name of Student Institutional Affiliation Course Date 2 Research on Diabetic Foot Disease Introduction A severe and complex consequence of diabetes mellitus known as diabetic foot disease (DFD) includes infections, foot ulcers, peripheral neuropathy, and peripheral arterial disease. DFD can have profound effects

    Management of technology 325

    Description Please answer it as its written in the file Chapter Ten Organizing for Innovation Organizing for Innovation at Google Google was founded in 1998 by two Stanford Ph.D. students, Sergey Brin and Larry Page, who had developed a formula for rank ordering random search results by relevancy. Their formula

    Management Question

    Description Follow the instructions and the number of words for each question in the assignment file. Similarity and copying and pasting are prohibited. The grade will be zero. Add sources. ‫المملكة العربية السعودية‬ ‫وزارة التعليم‬ ‫الجامعة السعودية اإللكترونية‬ Kingdom of Saudi Arabia Ministry of Education Saudi Electronic University College of

    mgt402 Entrepreneurship and small bus

    Description Please solve it as it is written on the file Case 1 Panda Sunglasses How Should a Start-Up Business with a Social Mission Market Its Sunglasses with Bamboo Frames? V incent Ko showed his entrepreneurial potential in high school in Rockville, Maryland, when, as a young hockey player, he

    dis

    Description 2 dis Last week, you completed a personal development plan. For this week’s discussion, please consider how you would use employee development plans as a motivational tool in your organization. Consider how data from personal development plans could be used track organizational compliance to accreditation and credentialing standards. How

    CTA – Engaging the Organization in the Change Process

    Description Critical Thinking Assignment Module 12: Engaging the Organization in the Change Process As we move through this module, consider the ramifications related to unethical conduct that some companies have experienced. Also consider how that conduct could have been avoided by applying some of the lessons learned in this module.

    Algorithms & Data Structures Question

    Description Saudi Electronic University Health Sciences Collage Master of Healthcare Administration HCM 600 Research Project Examining The Long-Term Interventions Effects of Telepsychiatry on Chronic Mental Health Conditions in Saudi Arabia: Systematic Review A Research Project Submitted in Partial Fulfillment of the Requirements for the Degree (MSc of Healthcare Administration) Presented

    373 ASS 60

    Description SEE College of Health Sciences Department of Public Health PHC 241 Fall 2021 Paper assignment rubric Thinking Content Presentation Criteria Proficiency 2 Some Proficiency 1.75 Limited Proficiency 1.50 No Proficiency 1 The purpose and focus are clear and consistent Punctuation, grammar, spelling, and mechanics are appropriate Information and evidence

    Management Question

    Description I have as assignment please make sure there is no plagiarism ‫المملكة العربية السعودية‬ ‫وزارة التعليم‬ ‫الجامعة السعودية اإللكترونية‬ Kingdom of Saudi Arabia Ministry of Education Saudi Electronic University College of Administrative and Financial Sciences Assignment-3 MGT324-Public Management Due Date: 30/11/2024 @ 23:59 Course Name: Public Management Course Code:

    Management Question

    Description I have as assignment please make sure there is no plagiarism ‫المملكة العربية السعودية‬ ‫وزارة التعليم‬ ‫الجامعة السعودية اإللكترونية‬ Kingdom of Saudi Arabia Ministry of Education Saudi Electronic University College of Administrative and Financial Sciences Assignment 3 Management of Technology (MGT 325) Deadline: 30/11/2024 @ 23:59 Course Name: Management

    Management Question

    Description I have as assignment please make sure there is no plagiarism ‫المملكة العربية السعودية‬ ‫وزارة التعليم‬ ‫الجامعة السعودية اإللكترونية‬ Kingdom of Saudi Arabia Ministry of Education Saudi Electronic University College of Administrative and Financial Sciences Assignment 3 Strategic Management (MGT 401) Due Date: 30/11/2024 @ 23:59 Course Name: Strategic

    Management Question

    Description I want to solve the attached assignment, and please follow the instructions described on the main page, and I hope that there will be no similar version of another solution, and I want the solution in detail, not in short ‫المملكة العربية السعودية‬ ‫وزارة التعليم‬ ‫الجامعة السعودية اإللكترونية‬ Kingdom

    mng403.badeer

    Description Hello, I hope you pay attention. I want correct and perfect work. I want all the questions to be solved correctly and completely without plagiarism. I emphasize this important point. Any percentage of plagiarism will lead to the cancellation of the work. I want a correct solution with references

    periodic report

    Description I did a product journey map The secret customer A correction plan for a product website An internal campaign to raise awareness about the customer experience and its importance Participated with the research and marketing department to work on a survey to measure the satisfaction of internal employees Trained

    Management Question

    Description ‫المملكة العربية السعودية‬ ‫وزارة التعليم‬ ‫الجامعة السعودية اإللكترونية‬ Kingdom of Saudi Arabia Ministry of Education Saudi Electronic University College of Administrative and Financial Sciences Assignment 3 MGT101 (1st Term 2024-2025) Deadline: 30/11/2024 @ 23:59 (To be released to students on BB in Week 10) Course Name: Principles of Management

    Finance Question

    Description I want to solve the attached assignment and please follow the instructions described on the main page of the duty and I hope that there are no copies or similarities and if I want the solution to be short, I want details ‫المملكة العربية السعودية‬ ‫وزارة التعليم‬ ‫الجامعة السعودية

    Mgt430 internship

    Description I’m working on my final report and presentation for my internship I need support. All details are attached please follow requirement. The report should be submitted within two weeks after you finish your Co-op training Program. In addition, the report should be approximately 3000 – 4000, single –spaced and