# Data Development Process-Continued

#### Union

Union is a binary operation that takes two relations, R and S, as input and produces an output relation that contains all tuples that are either in R or S, or both R and S. Duplicate tuples are filtered out.

R and S are relationships, and t is a tuple variable.

R and S must be compatible with one other. To be union-compatible, the relationships must have the same number of attributes and the same domains.

It’s worth noting that the attribute names in both relations don’t have to be identical. The first relation’s characteristics are present in the result.

**For Example**

#### Set Difference

Set difference is a binary operation that takes two r elations R and S as input and pro duces an output relation that contains all the tuples of R that are not in S.

When R and S are relations, t is a tuple variable.

Take note of the following:

- R – S S – R
- The union of R and S must be possible.

**For Example**

#### Intersection

Intersection is a binary operation that takes two relations, R and S, as input and outputs a relation that contains all tuples in both R and S.

t is a tuple variable, and R, S are relations.

The union of R and S must be possible.

Also note that, *R * *S *= *R *– (*R *– *S*) = *S *– (*S *– *R*).

**For Example,**

#### Cartesian product

The Cartesian product of two degree k1 relations R and S is:

R S is a degree (k1 + k2) relation that consists of all (k1 + k2)-tuples, where each tuple is a concatenation of one R tuple and one S tuple.

|R| * |S| is the cardinality of R x S.

The cross product is another name for the Cartesian product.

**For Example**

You might be interested in: Introduction to Database Systems

#### Join

Theta join is a Cartesian product derivative. We only pick a subset of those tuples from R and S that meet a specific condition on F: instead of collecting all combinations of tuples from R and S, we only select a subset of those tuples that satisfy a particular condition on F:

- The variables R and S are relations, while t is a tuple variable.
- F(t) is a formula that is defined as a selection formula.

**For Example**

#### Types of EquiJoin

The ϴ-Join is a general join in the sense that it accepts any expression in condition F. More specialized joins, on the other hand, are frequently employed.

In formula F, an equijoin only contains the equality opertor (=).

WorksOn ⨝ *WorksOn.pn**o** **= Proj.pno *Proj

The equijoin of R and S over a set of attributes common to both R and S is called a natural join over two relations R and S, represented by R ⨝ S.

- It gets rid of the join characteristics’ “excess copies.”
- In both relations, the characteristics must have the same name.

**Example of Equijoin:**

**Example of Natural Join**

#### Outer Joins

Outer joins are used when performing a join “loses” some of the relations’ tuples. Dangling tuples are what they’re called.

There are three different types of outer joins:

1. **Left outer join** – The output comprises all R tuples that correspond to S tuples. If a tuple in R matches a tuple in S that does not exist, the tuple is included in the final result and padded with nulls for S’s attributes.

2. **Right outer join** – The result includes all S tuples that match R tuples. If a tuple in S matches a tuple in R that does not exist, the tuple is included in the final result and padded with nulls for R’s attributes.

3. **Full outer join** – Whether or not R and S have a matching tuple in the other relation, all R and S tuples are included in the result.

**Example of Right Outer Join**

#### Semi-Join and Outer-Join

A semi-join between two tables produces rows from the first table that have one or more matches in the second table.

In SQL, semi-joins are utilised in EXISTS and in CONSTRUCTS.

An anti-join between two tables returns rows from the first table where there are no matches in the second.

NOT EXISTS, NOT IN, and FOR ALL all employ anti-joins.

The counterpart of semi-join is anti-join: R ⊳S = R – R ⋉ S

**Semi-Join Example**

**Anti-Join Example**

#### Aside: Division Operator

There is also a division operator that is used to determine if all possible combinations of a connection exist.

E.g. Return a list of all workers who work on all of ‘John Smith’s’ projects

We won’t spend any time on the division operator because it isn’t a basic operator and isn’t frequently utilised.

Note that:

#### Combining Operators

Relational algebra operations can be combined in one expression by string operator:

When employee ‘J. Doe’ has worked on a project for more than 16 months, return his eno, pno, and duration.

Operations can also be combined by storing intermediate outcomes in temporary relation variables.

The assignment operator ¬ will be used to indicate that the outcome of an operation will be assigned to a temporary relation.

#### Rename Operator

When assigning a result, renaming is an option:

result(EmployeeNum, ProjectNum, Duration) ¬ P* _{eno,pno,dur} *(

*empwo*)

Alternatively, you may use the rename operator (rho)

#### Operator Precedence

The relati onal operat ors, like math ematic al operat ors, take precedence. From highest to lowest, the following operators take precedence:

- unary operators – s, P, r
- Cartesian product and joins – X, ⨝
- intersection and division
- union and a set

The order of operations can be changed using parentheses.

Because there is no general agreement on operator precedence, we always utilize parentheses and the argument for both unary and binary operations.

Also Read: Database Development Process