Anda di halaman 1dari 17

SQL

- Datawarehouse: Central repository /rɪˈpɒzɪt(ə)ri/ of data stored the data's basically


databases and tables
- Database: organized form of data for rapid search and retrieval /rɪˈtriːvl/
- Database management system (DBMS) is a system software for creating and
managing databases.
- DBMS type:
o Relational Database Management System: The data is stored in tables
containing rows and columns. Relationships are established through
Primary /ˈprʌɪm(ə)ri/ and Foreign /ˈfɒrɪn/ keys.
 Advantage: handle lots of queries, transaction, ACID.
 Disadvantage: cannot store complex or very large, costly
o Non-Relational Database Management System: database provides a
mechanism for storage and retrieval of data that is modeled /ˈmɒd(ə)l/ in
means other than the tabular relations used in relational databases.
 Advantage: Unstruture data, semi-unstruture data, flexible,efficient
- Data integrity is the overall completeness /kəmˈpliːtnəs/, accuracy /ˈakjʊrəsi/
and consistency /kənˈsɪst(ə)nsi/ of data.
This can be indicated by the absence /ˈabs(ə)ns/ of alteration /ɔːltəˈreɪʃ(ə)n/
between two instances or between two updates of a data record, meaning data
is intact /ɪnˈtakt/ and unchanged
o Entity /ˈɛntɪti/ Integrity: concept of primary keys. has to be unique and
not null
o Referential /ˌrɛfəˈrɛnʃ(ə)l/ Integrity: concept of foreign keys. 2 state:
would refer to a primary key value of another table, null (no relationship)
o Domain /də(ʊ)ˈmeɪn/ Integrity: This states that all columns in a relational
database are in a defined domain.
The concept of data integrity ensures that all data in a database can be traced
and connected to the other data. This ensures that everything is recoverable
/rɪˈkʌvərəbl/ and searchable /ˈsəːtʃəb(ə)l/
- SQL command type:
o Data Manipulation /məˌnɪpjʊˈleɪʃ(ə)n/ Language (DML): commands for
inserting, deleting, updating, and selecting data from the database.
o Data Definition Language (DDL): commands for creating tables, defining
relationships, and controlling strutural aspects
o Data Control Language (DCL): define access
- PostgreSQL: clients connecting to the server across a network (TCP/IP) (LAN)
(Internet)
- Query: An act of retrieving info from database
- A transaction is a very small unit of a program and it may contain several
lowlevel tasks
- A transaction in a database system must maintain Atomicity, Consistency,
Isolation, and Durability
o Atomicity: States should be defined either before the execution of the
transaction or after the execution /failure of the transaction.
o Consistency: remain in a consistent state after any transaction. No
transaction should have any adverse effect on the data residing in the
database
o Isolation: In a database system where more than one transaction are
being executed simultaneously and in parallel, No transaction will affect
the existence of any other transaction.
o Durability: makes sure that transactions which have been committed will
be stored permanently.
- State of transactions:
o Active: Excuted
o Partially Committed: When a transaction executes its final operation
o Failed: if any of the checks made by the database recovery system fails
o Aborted: recovery manager rolls back all its write operations on the
database to bring the database back to its original state. Re-start or kill
transaction
o Committed: a transaction executes all its operations successfully, its
effects are now permanently established on the database system.
- Make a transaction on postgreSQL: BIGIN, COMMIT (to make each session is
visible with each others), SAVEPOINT, ROLLBACK (undo the change of the current
transaction)
- Denormalization is a database optimization technique in which we add
redundant data to one or more tables. avoid costly joins, after normalization
o Retrieving data is faster, simple
o Updates and inserts are more expensive.
o Need more storage
o Data may be inconsistent
o For example:
- Entities: An entity is an object that exists. Data can be stored in table.
- Relationships: Relation or link between entities
o One to one
o One to many
o Many to one
o Self-reference
- Operator:
o Arithmetic /ˌarɪθˈmɛtɪk/ operators
o Logical /ˈlɒdʒɪk(ə)l/ operators
o Comparison /kəmˈparɪs(ə)n/ operators
- Primary key: is a column or a group of columns used to identify a row uniquely in
a table. A primary key constraint is combination of a not-null constraint and a unique
constraint. ( set in CREATE TABLE, or define in ALTER TABLE xxx ADD
PRIMARY KEY, remove by DROP CONSTRAINT, default table-name_pkey,
CONSTRAINT constrain_name PRIMARY KEY (column1, column2))
- Foreign key is a column or group of columns in a relational db table that provides
a link between data in two tables. that uniquely identifies a row in another table.
(Using REFENRENCES table(columns), ALTER TABLE __ ADD CONSTRAINT __
FOREIGN KEY () REFERENCE())
- Available privileges /ˈprɪvɪlɪdʒ/ in SQL
o System privilege: perform one or more actions on it which include Admin
o Object privilege: perform actions on an object
- Difference between primary key and Foreign key:
o Primary key cannot accept null values , while foreign key can accept
mutiple null values
o Primary key is permitted to exist in a table while foreign key can be one or
more in the other table.
o Primary key uniquely identify a record in the table while foreign key is a
field in a table that is primary key in another table.
o Primary key is clustered index and data in the table are physically
organized in the sequence of clustered index. foreign key do not
automatically create an index
- Unique keys: (unique constraint) A unique constraint can be used to ensure rows
are unique within the database. A table may have several sets of columns which
you want unique.
o There can be multiple unique keys defined on a table.
o Unique Keys result in NONCLUSTERED Unique Indexes by default
o One or more columns make up a unique key.
o Column may be NULL, but on one NULL per column is allowed.
o A unique constraint can be referenced by a Foreign Key Constraint.
- Compare Primary key and unique key: come from their intended use.
o A primary key main job is to uniquely identify a row within a table, name
for this is entity integrity. For a table to be truly relational it must have a
primary key defined.
o the main job of the unique key allows you to place additional unique
conditions on your columns.

- Constraints: specify the limit on the data type. not null, check, default, unique,
primary key, foreign key
- Check constraint: a kind of constraint that allows you to specify if a value in a
column must meet a specific requirement before update or insert.(in Create table:
CHECK (), in ALTER TABLE __ ADD CONSTRAINT __ CHECK() )
- Not null constraint: to ensure that the value of a column is not null. ( ALTER TABLE
ALTER COLUMN __ SET NOT NULL)

- List all the types of user-defined functions:
o Scalar Functions
o Inline Table-valued functions
o Multi-statement valued functions
- Collation: Collation refers to a set of rules that determine how data is sorted and
compared
o Case Sensitivity: A and a and B and b.
o Kana Sensitivity: Japanese Kana characters.
o Width Sensitivity: Single byte character and double-byte character.
o Accent Sensitivity.
- Local and Global variables:
o Local: can be used or exist only inside the function
o Global: can be accessed throughout the program
- Auto Increment in SQL: Auto-increment allows a unique number to be generated
automatically when a new record is inserted into a table. Usually primary key. In
postgreSQL, do it by using SERIAL in create table
- Query: A query is a request for data or information from a database table or
combination of tables.
- Subquery:
o Query inside another query
o Always executed first
o Use any comparison operators
o Nested inside a SELECT, UPDATE or any other query
o Type:
 Correlated /ˈkɒrələt/ subquery not considered as an independent
 Non-Correlated subquery considered as an independent
- Normalization of Database: is a technique of organizing the data. Normalization
is a systematic approach of decomposing tables to eliminate data
redundancy(repetition) and undesirable /ʌndɪˈzʌɪərəb(ə)l/ characteristics.
o 2 purpose: Eliminating redundan and Ensuring data dependencies make
sense (data is logically stored)
o If a table is not properly normalized and have data redundancy: eat up
extra memory, difficult to handle and update data.
o Normal form:
 First normal form (1NF):
 It should only have single(atomic) valued
attributes/columns.
 Same kind or type
 Unique name for Attributes/Columns
 Order doesn't matters
 Second normal form (2NF):
 It should be in the First Normal form.
 And, it should not have Partial Dependency ( Dependency:
row can be fetch from primary key-> other columns depends
on it)
 Partial Dependency exists, when for a composite primary
key, any attribute in the table depends only on a part of the
primary key and not on the complete primary key.
 Solution remove the row that depends on partial Candidate
key and add them to another table
 Third Normal Form (3NF):
 It is in the Second Normal form.
 And, it doesn't have Transitive Dependency.(When a non-
prime attribute depends on other non-prime attributes
rather than depending upon the primary key.)
 Solution remove 2 that have transitive dependency
 Boyce and Codd bois co:d Normal Form
 3rd Normal Form
 each any dependency ( X → Y ‘x produce y’), X should be a
super Key ( if X is non prime attributes and prime Y is
depend on X -> not in BCNF)
 solution break the link
 R(A,B,C,D) A-> BCD; AB -> CD; D->B break into R1(ACD)
R2(DB)
 Fourth Normal Form (4NF):
 Boyce-Codd Normal Form.
 It doesn't have Multi-Valued Dependency.(For a
dependency A → B, if for a single value of A, multiple value
of B exists, at least 3 columns)
 Solution: decompose the table into 2 tables
- PostgreSQL is a general purpose and object-relational database management
system, the most advanced open source database system.
- ILIKE acts like LIKE but case-insensitively.
- []- a any single characters in brackets
- - range of characters
- ^not in brackets
- Join SQL: combine rows from two or more tables. 6 types:
o INNER JOIN selects records that have matching values in both tables.
o LEFT JOIN Returns all rows from the left table, and the matched rows
from the right table
o RIGHT JOIN Returns all rows from the right table, and the matched rows
from the left table.
o FULL JOIN Returns all rows for which there is a match in EITHER of the
tables
o CROSS JOIN : Returns all records where each row from the first table is
combined with each row from the second table(i.e., returns the Cartesian
product of the sets of rows from the joined tables)
o Natural join: is a join that creates an implicit join based on the same
column names in the joined tables.
 A natural join can be inner, left, right. Default is inner
 does not require to specify the join clause
- Group by: divides rows into groups and applies an aggregate function on each.
- Union:
o The UNION combine the result-set of two or more SELECT statements.
o UNION omit duplicate records
o UNION ALL include duplicate records
o Performance of UNION ALL > UNION
o Two rule:
 Both result set must have same number of columns
 The corresponding columns must have compatible data types.
- Intersect: returns any rows that are available in both result set or returned by
both queries.
- Except: operator returns distinct rows from the first (left) query that are not in
the output of the second (right) query.
- Between and in:
o Between selects data that is a range of values.
o Selects data that matches any value in a list of values.
- Comparison operators in subquery:
o IN, ANY, ALL.
- Clause SQL: limit result set by providing a condition to the query
- Having: applies the condition for groups
o After group by
o Can use without group by
o Only refer to the columns within aggregate functions
- Where filters rows based on a specified condition.
- What is the difference between the RANK() and DENSE_RANK() functions:
o In cases where there is a “tie” (same value).
o RANK() will assign non-consecutive “ranks”.
o DENSE_RANK() will assign consecutive ranks.
o For example, consider the set {25, 25, 50, 75, 75, 100} . For such a
set, RANK() will return {1, 1, 3, 4, 4, 6} (note that the values 2 and 5
are skipped), whereas DENSE_RANK() will return {1,1,2,3,3,4} .

- What is the difference between IN and EXISTS


o IN :
 Works on List result set
 Doesn’t work on subqueries resulting in Virtual tables with multiple
columns
 Compares every value in the result list
 Performance is comparatively SLOW for larger resultset of subquery
o EXISTS :
 Works on Virtual tables
 Is used with co-related queries
 Exits comparison when match is found
 Performance is comparatively FAST for larger resultset of subquery
- EXISTS checks for the existence of rows returned by a subquery.
- The ANY and ALL operators are used with a WHERE or HAVING clause.
o ANY operator returns true if any of the subquery values meet the
condition.
o ANY The subquery must return exactly one column.
o ALL operator returns true if all of the subquery values meet the condition.
- Common Table Expressions are temporary in the sense that they only exist
during the execution of the query. used to simplify complex joins and
subqueries.
- SQL functions use:
o To perform some calculations on the data
o To modify individual data items
o To manipulate the output
o To format dates and numbers
o To convert the data types
- Recursive stored procedure: to use the same set of code n numbers of time
- difference between DELETE and TRUNCATE statements:
o DELETE
 delete a row
 can rollback
 DML command
 Slower
o TRUNCATE
 delete all the row
 can’t rollback
 DDL command
 Faster
- Difference between DROP and TRUNCATE commands:
o DROP
 removes a table
 cannot be rolled back
o TRUNCATE
 removes all the rows
- Varchar and char:
o Varchar variable-length with limit, use dynamic memory allocation.
 If don’t specify the n integer in varchar(n) = text
 Advantage: check and issue error if insert a longer string in to the
column
o Char string value of fixed-length, blank padded, hold 255 chars, use static
memory allocation
 Without n, char() = char(1)
 If (space)1111111 -> truncate to 11111111 to store
- Create a database:
o Db name: must be unique in the PostgreSQL database server
o OWNER: role_name of the user who will own the new database
o Template: is the name of the database template from which the new
database creates
o Encoding: specifies the character set encoding for the new database.
o LC_collate: specifies a collation for the new database. The collation
specifies the sort order of strings that affect the result of the ORDER BY
clause in the SELECT statement.
o LC_ctype: specifies the character classification for the new database.
o Tablespace: specifies the tablespace name for the new database.
o CONNECTION LIMIT: specifies the maximum concurrent connections to
the new database. The default is -1 i.e., unlimited. ( To set a limit on the
number of users simultaneously /ˌsɪmlˈteɪnɪəsli/ accessing the instance)
- Alter Database: Using ALTER DATABASE
o Rename database: RENAME TO.
o Change owner: OWNER TO, if not exist use VALID UNTIL ‘infinity’;
o Change tablespace: SET TABLESPACE
 connecting to another database to disconnect the database which I
want to rename
 check the all active connections to the database by using a query of
pg_stat_activity where datname = ’db’
 terminate all connection to avoid data loss
o DROP: Delete a Database.
- Get table, Databases, Index, tablespace, value size:
o To get the size of a specific table, use pg_relation_size(), pg_size_pretty()
- Copy a database:
o dump the source database to a file
o copy the dump file to the remote server.
o create a new database in the remote server.
o restore the dump file on the remote server.
- A sequence in PostgreSQL generates a sequence of integers based on a specified
specification.( CREATE SEQUENCE/increment/start/stop.. SELECT nextval())
- Schema SQL: a schema is a namespace that contains named database objects
such as tables, views, indexes, data types, functions, and operators. A database
can contain one or multiple schemas while each schema belongs to only one
database.

o Schemas allow you to organize database objects e.g., tables into logical
groups to make them more manageable.
o Schemas enable multiple users to use one database without interfering
with each other.
o Create a schema named public
- CTE (common table expressions) to simplify complex queries. Common Table
Expressions are temporary in the sense that they only exist during the execution
of the query.
- Type of Relationships
o One-one relationships (1-1 Relationship)
 Defined as the relationship between two tables where both the
tables should be associated with each other based on only one
matching row.
 Using primary key and unique foreign key constraints.
o One-Many relationships (1-M Relationship)
 Defined as a relationship between two tables where a row from one
table can have multiple matching rows in another table.
 Using primary key and foreign key relationship.
- Trigger SQL:
o is a special user function invoked /ɪnˈvəʊk/ automatically whenever an
event associated with a table occurs. An event could be any of the
following: INSERT, UPDATE, DELETE or TRUNCATE. (data modification)
o 2 main type: row and statement level triggers (depend on number of time
invoked)
o Useful in case:
 To implement a sort of data change log to a table to keep history of
data
 Preserve data or referential integrity where the application
programmers could not be trusted to do so consistently.
 maintain complex data integrity rules which you cannot implement
else where except at the database level. For example, when a new
row is added into the customer table, other rows must be also
created in tables of banks and credits.
o Create a trigger:
 First, create a trigger function using CREATE FUNCTION statement.
 Second, bind the trigger function to a table by using CREATE
TRIGGER statement.
 NEW OLD state of data
 Before/ after / instead of

- View:
o A view is a database object that is of a stored query. view is a logical table
that represents data of one or more underlying tables through a SELECT
statement.
o virtual table
o consists of a subset of data
o not present, less space to store
o View used for:
 grant permission to users through a view that contains specific data
that the users are authorized to see.
 Making complex queries simple
 Ensuring data independence
 Providing different views of the same data
o Materialized view views to store data physically
 A materialized view caches the result of a complex expensive
query and then allow you to refresh this result periodically.
 The materialized views are useful in many cases that require fast
data access therefore they are often used in data warehouses or
business intelligent applications.

 Create
 With data = load data in creation time / with no data = load by
refresh
 Concurrently option creates a temporary updated version of the
materialized view, loading without lock
 Using With check option on where the specific columns to create
updatable view
- Store Procedure: a function which consists of many SQL statements
o Advantages
 used as a modular programming
 reusable in many applications
 Reduce the number of round trips between applications and
database servers. The application only has to issue a function call to
get the result back instead of sending multiple SQL statements and
wait for the result between each call.
 reduces network traffic and provides better security
o Disadvantage
 executed only in the database utilizes more memory
o Block-structure
 2 section: declaration (optional) and body


 ; semicolon
 declaration /dɛkləˈreɪʃ(ə)n/ : is where declare /dɪˈklɛː/ all variables
within the body section.
 Label for specify it in exit statement of the block body / qualify the
name of the variables
 Subblock is inside body of another block
 if declare a variable within subblock with the same name as
the one in the outer block, the variable in the outer block is
hidden in the subblock.
 Access variable of outer by label.variable
o Variable: is a meaningful name for a memory location. A variable holds a
value that can be changed through the block or function.


o Constants: cannot be changed
o User define function


 Parameters : IN / OUT / INOUT / VARIADIC:
 IN : cannot get parameters back as a part of the result. (Default)
 OUT: parameters are defined as part of the function
arguments list and are returned back as a part of the result.
 INOUT: combination IN OUT. Pass value in function then
change the argument and passed the value as a part the
result.
 VARIADIC: pass an array
o Cursor allows us to encapsulate a query and process each individual row
at a time. We use cursors when we want to divide a large result set into
parts and process each part individually. If we process it at once, we may
have a memory overflow error.
o Stored procedure: support for transactions (funtional user cannot)
o PostgreSQL allows you to extend the database functionality with user-
defined functions and stored procedures

o The store procedures define functions for creating triggers or custom


aggregate functions. In addition, stored procedures also add many
procedural features e.g., control structures and complex calculation.

- Index:
o A performance tuning method. indexes are effective tools to enhance
database performance
o Creates an entry for each value
o Faster to retrieve data
o Disadvantage: indexes add write and storage overheads to the database
system
o An index is a separated data structure. B-Tree that speeds up the data
retrieval on a table at the cost of additional writes and storage to maintain
it.
- Index is create on columns of tables or view. For example, we make an index in
primary key, then find a records depended on a value. First, SQL find an index
correspond to the value, then using the index to find the position of the record.
SQL no longer scan all value in the primary key.
- Index type:
o B-tree: self-balancing tree logarithmic time
o Hash indexes


 Only = operator
o GIN indexes : generalized inverted indexes. GIN indexes are most useful
when you have multiple values stored in a single column,
o BRIN stands for block range indexes.
 BRIN is much smaller and less costly to maintain in comparison with
a B-tree index.
 very large table
 BRIN is often used on a column that has a linear sort order
o GiST stands for Generalized Search Tree. For indexing geometric data
types and full-text search.
o SP-GiST stands for space-partitioned GiST. for data that has a natural
clustering element to it and is also not an equally balanced tree, for
example, GIS, multimedia, phone routing, and IP routing.
- Partial index: specify the rows of a table that should be indexed.

- Multicolumn index: take multicolumns for index


- Index:
o Clustered index:
 Clustered indexes sort and store the data rows in the table or view
based on their key values. There can be only one clustered index
per table.
o Non- cluster index:
 have a structure separate from the data rows. A nonclustered index
contains the nonclustered index key values and each key value
entry has a pointer to the data row that contains the key value.
Point to data : row locator. For a heap, a row locator is a pointer to
the row. For a clustered table, the row locator is the clustered
index key.
o Unique index:
 Value of 2 or more columns is unique
 Applied automatically if a primary key is defined
- Index Design: waste storage, costly performance when updating data.
o Big data, low frequency of update:
 Using index to enhance performance
 In small table: perf of scan table > perf find index
o For cluster index, small length, unique, null, usually use primary key
o Perf of index depends on Uniqueness. Using unique index for first name
o For composite index ( 2 or more columns), for example Where A = B and C
= D, uniqueness of A > C.
o Also index for computed columns.
- Differences between a stored procedure and a trigger:
o execute a stored procedure whenever we want with the help of the exec
command, but a trigger can only be executed whenever an event
o call a stored procedure from inside another stored procedure but we can't
directly call another trigger within a trigger.
o Stored procedures can be scheduled through a job to execute on a
predefined time, but we can't schedule a trigger.
o Stored procedure can take input parameters, but we can't pass
parameters as input to a trigger
o We can use transaction statements like begin transaction, commit
transaction, and rollback inside a stored procedure but we can't use
transaction statements inside a trigger.
- Connect postgreSQL to python

Anda mungkin juga menyukai