ANALYTICAL FUNCTIONS

ANALYTICAL FUNCTIONS

The general syntax of analytic function is:

Function(arg1,…, argn) OVER ( [PARTITION BY ] [ORDER BY ] [] )

COUNT()

SELECT empno, deptno,
COUNT(*) OVER (PARTITION BY
deptno) DEPT_COUNT
FROM emp
WHERE deptno IN (20, 30);

EMPNO     DEPTNO DEPT_COUNT
———- ———- ———-
7369         20          5
7566         20          5
7788         20          5
7902         20          5
7876         20          5
7499         30          6
7900         30          6
7844         30          6
7698         30          6
7654         30          6
7521         30          6

ROW_NUMBER

All the above three functions assign integer values to the rows depending on their order.

ROW_NUMBER( ) gives a running serial number to a partition of records. It is very useful in reporting, especially in places where different partitions have their own serial numbers.

SELECT empno, deptno, hiredate,
ROW_NUMBER( ) OVER (PARTITION BY
deptno ORDER BY hiredate
NULLS LAST) SRLNO
FROM emp
WHERE deptno IN (10, 20)
ORDER BY deptno, SRLNO;

EMPNO  DEPTNO HIREDATE       SRLNO
—— ——- ——— ———-
7782      10 09-JUN-81          1
7839      10 17-NOV-81          2
7934      10 23-JAN-82          3
7369      20 17-DEC-80          1
7566      20 02-APR-81          2
7902      20 03-DEC-81          3
7788      20 09-DEC-82          4
7876      20 12-JAN-83          5

RANK and DENSE_RANK

RANK and DENSE_RANK both provide rank to the records based on some column value or expression. In case of a tie of 2 records at position N, RANK declares 2 positions N and skips position N+1 and gives position N+2 to the next record. While DENSE_RANK declares 2 positions N but does not skip position N+1.

SELECT empno, deptno, sal,
RANK() OVER (PARTITION BY deptno
ORDER BY sal DESC NULLS LAST) RANK,
DENSE_RANK() OVER (PARTITION BY
deptno ORDER BY sal DESC NULLS
LAST) DENSE_RANK
FROM emp
WHERE deptno IN (10, 20)
ORDER BY 2, RANK;

EMPNO  DEPTNO   SAL  RANK DENSE_RANK
—— ——- —– —– ———-
7839      10  5000     1          1
7782      10  2450     2          2
7934      10  1300     3          3
7788      20  3000     1          1
7902      20  3000     1          1
7566      20  2975     3          2
7876      20  1100     4          3
7369      20   800     5          4

LEAD and LAG

LEAD has the ability to compute an expression on the next rows (rows which are going to come after the current row) and return the value to the current row. The general syntax of LEAD is shown below:

LEAD (, , ) OVER ()

is the expression to compute from the leading row.

is the index of the leading row relative to the current row.

is a positive integer with default 1.

is the value to return if the points to a row outside the partition range.

The syntax of LAG is similar except that the offset for LAG goes into the previous rows.

SELECT deptno, empno, sal,
LEAD(sal, 1, 0) OVER (PARTITION BY dept ORDER BY sal DESC NULLS LAST) NEXT_LOWER_SAL,
LAG(sal, 1, 0) OVER (PARTITION BY dept ORDER BY sal DESC NULLS LAST) PREV_HIGHER_SAL
FROM emp
WHERE deptno IN (10, 20)
ORDER BY deptno, sal DESC;

DEPTNO  EMPNO   SAL NEXT_LOWER_SAL PREV_HIGHER_SAL
——- —— —– ————– —————
10   7839  5000           2450               0
10   7782  2450           1300            5000
10   7934  1300              0            2450
20   7788  3000           3000               0
20   7902  3000           2975            3000
20   7566  2975           1100            3000
20   7876  1100            800            2975
20   7369   800              0            1100

FIRST VALUE and LAST VALUE

The general syntax is:

FIRST_VALUE() OVER ()

The FIRST_VALUE analytic function picks the first record from the partition after doing the ORDER BY. The is computed on the columns of this first record and results are returned. The LAST_VALUE function is used in similar context except that it acts on the last record of the partition.

— How many days after the first hire of each department were the next
— employees hired?

SELECT empno, deptno, hiredate – FIRST_VALUE(hiredate)
OVER (PARTITION BY deptno ORDER BY hiredate) DAY_GAP
FROM emp
WHERE deptno IN (20, 30)
ORDER BY deptno, DAY_GAP;

EMPNO     DEPTNO    DAY_GAP
———- ———- ———-
7369         20          0
7566         20        106
7902         20        351
7788         20        722
7876         20        756
7499         30          0
7521         30          2
7698         30         70
7844         30        200
7654         30        220
7900         30        286

FIRST and LAST

The FIRST function (or more properly KEEP FIRST function) is used in a very special situation. Suppose we rank a group of record and found several records in the first rank. Now we want to apply an aggregate function on the records of the first rank. KEEP FIRST enables that.

The general syntax is:

Function( ) KEEP (DENSE_RANK FIRST ORDER BY ) OVER ()

Please note that FIRST and LAST are the only functions that deviate from the general syntax of analytic functions. They do not have the ORDER BY inside the OVER clause. Neither do they support any clause. The ranking done in FIRST and LAST is always DENSE_RANK. The query below shows the usage of FIRST function. The LAST function is used in similar context to perform computations on last ranked records.

— How each employee’s salary compare with the average salary of the first
— year hires of their department?

SELECT empno, deptno, TO_CHAR(hiredate,’YYYY’) HIRE_YR, sal,
TRUNC(
AVG(sal) KEEP (DENSE_RANK FIRST
ORDER BY TO_CHAR(hiredate,’YYYY’) )
OVER (PARTITION BY deptno)
) AVG_SAL_YR1_HIRE
FROM emp
WHERE deptno IN (20, 10)
ORDER BY deptno, empno, HIRE_YR;

EMPNO     DEPTNO HIRE        SAL AVG_SAL_YR1_HIRE
———- ———- —- ———- —————-
7782         10 1981       2450             3725
7839         10 1981       5000             3725
7934         10 1982       1300             3725
7369         20 1980        800              800
7566         20 1981       2975              800
7788         20 1982       3000              800
7876         20 1983       1100              800
7902         20 1981       3000              800

LISTAGG

String Aggregation Techniques – On occasion it is necessary to aggregate data from a number of rows into a single row, giving a list of data associated with a specific value.

LISTAGG function concatenates values of the measure_column for each GROUP based on the order_by_clause.

LISTAGG (measure_column [, ‘delimiter’])

WITHIN GROUP (order_by_clause) [OVER (query_partition_clause)]

measure_column

The column or expression whose values you wish to concatenate together in the result set. Null values in the measure_column are ignored.

Delimiter – Optional. It is the delimiter to use when separating the measure_column values when outputting the results.

Order_by_clause – It determines the order that the concatenated values (ie: measure_column) are returned.

Base Data:

DEPTNO ENAME
———- ———-
20 SMITH
30 ALLEN
30 WARD
20 JONES
30 MARTIN
30 BLAKE
10 CLARK
20 SCOTT
10 KING
30 TURNER
20 ADAMS
30 JAMES
20 FORD
10 MILLER

Desired Output:

DEPTNO EMPLOYEES
———- ————————————————–
10 CLARK,KING,MILLER
20 SMITH,FORD,ADAMS,SCOTT,JONES
30 ALLEN,BLAKE,MARTIN,TURNER,JAMES,WARD

SELECT deptno, LISTAGG(ename, ‘,’) WITHIN GROUP (ORDER BY ename) AS employees
FROM   emp
GROUP BY deptno;

Advertisements

Leave a Reply

Fill in your details below or click an icon to log in:

WordPress.com Logo

You are commenting using your WordPress.com account. Log Out / Change )

Twitter picture

You are commenting using your Twitter account. Log Out / Change )

Facebook photo

You are commenting using your Facebook account. Log Out / Change )

Google+ photo

You are commenting using your Google+ account. Log Out / Change )

Connecting to %s