Tuesday, October 25, 2011

Analytic function to the rescue! Again.

My current APEX project has a requirement to show a chart on one of the pages. No big deal. Usually. Because it should represent some value over time and that value would be stored in the database....could be every second, this chart could contain 10,000's, if not 100,000's points. 
So generating the XML, transferring the XML to the browser and interpreting the XML by the chart engine....was slow...
So I had to come up with a solution to reduce the number of points, without destroying the goal of the chart. Oh, and did I mention that the value could be stored every second, but could also be every minute, hour, whatever?
The first thing I came up with was the SAMPLE clause. Never heard of it and never used it before. You can just do a SELECT * FROM EMP SAMPLE(10) and as a result, you'll get 10% of the rows of the EMP table. The only withdrawal with that was, the result could be different every time. So when refreshing a chart, the chart could look really different. Another, more minor, hiccup was, the sample size  should be hard-coded and couldn't be parametrised (another "smaller" subrequirement).
So after some research I stumbled upon an Analytic Function, that might do the trick: NTILE( number ). This function "divides an ordered data set into a number of buckets indicated by the number parameter and assigns the appropriate bucket number to each row" (quote from the documentation - couldn't say it better). So using this function, you can equally divide 100,000 records in 25 buckets - order by timestamp. And once you've done that you can easily calculate the average value per bucket. And the average timestamp as well. And just use these values to generate a more minimalistic XML document... O, and another fine thing: you can pass any number parameter to NTILE, so using the same query, generate either 10 or 10,000 points...

As a - functionally useless - example a query to generate 15 rows with the average object_id and the average create date of all_objects:
select  to_char( startdate + ( enddate - startdate )/2
               , 'YYMMDD HH24:MI:SS' ) label
,       average
from
(
select min(created)  startdate
,      max(created)  enddate
,      avg(value)    average
from
(
select object_id value
,      created
,      ntile( 15 ) over ( order by created ) as bucket
from all_objects
)
group by bucket
order by 1
)
Post a Comment