My version of SQL string to table tokenizer

This one’s a short post on a fairly random topic as unfortunately I don’t have time today to come up with anything deeper :)

I needed to come up with a delimited string to table tokenizer for an Oracle development project. There are quite a few examples out there how to do that, including Adrian Billington’s www.oracle-developer.net and the http://technology.amis.nl/blog/?p=1631.

So far the simplest solution I had seen was using a bunch of INSTR’s, SUBSTR’s and DECODE’s in a CONNECT BY loop.

However as this application is using Oracle 10g, I could use regular expressions to make the SQL even shorter:

Tanel@Sol01> define separator=":"
Tanel@Sol01> define mystring="A:BCD::EFG:H:IJKL"
Tanel@Sol01>
Tanel@Sol01> SELECT
  2             LEVEL,
  3             REGEXP_REPLACE(
  4                     REGEXP_SUBSTR( '&mystring'||'&separator', '(.*?)&separator', 1, LEVEL )
  5                     , '&separator$'
  6                     , ''
  7             ) TOKEN
  8  FROM
  9     DUAL
 10  CONNECT BY
 11     REGEXP_INSTR( '&mystring'||'&separator', '(.*?)&separator', 1, LEVEL ) > 0
 12  ORDER BY
 13     LEVEL ASC
 14  /

LEVEL TOKEN
----- ----------
    1 A
    2 BCD
    3
    4 EFG
    5 H
    6 IJKL

6 rows selected.

For anyone not yet familiar with regular expressions, I definitely recommend to get to know at least the basics – they can save you hours (or even days) of work in some cases. And the cool thing is that regular expressions are fairly standard across the development platforms too (e.g. if you know Oracle regexp, then it’s very easy to handle Java, perl or C# regexp too).

And regexp is not only a developer thing, it can be very useful for everyday production DBAs as well. For example, every decent “grep” utility has regexp support built in – this makes search for complex conditions from log or config files easier for you.

My version of SQL string to table tokenizer

Tanel Poder

2007-06-19