Using DUMP, CHR, and REPLACE character functions

task#7 The set of sentences that will be analysed consist of character strings with the following pattern:

Number#1 Word#2 Number#2 Word#3 Number#3 Word#4 Number#4 Word#5 Number#5

Substrings Number#1, Number#2, Number#3, Number#4, Number#5 need to be extracted from the text and saved into separate table columns as numbers in order to be used as parameters in SQL aggregate functions for further analysis.

The length of each sentence is unknown as well as the format of Number# and Word# substrings.

step#1 – Start the SQL Developer

Start SQL Developer

Start SQL Developer

step#2 – The collected set of sentences is saved into the table DEMO.TEXTS. Each sentence is represented as a string of VARCHAR2(256) in the column WORD_SET.

As the table TEXTS is too long , I selected 6 rows that represent a typical characters sets for this task.

The Set Of Strings

The Set Of Strings

What is strange with this set of strings is that they appear to be of the similar sizes, but their exact lengths are different. The shortest string is 29 characters long, and the longest one is 166 characters long. Obviously, the longer strings contain some other ASCII non-printable characters. How to see them?

step#3 – Apply SQL DUMP function to reveal the content of strings.

Apply the DUMP Function

Apply the DUMP Function

The Result Set shows that our strings contain ASCII characters:

CHR(10) or Line feed, ‘\n’

CHR(13) or Carriage return, ‘\r’

CHR(32) or Space

step#4 – With the simple update statement and REPLACE function, characters chr(10), chr(13), and chr(32) will be removed from all strings:

The UPDATE Statement with REPLACE Functions

The UPDATE Statement with REPLACE Functions

step#5 – All strings are clean and contain only “0-9”, “a-z”, and “A-Z” characters.

The Clean Strings

The Clean Strings

step#6 – The typical string looks like

5,219Spring28.5KSummer44.1KAutumn2,557Winter6

The following procedure called proc_extract will extract substrings with numbers and save them in separate columns as NUMBER data type.

Those columns are table TEXTS columns named

TOTAL for Number#1,

SPRING for Number#2,

SUMMER for Number#3,

AUTUMN for Number#4, and

WINTER for Number#5 substrings.

create or replace procedure proc_extract

authid current_user

is

  type type_id is table of DEMO.texts.word_id%TYPE;

  l_my_id type_id := type_id();

  type t_stats is table of DEMO.texts.word_set%TYPE;

  l_my_list t_stats := t_stats();

  l_last_id number := 0;

  l_len1 number;

  l_len2 number;

  l_my_str varchar2(256);

  l_my_str2 varchar2(256);

  l_my_i DEMO.texts.word_id%TYPE;

begin

  select word_id, word_set bulk collect into l_my_id, l_my_list

  from texts

  where word_set <> 'X';

  dbms_output.put_line(l_my_list.COUNT);

  for i in l_my_id.FIRST..l_my_id.COUNT

  loop

    l_my_i := l_my_id(i);

    l_my_str := l_my_list(i);

    l_len2 := length (l_my_str);

    if (instr(l_my_str,'Winter', 1, 1) <> 0 ) then

      l_len1 := instr(l_my_str,'Winter', 1, 1);

      l_my_str2 := substr(l_my_str, l_len1, l_len2 - l_len1 + 1);

      l_my_str2 := funct_format_num(replace(l_my_str2, 'Winter') );

      l_my_str := substr(l_my_str, 1, l_len1 - 1);

      dbms_output.put_line(l_my_str2);

      dbms_output.put_line(l_my_str);

      update texts set winter = to_number(l_my_str2) where word_id = l_my_i;

    end if;

    l_len2 := length (l_my_str);

    if (instr(l_my_str,'Autumn',1,1) <> 0 ) then

      l_len1 := instr(l_my_str, 'Autumn', 1, 1);

      l_my_str2:= substr(l_my_str, l_len1, l_len2 - l_len1 +1);

      l_my_str2 := funct_format_num(replace(l_my_str2,'Autumn'));

      l_my_str := substr(l_my_str, 1, l_len1 - 1);

      dbms_output.put_line(l_my_str2);

      dbms_output.put_line(l_my_str);

      update texts set autumn = to_number(l_my_str2) where word_id = l_my_i;

    end if;

    l_len2 := length (l_my_str);

    if (instr(l_my_str,'Summer',1,1) <> 0 ) then

      l_len1 := instr(l_my_str,'Summer',1,1);

      l_my_str2:= substr(l_my_str, l_len1 , l_len2 - l_len1 + 1);

      l_my_str2 := funct_format_num(replace(l_my_str2,'Summer'));

      l_my_str := substr(l_my_str, 1, l_len1 - 1);

      dbms_output.put_line(l_my_str2);

      dbms_output.put_line(l_my_str);

      update texts set summer = to_number(l_my_str2) where word_id = l_my_i;

    end if;

    l_len2 := length (l_my_str);

    if (instr(l_my_str,'Spring',1,1) <> 0 ) then

      l_len1 := instr(l_my_str,'Spring',1,1);

      l_my_str2:= substr(l_my_str, l_len1 , l_len2 - l_len1 + 1);

      l_my_str2 := funct_format_num(replace(l_my_str2,'Spring'));

      l_my_str := substr(l_my_str, 1, l_len1 - 1);

      dbms_output.put_line(l_my_str2);

      dbms_output.put_line(l_my_str);

      update texts set spring = to_number(l_my_str2) where word_id = l_my_i;

    end if;

    l_len2 := length (l_my_str);

    if l_len2 > 0 then

      l_my_str := funct_format_num(l_my_str);

      dbms_output.put_line(l_my_str);

      update texts set total = to_number(l_my_str) where word_id = l_my_i;

    end if;
 
  end loop;

  commit;

exception

when others then

  dbms_output.put_line(SQLERRM);

end;

The function funct_format_num will transform the existing number format “9,999”, “99.9K”, “999K”, and “99.9M” into simple “9999999” format that is easy to use.

create or replace function funct_format_num (p_in_str varchar2)

return varchar2

authid current_user

as

  l_s varchar2(256);

begin

  l_s := p_in_str;

  if (instr(p_in_str,'.') <> 0 AND instr(p_in_str,'K') <> 0 ) then

    l_s := replace(replace(l_s,'.'),'K');

    l_s := l_s || '00';

  elsif instr(p_in_str,'K') <> 0 then

    l_s := replace(l_s,'K');

    l_s := l_s || '000';

  elsif ( instr(p_in_str,'.') <> 0 AND instr(p_in_str,'M')<> 0 ) then

    l_s:= replace(replace(l_s,'.'),'M');

    l_s:= l_s || '00000';

  elsif instr(p_in_str,',') <> 0 then

    l_s := replace(l_s,',');

  end if;

  return l_s;

exception

when others then

  raise;

end;

step#7 – After execution of the procedure proc_extract, the table is populated with correct values.

The Table DEMO.TEXTS

The Table DEMO.TEXTS

step#8 – Now, it is easy to execute any SQL aggregate function on TEXTS columns

Apply Aggregate Functions

Apply Aggregate Functions

Summary

Sometimes, we do not need to write long procedures with FOR loops and analyze each table column that contains character strings. Instead, the rule “Use SQL statement first.” is the best choice. Simple UPDATE statement combined with CHR and REPLACE functions will save the day and our time and the helpful DUMP function gives us the evidence what non-printable characters are inside our character strings.

The other story is when the once clean strings that we have, should be divided to several substrings. Then, SQL functions SUBSTR, INSTR, and LENGTH should be used and appropriate subprograms should be written as well to finish the task.

Reminder

DUMP (expr, return_fmt, start_position, length)

returns a VARCHAR2 value containing the data type code, length in bytes, and internal representation of expr. The returned result is always in the database character set. [ref2 ]

CHR (n USING NCHAR_CS)

returns the character having the binary equivalent to n as a VARCHAR2 value in either the database character set or, if you specify USING NCHAR_CS, the national character set. [ref2 ]

REPLACE (char, search_string, replacement_string)

returns char with every occurrence of search_string replaced with replacement_string. If replacement_string is omitted or null, then all occurrences of search_string are removed. If search_string is null, then char is returned. [ref2 ]

Reference

[ref2] Oracle® Database SQL Language Reference 12c Release 1 (12.1) E41329-09