APEX Data Parser - How to Parse XLSX and CSV

APEX Data Parser - How to Parse XLSX and CSV?

Preview PARSE Function
This is the main parser function. It allows to parse XML, XLSX, CSV or JSON files and returns a generic table of the following structure:

Structure of parser return

LINE_NUMBER, COL001, COL002, COL003, COL004 … COL300

We have line_number and 300 other columns, having said that we can read 300 columns from the above four file formats.

Parser Function parameters.

FUNCTION parse (p_content                       IN BLOB,
                p_file_name                     IN VARCHAR2 DEFAULT NULL,
                p_file_type                     IN t_file_type DEFAULT NULL,
                p_file_profile                  IN CLOB DEFAULT NULL,
                p_detect_data_types             IN VARCHAR2 DEFAULT 'Y',
                p_decimal_char                  IN VARCHAR2 DEFAULT NULL,
                p_xlsx_sheet_name               IN VARCHAR2 DEFAULT NULL,
                p_row_selector                  IN VARCHAR2 DEFAULT NULL,
                p_csv_row_delimiter             IN VARCHAR2 DEFAULT lf,
                p_csv_col_delimiter             IN VARCHAR2 DEFAULT NULL,
                p_csv_enclosed                  IN VARCHAR2 DEFAULT '"',
                p_skip_rows                     IN PLS_INTEGER DEFAULT 0,
                p_add_headers_row               IN VARCHAR2 DEFAULT 'N',
                p_file_charset                  IN VARCHAR2 DEFAULT 'AL32UTF8',
                p_max_rows                      IN NUMBER DEFAULT NULL,
                p_return_rows                   IN NUMBER DEFAULT NULL,
                p_store_profile_to_collection   IN VARCHAR2 DEFAULT NULL)
   RETURN wwv_flow_t_parser_table   PIPELINED;
Parameter Description
P_CONTENT The file content to be parsed as a BLOB
P_FILE_NAME The name of the file; only used to derive the file type. Either P_FILE_NAME, P_FILE_TYPE or P_FILE_PROFILE
P_FILE_TYPE The type of the file to be parsed. Use this to explicitly pass the file type in. Either P_FILE_NAME, P_FILE_TYPE or P_FILE_PROFILE must be passed in.
P_FILE_PROFILE File profile to be used for parsing. The file profile might have been computed in a previous PARSE() invocation. If passed in again, the function will skip some profile detection logic and use the passed in profile - in order to improve performance.
P_DETECT_DATA_TYPES Whether to detect data types (NUMBER, DATE, TIMESTAMP) during parsing. If set to ‘Y’, the function will compute the file profile and also add data type information to it. If set to ‘N’, no data types will be detected and all columns will be VARCHAR2. Default is ‘Y’.
P_DECIMAL_CHAR Use this decimal character when trying to detect NUMBER data types. If not specified,the procedure will auto-detect the decimal character.
P_XLSX_SHEET_NAME For XLSX workbooks. The name of the worksheet to parse. If omitted, the function uses the first worksheet found.
P_ROW_SELECTOR For JSON and XML files. Pointer to the array / list of rows within the JSON or XML file. If omitted, the function will:
For XML files: Use “//” (first tag under the root tag) as the row selector.
For JSON files: Look for a JSON array and use the first array found.
For XML files: Use “//” (first tag under the root tag) as the row selector.
For JSON files: Look for a JSON array and use the first array found.

Examples:

CREATE TABLE loader_files
(
   id             NUMBER,
   file_content   BLOB
);

How to parse XLSX?

Insert excel file, Let’s take a workbook with two worksheets.

Worksheet 1

Line No ITEMS Store
1 Apple New Delhi
2 Shampoo Noida
3 Milk Gurugram
4 Veggies Pune

Worksheet 2

Item Qty
10001 Ab1
10002 Ab2
10003 Ab3
10004 Ab4
10005 Ab5
10006 Ab6
10007 Ab7
10008 Ab8
10009 Ab9
10010 Ab10
10011 Ab11
10012 Ab12
10013 Ab13
10014 Ab14
10015 Ab15
10016 Ab16
10017 Ab17
10018 Ab18
10019 Ab19
10020 Ab20
10021 Ab21
SELECT line_number,
       col001,
       col002,
       col003
  FROM TABLE (apex_data_parser.parse (p_content           => (SELECT file_content
                                                                FROM loader_files
                                                               WHERE id = 1),
                                      p_file_name         => 'data.xlsx',
                                      p_xlsx_sheet_name   => 'sheet1.xml',
                                      p_skip_rows         => 0));

Output

LINE_NUMBER COL001 COL002 COL003
1 Line No ITEMS Store
2 1 Apple New Delhi
3 2 Shampoo Noida
4 3 Milk Gurugram
5 4 Veggies Pune

To skip the headers in the file, set p_skip_rows to 1.

SELECT line_number,
       col001,
       col002,
       col003
  FROM TABLE (apex_data_parser.parse (p_content           => (SELECT file_content
                                                                FROM loader_files
                                                               WHERE id = 1),
                                      p_file_name         => 'data.xlsx',
                                      p_xlsx_sheet_name   => 'sheet1.xml',
                                      p_skip_rows         => 1));
LINE_NUMBER COL001 COL002 COL003
1 1 Apple New Delhi
2 2 Shampoo Noida
3 3 Milk Gurugram
4 4 Veggies Pune

Data from the Second Sheet.

SELECT line_number,
       col001,
       col002,
       col003
  FROM TABLE (apex_data_parser.parse (p_content           => (SELECT file_content
                                                                FROM loader_files
                                                               WHERE id = 1),
                                      p_file_name         => 'data.xlsx',
                                      p_xlsx_sheet_name   => 'sheet2.xml',
                                      p_skip_rows         => 1));

Output

LINE_NUMBER COL001 COL002 COL003
1 Item Qty
2 10001 Ab1
3 10002 Ab2
. . . .
. . . .
20 10019 Ab19
21 10020 Ab20
22 10021 Ab21

How to get the Column List from Worksheet?

SELECT *
  FROM TABLE (
          apex_data_parser.get_columns (apex_data_parser.discover (
                                           p_content     => (SELECT file_content
                                                               FROM loader_files
                                                              WHERE id = 1),
                                           p_file_name   => 'large.xlsx')));

Data Grid

How to get the list of worksheets?

SELECT *
  FROM TABLE (apex_data_parser.get_xlsx_worksheets (
                 p_content   => (SELECT file_content
                                   FROM loader_files
                                  WHERE id = 1)));

SHEET_SEQUENCE SHEET_DISPLAY_NAME SHEET_FILE_NAME SHEET_PATH
1 A sheet1.xml worksheets/sheet1.xml
2 B sheet2.xml worksheets/sheet2.xml

How to parse CSV files?

Files data Data File

SELECT *
  FROM TABLE (apex_data_parser.parse (p_content     => (SELECT file_content
                                                          FROM loader_files
                                                         WHERE id = 2),
                                      p_file_name   => 'data.csv'));
LINE_NUMBER COL001 COL002 COL003
1 CO SO ASHISH
2 1 2 Sahay

You might like


comments powered by Disqus