APEX Data Parser – How to Parse XLSX and CSV

APEX Data Parser – How to Parse XLSX and CSV?

 PARSE Function

This is the main parser function. It allows to parse XML, XLSX, CSV or JSON files and returns a generic table of the following structure:

Structure of parser return

LINE_NUMBER, COL001, COL002, COL003, COL004 … COL300

We have line_number and 300 other columns, having said that we can read 300 columns from the above four file formats.

Parser Function parameters.

FUNCTION parse (p_content                       IN BLOB,
                p_file_name                     IN VARCHAR2 DEFAULT NULL,
                p_file_type                     IN t_file_type DEFAULT NULL,
                p_file_profile                  IN CLOB DEFAULT NULL,
                p_detect_data_types             IN VARCHAR2 DEFAULT 'Y',
                p_decimal_char                  IN VARCHAR2 DEFAULT NULL,
                p_xlsx_sheet_name               IN VARCHAR2 DEFAULT NULL,
                p_row_selector                  IN VARCHAR2 DEFAULT NULL,
                p_csv_row_delimiter             IN VARCHAR2 DEFAULT lf,
                p_csv_col_delimiter             IN VARCHAR2 DEFAULT NULL,
                p_csv_enclosed                  IN VARCHAR2 DEFAULT '"',
                p_skip_rows                     IN PLS_INTEGER DEFAULT 0,
                p_add_headers_row               IN VARCHAR2 DEFAULT 'N',
                p_file_charset                  IN VARCHAR2 DEFAULT 'AL32UTF8',
                p_max_rows                      IN NUMBER DEFAULT NULL,
                p_return_rows                   IN NUMBER DEFAULT NULL,
                p_store_profile_to_collection   IN VARCHAR2 DEFAULT NULL)
   RETURN wwv_flow_t_parser_table   PIPELINED;

ParameterDescription
P_CONTENTThe file content to be parsed as a BLOB
P_FILE_NAMEThe name of the file; only used to derive the file type. Either P_FILE_NAME, P_FILE_TYPE or P_FILE_PROFILE
P_FILE_TYPEThe type of the file to be parsed. Use this to explicitly pass the file type in. Either P_FILE_NAME, P_FILE_TYPE or P_FILE_PROFILE must be passed in.
P_FILE_PROFILEFile profile to be used for parsing. The file profile might have been computed in a previous PARSE() invocation. If passed in again, the function will skip some profile detection logic and use the passed in profile – in order to improve performance.
P_DETECT_DATA_TYPESWhether to detect data types (NUMBER, DATE, TIMESTAMP) during parsing. If set to ‘Y’, the function will compute the file profile and also add data type information to it. If set to ‘N’, no data types will be detected and all columns will be VARCHAR2. Default is ‘Y’.
P_DECIMAL_CHARUse this decimal character when trying to detect NUMBER data types. If not specified,the procedure will auto-detect the decimal character.
P_XLSX_SHEET_NAMEFor XLSX workbooks. The name of the worksheet to parse. If omitted, the function uses the first worksheet found.
P_ROW_SELECTORFor JSON and XML files. Pointer to the array / list of rows within the JSON or XML file. If omitted, the function will:
For XML files: Use “//” (first tag under the root tag) as the row selector.
For JSON files: Look for a JSON array and use the first array found.
For XML files: Use “//” (first tag under the root tag) as the row selector.
For JSON files: Look for a JSON array and use the first array found.

Examples:

CREATE TABLE loader_files
(
   id             NUMBER,
   file_content   BLOB
);


How to parse XLSX?

Insert excel file, Let’s take a workbook with two worksheets.

Worksheet 1

Line NoITEMSStore
1AppleNew Delhi
2ShampooNoida
3MilkGurugram
4VeggiesPune

Worksheet 2

ItemQty
10001Ab1
10002Ab2
10003Ab3
10004Ab4
10005Ab5
10006Ab6
10007Ab7
10008Ab8
10009Ab9
10010Ab10
10011Ab11
10012Ab12
10013Ab13
10014Ab14
10015Ab15
10016Ab16
10017Ab17
10018Ab18
10019Ab19
10020Ab20
10021Ab21
SELECT line_number,
       col001,
       col002,
       col003
  FROM TABLE (apex_data_parser.parse (p_content           => (SELECT file_content
                                                                FROM loader_files
                                                               WHERE id = 1),
                                      p_file_name         => 'data.xlsx',
                                      p_xlsx_sheet_name   => 'sheet1.xml',
                                      p_skip_rows         => 0));


Output

LINE_NUMBERCOL001COL002COL003
1Line NoITEMSStore
21AppleNew Delhi
32ShampooNoida
43MilkGurugram
54VeggiesPune

To skip the headers in the file, set p_skip_rows to 1.

SELECT line_number,
       col001,
       col002,
       col003
  FROM TABLE (apex_data_parser.parse (p_content           => (SELECT file_content
                                                                FROM loader_files
                                                               WHERE id = 1),
                                      p_file_name         => 'data.xlsx',
                                      p_xlsx_sheet_name   => 'sheet1.xml',
                                      p_skip_rows         => 1));

LINE_NUMBERCOL001COL002COL003
11AppleNew Delhi
22ShampooNoida
33MilkGurugram
44VeggiesPune

Data from the Second Sheet.

SELECT line_number,
       col001,
       col002,
       col003
  FROM TABLE (apex_data_parser.parse (p_content           => (SELECT file_content
                                                                FROM loader_files
                                                               WHERE id = 1),
                                      p_file_name         => 'data.xlsx',
                                      p_xlsx_sheet_name   => 'sheet2.xml',
                                      p_skip_rows         => 1));


Output

LINE_NUMBERCOL001COL002COL003
1ItemQty
210001Ab1
310002Ab2
....
....
2010019Ab19
2110020Ab20
2210021Ab21

How to get the Column List from Worksheet?

SELECT *
  FROM TABLE (
          apex_data_parser.get_columns (apex_data_parser.discover (
                                           p_content     => (SELECT file_content
                                                               FROM loader_files
                                                              WHERE id = 1),
                                           p_file_name   => 'large.xlsx')));


How to get the list of worksheets?

SELECT *
  FROM TABLE (apex_data_parser.get_xlsx_worksheets (
                 p_content   => (SELECT file_content
                                   FROM loader_files
                                  WHERE id = 1)));


SHEET_SEQUENCESHEET_DISPLAY_NAMESHEET_FILE_NAMESHEET_PATH
1Asheet1.xmlworksheets/sheet1.xml
2Bsheet2.xmlworksheets/sheet2.xml

How to parse CSV files?

Files data 

SELECT *
  FROM TABLE (apex_data_parser.parse (p_content     => (SELECT file_content
                                                          FROM loader_files
                                                         WHERE id = 2),
                                      p_file_name   => 'data.csv'));

LINE_NUMBERCOL001COL002COL003
1COSOASHISH
212Sahay

You might like