APEX Data Parser - How to Parse XLSX and CSV

APEX Data Parser – How to Parse XLSX and CSV?

PARSE Function

This is the main parser function. It allows to parse XML, XLSX, CSV or JSON files and returns a generic table of the following structure:

Structure of parser return

LINE_NUMBER, COL001, COL002, COL003, COL004 … COL300

We have line_number and 300 other columns, having said that we can read 300 columns from the above four file formats.

Parser Function parameters.

Copy Code

FUNCTION parse (p_content                       IN BLOB,
                p_file_name                     IN VARCHAR2 DEFAULT NULL,
                p_file_type                     IN t_file_type DEFAULT NULL,
                p_file_profile                  IN CLOB DEFAULT NULL,
                p_detect_data_types             IN VARCHAR2 DEFAULT 'Y',
                p_decimal_char                  IN VARCHAR2 DEFAULT NULL,
                p_xlsx_sheet_name               IN VARCHAR2 DEFAULT NULL,
                p_row_selector                  IN VARCHAR2 DEFAULT NULL,
                p_csv_row_delimiter             IN VARCHAR2 DEFAULT lf,
                p_csv_col_delimiter             IN VARCHAR2 DEFAULT NULL,
                p_csv_enclosed                  IN VARCHAR2 DEFAULT '"',
                p_skip_rows                     IN PLS_INTEGER DEFAULT 0,
                p_add_headers_row               IN VARCHAR2 DEFAULT 'N',
                p_file_charset                  IN VARCHAR2 DEFAULT 'AL32UTF8',
                p_max_rows                      IN NUMBER DEFAULT NULL,
                p_return_rows                   IN NUMBER DEFAULT NULL,
                p_store_profile_to_collection   IN VARCHAR2 DEFAULT NULL)
   RETURN wwv_flow_t_parser_table   PIPELINED;

Parameter	Description
P_CONTENT	The file content to be parsed as a BLOB
P_FILE_NAME	The name of the file; only used to derive the file type. Either P_FILE_NAME, P_FILE_TYPE or P_FILE_PROFILE
P_FILE_TYPE	The type of the file to be parsed. Use this to explicitly pass the file type in. Either P_FILE_NAME, P_FILE_TYPE or P_FILE_PROFILE must be passed in.
P_FILE_PROFILE	File profile to be used for parsing. The file profile might have been computed in a previous PARSE() invocation. If passed in again, the function will skip some profile detection logic and use the passed in profile – in order to improve performance.
P_DETECT_DATA_TYPES	Whether to detect data types (NUMBER, DATE, TIMESTAMP) during parsing. If set to ‘Y’, the function will compute the file profile and also add data type information to it. If set to ‘N’, no data types will be detected and all columns will be VARCHAR2. Default is ‘Y’.
P_DECIMAL_CHAR	Use this decimal character when trying to detect NUMBER data types. If not specified,the procedure will auto-detect the decimal character.
P_XLSX_SHEET_NAME	For XLSX workbooks. The name of the worksheet to parse. If omitted, the function uses the first worksheet found.
P_ROW_SELECTOR	For JSON and XML files. Pointer to the array / list of rows within the JSON or XML file. If omitted, the function will:
For XML files: Use “//” (first tag under the root tag) as the row selector.
For JSON files: Look for a JSON array and use the first array found.
For XML files: Use “//” (first tag under the root tag) as the row selector.
For JSON files: Look for a JSON array and use the first array found.

Examples:

Copy Code

CREATE TABLE loader_files
(
   id             NUMBER,
   file_content   BLOB
);

How to parse XLSX?

Insert excel file, Let’s take a workbook with two worksheets.

Worksheet 1

Line No	ITEMS	Store
1	Apple	New Delhi
2	Shampoo	Noida
3	Milk	Gurugram
4	Veggies	Pune

Worksheet 2

Item	Qty
10001	Ab1
10002	Ab2
10003	Ab3
10004	Ab4
10005	Ab5
10006	Ab6
10007	Ab7
10008	Ab8
10009	Ab9
10010	Ab10
10011	Ab11
10012	Ab12
10013	Ab13
10014	Ab14
10015	Ab15
10016	Ab16
10017	Ab17
10018	Ab18
10019	Ab19
10020	Ab20
10021	Ab21

Copy Code

SELECT line_number,
       col001,
       col002,
       col003
  FROM TABLE (apex_data_parser.parse (p_content           => (SELECT file_content
                                                                FROM loader_files
                                                               WHERE id = 1),
                                      p_file_name         => 'data.xlsx',
                                      p_xlsx_sheet_name   => 'sheet1.xml',
                                      p_skip_rows         => 0));

Output

LINE_NUMBER	COL001	COL002	COL003
1	Line No	ITEMS	Store
2	1	Apple	New Delhi
3	2	Shampoo	Noida
4	3	Milk	Gurugram
5	4	Veggies	Pune

To skip the headers in the file, set p_skip_rows to 1.

Copy Code

SELECT line_number,
       col001,
       col002,
       col003
  FROM TABLE (apex_data_parser.parse (p_content           => (SELECT file_content
                                                                FROM loader_files
                                                               WHERE id = 1),
                                      p_file_name         => 'data.xlsx',
                                      p_xlsx_sheet_name   => 'sheet1.xml',
                                      p_skip_rows         => 1));

LINE_NUMBER	COL001	COL002	COL003
1	1	Apple	New Delhi
2	2	Shampoo	Noida
3	3	Milk	Gurugram
4	4	Veggies	Pune

Data from the Second Sheet.

Copy Code

SELECT line_number,
       col001,
       col002,
       col003
  FROM TABLE (apex_data_parser.parse (p_content           => (SELECT file_content
                                                                FROM loader_files
                                                               WHERE id = 1),
                                      p_file_name         => 'data.xlsx',
                                      p_xlsx_sheet_name   => 'sheet2.xml',
                                      p_skip_rows         => 1));

Output

LINE_NUMBER	COL001	COL002	COL003
1	Item	Qty
2	10001	Ab1
3	10002	Ab2
.	.	.	.
.	.	.	.
20	10019	Ab19
21	10020	Ab20
22	10021	Ab21

How to get the Column List from Worksheet?

Copy Code

SELECT *
  FROM TABLE (
          apex_data_parser.get_columns (apex_data_parser.discover (
                                           p_content     => (SELECT file_content
                                                               FROM loader_files
                                                              WHERE id = 1),
                                           p_file_name   => 'large.xlsx')));

How to get the list of worksheets?

Copy Code

SELECT *
  FROM TABLE (apex_data_parser.get_xlsx_worksheets (
                 p_content   => (SELECT file_content
                                   FROM loader_files
                                  WHERE id = 1)));

SHEET_SEQUENCE	SHEET_DISPLAY_NAME	SHEET_FILE_NAME	SHEET_PATH
1	A	sheet1.xml	worksheets/sheet1.xml
2	B	sheet2.xml	worksheets/sheet2.xml

How to parse CSV files?

Files data

Copy Code

SELECT *
  FROM TABLE (apex_data_parser.parse (p_content     => (SELECT file_content
                                                          FROM loader_files
                                                         WHERE id = 2),
                                      p_file_name   => 'data.csv'));

LINE_NUMBER	COL001	COL002	COL003
1	CO	SO	ASHISH
2	1	2	Sahay

APEX Data Parser – How to Parse XLSX and CSV?

PARSE Function

How to parse XLSX?

Output

To skip the headers in the file, set p_skip_rows to 1.

Output

How to get the Column List from Worksheet?

How to get the list of worksheets?

How to parse CSV files?

You might like