In my previous blogs , we have seen how easy is to parse the excel and CSV files using APEX_DATA_PARSER. All options are declarative and easy to use. I have been using this utility for a long time but the use involved with small files. Recently I needed to process and parse a file of nearly 140 MB size, which consists of around 40 columns and half a million rows.
SELECT *
FROM TABLE (apex_data_parser.parse
(p_content => (SELECT
file_content
FROM
loader_files
WHERE id = 2),
p_file_name => 'data.csv'));
I was using the above query to parse the CSV files data. As I mentioned file has 40 columns and half a million rows which makes it a 140 MB in size. Fetching data from file and inserting into the target table was taking huge time, to be precise it was taking 27 hours to complete. Horrible right! In my investigation I found, if the files size is more than 50 MB then it starts taking longer time to parse. Then I started searching for the best solution in this situation. I found below.
- The APEX_DATA_PARSER package has indeed a 50MB “threshold”.
- For BLOBs smaller than 50MB, APEX creates a new, cached(!), temporary LOB, copies the data over and does all parsing on that temporary LOB. So all LOB operations happen in memory.
- If the BLOB is larger than 50MB, no temporary LOB is created and the LOB operations of APEX_DATA_PARSER are performed on the LOB Locator which has been passed in. In your case, it’s the LOB locator for your column > A41_TEMP_FILES.BLOB_CONTENT. Thus all the LOB operations happen in disk, which makes it slower.
- APEX is built to run on shared instances with many workspaces and parallel users as well – so this threshold is to protect the (shared) SGA and database memory.
- However, developers working with the APEX_DATA_PARSER package can always create a cached temporary LOB themselves and pass that to APEX_DATA_PARSER.PARSE. For this use the DBMS_LOB.CREATETEMPORARY procedure with P_CACHE => TRUE to create the temporary LOB, then DBMS_LOB.COPY to copy the data from the uploaded BLOB to the temporary BLOB and then pass the temporary BLOB to APEX_DATA_PARSER.PARSE.
E.g.
CREATE OR REPLACE FUNCTION create_cached_blob (p_file IN BLOB)
RETURN BLOB
IS
l_blob BLOB;
BEGIN
DBMS_LOB.createtemporary (lob_loc => l_blob, cache => TRUE);
DBMS_LOB.COPY (dest_lob => l_blob,
src_lob => p_file,
amount => DBMS_LOB.lobmaxsize);
RETURN l_blob;
END create_cached_blob;
Then use this function in APEX_DATA_PARSER as follows:
apex_data_parser.parse(
p_content => create_cached_blob(
f.BLOB_CONTENT)
);
After using this I am able to load the file in 12 minutes. Great improvement isn’t it?
More Clarification I found on Connor’s blog post.
Conclusion:
Whenever you have file size more than 50 MBs, you should use the DBMS_LOB.CREATETEMPORARY with cache enabled to parse quickly. By using this we can shift LOB locator to memory instead of Disk.