I acquired some public mortgage data from www.ffiec.gov for today’s big data challenge between STATISTICA and Rapid Miner. If you would like to duplicate this challenge on your own computer, flat files can be obtained from the following link:
You will see three headings on this web page: TS & MSA Info, LAR, and File Formats & Documentation (PDFs). The data can be obtained by clicking the links under the LAR heading. Decriptions of the data sets can be found under the File Formats & Documentation (PDFs) heading. For today’s challenge I clicked on ALL under 2012. The resulting downloaded zip file was 427 MB in size. When the zip file was unpacked the the resulting CSV file was 3.1 GB in size.
This is a simple challenge: Can the respective software load the mortgage data into memory?
Three different trials were conducted with an HP Envy desktop computer with i7 processor and 12 GB of RAM and the same environmental conditions.
The first trial was conducted with the trial version of Rapid Miner version 6. After about 15 minutes the following error message was recieved:
Therefore the total time to load and the resulting file size are unknown with the trial version of Rapid Miner 6.
The second trial was conducted with Rapid Miner 5.3 which works virtually the same as version 6 when importing CSV files. Approximately twenty minutes passed after the read csv operator was launched in Rapid Miner and the following message was displayed:
The third trial was conducted with STATISTICA version 12 which took six minutes to read in 18.7 million records. The resulting size of the STATISTICA spreadsheet (.sta) was 7.8 GB.
Here is a video capture of the three trials:
Please post your experiences loading the mortgage data set in the comments section below. I’ll post again in two weeks with another Big Data challenge between STATISTICA and Rapid Miner.