I spent the past few days collecting some housing data near where I live. I have taken out any identifying information and changed a few of the values, but for the most part this is a real life data set.
I used a STATISTICA Data Miner workspace for this data mining project.
Here is the summary of the output from the Data Health Node:
The only default that I changed for the Data Miner Workspace was for the GLM to use 2nd order interactions. All other options were left as defaults. I checked the predictions of the models against the known closing prices for the homes sold in the last year. The output of the goodness of fit statistical calculations shows that the GLM is doing the best at predicting the closing prices using the input data.
I would be interested if someone can do better with the data provided. Let me know how it goes. I would like to see the prediction of my current home value based on the analysis. 🙂
I hope you find the attached data interesting and have some fun data mining with it. Have a great weekend and I hope to hear some feedback on your experience using STATISTICA for this project.