DM Answers

Data Mining with STATISTICA


Extending STATISTICA Text Mining Capabilities using R Integration

12th of April, 2014

Last summer I kicked off my blog with a series of posts about text mining Pub Med journal articles using STATISTICA Text Miner.  I had a reader ask a question in the Fall of 2013 concerning adding phrases to the text analysis.  STATISTICA allows phrases to be specified.  The problem is that there is no […]

Revisiting Predictive Quality Control

2nd of March, 2014

I introduced a method for predictive quality control in my October 6, 2013 blog post.  I am going to revisit this topic by refining this method by using the Predictor Screening instead of Feature Selection.  The R-square value is included in the Predictor Screening output which is useful in evaluating the quality of the results. […]

My Linear Model for the NBA data posted in the last blog entry…

17th of December, 2013

I’ll get to the model for the NBA data in a minute.  First of all please allow me a small rant.  Sometimes the simple things are the best.  I have found this to be true with data mining models.  Some might ask, why build a linear model, when it is so much more sexy to […]

R Integration with STATISTICA: Heat Maps

20th of November, 2013

Today I am going to graph some data from the current NBA season using a heat map.  Heat maps are not a standard graph in STATISTICA so we need to use the R Integration functionality to extend the capability to create a heat map.  This first requires R to be installed on your computer.  You […]

Multivariate SPC

5th of November, 2013

I would like to continue talking about Statistical Process Control, but this time I would like to focus on the case where there are multiple correlated metrics being monitored.  Should each of the metrics be monitored separately or would it make more sense to consider them together in a multivariate analysis?  I hope to answer […]

Predictive Quality Control

6th of October, 2013

Today I am shifting gears to a manufacturing data set I was able to acquire.  The information is obscurred so you won’t know where the data came from, but it should still be interesting.  I am attaching the data here for those that would like to duplicate the analysis or dig deeper.  I would be […]

Demand Forecasting Data Set

28th of September, 2013

I had a request for the data set I used for the demand forecasting blog post.  Here it is in .xlsx format… Store_Data_modified_2 If you do use the data set, I would appreciate any feedback you can provide on my demand forecasting blog post. Enjoy!

Predicting Home Prices

22nd of September, 2013

I spent the past few days collecting some housing data near where I live.  I have taken out any identifying information and changed a few of the values, but for the most part this is a real life data set. House Closing Prices Sept 2013_subset3 I used a STATISTICA Data Miner workspace for this data […]

STATISTICA Data Health Node

7th of September, 2013

I mentioned in my last post on demand forecasting that I had used a new feature in STATISTICA called the Data Health Node.  I would like to go into more detail today about the usefulness of this new feature.  I think you will find it quite impressive as I did.  If you have any questions, […]

Importing Open Source Data in XML format using Yahoo Query Language

6th of July, 2013

I am excited to share an open source of interesting data.  The data set discussed today can be accessed using Yahoo Query Language (YQL) and the XML import techniques I have been talking about recently.  The first step is to create a url that will capture the data.  This can be done on the YQL […]