DM Answers

Data Mining with STATISTICA

Archive: Data Mining

Extending STATISTICA Text Mining Capabilities using R Integration

12th of April, 2014

Last summer I kicked off my blog with a series of posts about text mining Pub Med journal articles using STATISTICA Text Miner.  I had a reader ask a question in the Fall of 2013 concerning adding phrases to the text analysis.  STATISTICA allows phrases to be specified.  The problem is that there is no […]

Revisiting Predictive Quality Control

2nd of March, 2014

I introduced a method for predictive quality control in my October 6, 2013 blog post.  I am going to revisit this topic by refining this method by using the Predictor Screening instead of Feature Selection.  The R-square value is included in the Predictor Screening output which is useful in evaluating the quality of the results. […]

Improved performance over Regular Expressions using Split String

15th of February, 2014

When employing the CRISP-DM data mining model, a good portion of the time is spent in the data preparation phase.  Last week I talked about how to pick off values from a string using regular expressions.  This is a project that has significance to me right now because this applies to a work related project. […]

Regular Expressions in STATISTICA

9th of February, 2014

I have been working the last two weeks in my free time on implementing Regular Expressions(regex) in STATISTICA.  If you are not familiar with regex, they can be very useful to search through text strings and pull out relevant information you want for data mining.  For instance I could have a string that looks like […]

My Linear Model for the NBA data posted in the last blog entry…

17th of December, 2013

I’ll get to the model for the NBA data in a minute.  First of all please allow me a small rant.  Sometimes the simple things are the best.  I have found this to be true with data mining models.  Some might ask, why build a linear model, when it is so much more sexy to […]

Building a Linear Model in STATISTICA for NBA Team Basketball Data

10th of December, 2013

I have lost my voice so I am not going to produce a video this time around.  I would like to throw out a challenge problem.  I will discuss the results in a blog post next weekend. I collected some team statistics for the current NBA basketball season from the ESPN website that was current on […]

Predictive Quality Control

6th of October, 2013

Today I am shifting gears to a manufacturing data set I was able to acquire.  The information is obscurred so you won’t know where the data came from, but it should still be interesting.  I am attaching the data here for those that would like to duplicate the analysis or dig deeper.  I would be […]

Demand Forecasting Data Set

28th of September, 2013

I had a request for the data set I used for the demand forecasting blog post.  Here it is in .xlsx format… Store_Data_modified_2 If you do use the data set, I would appreciate any feedback you can provide on my demand forecasting blog post. Enjoy!

Predicting Home Prices

22nd of September, 2013

I spent the past few days collecting some housing data near where I live.  I have taken out any identifying information and changed a few of the values, but for the most part this is a real life data set. House Closing Prices Sept 2013_subset3 I used a STATISTICA Data Miner workspace for this data […]

STATISTICA Data Health Node

7th of September, 2013

I mentioned in my last post on demand forecasting that I had used a new feature in STATISTICA called the Data Health Node.  I would like to go into more detail today about the usefulness of this new feature.  I think you will find it quite impressive as I did.  If you have any questions, […]