DM Answers

Data Mining with STATISTICA

Toby Barrus

What got me here?

20th of November, 2016

I attended a screening of Most Likely to Succeed this past Tuesday at a local high school.  I left the film indifferent.  I wasn’t inspired by the film, but I didn’t hate it either.  The main idea for the film is how education is outdated and shows a school called High Tech High which is trying […]

Sleep Survey Drawing Winner

12th of September, 2016

Congratulations to Thomas Rust who won the drawing for the sleep survey I have been conducting over the past two weeks. Feel free to continue to post responses to the survey.  If I receive enough responses, I might consider doing another drawing. Thanks to all those that participated.

Creating a corpus from transactional data

10th of May, 2015

First off, I would like to say Happy Mother’s Day to all the mothers out there.  I would especially like to honor my wife, my mom, and my mom’s mom.  These are great women that I owe a lot to.  I would not be the man I am today without their positive influence in my […]

Year Hiatus

26th of April, 2015

About a year ago I took a new job with Zions Bancorporation as a Data Scientist.  The group I work with does not currently use STATISTICA.  However, there is an openness to give STATISTICA a chance.  I do not currently have a license for my home computer so I have not been able to blog […]

Extending STATISTICA Text Mining Capabilities using R Integration

12th of April, 2014

Last summer I kicked off my blog with a series of posts about text mining Pub Med journal articles using STATISTICA Text Miner.  I had a reader ask a question in the Fall of 2013 concerning adding phrases to the text analysis.  STATISTICA allows phrases to be specified.  The problem is that there is no […]

Predictive Quality Control Summary

29th of March, 2014

Okay so I lied!  There will be another predictive quality control blog post.  I decided there needed to be one more post that summarized everything I have presented over the last few months concerning predictive quality control.  I created a Power Point Presentation and recorded a you tube video.  As always, I would be interested […]

Boosted Trees for Predictive Quality Control

18th of March, 2014

I would like to start off by congratulating StatSoft on the recent results for the Gartner Magic Quadrant for Advanced Analytics: http://www.statsoft.com/Company/About-Us/Reviews/2014-Published-Reviews#gartner-adv-MQ2014 In the last blog I revisited the quality control method I had first talked about back on October 6, 2013.  I would like to wrap things up by talking briefly about the use […]

Revisiting Predictive Quality Control

2nd of March, 2014

I introduced a method for predictive quality control in my October 6, 2013 blog post.  I am going to revisit this topic by refining this method by using the Predictor Screening instead of Feature Selection.  The R-square value is included in the Predictor Screening output which is useful in evaluating the quality of the results. […]

Improved performance over Regular Expressions using Split String

15th of February, 2014

When employing the CRISP-DM data mining model, a good portion of the time is spent in the data preparation phase.  Last week I talked about how to pick off values from a string using regular expressions.  This is a project that has significance to me right now because this applies to a work related project. […]

Regular Expressions in STATISTICA

9th of February, 2014

I have been working the last two weeks in my free time on implementing Regular Expressions(regex) in STATISTICA.  If you are not familiar with regex, they can be very useful to search through text strings and pull out relevant information you want for data mining.  For instance I could have a string that looks like […]