John Elder Workshop: Core Machine Learning and Data Science Techniques


Wednesday, October 3rd, 9:00am - 5:00pm

Tickets available in cart. €595 + VAT

Intended Audience:

Interested in the true nuts and bolts.

Knowledge Level:

Familiar with the basics of predictive modeling. Predictive analytics has proven capable of enormous returns across industries – but, with so many core methods for predictive modeling, there are some tough questions that need answering.

What you will learn:

1: The tremendous value of learning from data.
2: How to create valuable predictive models for your business. 3: Best Practices by seeing their flip side: Worst Practices.

This one-day session surveys standard and advanced methods for predictive modeling.

Dr. Elder will describe the key inner workings of leading algorithms, demonstrate their performance with business case studies, compare their merits, and show you how to pick the method and tool best suited to each predictive analytics project. Methods covered include classical regression, decision trees, neural networks, ensemble methods, uplift modeling and more.

The key to successfully leveraging these methods is to avoid “worst practices”. It's all too easy to go too far in one's analysis and “torture the data until it confesses” or otherwise doom predictive models to fail where they really matter: on new situations.

Dr. Elder will share his (often humorous) stories from real-world applications, highlighting the Top 10 common, but deadly, mistakes. Come learn how to avoid these pitfalls by laughing (or gasping) at stories of barely averted disaster.

If you'd like to become a practitioner of predictive analytics – or if you already are, and would like to hone your knowledge across methods and best practices, this workshop is for you.

Course Outline:

I. Pattern Discovery: An Executive Summary

  • Data Mining or Data Dredging?
  • Computer vs. Human: Mining and Visualization
  • Example Projects from Science and Business
  • Ingredients for Success
  • Modern Modeling Algorithms
  • Bundling Models to Increase Accuracy
  • Example: Identify Bat Species

II. Getting Going

  • Technical disciplines contribute
  • Stages of an analytic project
  • Setting up the data file
  • Example project: Fraud Detection
  • Lift Charts to display model quality
  • Decision Trees to fit data

III. Clustering and Nearness

  • Commercial Products’ Algorithms
  • Unsupervised Learning
  • Clustering
  • Principal Components
  • Nearest Neighbor
  • Mahalanobis distance

  • IV. Neural Networks

  • Logistic (sigmoidal) transformation
  • Example

  • V. Re-Sampling - essential for validation

    • The danger of over-fit and over-search
    • Cross-Validation
    • Bootstrap
    • Target Shuffling
    • Example: find sweet spot for strikes in baseball

    VI. Visualization

  • Projections and projection pursuit
  • Visualizing numbers, text, and links
  • Density graphs: Drug discovery application

  • VII. Ensembles

  • Bagging (with CART example)
  • Boosting
  • Bundling different models (with Credit Scoring example)

  • VIII. Top 10 Data Mining Mistakes

  • Lack data
  • Focus on Training
  • Rely on 1 technique
  • Ask the wrong question
  • Listen (only) to the data
  • Future leakage
  • Discount pesky cases
  • Extrapolate
  • Answer every inquiry
  • Sample without care
  • Believe the best model

  • Location:

    The Tower, Trinity Technology & Enterprise Campus, Grand Canal Quay, Dublin 2