Tapping into the "folk knowledge" needed to advance machine learning applications.

Created on 2024-04-18T20:55:30-05:00

Classification: consumes features, produces predictions about the object.

Representation: a way of encoding features or answers that the computer understands.

Evaluation: a function that scores the computer's output; "objective function."

Optimization: a process of automatically tweaking numbers to get a better scoring evaluation.

Generality: how well the machine makes decisions on data it was not specifically trained on.

Validation Set: witholding some data so the computer does not see it while learning. Used to determine how well the machine acts on "real world" data it has not seen before.

Decision Tree: an acyclic graph which is stepped one node at a time. Each step evaluates a feature and decides which node is next. At the end of the tree a prediction is issued.

Wolper's No Free Lunch: no learner can outperform random guessing, when the problem space is sufficiently large enough.

Overfitting: when a learned system is highly accurate about data it was trained on, but poorly performing on data in the wild.

Bayes and heuristics: heuristics may be more accurate despite using wrong assumptions, because an accurate analysis may require more data than is available.

Cross validation: a way of using the whole data, by witholding a different subset for validation while training multiple learners.

Curse of dimensionality: when algorithms break down as the number of input features grows.

Feature engineering: determining which features and variables are important for making accurate predictions.

Ensemble learning: training multiple models, running predictions through the multiple models, and using consensus to determine what prediction to keep.