The Scikit-Learn Workshop
Demystifying Machine Learning through Metaphors & Code.
Analogy: Think of Scikit-Learn (sklearn) not as code, but as a Master Carpenter’s Toolbox. It has a specific tool for every job—sawing (splitting data), measuring (accuracy), and assembling (training models).
The Landscape of Machine Learning
The “Universal” Scikit-Learn Syntax
Almost every algorithm in sklearn follows these exact same 4 steps.
# 1. Import the class you need
from sklearn.linear_model import LinearRegression
# 2. Instantiate the model (The “Empty Box”)
model = LinearRegression()
# 3. Fit the model (The “Learning” Phase)
# X = features (study hours), y = target (test score)
model.fit(X_train, y_train)
# 4. Predict (The “Testing” Phase)
prediction = model.predict(X_new_data)
Supervised Learning
The computer learns with a “Teacher” (labeled data).
Regression
“Predicting a specific number”
📈
The Real Estate Agent
Features: Size, Location. Target: Price.
Classification
“Sorting into buckets”
📧
The Mail Sorter
Features: Keywords. Target: Spam or Not Spam.
Unsupervised Learning
The computer learns alone by finding patterns.
Clustering
“Grouping similar things”
from sklearn.cluster import KMeans
kmeans = KMeans(n_clusters=3)
kmeans.fit(data)
Dimensionality Reduction
“Simplifying the complexity”
from sklearn.decomposition import PCA
pca = PCA(n_components=2)
clean_data = pca.fit_transform(messy_data)