{"id":9209,"date":"2026-04-21T03:42:07","date_gmt":"2026-04-21T03:42:07","guid":{"rendered":"https:\/\/www.askpython.com\/?p=9209"},"modified":"2026-04-21T03:42:08","modified_gmt":"2026-04-21T03:42:08","slug":"python-predict-function","status":"publish","type":"post","link":"https:\/\/www.askpython.com\/python\/examples\/python-predict-function","title":{"rendered":"Python predict() function &#8211; All you need to know!"},"content":{"rendered":"\n<p>Every ML model, regardless of how it was trained or what framework built it, eventually does the same thing: it takes input and produces output. Python&#8217;s model.predict() operation is. It looks simple. It is simple-until it isn&#8217;t.<\/p>\n\n\n\n<p>The same method name appears across scikit-learn, Keras, TensorFlow, PyTorch, XGBoost, LightGBM, and most other ML frameworks. But &#8220;predict&#8221; means slightly different things in each. The return shapes differ. The input expectations differ. The performance characteristics differ. And the ways it can fail differ too. This article covers what <code>predict()<\/code> actually does, how it behaves across the major frameworks, and the practical issues you&#8217;ll hit when running it in production.<\/p>\n\n\n\n<h2 class=\"wp-block-heading\">TLDR<\/h2>\n\n\n\n<ul class=\"wp-block-list\"><li>predict() runs inference mode without updating weights<\/li><li>sklearn returns class labels; Keras with activation returns probabilities; PyTorch requires model.eval() and torch.no_grad()<\/li><li>Almost all frameworks require 2D input shape (n_samples, n_features) even for single samples<\/li><li>predict_proba() gives class probabilities in sklearn, XGBoost, and LightGBM<\/li><li>PyTorch has no built-in predict() method &#8211; call the model directly<\/li><\/ul>\n\n\n\n<h2 class=\"wp-block-heading\">What model.predict() Actually Does<\/h2>\n\n\n\n<p><code>predict()<\/code> runs the model in <strong>inference mode<\/strong>. It passes your input data through the forward pass and returns the model&#8217;s predictions. Unlike <code>fit()<\/code> or <code>train()<\/code>, it does not update any weights. It is a pure computation.<\/p>\n\n\n\n<p>At a high level:<\/p>\n\n\n\n<p>1. The input is preprocessed and formatted to match what the model expects 2. The model runs its forward pass 3. Raw outputs (logits, probabilities, regression values) are returned<\/p>\n\n\n\n<p>The critical thing to understand: <code>predict()<\/code> does <strong>not<\/strong> apply your final activation function in some frameworks, and it does in others. Confusion starts here.<\/p>\n\n\n\n<h2 class=\"wp-block-heading\">scikit-learn: The Baseline<\/h2>\n\n\n\n<p>scikit-learn has the most consistent and predictable <code>predict()<\/code> behavior. It is the reference implementation that most other frameworks loosely follow.<\/p>\n\n\n\n<h3 class=\"wp-block-heading\">Classification<\/h3>\n\n\n<div class=\"wp-block-syntaxhighlighter-code \"><pre class=\"brush: python; title: ; notranslate\" title=\"\">\n\nfrom sklearn.ensemble import RandomForestClassifier\nfrom sklearn.datasets import make_classification\nfrom sklearn.model_selection import train_test_split\n\nX, y = make_classification(n_samples=1000, n_features=20, random_state=42)\nX_train, X_test, y_train, y_test = train_test_split(X, y, test_size=0.2)\n\nmodel = RandomForestClassifier(n_estimators=100, random_state=42)\nmodel.fit(X_train, y_train)\n\npredictions = model.predict(X_test)\nprint(predictions.shape)  # (200,)\nprint(predictions&#x5B;:10])  # array(&#x5B;0, 1, 1, 0, 1, 0, 1, 1, 0, 0])\n\n<\/pre><\/div>\n\n\n<p><code>predict()<\/code> returns a NumPy array of <strong>class labels<\/strong>. For binary classification, these are <code>0<\/code> or <code>1<\/code>. For multiclass, these are integer indices.<\/p>\n\n\n\n<h3 class=\"wp-block-heading\">Getting Probabilities in sklearn<\/h3>\n\n\n\n<p>If you want probabilities, you need <code>predict_proba()<\/code>:<\/p>\n\n\n<div class=\"wp-block-syntaxhighlighter-code \"><pre class=\"brush: python; title: ; notranslate\" title=\"\">\n\nprobabilities = model.predict_proba(X_test)\nprint(probabilities.shape)  # (200, 2) \u2014 two classes\nprint(probabilities&#x5B;:3])\n# &#x5B;&#x5B;0.85, 0.15],\n#  &#x5B;0.12, 0.88],\n#  &#x5B;0.73, 0.27]]\n\n<\/pre><\/div>\n\n\n<p>Note that <code>predict_proba()<\/code> returns the probability for <strong>each class<\/strong>. The order matches <code>model.classes_<\/code>. If you need just the positive class probability in binary classification, use <code>predict_proba(X_test)[:, 1]<\/code>.<\/p>\n\n\n\n<h3 class=\"wp-block-heading\">Regression<\/h3>\n\n\n<div class=\"wp-block-syntaxhighlighter-code \"><pre class=\"brush: python; title: ; notranslate\" title=\"\">\n\nfrom sklearn.ensemble import GradientBoostingRegressor\n\nmodel = GradientBoostingRegressor(n_estimators=100, random_state=42)\nmodel.fit(X_train, y_train)\n\npredictions = model.predict(X_test)\nprint(predictions.shape)  # (200,)\nprint(predictions&#x5B;:5])  # array(&#x5B;2.34, -0.87, 1.56, 3.21, 0.12])\n\n<\/pre><\/div>\n\n\n<p>For regressors, <code>predict()<\/code> returns floating-point values directly. No separate method for raw scores vs. final output-the returned value is always the final prediction.<\/p>\n\n\n\n<h3 class=\"wp-block-heading\">sklearn Summary<\/h3>\n\n\n<figure class=\"wp-block-table\"><table><thead><tr><th>Model Type<\/th><th>Return Shape<\/th><th>Return Type<\/th><\/tr><\/thead><tbody><tr><td>Classifier<\/td><td><code>(n_samples,)<\/code><\/td><td>Integer labels<\/td><\/tr><tr><td>Regressor<\/td><td><code>(n_samples,)<\/code><\/td><td>Float values<\/td><\/tr><tr><td><code>predict_proba()<\/code><\/td><td><code>(n_samples, n_classes)<\/code><\/td><td>Float probabilities<\/td><\/tr><\/tbody><\/table><\/figure>\n\n\n<h2 class=\"wp-block-heading\">Keras \/ TensorFlow: Classification Requires Sigmoid or Softmax<\/h2>\n\n\n\n<p>Keras is where most developers hit their first <code>predict()<\/code> surprise. The <code>predict()<\/code> method returns <strong>logits<\/strong> for classification models by default, not probabilities.<\/p>\n\n\n\n<h3 class=\"wp-block-heading\">Binary Classification<\/h3>\n\n\n<div class=\"wp-block-syntaxhighlighter-code \"><pre class=\"brush: python; title: ; notranslate\" title=\"\">\n\nimport numpy as np\nfrom tensorflow.keras.models import Sequential\nfrom tensorflow.keras.layers import Dense\n\n# Binary classification model\nmodel = Sequential(&#x5B;\n    Dense(64, activation=&#039;relu&#039;, input_shape=(20,)),\n    Dense(1, activation=&#039;sigmoid&#039;)  # sigmoid on output layer\n])\n\nmodel.compile(optimizer=&#039;adam&#039;, loss=&#039;binary_crossentropy&#039;)\nmodel.fit(X_train, y_train, epochs=10, verbose=0)\n\n# Raw predictions \u2014 still logits because of how Keras works\n# Actually, sigmoid is applied, so you get probabilities\npredictions = model.predict(X_test, verbose=0)\nprint(predictions.shape)  # (n_samples, 1)\nprint(predictions&#x5B;:5].flatten())\n# &#x5B;0.87, 0.12, 0.65, 0.91, 0.34]\n\n<\/pre><\/div>\n\n\n<p>Here&#8217;s the gotcha: if you build a binary classification model <strong>without<\/strong> an activation function on the final layer (i.e., you plan to apply sigmoid manually), then <code>predict()<\/code> returns raw logits. If you use <code>activation='sigmoid'<\/code> on the final layer, <code>predict()<\/code> returns probabilities.<\/p>\n\n\n<div class=\"wp-block-syntaxhighlighter-code \"><pre class=\"brush: python; title: ; notranslate\" title=\"\">\n\n# Without sigmoid on final layer \u2014 returns logits\nmodel_logits = Sequential(&#x5B;\n    Dense(64, activation=&#039;relu&#039;, input_shape=(20,)),\n    Dense(1)  # linear output \u2014 raw logits\n])\nmodel_logits.compile(optimizer=&#039;adam&#039;, loss=&#039;binary_crossentropy&#039;)\nmodel_logits.fit(X_train, y_train, epochs=10, verbose=0)\n\nraw_output = model_logits.predict(X_test, verbose=0)\n# raw_output is logits, not probabilities\n# Apply sigmoid manually to convert: 1 \/ (1 + np.exp(-raw_output))\n\n<\/pre><\/div>\n\n\n<h3 class=\"wp-block-heading\">Multiclass Classification<\/h3>\n\n\n<div class=\"wp-block-syntaxhighlighter-code \"><pre class=\"brush: python; title: ; notranslate\" title=\"\">\n\nfrom tensorflow.keras.utils import to_categorical\n\n# One-hot encode labels for multiclass\ny_train_cat = to_categorical(y_train, num_classes=3)\ny_test_cat = to_categorical(y_test, num_classes=3)\n\nmodel = Sequential(&#x5B;\n    Dense(64, activation=&#039;relu&#039;, input_shape=(20,)),\n    Dense(3, activation=&#039;softmax&#039;)  # softmax for multiclass\n])\nmodel.compile(optimizer=&#039;adam&#039;, loss=&#039;categorical_crossentropy&#039;)\nmodel.fit(X_train, y_train_cat, epochs=10, verbose=0)\n\npredictions = model.predict(X_test, verbose=0)\nprint(predictions.shape)  # (n_samples, 3)\nprint(predictions&#x5B;:3])\n# &#x5B;&#x5B;0.05, 0.12, 0.83],\n#  &#x5B;0.71, 0.22, 0.07],\n#  &#x5B;0.33, 0.45, 0.22]]\n\n<\/pre><\/div>\n\n\n<p>With <code>softmax<\/code> on the final layer, <code>predict()<\/code> returns probabilities that sum to 1.0 across each row.<\/p>\n\n\n\n<h3 class=\"wp-block-heading\">Using predict() with Models Without Output Activation<\/h3>\n\n\n\n<p>If you are doing custom training loops or using logits directly, you need to know how to handle raw outputs:<\/p>\n\n\n<div class=\"wp-block-syntaxhighlighter-code \"><pre class=\"brush: python; title: ; notranslate\" title=\"\">\n\n# Raw logits from a model without softmax\nlogits = model_logits.predict(X_test, verbose=0)\n\n# Convert to probabilities\nprobabilities = np.exp(logits) \/ np.sum(np.exp(logits), axis=1, keepdims=True)\n# Or simply:\nfrom scipy.special import softmax\nprobabilities = softmax(logits, axis=1)\n\n<\/pre><\/div>\n\n\n<h3 class=\"wp-block-heading\">predict() vs predict_on_batch()<\/h3>\n\n\n\n<p>Keras <code>predict()<\/code> is designed to handle large datasets by processing in batches internally. For small datasets, this overhead can actually slow things down. Use <code>predict_on_batch()<\/code> when you know your input size and want to avoid the batch-scheduling overhead:<\/p>\n\n\n<div class=\"wp-block-syntaxhighlighter-code \"><pre class=\"brush: python; title: ; notranslate\" title=\"\">\n\n# Standard predict \u2014 handles batching internally\npredictions = model.predict(X_test, batch_size=32, verbose=1)\n\n# Manual batch processing for small data\nfor i in range(0, len(X_test), 32):\n    batch = X_test&#x5B;i:i+32]\n    batch_preds = model.predict_on_batch(batch)\n\n<\/pre><\/div>\n\n\n<p><code>predict()<\/code> with <code>verbose=1<\/code> shows a progress bar, which is useful for large datasets. <code>predict_on_batch()<\/code> has no progress output-it is a direct computation call.<\/p>\n\n\n\n<h2 class=\"wp-block-heading\">PyTorch: No Built-In predict() Method<\/h2>\n\n\n\n<p>PyTorch does <strong>not<\/strong> have a <code>model.predict()<\/code> method. Developers coming from sklearn or Keras trip up here.<\/p>\n\n\n\n<p>Instead, you put the model in evaluation mode and call the model directly:<\/p>\n\n\n<div class=\"wp-block-syntaxhighlighter-code \"><pre class=\"brush: python; title: ; notranslate\" title=\"\">\n\nimport torch\nimport torch.nn as nn\n\nclass SimpleClassifier(nn.Module):\n    def __init__(self, input_dim):\n        super().__init__()\n        self.fc1 = nn.Linear(input_dim, 64)\n        self.fc2 = nn.Linear(64, 1)\n    \n    def forward(self, x):\n        x = torch.relu(self.fc1(x))\n        x = torch.sigmoid(self.fc2(x))\n        return x\n\nmodel = SimpleClassifier(input_dim=20)\nmodel.eval()  # Critical: set to evaluation mode\n\n# Inference\nwith torch.no_grad():\n    X_test_tensor = torch.tensor(X_test, dtype=torch.float32)\n    predictions = model(X_test_tensor)\n    \nprint(predictions.shape)  # (200, 1)\nprint(predictions&#x5B;:5].numpy().flatten())\n\n<\/pre><\/div>\n\n\n<h3 class=\"wp-block-heading\">The eval() Mode Matters<\/h3>\n\n\n\n<p>Dropout layers, batch normalization, and other stochastic layers behave differently in training vs. evaluation. Always call <code>model.eval()<\/code> before inference:<\/p>\n\n\n<div class=\"wp-block-syntaxhighlighter-code \"><pre class=\"brush: python; title: ; notranslate\" title=\"\">\n\nmodel.eval()  # Disables dropout, uses running stats for BatchNorm\n\nwith torch.no_grad():  # Disables gradient computation\n    predictions = model(X_test_tensor)\n\n<\/pre><\/div>\n\n\n<h3 class=\"wp-block-heading\">Common PyTorch Inference Patterns<\/h3>\n\n\n<div class=\"wp-block-syntaxhighlighter-code \"><pre class=\"brush: python; title: ; notranslate\" title=\"\">\n\n# Batch inference\ndef predict_batch(model, X, batch_size=64):\n    model.eval()\n    predictions = &#x5B;]\n    with torch.no_grad():\n        for i in range(0, len(X), batch_size):\n            batch = torch.tensor(X&#x5B;i:i+batch_size], dtype=torch.float32)\n            preds = model(batch)\n            predictions.append(preds.numpy())\n    return np.concatenate(predictions)\n\n# CPU vs GPU\ndevice = torch.device(&#039;cuda&#039; if torch.cuda.is_available() else &#039;cpu&#039;)\nmodel = model.to(device)\n\nwith torch.no_grad():\n    X_test_tensor = torch.tensor(X_test, dtype=torch.float32).to(device)\n    predictions = model(X_test_tensor).cpu().numpy()\n\n<\/pre><\/div>\n\n\n<h3 class=\"wp-block-heading\">PyTorch with torch.compile (PyTorch 2.0+)<\/h3>\n\n\n\n<p>PyTorch 2.0 introduced <code>torch.compile()<\/code>, which JIT-compiles the model for faster inference:<\/p>\n\n\n<div class=\"wp-block-syntaxhighlighter-code \"><pre class=\"brush: python; title: ; notranslate\" title=\"\">\n\nmodel = SimpleClassifier(input_dim=20)\nmodel.eval()\n\n# Compile for ~20-30% speedup on inference\ncompiled_model = torch.compile(model)\n\nwith torch.no_grad():\n    predictions = compiled_model(X_test_tensor)\n\n<\/pre><\/div>\n\n\n<h2 class=\"wp-block-heading\">XGBoost and LightGBM: Native Gradient Boosting<\/h2>\n\n\n\n<p>XGBoost and LightGBM have their own <code>predict()<\/code> methods that behave similarly to sklearn but with important differences.<\/p>\n\n\n\n<h3 class=\"wp-block-heading\">XGBoost<\/h3>\n\n\n<div class=\"wp-block-syntaxhighlighter-code \"><pre class=\"brush: python; title: ; notranslate\" title=\"\">\n\nimport xgboost as xgb\n\nmodel = xgb.XGBClassifier(n_estimators=100, use_label_encoder=False, eval_metric=&#039;logloss&#039;)\nmodel.fit(X_train, y_train)\n\n# Default: returns class predictions\npredictions = model.predict(X_test)\nprint(predictions.shape)  # (200,)\n\n# Probabilities\nprobabilities = model.predict_proba(X_test)\nprint(probabilities.shape)  # (200, 2)\n\n<\/pre><\/div>\n\n\n<h3 class=\"wp-block-heading\">XGBoost with raw_score and pred_leaf<\/h3>\n\n\n\n<p>XGBoost exposes additional prediction types:<\/p>\n\n\n<div class=\"wp-block-syntaxhighlighter-code \"><pre class=\"brush: python; title: ; notranslate\" title=\"\">\n\n# Raw margin scores (before the global link function)\nraw_scores = model.predict(X_test, output_margin=True)\n\n# Leaf indices (useful for tree interpretation)\nleaf_indices = model.predict(X_test, pred_leaf=True)\nprint(leaf_indices.shape)  # (200, n_trees) \u2014 which leaf each tree puts the sample in\n\n<\/pre><\/div>\n\n\n<h3 class=\"wp-block-heading\">LightGBM<\/h3>\n\n\n<div class=\"wp-block-syntaxhighlighter-code \"><pre class=\"brush: python; title: ; notranslate\" title=\"\">\n\nimport lightgbm as lgb\n\nmodel = lgb.LGBMClassifier(n_estimators=100)\nmodel.fit(X_train, y_train)\n\npredictions = model.predict(X_test)\nprobabilities = model.predict_proba(X_test)\n\n# Raw scores\nraw_scores = model.predict(X_test, raw_score=True)\n\n# Leaf indices\nleaf_preds = model.predict(X_test, pred_leaf=True)\n\n<\/pre><\/div>\n\n\n<h2 class=\"wp-block-heading\">Common Pitfalls Across All Frameworks<\/h2>\n\n\n\n<h3 class=\"wp-block-heading\">1. Input Shape Mismatches<\/h3>\n\n\n\n<p>The single most common error. <code>model.predict()<\/code> almost always expects 2D input <code>(n_samples, n_features)<\/code>, even if you are predicting a single sample.<\/p>\n\n\n<div class=\"wp-block-syntaxhighlighter-code \"><pre class=\"brush: python; title: ; notranslate\" title=\"\">\n\n# Wrong \u2014 1D array\nsingle_sample = X_test&#x5B;0]\npredictions = model.predict(single_sample)  # Shape mismatch error\n\n# Correct \u2014 2D array\nsingle_sample = X_test&#x5B;0:1]  # Shape (1, n_features)\npredictions = model.predict(single_sample)\n\n<\/pre><\/div>\n\n\n<p>Here is the tricky part: most frameworks can broadcast 1D to 2D in other contexts, but <code>predict()<\/code> is strict about shape.<\/p>\n\n\n\n<h3 class=\"wp-block-heading\">2. Not Setting the Model to Evaluation Mode (PyTorch)<\/h3>\n\n\n\n<p>Dropout being active during inference will randomly zero out neurons, producing different outputs every call. Always:<\/p>\n\n\n<div class=\"wp-block-syntaxhighlighter-code \"><pre class=\"brush: python; title: ; notranslate\" title=\"\">\n\nmodel.eval()  # Before inference\n\n<\/pre><\/div>\n\n\n<h3 class=\"wp-block-heading\">3. Forgetting That predict() Returns Indices, Not Probabilities (sklearn)<\/h3>\n\n\n<div class=\"wp-block-syntaxhighlighter-code \"><pre class=\"brush: python; title: ; notranslate\" title=\"\">\n\n# Wrong assumption\nif model.predict(X_test) &gt; 0.5:  # comparing array to scalar does element-wise comparison\n    ...\n\n# Correct \u2014 for binary classification\nproba = model.predict_proba(X_test)&#x5B;:, 1]\npredictions = (proba &gt; 0.5).astype(int)\n\n<\/pre><\/div>\n\n\n<h3 class=\"wp-block-heading\">4. Keras predict() Batching Overhead for Small Inputs<\/h3>\n\n\n\n<p>For small test sets, Keras <code>predict()<\/code> can be slower than expected due to internal batch scheduling:<\/p>\n\n\n<div class=\"wp-block-syntaxhighlighter-code \"><pre class=\"brush: python; title: ; notranslate\" title=\"\">\n\n# Slow for small data \u2014 batch scheduling overhead\npredictions = model.predict(X_small, verbose=0)\n\n# Faster for small data\npredictions = model.predict_on_batch(X_small)\n\n<\/pre><\/div>\n\n\n<h3 class=\"wp-block-heading\">5. Ignoring the Dtype of Your Input<\/h3>\n\n\n<div class=\"wp-block-syntaxhighlighter-code \"><pre class=\"brush: python; title: ; notranslate\" title=\"\">\n\n# If your training data was float32 but inference is float64\nX_test_wrong = np.array(X_test, dtype=np.float64)\npredictions = model.predict(X_test_wrong)  # May work or may cast unexpectedly\n\n# Ensure matching dtype\nX_test_correct = np.array(X_test, dtype=np.float32)\npredictions = model.predict(X_test_correct)\n\n<\/pre><\/div>\n\n\n<h3 class=\"wp-block-heading\">6. XGBoost\/LightGBM Using Wrong Input Type After sklearn<\/h3>\n\n\n\n<p>sklearn models accept pandas DataFrames. XGBoost and LightGBM often work better with their native data structures for large datasets:<\/p>\n\n\n<div class=\"wp-block-syntaxhighlighter-code \"><pre class=\"brush: python; title: ; notranslate\" title=\"\">\n\nimport xgboost as xgb\n\n# DMatrix is XGBoost&#039;s native data structure \u2014 faster for large data\ndtrain = xgb.DMatrix(X_train, label=y_train)\ndtest = xgb.DMatrix(X_test)\n\nmodel = xgb.train(params, dtrain, num_boost_round=100)\npredictions = model.predict(dtest)  # Note: different API \u2014 model is Booster, not Classifier\n\n<\/pre><\/div>\n\n\n<h2 class=\"wp-block-heading\">Batch Prediction Performance<\/h2>\n\n\n\n<p>When you need to predict on large datasets, how you batch matters:<\/p>\n\n\n<div class=\"wp-block-syntaxhighlighter-code \"><pre class=\"brush: python; title: ; notranslate\" title=\"\">\n\ndef batch_predict(model, X, framework=&#039;sklearn&#039;, batch_size=1000):\n    n_samples = len(X)\n    predictions = &#x5B;]\n    \n    for start in range(0, n_samples, batch_size):\n        end = min(start + batch_size, n_samples)\n        batch = X&#x5B;start:end]\n        \n        if framework == &#039;sklearn&#039;:\n            preds = model.predict(batch)\n        elif framework == &#039;keras&#039;:\n            preds = model.predict(batch, verbose=0)\n        elif framework == &#039;pytorch&#039;:\n            with torch.no_grad():\n                batch_tensor = torch.tensor(batch, dtype=torch.float32)\n                preds = model(batch_tensor).numpy()\n        \n        predictions.append(preds)\n    \n    return np.concatenate(predictions)\n\n<\/pre><\/div>\n\n\n<p>Key points:<\/p>\n\n\n\n<ul class=\"wp-block-list\"><li>sklearn: internal batching is usually sufficient, pass the whole array<\/li><li>Keras: <code>batch_size<\/code> parameter in <code>predict()<\/code> controls internal batching; set it based on your memory constraints<\/li><li>PyTorch: manual batching gives you full control<\/li><\/ul>\n\n\n\n<h2 class=\"wp-block-heading\">What About predict_proba() and Other Variants?<\/h2>\n\n\n\n<p>Frameworks typically provide variant methods:<\/p>\n\n\n<figure class=\"wp-block-table\"><table><thead><tr><th>Method<\/th><th>Returns<\/th><th>Available In<\/th><\/tr><\/thead><tbody><tr><td><code>predict()<\/code><\/td><td>Class labels (sklearn) or probabilities (Keras with activation)<\/td><td>All<\/td><\/tr><tr><td><code>predict_proba()<\/code><\/td><td>Class membership probabilities<\/td><td>sklearn, Keras (wraps predict), XGBoost, LightGBM<\/td><\/tr><tr><td><code>predict_log_proba()<\/code><\/td><td>Log probabilities<\/td><td>sklearn<\/td><\/tr><tr><td><code>predict_on_batch()<\/code><\/td><td>Same as predict, explicit batch<\/td><td>Keras<\/td><\/tr><tr><td><code>predict_async()<\/code><\/td><td>Async version<\/td><td>Some frameworks (e.g., TensorFlow.js)<\/td><\/tr><\/tbody><\/table><\/figure>\n\n\n<p>Use <code>predict_proba()<\/code> when you need the uncertainty of a prediction, not just the label. These methods are essential for:<\/p>\n\n\n\n<ul class=\"wp-block-list\"><li>Threshold tuning (choosing your own classification threshold)<\/li><li>Calibrated probabilities<\/li><li>Ensemble methods that weight predictions by confidence<\/li><\/ul>\n\n\n\n<h2 class=\"wp-block-heading\">Putting It Together: A Framework-Agnostic predict() Wrapper<\/h2>\n\n\n\n<p>If you are working with multiple frameworks in the same codebase, a thin wrapper can smooth over the differences:<\/p>\n\n\n<div class=\"wp-block-syntaxhighlighter-code \"><pre class=\"brush: python; title: ; notranslate\" title=\"\">\n\nimport numpy as np\n\ndef predict(model, X, framework=&#039;sklearn&#039;, proba=False):\n    X = np.asarray(X)\n    \n    if X.ndim == 1:\n        X = X.reshape(1, -1)  # Ensure 2D\n    \n    if framework == &#039;sklearn&#039;:\n        if proba:\n            return model.predict_proba(X)\n        return model.predict(X)\n    \n    elif framework == &#039;keras&#039;:\n        preds = model.predict(X, verbose=0)\n        if proba:\n            return preds\n        return (preds &gt; 0.5).astype(int).flatten()\n    \n    elif framework == &#039;pytorch&#039;:\n        model.eval()\n        with torch.no_grad():\n            X_tensor = torch.tensor(X, dtype=torch.float32)\n            preds = model(X_tensor).numpy()\n        if proba:\n            return preds\n        return (preds &gt; 0.5).astype(int).flatten()\n    \n    elif framework in (&#039;xgboost&#039;, &#039;lightgbm&#039;):\n        if proba:\n            return model.predict_proba(X)\n        return model.predict(X)\n    \n    else:\n        raise ValueError(f&quot;Unknown framework: {framework}&quot;)\n\n<\/pre><\/div>\n\n\n<h2 class=\"wp-block-heading\">The Core Principle<\/h2>\n\n\n\n<p><code>model.predict()<\/code> is a framework-specific inference call that:<\/p>\n\n\n\n<ul class=\"wp-block-list\"><li>Takes your preprocessed input data<\/li><li>Runs the forward pass without updating weights<\/li><li>Returns predictions in framework-specific format (labels, probabilities, or raw scores)<\/li><\/ul>\n\n\n\n<h2 class=\"wp-block-heading\">FAQ<\/h2>\n\n\n\n<p><strong>Q: Does model.predict() update weights?<\/strong><\/p>\n\n\n\n<p>No. predict() runs inference mode, which is a pure forward pass with no weight updates. Only fit(), train(), or backward() operations change model parameters.<\/p>\n\n\n\n<p><strong>Q: Why does Keras predict() return different values than sklearn?<\/strong><\/p>\n\n\n\n<p>sklearn always returns class labels (0 or 1) for classifiers. Keras returns raw values that depend on your output layer activation. If you used sigmoid, you get probabilities. If you used no activation, you get logits and must apply sigmoid manually.<\/p>\n\n\n\n<p><strong>Q: Why does my PyTorch model give different outputs every time I call it?<\/strong><\/p>\n\n\n\n<p>You probably forgot model.eval() or are still inside a torch.enable_grad() context. Dropout and certain other layers behave differently in training mode. Always call model.eval() and use torch.no_grad() for inference.<\/p>\n\n\n\n<p><strong>Q: Can I use predict() on a single sample?<\/strong><\/p>\n\n\n\n<p>Yes, but you must pass a 2D array with shape (1, n_features), not a 1D array. Single samples need X[0:1] not X[0] for most frameworks.<\/p>\n\n\n\n<p><strong>Q: What is the difference between predict() and predict_proba()?<\/strong><\/p>\n\n\n\n<p>predict() returns class labels (for classifiers) or values (for regressors). predict_proba() returns class membership probabilities. Use predict_proba() when you need confidence scores or want to tune your own classification threshold.<\/p>\n\n\n\n<p>The surface similarity across frameworks masks important differences in return types, input shape requirements, and behavior in training vs. evaluation mode. Understanding these differences is what separates code that works in a notebook from code that works in production.<\/p>\n","protected":false},"excerpt":{"rendered":"<p>Every ML model, regardless of how it was trained or what framework built it, eventually does the same thing: it takes input and produces output. Python&#8217;s model.predict() operation is. It looks simple. It is simple-until it isn&#8217;t. The same method name appears across scikit-learn, Keras, TensorFlow, PyTorch, XGBoost, LightGBM, and most other ML frameworks. But [&hellip;]<\/p>\n","protected":false},"author":4,"featured_media":9311,"comment_status":"closed","ping_status":"closed","sticky":false,"template":"","format":"standard","meta":{"footnotes":""},"categories":[9],"tags":[],"class_list":["post-9209","post","type-post","status-publish","format-standard","has-post-thumbnail","hentry","category-examples"],"blocksy_meta":[],"_links":{"self":[{"href":"https:\/\/www.askpython.com\/wp-json\/wp\/v2\/posts\/9209","targetHints":{"allow":["GET"]}}],"collection":[{"href":"https:\/\/www.askpython.com\/wp-json\/wp\/v2\/posts"}],"about":[{"href":"https:\/\/www.askpython.com\/wp-json\/wp\/v2\/types\/post"}],"author":[{"embeddable":true,"href":"https:\/\/www.askpython.com\/wp-json\/wp\/v2\/users\/4"}],"replies":[{"embeddable":true,"href":"https:\/\/www.askpython.com\/wp-json\/wp\/v2\/comments?post=9209"}],"version-history":[{"count":0,"href":"https:\/\/www.askpython.com\/wp-json\/wp\/v2\/posts\/9209\/revisions"}],"wp:featuredmedia":[{"embeddable":true,"href":"https:\/\/www.askpython.com\/wp-json\/wp\/v2\/media\/9311"}],"wp:attachment":[{"href":"https:\/\/www.askpython.com\/wp-json\/wp\/v2\/media?parent=9209"}],"wp:term":[{"taxonomy":"category","embeddable":true,"href":"https:\/\/www.askpython.com\/wp-json\/wp\/v2\/categories?post=9209"},{"taxonomy":"post_tag","embeddable":true,"href":"https:\/\/www.askpython.com\/wp-json\/wp\/v2\/tags?post=9209"}],"curies":[{"name":"wp","href":"https:\/\/api.w.org\/{rel}","templated":true}]}}