<?xml version="1.0" encoding="UTF-8"?><rss xmlns:dc="http://purl.org/dc/elements/1.1/" xmlns:content="http://purl.org/rss/1.0/modules/content/" xmlns:atom="http://www.w3.org/2005/Atom" version="2.0" xmlns:cc="http://cyber.law.harvard.edu/rss/creativeCommonsRssModule.html">
    <channel>
        <title><![CDATA[Stories by Jacob Bumgarner, Ph.D. on Medium]]></title>
        <description><![CDATA[Stories by Jacob Bumgarner, Ph.D. on Medium]]></description>
        <link>https://medium.com/@jacobbumgarner?source=rss-e1f3762eb90c------2</link>
        <image>
            <url>https://cdn-images-1.medium.com/fit/c/150/150/1*olPSM7fWhbIypPd6obscHg.jpeg</url>
            <title>Stories by Jacob Bumgarner, Ph.D. on Medium</title>
            <link>https://medium.com/@jacobbumgarner?source=rss-e1f3762eb90c------2</link>
        </image>
        <generator>Medium</generator>
        <lastBuildDate>Sun, 24 May 2026 07:35:15 GMT</lastBuildDate>
        <atom:link href="https://medium.com/@jacobbumgarner/feed" rel="self" type="application/rss+xml"/>
        <webMaster><![CDATA[yourfriends@medium.com]]></webMaster>
        <atom:link href="http://medium.superfeedr.com" rel="hub"/>
        <item>
            <title><![CDATA[Breaking it Down: K-Means Clustering]]></title>
            <link>https://medium.com/data-science/breaking-it-down-k-means-clustering-e0ef0168688d?source=rss-e1f3762eb90c------2</link>
            <guid isPermaLink="false">https://medium.com/p/e0ef0168688d</guid>
            <category><![CDATA[k-means]]></category>
            <category><![CDATA[numpy]]></category>
            <category><![CDATA[machine-learning]]></category>
            <category><![CDATA[clustering-algorithm]]></category>
            <category><![CDATA[scikit-learn]]></category>
            <dc:creator><![CDATA[Jacob Bumgarner, Ph.D.]]></dc:creator>
            <pubDate>Sun, 06 Nov 2022 19:19:04 GMT</pubDate>
            <atom:updated>2022-11-14T23:43:40.316Z</atom:updated>
            <content:encoded><![CDATA[<h4>Exploring and visualizing the fundamentals of K-means clustering with NumPy and scikit-learn.</h4><pre><strong>Outline:</strong><br><a href="#1791">1. What is K-Means Clustering?</a><br><a href="#f947">2. Implementing K-means from Scratch with NumPy</a><br>   <a href="#96e1">1. K-means++ Cluster Initialization</a><br>   <a href="#d0b4">2. K-Means Function Differentiation</a><br>   <a href="#ea21">3. Data Labeling and Centroid Updates</a><br>   <a href="#9b4e">4. Fitting it Together</a><br><a href="#e5af">3. K-Means for Video Keyframe Extraction: Bee Pose Estimation</a><br><a href="#aabf">4. Implementing K-means with </a><a href="#aabf">scikit-learn</a><br><a href="#8571">5. Summary</a><br><a href="#f720">6. Resources</a></pre><figure><img alt="" src="https://cdn-images-1.medium.com/max/1024/1*OE9JOSlm7yViIQMz-hsN2Q.png" /><figcaption>Article Overview</figcaption></figure><p>See my GitHub <a href="https://github.com/JacobBumgarner/learning-repo">learning-repo</a> for all of the code behind this post.</p><h3>1. What is K-Means Clustering?</h3><p>K-means clustering is an algorithm used to classify data into a user-defined number of groups, <em>k</em>. K-means is a form of unsupervised machine learning, meaning that the input data do not have labels prior to running the algorithm.</p><p>Clustering data with algorithms such as k-means is valuable for a variety of reasons. Primarily, clustering serves to identify unique groups in unlabeled datasets when building data analytics pipelines. These labels are useful for data inspection, data interpretation, and training AI models. K-means and its variants are used in a variety of contexts, including:</p><ul><li><strong>Research.</strong> E.g., Categorizing single-cell RNA sequencing results<a href="https://journals.plos.org/ploscompbiol/article?id=10.1371/journal.pcbi.1008625">¹</a></li><li><strong>Computer Science.</strong> E.g., Clustering emails for spam detection and filtering<a href="https://www.semanticscholar.org/paper/Spam-Filtering-using-K-mean-Clustering-with-Local-Sharma-Rastogi/901af90a3bf03f34064f22e3c5e39bbe6a5cf661?p2df">²</a></li><li><strong>Marketing.</strong> E.g., Customer group segmentation for credit card ad targeting<a href="https://www.kaggle.com/code/muhammadshahzadkhan/bank-customer-segmentation-pca-kmeans"><strong>³</strong></a></li></ul><h3>2. Implementing K-Means from Scratch with NumPy</h3><p>To gain a fundamental understanding of how k-means works, we will examine each step of the algorithm. We’ll do this with visual explanations and by building a model from scratch with NumPy.</p><p>The algorithm and mathematical function behind k-means are beautiful yet relatively simple. Let’s start with an overview:</p><iframe src="https://cdn.embedly.com/widgets/media.html?src=https%3A%2F%2Fwww.youtube.com%2Fembed%2FE173z7K87PY%3Ffeature%3Doembed&amp;display_name=YouTube&amp;url=https%3A%2F%2Fwww.youtube.com%2Fwatch%3Fv%3DE173z7K87PY&amp;image=https%3A%2F%2Fi.ytimg.com%2Fvi%2FE173z7K87PY%2Fhqdefault.jpg&amp;key=a19fcc184b9711e1b4764040d3dc5c07&amp;type=text%2Fhtml&amp;schema=youtube" width="854" height="480" frameborder="0" scrolling="no"><a href="https://medium.com/media/4886b38b13350dab674ff8a90858ff8c/href">https://medium.com/media/4886b38b13350dab674ff8a90858ff8c/href</a></iframe><p>In summary, the k-means algorithm has three steps:</p><ol><li>Assign initial cluster center (centroid) positions</li><li>Label the data based on the nearest centroid</li><li>Move the centroids to the mean position of the newly labeled data points. Go back to step 2 until the cluster centers converge.</li></ol><p>Let’s move on to building the model. These are the functions that we’ll need to write in order to use the algorithm:</p><iframe src="https://cdn.embedly.com/widgets/media.html?src=https%3A%2F%2Fcarbon.now.sh%2Fembed%3Fbg%3Drgba%2528255%252C255%252C255%252C0%2529%26t%3Dmaterial%26wt%3Dnone%26l%3Dpython%26width%3D680%26ds%3Dfalse%26dsyoff%3D20px%26dsblur%3D68px%26wc%3Dtrue%26wa%3Dtrue%26pv%3D11px%26ph%3D8px%26ln%3Dtrue%26fl%3D1%26fm%3DHack%26fs%3D10px%26lh%3D133%2525%26si%3Dfalse%26es%3D2x%26wm%3Dfalse%26code%3Dimport%252520numpy%252520as%252520np%25250A%25250A%25250Aclass%252520KMeans%25253A%25250A%252520%252520%252520%252520def%252520__init__%2528%25250A%252520%252520%252520%252520%252520%252520%252520%252520self%25252C%25250A%252520%252520%252520%252520%252520%252520%252520%252520n_clusters%25253A%252520int%252520%25253D%2525203%25252C%25250A%252520%252520%252520%252520%252520%252520%252520%252520centroid_init%25253A%252520str%252520%25253D%252520%252522kmeans%25252B%25252B%252522%25252C%25250A%252520%252520%252520%252520%252520%252520%252520%252520max_iterations%25253A%252520int%252520%25253D%2525201000%25252C%25250A%252520%252520%252520%252520%252520%252520%252520%252520verbose%25253A%252520bool%252520%25253D%252520False%25252C%25250A%252520%252520%252520%252520%2529%25253A%252520...%25250A%252520%252520%252520%252520%252522%252522%252522Construct%252520a%252520k-means%252520clustering%252520object.%252522%252522%252522%25250A%25250A%252520%252520%252520%252520def%252520fit%2528self%25252C%252520input_data%25253A%252520np.ndarray%2529%25253A%252520...%25250A%252520%252520%252520%252520%252522%252522%252522Compute%252520k-means%252520clutsering%252520for%252520the%252520input%252520data.%252522%252522%252522%25250A%25250A%252520%252520%252520%252520def%252520_init_centroids_random%2528%25250A%252520%252520%252520%252520%252520%252520%252520%252520self%25252C%252520input_data%25253A%252520np.ndarray%25252C%252520n_centroids%25253A%252520int%25250A%252520%252520%252520%252520%2529%252520-%25253E%252520np.ndarray%25253A%252520...%25250A%252520%252520%252520%252520%252522%252522%252522Randomly%252520initialize%252520centroid%252520points.%252522%252522%252522%25250A%25250A%252520%252520%252520%252520def%252520_init_centroids_plusplus%2528%25250A%252520%252520%252520%252520%252520%252520%252520%252520self%25252C%252520input_data%25253A%252520np.ndarray%25252C%252520n_centroids%25253A%252520int%25250A%252520%252520%252520%252520%2529%252520-%25253E%252520np.ndarray%25253A%252520...%25250A%252520%252520%252520%252520%252522%252522%252522Initialize%252520centroid%252520points%252520using%252520the%252520kmeans%25252B%25252B%252520algorithm.%252522%252522%252522%25250A%25250A%252520%252520%252520%252520def%252520_compute_labels%2528%25250A%252520%252520%252520%252520%252520%252520%252520%252520self%25252C%252520input_data%25253A%252520np.ndarray%25252C%252520centroids%25253A%252520np.ndarray%25250A%252520%252520%252520%252520%2529%252520-%25253E%252520np.ndarray%25253A%252520...%25250A%252520%252520%252520%252520%252522%252522%252522Return%252520the%252520resulting%252520cluster%252520data%252520labels.%252522%252522%252522%25250A%25250A%252520%252520%252520%252520def%252520_calculate_distances%2528%25250A%252520%252520%252520%252520%252520%252520%252520%252520self%25252C%252520input_data%25253A%252520np.ndarray%25252C%252520centroids%25253A%252520np.ndarray%25250A%252520%252520%252520%252520%2529%252520-%25253E%252520np.ndarray%25253A%252520...%25250A%252520%252520%252520%252520%252522%252522%252522Calculate%252520the%252520distance%252520of%252520each%252520input%252520point%252520to%252520each%252520centroid.%252522%252522%252522%25250A%252520%252520%252520%252520%25250A%252520%252520%252520%252520def%252520_update_centroid_positions%2528%25250A%252520%252520%252520%252520%252520%252520%252520%252520self%25252C%25250A%252520%252520%252520%252520%252520%252520%252520%252520centroids%25253A%252520np.ndarray%25252C%25250A%252520%252520%252520%252520%252520%252520%252520%252520input_data%25253A%252520np.ndarray%25252C%25250A%252520%252520%252520%252520%252520%252520%252520%252520labels%25253A%252520np.ndarray%25252C%25250A%252520%252520%252520%252520%2529%252520-%25253E%252520np.ndarray%25253A%252520...%25250A%252520%252520%252520%252520%252522%252522%252522Update%252520the%252520location%252520of%252520each%252520centroid%252520to%252520its%252520center%252520of%252520mass.%252522%252522%252522%25250A%252520%252520%252520%252520&amp;display_name=Carbon&amp;url=https%3A%2F%2Fcarbon.now.sh%2F%3Fbg%3Drgba%252528255%25252C255%25252C255%25252C0%252529%26t%3Dmaterial%26wt%3Dnone%26l%3Dpython%26width%3D680%26ds%3Dfalse%26dsyoff%3D20px%26dsblur%3D68px%26wc%3Dtrue%26wa%3Dtrue%26pv%3D11px%26ph%3D8px%26ln%3Dtrue%26fl%3D1%26fm%3DHack%26fs%3D10px%26lh%3D133%252525%26si%3Dfalse%26es%3D2x%26wm%3Dfalse%26code%3Dimport%25252520numpy%25252520as%25252520np%2525250A%2525250A%2525250Aclass%25252520KMeans%2525253A%2525250A%25252520%25252520%25252520%25252520def%25252520__init__%252528%2525250A%25252520%25252520%25252520%25252520%25252520%25252520%25252520%25252520self%2525252C%2525250A%25252520%25252520%25252520%25252520%25252520%25252520%25252520%25252520n_clusters%2525253A%25252520int%25252520%2525253D%252525203%2525252C%2525250A%25252520%25252520%25252520%25252520%25252520%25252520%25252520%25252520centroid_init%2525253A%25252520str%25252520%2525253D%25252520%25252522kmeans%2525252B%2525252B%25252522%2525252C%2525250A%25252520%25252520%25252520%25252520%25252520%25252520%25252520%25252520max_iterations%2525253A%25252520int%25252520%2525253D%252525201000%2525252C%2525250A%25252520%25252520%25252520%25252520%25252520%25252520%25252520%25252520verbose%2525253A%25252520bool%25252520%2525253D%25252520False%2525252C%2525250A%25252520%25252520%25252520%25252520%252529%2525253A%25252520...%2525250A%25252520%25252520%25252520%25252520%25252522%25252522%25252522Construct%25252520a%25252520k-means%25252520clustering%25252520object.%25252522%25252522%25252522%2525250A%2525250A%25252520%25252520%25252520%25252520def%25252520fit%252528self%2525252C%25252520input_data%2525253A%25252520np.ndarray%252529%2525253A%25252520...%2525250A%25252520%25252520%25252520%25252520%25252522%25252522%25252522Compute%25252520k-means%25252520clutsering%25252520for%25252520the%25252520input%25252520data.%25252522%25252522%25252522%2525250A%2525250A%25252520%25252520%25252520%25252520def%25252520_init_centroids_random%252528%2525250A%25252520%25252520%25252520%25252520%25252520%25252520%25252520%25252520self%2525252C%25252520input_data%2525253A%25252520np.ndarray%2525252C%25252520n_centroids%2525253A%25252520int%2525250A%25252520%25252520%25252520%25252520%252529%25252520-%2525253E%25252520np.ndarray%2525253A%25252520...%2525250A%25252520%25252520%25252520%25252520%25252522%25252522%25252522Randomly%25252520initialize%25252520centroid%25252520points.%25252522%25252522%25252522%2525250A%2525250A%25252520%25252520%25252520%25252520def%25252520_init_centroids_plusplus%252528%2525250A%25252520%25252520%25252520%25252520%25252520%25252520%25252520%25252520self%2525252C%25252520input_data%2525253A%25252520np.ndarray%2525252C%25252520n_centroids%2525253A%25252520int%2525250A%25252520%25252520%25252520%25252520%252529%25252520-%2525253E%25252520np.ndarray%2525253A%25252520...%2525250A%25252520%25252520%25252520%25252520%25252522%25252522%25252522Initialize%25252520centroid%25252520points%25252520using%25252520the%25252520kmeans%2525252B%2525252B%25252520algorithm.%25252522%25252522%25252522%2525250A%2525250A%25252520%25252520%25252520%25252520def%25252520_compute_labels%252528%2525250A%25252520%25252520%25252520%25252520%25252520%25252520%25252520%25252520self%2525252C%25252520input_data%2525253A%25252520np.ndarray%2525252C%25252520centroids%2525253A%25252520np.ndarray%2525250A%25252520%25252520%25252520%25252520%252529%25252520-%2525253E%25252520np.ndarray%2525253A%25252520...%2525250A%25252520%25252520%25252520%25252520%25252522%25252522%25252522Return%25252520the%25252520resulting%25252520cluster%25252520data%25252520labels.%25252522%25252522%25252522%2525250A%2525250A%25252520%25252520%25252520%25252520def%25252520_calculate_distances%252528%2525250A%25252520%25252520%25252520%25252520%25252520%25252520%25252520%25252520self%2525252C%25252520input_data%2525253A%25252520np.ndarray%2525252C%25252520centroids%2525253A%25252520np.ndarray%2525250A%25252520%25252520%25252520%25252520%252529%25252520-%2525253E%25252520np.ndarray%2525253A%25252520...%2525250A%25252520%25252520%25252520%25252520%25252522%25252522%25252522Calculate%25252520the%25252520distance%25252520of%25252520each%25252520input%25252520point%25252520to%25252520each%25252520centroid.%25252522%25252522%25252522%2525250A%25252520%25252520%25252520%25252520%2525250A%25252520%25252520%25252520%25252520def%25252520_update_centroid_positions%252528%2525250A%25252520%25252520%25252520%25252520%25252520%25252520%25252520%25252520self%2525252C%2525250A%25252520%25252520%25252520%25252520%25252520%25252520%25252520%25252520centroids%2525253A%25252520np.ndarray%2525252C%2525250A%25252520%25252520%25252520%25252520%25252520%25252520%25252520%25252520input_data%2525253A%25252520np.ndarray%2525252C%2525250A%25252520%25252520%25252520%25252520%25252520%25252520%25252520%25252520labels%2525253A%25252520np.ndarray%2525252C%2525250A%25252520%25252520%25252520%25252520%252529%25252520-%2525253E%25252520np.ndarray%2525253A%25252520...%2525250A%25252520%25252520%25252520%25252520%25252522%25252522%25252522Update%25252520the%25252520location%25252520of%25252520each%25252520centroid%25252520to%25252520its%25252520center%25252520of%25252520mass.%25252522%25252522%25252522%2525250A%25252520%25252520%25252520%25252520&amp;image=https%3A%2F%2Fcarbon.now.sh%2Fstatic%2Fbrand%2Fbanner.png&amp;key=a19fcc184b9711e1b4764040d3dc5c07&amp;type=text%2Fhtml&amp;scroll=auto&amp;schema=carbon" width="1024" height="480" frameborder="0" scrolling="no"><a href="https://medium.com/media/47a705a2a72170d24a494cd33e8533e9/href">https://medium.com/media/47a705a2a72170d24a494cd33e8533e9/href</a></iframe><h4>2.1. Cluster Initialization</h4><p>The first step of the k-means algorithm is for the user to select the number of groups that the data should be clustered into, <em>k</em>.</p><p>In the original implementation of the algorithm, once <em>k</em> was selected, the initial positions of the cluster centers (or <em>centroids</em>) would be initialized by randomly selecting <em>k</em> of the input data points as the centroid starting positions.</p><p>This approach turned out to be quite inefficient, as the starting centroid positions could end up being randomly close to one another. In 2006, a new and more efficient approach to the centroid initialization process was developed by Arthur and Vassilvitskii<a href="http://ilpubs.stanford.edu:8090/778/1/2006-13.pdf">⁴</a>. They published their approach in 2007, calling it <strong>k-means++</strong>.</p><p>Rather than randomly selecting the initial centroids, <strong>k-means++</strong> efficiently selects the positions based on distance distributions. Let’s visualize how it works:</p><iframe src="https://cdn.embedly.com/widgets/media.html?src=https%3A%2F%2Fwww.youtube.com%2Fembed%2F4qJWhvFQb9g%3Ffeature%3Doembed&amp;display_name=YouTube&amp;url=https%3A%2F%2Fwww.youtube.com%2Fwatch%3Fv%3D4qJWhvFQb9g&amp;image=https%3A%2F%2Fi.ytimg.com%2Fvi%2F4qJWhvFQb9g%2Fhqdefault.jpg&amp;key=a19fcc184b9711e1b4764040d3dc5c07&amp;type=text%2Fhtml&amp;schema=youtube" width="854" height="480" frameborder="0" scrolling="no"><a href="https://medium.com/media/a738d68a5b57662ba31312d7ab0401ad/href">https://medium.com/media/a738d68a5b57662ba31312d7ab0401ad/href</a></iframe><p>Now that the intuition behind k-means++ has been exposed, let’s implement the function for it:</p><iframe src="https://cdn.embedly.com/widgets/media.html?src=https%3A%2F%2Fcarbon.now.sh%2Fembed%3Fbg%3Drgba%2528255%252C255%252C255%252C0%2529%26t%3Dmaterial%26wt%3Dnone%26l%3Dpython%26width%3D680%26ds%3Dfalse%26dsyoff%3D20px%26dsblur%3D68px%26wc%3Dtrue%26wa%3Dtrue%26pv%3D11px%26ph%3D8px%26ln%3Dtrue%26fl%3D1%26fm%3DHack%26fs%3D10px%26lh%3D133%2525%26si%3Dfalse%26es%3D2x%26wm%3Dfalse%26code%3Ddef%252520_init_centroids_plusplus%2528%25250A%252520%252520%252520%252520self%25252C%252520input_data%25253A%252520np.ndarray%25252C%252520n_centroids%25253A%252520int%25250A%2529%252520-%25253E%252520np.ndarray%25253A%25250A%252520%252520%252520%252520%252522%252522%252522Initialize%252520centroid%252520points%252520using%252520the%252520kmeans%25252B%25252B%252520algorithm.%25250A%25250A%252520%252520%252520%252520Parameters%25250A%252520%252520%252520%252520----------%25250A%252520%252520%252520%252520input_data%252520%25253A%252520np.ndarray%25250A%252520%252520%252520%252520%252520%252520%252520%252520The%252520input%252520data.%25250A%252520%252520%252520%252520n_centroids%252520%25253A%252520int%25250A%252520%252520%252520%252520%252520%252520%252520%252520The%252520number%252520of%252520centroid%252520points%252520to%252520initialize%252520from%252520randomly%25250A%252520%252520%252520%252520%252520%252520%252520%252520selected%252520points%252520in%252520the%252520input%252520dataset.%25250A%25250A%252520%252520%252520%252520Returns%25250A%252520%252520%252520%252520-------%25250A%252520%252520%252520%252520centroids%252520%25253A%252520np.ndarray%25250A%252520%252520%252520%252520%252520%252520%252520%252520The%252520initialized%252520centroids%25250A%252520%252520%252520%252520%252522%252522%252522%25250A%252520%252520%252520%252520centroids%252520%25253D%252520%25255B%25255D%25250A%252520%252520%252520%252520centroid_rows%252520%25253D%252520%25255B%25255D%25250A%25250A%252520%252520%252520%252520%252523%252520randomly%252520select%252520first%252520centroid%25250A%252520%252520%252520%252520centroid_rows.append%2528np.random.choice%2528input_data.shape%25255B0%25255D%2529%2529%25250A%252520%252520%252520%252520centroids.append%2528input_data%25255Bcentroid_rows%25255B0%25255D%25255D%2529%25250A%25250A%252520%252520%252520%252520%252523%252520Select%252520other%252520centroids%25250A%252520%252520%252520%252520for%252520_%252520in%252520range%25281%25252C%252520n_centroids%2529%25253A%25250A%252520%252520%252520%252520%252520%252520%252520%252520%252523%252520compute%252520squared%252520l2%252520of%252520input_data%252520to%252520all%252520centroids%25250A%252520%252520%252520%252520%252520%252520%252520%252520distances%252520%25253D%252520cdist%2528%25250A%252520%252520%252520%252520%252520%252520%252520%252520%252520%252520%252520%252520input_data%25252C%252520np.asarray%2528centroids%2529%25252C%252520%252522sqeuclidean%252522%25250A%252520%252520%252520%252520%252520%252520%252520%252520%2529%25250A%25250A%252520%252520%252520%252520%252520%252520%252520%252520%252523%252520get%252520min%252520distance%252520for%252520the%252520centroids%25250A%252520%252520%252520%252520%252520%252520%252520%252520distances%252520%25253D%252520distances.min%2528axis%25253D1%2529%25250A%25250A%252520%252520%252520%252520%252520%252520%252520%252520%252523%252520select%252520a%252520new%252520centroid%252520randomly%252520based%252520on%252520the%252520probability%25250A%252520%252520%252520%252520%252520%252520%252520%252520%252523%252520%252520%252520%252520%252520distribution.%25250A%252520%252520%252520%252520%252520%252520%252520%252520%252523%252520previous%252520centroids%252520will%252520have%252520probabilities%252520of%2525200%252520-%25250A%252520%252520%252520%252520%252520%252520%252520%252520%252523%252520%252520%252520%252520%252520highly%252520unlikely%252520to%252520get%252520a%252520reselection.%25250A%252520%252520%252520%252520%252520%252520%252520%252520prob_distribution%252520%25253D%252520distances%252520%25252F%252520distances.sum%2528%2529%25250A%252520%252520%252520%252520%252520%252520%252520%252520centroid_index%252520%25253D%252520np.random.choice%2528%25250A%252520%252520%252520%252520%252520%252520%252520%252520%252520%252520%252520%252520input_data.shape%25255B0%25255D%25252C%252520p%25253Dprob_distribution%25250A%252520%252520%252520%252520%252520%252520%252520%252520%2529%25250A%25250A%252520%252520%252520%252520%252520%252520%252520%252520centroids.append%2528input_data%25255Bcentroid_index%25255D%2529%25250A%252520%252520%252520%252520%252520%252520%252520%252520centroid_rows.append%2528centroid_index%2529%25250A%25250A%252520%252520%252520%252520return%252520np.asarray%2528centroids%2529%25250A&amp;display_name=Carbon&amp;url=https%3A%2F%2Fcarbon.now.sh%2F%3Fbg%3Drgba%252528255%25252C255%25252C255%25252C0%252529%26t%3Dmaterial%26wt%3Dnone%26l%3Dpython%26width%3D680%26ds%3Dfalse%26dsyoff%3D20px%26dsblur%3D68px%26wc%3Dtrue%26wa%3Dtrue%26pv%3D11px%26ph%3D8px%26ln%3Dtrue%26fl%3D1%26fm%3DHack%26fs%3D10px%26lh%3D133%252525%26si%3Dfalse%26es%3D2x%26wm%3Dfalse%26code%3Ddef%25252520_init_centroids_plusplus%252528%2525250A%25252520%25252520%25252520%25252520self%2525252C%25252520input_data%2525253A%25252520np.ndarray%2525252C%25252520n_centroids%2525253A%25252520int%2525250A%252529%25252520-%2525253E%25252520np.ndarray%2525253A%2525250A%25252520%25252520%25252520%25252520%25252522%25252522%25252522Initialize%25252520centroid%25252520points%25252520using%25252520the%25252520kmeans%2525252B%2525252B%25252520algorithm.%2525250A%2525250A%25252520%25252520%25252520%25252520Parameters%2525250A%25252520%25252520%25252520%25252520----------%2525250A%25252520%25252520%25252520%25252520input_data%25252520%2525253A%25252520np.ndarray%2525250A%25252520%25252520%25252520%25252520%25252520%25252520%25252520%25252520The%25252520input%25252520data.%2525250A%25252520%25252520%25252520%25252520n_centroids%25252520%2525253A%25252520int%2525250A%25252520%25252520%25252520%25252520%25252520%25252520%25252520%25252520The%25252520number%25252520of%25252520centroid%25252520points%25252520to%25252520initialize%25252520from%25252520randomly%2525250A%25252520%25252520%25252520%25252520%25252520%25252520%25252520%25252520selected%25252520points%25252520in%25252520the%25252520input%25252520dataset.%2525250A%2525250A%25252520%25252520%25252520%25252520Returns%2525250A%25252520%25252520%25252520%25252520-------%2525250A%25252520%25252520%25252520%25252520centroids%25252520%2525253A%25252520np.ndarray%2525250A%25252520%25252520%25252520%25252520%25252520%25252520%25252520%25252520The%25252520initialized%25252520centroids%2525250A%25252520%25252520%25252520%25252520%25252522%25252522%25252522%2525250A%25252520%25252520%25252520%25252520centroids%25252520%2525253D%25252520%2525255B%2525255D%2525250A%25252520%25252520%25252520%25252520centroid_rows%25252520%2525253D%25252520%2525255B%2525255D%2525250A%2525250A%25252520%25252520%25252520%25252520%25252523%25252520randomly%25252520select%25252520first%25252520centroid%2525250A%25252520%25252520%25252520%25252520centroid_rows.append%252528np.random.choice%252528input_data.shape%2525255B0%2525255D%252529%252529%2525250A%25252520%25252520%25252520%25252520centroids.append%252528input_data%2525255Bcentroid_rows%2525255B0%2525255D%2525255D%252529%2525250A%2525250A%25252520%25252520%25252520%25252520%25252523%25252520Select%25252520other%25252520centroids%2525250A%25252520%25252520%25252520%25252520for%25252520_%25252520in%25252520range%2525281%2525252C%25252520n_centroids%252529%2525253A%2525250A%25252520%25252520%25252520%25252520%25252520%25252520%25252520%25252520%25252523%25252520compute%25252520squared%25252520l2%25252520of%25252520input_data%25252520to%25252520all%25252520centroids%2525250A%25252520%25252520%25252520%25252520%25252520%25252520%25252520%25252520distances%25252520%2525253D%25252520cdist%252528%2525250A%25252520%25252520%25252520%25252520%25252520%25252520%25252520%25252520%25252520%25252520%25252520%25252520input_data%2525252C%25252520np.asarray%252528centroids%252529%2525252C%25252520%25252522sqeuclidean%25252522%2525250A%25252520%25252520%25252520%25252520%25252520%25252520%25252520%25252520%252529%2525250A%2525250A%25252520%25252520%25252520%25252520%25252520%25252520%25252520%25252520%25252523%25252520get%25252520min%25252520distance%25252520for%25252520the%25252520centroids%2525250A%25252520%25252520%25252520%25252520%25252520%25252520%25252520%25252520distances%25252520%2525253D%25252520distances.min%252528axis%2525253D1%252529%2525250A%2525250A%25252520%25252520%25252520%25252520%25252520%25252520%25252520%25252520%25252523%25252520select%25252520a%25252520new%25252520centroid%25252520randomly%25252520based%25252520on%25252520the%25252520probability%2525250A%25252520%25252520%25252520%25252520%25252520%25252520%25252520%25252520%25252523%25252520%25252520%25252520%25252520%25252520distribution.%2525250A%25252520%25252520%25252520%25252520%25252520%25252520%25252520%25252520%25252523%25252520previous%25252520centroids%25252520will%25252520have%25252520probabilities%25252520of%252525200%25252520-%2525250A%25252520%25252520%25252520%25252520%25252520%25252520%25252520%25252520%25252523%25252520%25252520%25252520%25252520%25252520highly%25252520unlikely%25252520to%25252520get%25252520a%25252520reselection.%2525250A%25252520%25252520%25252520%25252520%25252520%25252520%25252520%25252520prob_distribution%25252520%2525253D%25252520distances%25252520%2525252F%25252520distances.sum%252528%252529%2525250A%25252520%25252520%25252520%25252520%25252520%25252520%25252520%25252520centroid_index%25252520%2525253D%25252520np.random.choice%252528%2525250A%25252520%25252520%25252520%25252520%25252520%25252520%25252520%25252520%25252520%25252520%25252520%25252520input_data.shape%2525255B0%2525255D%2525252C%25252520p%2525253Dprob_distribution%2525250A%25252520%25252520%25252520%25252520%25252520%25252520%25252520%25252520%252529%2525250A%2525250A%25252520%25252520%25252520%25252520%25252520%25252520%25252520%25252520centroids.append%252528input_data%2525255Bcentroid_index%2525255D%252529%2525250A%25252520%25252520%25252520%25252520%25252520%25252520%25252520%25252520centroid_rows.append%252528centroid_index%252529%2525250A%2525250A%25252520%25252520%25252520%25252520return%25252520np.asarray%252528centroids%252529%2525250A&amp;image=https%3A%2F%2Fcarbon.now.sh%2Fstatic%2Fbrand%2Fbanner.png&amp;key=a19fcc184b9711e1b4764040d3dc5c07&amp;type=text%2Fhtml&amp;scroll=auto&amp;schema=carbon" width="1024" height="480" frameborder="0" scrolling="no"><a href="https://medium.com/media/b20204a4e63aca8d0cfbea0bf88f4187/href">https://medium.com/media/b20204a4e63aca8d0cfbea0bf88f4187/href</a></iframe><p>Of note, rather than having to choose <em>k</em> manually, several unbiased techniques can be used to identify an optimal number. <a href="https://medium.com/u/78433997a4a">Khyati Mahendru</a> explains two of these approaches, the <strong>elbow </strong>and <strong>silhouette methods</strong> <a href="https://medium.com/analytics-vidhya/how-to-determine-the-optimal-k-for-k-means-708505d204eb">in her article</a>. It’s worth a read!</p><h4>2.2. Data Labeling and Centroid Updates</h4><p>Following centroid initialization, the algorithm enters an iterative process of data labeling and centroid position updates.</p><p>In each iteration, the input data will first be labeled based on their proximity to the centroids. After this, each centroid’s position will be updated to the average position of the data in its cluster.</p><p>These two steps will be repeated until the label assignments/centroid positions no longer change (or <em>converge</em>). Let’s visualize this process:</p><iframe src="https://cdn.embedly.com/widgets/media.html?src=https%3A%2F%2Fwww.youtube.com%2Fembed%2F2lZZ_FzlIJY%3Ffeature%3Doembed&amp;display_name=YouTube&amp;url=https%3A%2F%2Fwww.youtube.com%2Fwatch%3Fv%3D2lZZ_FzlIJY&amp;image=https%3A%2F%2Fi.ytimg.com%2Fvi%2F2lZZ_FzlIJY%2Fhqdefault.jpg&amp;key=a19fcc184b9711e1b4764040d3dc5c07&amp;type=text%2Fhtml&amp;schema=youtube" width="854" height="480" frameborder="0" scrolling="no"><a href="https://medium.com/media/2e7c6ebd446fb25017aae247bbfa1520/href">https://medium.com/media/2e7c6ebd446fb25017aae247bbfa1520/href</a></iframe><p>Now, let’s implement the data labeling code:</p><iframe src="https://cdn.embedly.com/widgets/media.html?src=https%3A%2F%2Fcarbon.now.sh%2Fembed%3Fbg%3Drgba%2528255%252C255%252C255%252C0%2529%26t%3Dmaterial%26wt%3Dnone%26l%3Dpython%26width%3D680%26ds%3Dfalse%26dsyoff%3D20px%26dsblur%3D68px%26wc%3Dtrue%26wa%3Dtrue%26pv%3D11px%26ph%3D8px%26ln%3Dtrue%26fl%3D1%26fm%3DHack%26fs%3D10px%26lh%3D133%2525%26si%3Dfalse%26es%3D2x%26wm%3Dfalse%26code%3Ddef%252520_compute_labels%2528%25250A%252520%252520%252520%252520self%25252C%252520input_data%25253A%252520np.ndarray%25252C%252520centroids%25253A%252520np.ndarray%25250A%2529%252520-%25253E%252520np.ndarray%25253A%25250A%252520%252520%252520%252520%252522%252522%252522Return%252520the%252520resulting%252520cluster%252520data%252520labels.%25250A%25250A%252520%252520%252520%252520Parameters%25250A%252520%252520%252520%252520----------%25250A%252520%252520%252520%252520input_data%252520%25253A%252520np.ndarray%25250A%252520%252520%252520%252520%252520%252520%252520%252520The%252520input%252520data%252520to%252520find%252520labels%252520for.%25250A%252520%252520%252520%252520centroids%252520%25253A%252520np.ndarray%25250A%252520%252520%252520%252520%252520%252520%252520%252520The%252520centroid%252520points%252520used%252520to%252520label%252520each%252520input%252520data%252520point.%25250A%25250A%252520%252520%252520%252520Returns%25250A%252520%252520%252520%252520-------%25250A%252520%252520%252520%252520data_labels%252520%25253A%252520np.ndarray%25250A%252520%252520%252520%252520%252520%252520%252520%252520The%252520index%252520of%252520the%252520closest%252520centroid%252520to%252520each%252520input%252520point%25250A%252520%252520%252520%252520%252522%252522%252522%25250A%252520%252520%252520%252520%252523%252520Compute%252520the%252520distance%252520of%252520each%252520point%252520to%252520each%252520centroid%25252C%25250A%252520%252520%252520%252520%252523%252520%252520%252520assign%252520each%252520point%252520the%252520label%252520of%252520the%252520closest%252520centroid%25250A%252520%252520%252520%252520distances%252520%25253D%252520self._calculate_distances%2528input_data%25252C%252520centroids%2529%25250A%25250A%252520%252520%252520%252520labels%252520%25253D%252520np.argmin%2528distances%25252C%252520axis%25253D0%2529%25250A%252520%252520%252520%252520return%252520labels%25250A&amp;display_name=Carbon&amp;url=https%3A%2F%2Fcarbon.now.sh%2F%3Fbg%3Drgba%252528255%25252C255%25252C255%25252C0%252529%26t%3Dmaterial%26wt%3Dnone%26l%3Dpython%26width%3D680%26ds%3Dfalse%26dsyoff%3D20px%26dsblur%3D68px%26wc%3Dtrue%26wa%3Dtrue%26pv%3D11px%26ph%3D8px%26ln%3Dtrue%26fl%3D1%26fm%3DHack%26fs%3D10px%26lh%3D133%252525%26si%3Dfalse%26es%3D2x%26wm%3Dfalse%26code%3Ddef%25252520_compute_labels%252528%2525250A%25252520%25252520%25252520%25252520self%2525252C%25252520input_data%2525253A%25252520np.ndarray%2525252C%25252520centroids%2525253A%25252520np.ndarray%2525250A%252529%25252520-%2525253E%25252520np.ndarray%2525253A%2525250A%25252520%25252520%25252520%25252520%25252522%25252522%25252522Return%25252520the%25252520resulting%25252520cluster%25252520data%25252520labels.%2525250A%2525250A%25252520%25252520%25252520%25252520Parameters%2525250A%25252520%25252520%25252520%25252520----------%2525250A%25252520%25252520%25252520%25252520input_data%25252520%2525253A%25252520np.ndarray%2525250A%25252520%25252520%25252520%25252520%25252520%25252520%25252520%25252520The%25252520input%25252520data%25252520to%25252520find%25252520labels%25252520for.%2525250A%25252520%25252520%25252520%25252520centroids%25252520%2525253A%25252520np.ndarray%2525250A%25252520%25252520%25252520%25252520%25252520%25252520%25252520%25252520The%25252520centroid%25252520points%25252520used%25252520to%25252520label%25252520each%25252520input%25252520data%25252520point.%2525250A%2525250A%25252520%25252520%25252520%25252520Returns%2525250A%25252520%25252520%25252520%25252520-------%2525250A%25252520%25252520%25252520%25252520data_labels%25252520%2525253A%25252520np.ndarray%2525250A%25252520%25252520%25252520%25252520%25252520%25252520%25252520%25252520The%25252520index%25252520of%25252520the%25252520closest%25252520centroid%25252520to%25252520each%25252520input%25252520point%2525250A%25252520%25252520%25252520%25252520%25252522%25252522%25252522%2525250A%25252520%25252520%25252520%25252520%25252523%25252520Compute%25252520the%25252520distance%25252520of%25252520each%25252520point%25252520to%25252520each%25252520centroid%2525252C%2525250A%25252520%25252520%25252520%25252520%25252523%25252520%25252520%25252520assign%25252520each%25252520point%25252520the%25252520label%25252520of%25252520the%25252520closest%25252520centroid%2525250A%25252520%25252520%25252520%25252520distances%25252520%2525253D%25252520self._calculate_distances%252528input_data%2525252C%25252520centroids%252529%2525250A%2525250A%25252520%25252520%25252520%25252520labels%25252520%2525253D%25252520np.argmin%252528distances%2525252C%25252520axis%2525253D0%252529%2525250A%25252520%25252520%25252520%25252520return%25252520labels%2525250A&amp;image=https%3A%2F%2Fcarbon.now.sh%2Fstatic%2Fbrand%2Fbanner.png&amp;key=a19fcc184b9711e1b4764040d3dc5c07&amp;type=text%2Fhtml&amp;scroll=auto&amp;schema=carbon" width="1024" height="480" frameborder="0" scrolling="no"><a href="https://medium.com/media/92f09baef4344e104234f7df37257c08/href">https://medium.com/media/92f09baef4344e104234f7df37257c08/href</a></iframe><p>And lastly, we’ll implement the centroid position update function:</p><iframe src="https://cdn.embedly.com/widgets/media.html?src=https%3A%2F%2Fcarbon.now.sh%2Fembed%3Fbg%3Drgba%2528255%252C255%252C255%252C0%2529%26t%3Dmaterial%26wt%3Dnone%26l%3Dpython%26width%3D680%26ds%3Dfalse%26dsyoff%3D20px%26dsblur%3D68px%26wc%3Dtrue%26wa%3Dtrue%26pv%3D11px%26ph%3D8px%26ln%3Dtrue%26fl%3D1%26fm%3DHack%26fs%3D10px%26lh%3D133%2525%26si%3Dfalse%26es%3D2x%26wm%3Dfalse%26code%3Ddef%252520_update_centroid_positions%2528%25250A%252520%252520%252520%252520self%25252C%25250A%252520%252520%252520%252520centroids%25253A%252520np.ndarray%25252C%25250A%252520%252520%252520%252520input_data%25253A%252520np.ndarray%25252C%25250A%252520%252520%252520%252520labels%25253A%252520np.ndarray%25252C%25250A%2529%252520-%25253E%252520np.ndarray%25253A%25250A%252520%252520%252520%252520%252522%252522%252522Update%252520the%252520location%252520of%252520each%252520centroid%252520to%252520its%252520center%252520of%252520mass.%25250A%25250A%252520%252520%252520%252520Parameters%25250A%252520%252520%252520%252520----------%25250A%252520%252520%252520%252520centroids%252520%25253A%252520np.ndarray%25250A%252520%252520%252520%252520%252520%252520%252520%252520The%252520centroids%252520from%252520the%252520previous%252520iteration.%25250A%252520%252520%252520%252520input_data%252520%25253A%252520np.ndarray%25250A%252520%252520%252520%252520%252520%252520%252520%252520The%252520input%252520data%252520to%252520the%252520algorithm.%25250A%252520%252520%252520%252520labels%252520%25253A%252520np.ndarray%25250A%252520%252520%252520%252520%252520%252520%252520%252520The%252520labels%252520showing%252520the%252520closet%252520centroid%252520to%252520each%252520piece%252520of%25250A%252520%252520%252520%252520%252520%252520%252520%252520input%252520data.%25250A%25250A%252520%252520%252520%252520Returns%25250A%252520%252520%252520%252520-------%25250A%252520%252520%252520%252520centroids%252520%25253A%252520np.ndarray%25250A%252520%252520%252520%252520%252520%252520%252520%252520The%252520updated%252520centroids.%25250A%252520%252520%252520%252520%252522%252522%252522%25250A%252520%252520%252520%252520for%252520i%252520in%252520range%2528%25250A%252520%252520%252520%252520%252520%252520%252520%252520centroids.shape%25255B0%25255D%25250A%252520%252520%252520%252520%2529%25253A%252520%252520%252523%252520iterate%252520over%252520array%252520to%252520prevent%252520div%2525200%252520errors%25250A%252520%252520%252520%252520%252520%252520%252520%252520if%252520not%252520np.any%2528labels%252520%25253D%25253D%252520i%2529%25253A%25250A%252520%252520%252520%252520%252520%252520%252520%252520%252520%252520%252520%252520continue%25250A%252520%252520%252520%252520%252520%252520%252520%252520centroids%25255Bi%25255D%252520%25253D%252520np.mean%2528input_data%25255Blabels%252520%25253D%25253D%252520i%25255D%25252C%252520axis%25253D0%2529%25250A%252520%252520%252520%252520return%252520centroids%25250A&amp;display_name=Carbon&amp;url=https%3A%2F%2Fcarbon.now.sh%2F%3Fbg%3Drgba%252528255%25252C255%25252C255%25252C0%252529%26t%3Dmaterial%26wt%3Dnone%26l%3Dpython%26width%3D680%26ds%3Dfalse%26dsyoff%3D20px%26dsblur%3D68px%26wc%3Dtrue%26wa%3Dtrue%26pv%3D11px%26ph%3D8px%26ln%3Dtrue%26fl%3D1%26fm%3DHack%26fs%3D10px%26lh%3D133%252525%26si%3Dfalse%26es%3D2x%26wm%3Dfalse%26code%3Ddef%25252520_update_centroid_positions%252528%2525250A%25252520%25252520%25252520%25252520self%2525252C%2525250A%25252520%25252520%25252520%25252520centroids%2525253A%25252520np.ndarray%2525252C%2525250A%25252520%25252520%25252520%25252520input_data%2525253A%25252520np.ndarray%2525252C%2525250A%25252520%25252520%25252520%25252520labels%2525253A%25252520np.ndarray%2525252C%2525250A%252529%25252520-%2525253E%25252520np.ndarray%2525253A%2525250A%25252520%25252520%25252520%25252520%25252522%25252522%25252522Update%25252520the%25252520location%25252520of%25252520each%25252520centroid%25252520to%25252520its%25252520center%25252520of%25252520mass.%2525250A%2525250A%25252520%25252520%25252520%25252520Parameters%2525250A%25252520%25252520%25252520%25252520----------%2525250A%25252520%25252520%25252520%25252520centroids%25252520%2525253A%25252520np.ndarray%2525250A%25252520%25252520%25252520%25252520%25252520%25252520%25252520%25252520The%25252520centroids%25252520from%25252520the%25252520previous%25252520iteration.%2525250A%25252520%25252520%25252520%25252520input_data%25252520%2525253A%25252520np.ndarray%2525250A%25252520%25252520%25252520%25252520%25252520%25252520%25252520%25252520The%25252520input%25252520data%25252520to%25252520the%25252520algorithm.%2525250A%25252520%25252520%25252520%25252520labels%25252520%2525253A%25252520np.ndarray%2525250A%25252520%25252520%25252520%25252520%25252520%25252520%25252520%25252520The%25252520labels%25252520showing%25252520the%25252520closet%25252520centroid%25252520to%25252520each%25252520piece%25252520of%2525250A%25252520%25252520%25252520%25252520%25252520%25252520%25252520%25252520input%25252520data.%2525250A%2525250A%25252520%25252520%25252520%25252520Returns%2525250A%25252520%25252520%25252520%25252520-------%2525250A%25252520%25252520%25252520%25252520centroids%25252520%2525253A%25252520np.ndarray%2525250A%25252520%25252520%25252520%25252520%25252520%25252520%25252520%25252520The%25252520updated%25252520centroids.%2525250A%25252520%25252520%25252520%25252520%25252522%25252522%25252522%2525250A%25252520%25252520%25252520%25252520for%25252520i%25252520in%25252520range%252528%2525250A%25252520%25252520%25252520%25252520%25252520%25252520%25252520%25252520centroids.shape%2525255B0%2525255D%2525250A%25252520%25252520%25252520%25252520%252529%2525253A%25252520%25252520%25252523%25252520iterate%25252520over%25252520array%25252520to%25252520prevent%25252520div%252525200%25252520errors%2525250A%25252520%25252520%25252520%25252520%25252520%25252520%25252520%25252520if%25252520not%25252520np.any%252528labels%25252520%2525253D%2525253D%25252520i%252529%2525253A%2525250A%25252520%25252520%25252520%25252520%25252520%25252520%25252520%25252520%25252520%25252520%25252520%25252520continue%2525250A%25252520%25252520%25252520%25252520%25252520%25252520%25252520%25252520centroids%2525255Bi%2525255D%25252520%2525253D%25252520np.mean%252528input_data%2525255Blabels%25252520%2525253D%2525253D%25252520i%2525255D%2525252C%25252520axis%2525253D0%252529%2525250A%25252520%25252520%25252520%25252520return%25252520centroids%2525250A&amp;image=https%3A%2F%2Fcarbon.now.sh%2Fstatic%2Fbrand%2Fbanner.png&amp;key=a19fcc184b9711e1b4764040d3dc5c07&amp;type=text%2Fhtml&amp;scroll=auto&amp;schema=carbon" width="1024" height="480" frameborder="0" scrolling="no"><a href="https://medium.com/media/46e36b1b7ecbea176c0dd2557eb94cfd/href">https://medium.com/media/46e36b1b7ecbea176c0dd2557eb94cfd/href</a></iframe><h4>2.3. K-Means Function Differentiation</h4><p>The third step of the k-means algorithm is to update the position of the centroids. We saw that these centroids are updated to the average position of all of the cluster’s labeled points.</p><p>Updating the centroid to the average cluster position might seem intuitive, but what is the mathematical rationale behind this step? The rationale lies in the differentiation of the k-means equation.</p><p>Let’s expose this intuition by exploring an animated proof of the k-means function differentiation. This proof demonstrates that the positional updates are a result of the k-means equation aiming to minimize the within-group <em>variance</em>.</p><iframe src="https://cdn.embedly.com/widgets/media.html?src=https%3A%2F%2Fwww.youtube.com%2Fembed%2FuNRw-MUCjm4%3Ffeature%3Doembed&amp;display_name=YouTube&amp;url=https%3A%2F%2Fwww.youtube.com%2Fwatch%3Fv%3DuNRw-MUCjm4&amp;image=https%3A%2F%2Fi.ytimg.com%2Fvi%2FuNRw-MUCjm4%2Fhqdefault.jpg&amp;key=a19fcc184b9711e1b4764040d3dc5c07&amp;type=text%2Fhtml&amp;schema=youtube" width="854" height="480" frameborder="0" scrolling="no"><a href="https://medium.com/media/baa4bbfc957bc0fc09ba068df761ade7/href">https://medium.com/media/baa4bbfc957bc0fc09ba068df761ade7/href</a></iframe><h4>2.4 Fitting it Together</h4><p>Now that we’ve constructed the backbone functions for our k-means model, let’s tie it together in a single fit function that will fit our model to the input data. We will also define the __init__ function here:</p><iframe src="https://cdn.embedly.com/widgets/media.html?src=https%3A%2F%2Fcarbon.now.sh%2Fembed%3Fbg%3Drgba%2528255%252C255%252C255%252C0%2529%26t%3Dmaterial%26wt%3Dnone%26l%3Dpython%26width%3D680%26ds%3Dfalse%26dsyoff%3D20px%26dsblur%3D68px%26wc%3Dtrue%26wa%3Dtrue%26pv%3D11px%26ph%3D8px%26ln%3Dtrue%26fl%3D1%26fm%3DHack%26fs%3D10px%26lh%3D133%2525%26si%3Dfalse%26es%3D2x%26wm%3Dfalse%26code%3Ddef%252520fit%2528self%25252C%252520input_data%25253A%252520np.ndarray%2529%25253A%25250A%252520%252520%252520%252520%252522%252522%252522Compute%252520k-means%252520clutsering%252520for%252520the%252520input%252520data.%25250A%252520%252520%252520%252520Parameters%25250A%252520%252520%252520%252520----------%25250A%252520%252520%252520%252520input_data%252520%25253A%252520np.ndarray%252520with%252520shape%252520%2528m_samples%25252C%252520n_features%2529%25250A%252520%252520%252520%252520%252520%252520%252520%252520The%252520input%252520data%252520to%252520the%252520algorithm.%25250A%252520%252520%252520%252520%252522%252522%252522%25250A%25250A%252520%252520%252520%252520%252523%252520Prepare%252520the%252520array%25250A%252520%252520%252520%252520if%252520not%252520isinstance%2528input_data%25252C%252520np.ndarray%2529%25253A%25250A%252520%252520%252520%252520%252520%252520%252520%252520input_data%252520%25253D%252520np.array%2528input_data%2529%25250A%25250A%252520%252520%252520%252520%252523%252520Find%252520the%252520dimensionality%252520of%252520the%252520dataset%25250A%252520%252520%252520%252520if%252520not%252520input_data.ndim%252520%25253D%25253D%2525202%25253A%25250A%252520%252520%252520%252520%252520%252520%252520%252520raise%252520TypeError%2528%252522The%252520input%252520array%252520should%252520only%252520have%252520two%252520dimensions.%252522%2529%25250A%25250A%252520%252520%252520%252520if%252520self.centroid_init%252520%25253D%25253D%252520%252522kmeans%25252B%25252B%252522%25253A%25250A%252520%252520%252520%252520%252520%252520%252520%252520centroids%252520%25253D%252520self._init_centroids_plusplus%2528input_data%25252C%252520self.n_clusters%2529%25250A%252520%252520%252520%252520else%25253A%25250A%252520%252520%252520%252520%252520%252520%252520%252520centroids%252520%25253D%252520self._init_centroids_random%2528input_data%25252C%252520self.n_clusters%2529%25250A%25250A%252520%252520%252520%252520%252523%252520Keep%252520track%252520of%252520centroid%252520history%252520for%252520visualization%252520purposes%25250A%252520%252520%252520%252520self.centroid_history%252520%25253D%252520%25255Bcentroids.copy%2528%2529%25255D%25250A%25250A%252520%252520%252520%252520%252523%252520Now%252520start%252520the%252520k_means%252520algorithm.%25250A%252520%252520%252520%252520%252523%252520%252520%252520continue%252520until%252520the%252520iterations%252520are%252520maxed%252520or%252520the%252520centroids%252520converge.%25250A%252520%252520%252520%252520iteration%252520%25253D%2525200%25250A%252520%252520%252520%252520old_centroids%252520%25253D%252520centroids.copy%2528%2529%252520%25252B%2525201%25250A%252520%252520%252520%252520while%252520%2528%25250A%252520%252520%252520%252520%252520%252520%252520%252520not%252520np.all%2528np.isclose%2528centroids%25252C%252520old_centroids%25252C%252520atol%25253D0.001%2529%2529%25250A%252520%252520%252520%252520%252520%252520%252520%252520and%252520iteration%252520%25253C%252520self.max_iterations%25250A%252520%252520%252520%252520%2529%25253A%25250A%252520%252520%252520%252520%252520%252520%252520%252520if%252520self.verbose%25253A%25250A%252520%252520%252520%252520%252520%252520%252520%252520%252520%252520%252520%252520print%2528f%252522Iteration%25253A%252520%25257Biteration%25257D%252522%2529%25250A%25250A%252520%252520%252520%252520%252520%252520%252520%252520%252523%252520Keep%252520track%252520of%252520the%252520old%252520centroids%25250A%252520%252520%252520%252520%252520%252520%252520%252520old_centroids%252520%25253D%252520centroids.copy%2528%2529%25250A%25250A%252520%252520%252520%252520%252520%252520%252520%252520%252523%252520Identify%252520the%252520labels%252520for%252520each%252520point%25250A%252520%252520%252520%252520%252520%252520%252520%252520data_labels%252520%25253D%252520self._compute_labels%2528input_data%25252C%252520centroids%2529%25250A%25250A%252520%252520%252520%252520%252520%252520%252520%252520%252523%252520Update%252520each%252520centroid%252520position%252520based%252520on%252520the%252520current%252520point%252520labels%25250A%252520%252520%252520%252520%252520%252520%252520%252520centroids%252520%25253D%252520self._update_centroid_locations%2528%25250A%252520%252520%252520%252520%252520%252520%252520%252520%252520%252520%252520%252520centroids%25252C%252520input_data%25252C%252520data_labels%25250A%252520%252520%252520%252520%252520%252520%252520%252520%2529%25250A%25250A%252520%252520%252520%252520%252520%252520%252520%252520%252523%252520Update%252520the%252520iteration%252520and%252520old%252520centroids%25250A%252520%252520%252520%252520%252520%252520%252520%252520iteration%252520%25252B%25253D%2525201%25250A%25250A%252520%252520%252520%252520%252520%252520%252520%252520self.centroid_history.append%2528centroids.copy%2528%2529%2529%25250A%25250A%252520%252520%252520%252520self.labels%252520%25253D%252520self._compute_labels%2528input_data%25252C%252520centroids%2529%25250A%252520%252520%252520%252520self.cluster_centers%252520%25253D%252520self.centroid_history%25255B-1%25255D%25250A%252520%252520%252520%252520return&amp;display_name=Carbon&amp;url=https%3A%2F%2Fcarbon.now.sh%2F%3Fbg%3Drgba%252528255%25252C255%25252C255%25252C0%252529%26t%3Dmaterial%26wt%3Dnone%26l%3Dpython%26width%3D680%26ds%3Dfalse%26dsyoff%3D20px%26dsblur%3D68px%26wc%3Dtrue%26wa%3Dtrue%26pv%3D11px%26ph%3D8px%26ln%3Dtrue%26fl%3D1%26fm%3DHack%26fs%3D10px%26lh%3D133%252525%26si%3Dfalse%26es%3D2x%26wm%3Dfalse%26code%3Ddef%25252520fit%252528self%2525252C%25252520input_data%2525253A%25252520np.ndarray%252529%2525253A%2525250A%25252520%25252520%25252520%25252520%25252522%25252522%25252522Compute%25252520k-means%25252520clutsering%25252520for%25252520the%25252520input%25252520data.%2525250A%25252520%25252520%25252520%25252520Parameters%2525250A%25252520%25252520%25252520%25252520----------%2525250A%25252520%25252520%25252520%25252520input_data%25252520%2525253A%25252520np.ndarray%25252520with%25252520shape%25252520%252528m_samples%2525252C%25252520n_features%252529%2525250A%25252520%25252520%25252520%25252520%25252520%25252520%25252520%25252520The%25252520input%25252520data%25252520to%25252520the%25252520algorithm.%2525250A%25252520%25252520%25252520%25252520%25252522%25252522%25252522%2525250A%2525250A%25252520%25252520%25252520%25252520%25252523%25252520Prepare%25252520the%25252520array%2525250A%25252520%25252520%25252520%25252520if%25252520not%25252520isinstance%252528input_data%2525252C%25252520np.ndarray%252529%2525253A%2525250A%25252520%25252520%25252520%25252520%25252520%25252520%25252520%25252520input_data%25252520%2525253D%25252520np.array%252528input_data%252529%2525250A%2525250A%25252520%25252520%25252520%25252520%25252523%25252520Find%25252520the%25252520dimensionality%25252520of%25252520the%25252520dataset%2525250A%25252520%25252520%25252520%25252520if%25252520not%25252520input_data.ndim%25252520%2525253D%2525253D%252525202%2525253A%2525250A%25252520%25252520%25252520%25252520%25252520%25252520%25252520%25252520raise%25252520TypeError%252528%25252522The%25252520input%25252520array%25252520should%25252520only%25252520have%25252520two%25252520dimensions.%25252522%252529%2525250A%2525250A%25252520%25252520%25252520%25252520if%25252520self.centroid_init%25252520%2525253D%2525253D%25252520%25252522kmeans%2525252B%2525252B%25252522%2525253A%2525250A%25252520%25252520%25252520%25252520%25252520%25252520%25252520%25252520centroids%25252520%2525253D%25252520self._init_centroids_plusplus%252528input_data%2525252C%25252520self.n_clusters%252529%2525250A%25252520%25252520%25252520%25252520else%2525253A%2525250A%25252520%25252520%25252520%25252520%25252520%25252520%25252520%25252520centroids%25252520%2525253D%25252520self._init_centroids_random%252528input_data%2525252C%25252520self.n_clusters%252529%2525250A%2525250A%25252520%25252520%25252520%25252520%25252523%25252520Keep%25252520track%25252520of%25252520centroid%25252520history%25252520for%25252520visualization%25252520purposes%2525250A%25252520%25252520%25252520%25252520self.centroid_history%25252520%2525253D%25252520%2525255Bcentroids.copy%252528%252529%2525255D%2525250A%2525250A%25252520%25252520%25252520%25252520%25252523%25252520Now%25252520start%25252520the%25252520k_means%25252520algorithm.%2525250A%25252520%25252520%25252520%25252520%25252523%25252520%25252520%25252520continue%25252520until%25252520the%25252520iterations%25252520are%25252520maxed%25252520or%25252520the%25252520centroids%25252520converge.%2525250A%25252520%25252520%25252520%25252520iteration%25252520%2525253D%252525200%2525250A%25252520%25252520%25252520%25252520old_centroids%25252520%2525253D%25252520centroids.copy%252528%252529%25252520%2525252B%252525201%2525250A%25252520%25252520%25252520%25252520while%25252520%252528%2525250A%25252520%25252520%25252520%25252520%25252520%25252520%25252520%25252520not%25252520np.all%252528np.isclose%252528centroids%2525252C%25252520old_centroids%2525252C%25252520atol%2525253D0.001%252529%252529%2525250A%25252520%25252520%25252520%25252520%25252520%25252520%25252520%25252520and%25252520iteration%25252520%2525253C%25252520self.max_iterations%2525250A%25252520%25252520%25252520%25252520%252529%2525253A%2525250A%25252520%25252520%25252520%25252520%25252520%25252520%25252520%25252520if%25252520self.verbose%2525253A%2525250A%25252520%25252520%25252520%25252520%25252520%25252520%25252520%25252520%25252520%25252520%25252520%25252520print%252528f%25252522Iteration%2525253A%25252520%2525257Biteration%2525257D%25252522%252529%2525250A%2525250A%25252520%25252520%25252520%25252520%25252520%25252520%25252520%25252520%25252523%25252520Keep%25252520track%25252520of%25252520the%25252520old%25252520centroids%2525250A%25252520%25252520%25252520%25252520%25252520%25252520%25252520%25252520old_centroids%25252520%2525253D%25252520centroids.copy%252528%252529%2525250A%2525250A%25252520%25252520%25252520%25252520%25252520%25252520%25252520%25252520%25252523%25252520Identify%25252520the%25252520labels%25252520for%25252520each%25252520point%2525250A%25252520%25252520%25252520%25252520%25252520%25252520%25252520%25252520data_labels%25252520%2525253D%25252520self._compute_labels%252528input_data%2525252C%25252520centroids%252529%2525250A%2525250A%25252520%25252520%25252520%25252520%25252520%25252520%25252520%25252520%25252523%25252520Update%25252520each%25252520centroid%25252520position%25252520based%25252520on%25252520the%25252520current%25252520point%25252520labels%2525250A%25252520%25252520%25252520%25252520%25252520%25252520%25252520%25252520centroids%25252520%2525253D%25252520self._update_centroid_locations%252528%2525250A%25252520%25252520%25252520%25252520%25252520%25252520%25252520%25252520%25252520%25252520%25252520%25252520centroids%2525252C%25252520input_data%2525252C%25252520data_labels%2525250A%25252520%25252520%25252520%25252520%25252520%25252520%25252520%25252520%252529%2525250A%2525250A%25252520%25252520%25252520%25252520%25252520%25252520%25252520%25252520%25252523%25252520Update%25252520the%25252520iteration%25252520and%25252520old%25252520centroids%2525250A%25252520%25252520%25252520%25252520%25252520%25252520%25252520%25252520iteration%25252520%2525252B%2525253D%252525201%2525250A%2525250A%25252520%25252520%25252520%25252520%25252520%25252520%25252520%25252520self.centroid_history.append%252528centroids.copy%252528%252529%252529%2525250A%2525250A%25252520%25252520%25252520%25252520self.labels%25252520%2525253D%25252520self._compute_labels%252528input_data%2525252C%25252520centroids%252529%2525250A%25252520%25252520%25252520%25252520self.cluster_centers%25252520%2525253D%25252520self.centroid_history%2525255B-1%2525255D%2525250A%25252520%25252520%25252520%25252520return&amp;image=https%3A%2F%2Fcarbon.now.sh%2Fstatic%2Fbrand%2Fbanner.png&amp;key=a19fcc184b9711e1b4764040d3dc5c07&amp;type=text%2Fhtml&amp;scroll=auto&amp;schema=carbon" width="1024" height="480" frameborder="0" scrolling="no"><a href="https://medium.com/media/27e4784240842d33c77061490127def4/href">https://medium.com/media/27e4784240842d33c77061490127def4/href</a></iframe><iframe src="https://cdn.embedly.com/widgets/media.html?src=https%3A%2F%2Fcarbon.now.sh%2Fembed%3Fbg%3Drgba%2528255%252C255%252C255%252C0%2529%26t%3Dmaterial%26wt%3Dnone%26l%3Dpython%26width%3D680%26ds%3Dfalse%26dsyoff%3D20px%26dsblur%3D68px%26wc%3Dtrue%26wa%3Dtrue%26pv%3D11px%26ph%3D8px%26ln%3Dtrue%26fl%3D1%26fm%3DHack%26fs%3D10px%26lh%3D133%2525%26si%3Dfalse%26es%3D2x%26wm%3Dfalse%26code%3Ddef%252520__init__%2528%25250A%252520%252520%252520%252520self%25252C%25250A%252520%252520%252520%252520n_clusters%25253A%252520int%252520%25253D%2525203%25252C%25250A%252520%252520%252520%252520centroid_init%25253A%252520str%252520%25253D%252520%252522kmeans%25252B%25252B%252522%25252C%25250A%252520%252520%252520%252520max_iterations%25253A%252520int%252520%25253D%2525201000%25252C%25250A%252520%252520%252520%252520verbose%25253A%252520bool%252520%25253D%252520False%25252C%25250A%2529%25253A%25250A%252520%252520%252520%252520%252522%252522%252522Construct%252520a%252520k-means%252520clustering%252520object.%25250A%252520%252520%252520%252520Can%252520be%252520fit%252520to%252520input%252520data%25252C%252520return%252520data%252520labels%25252C%252520and%252520return%252520centroid%25250A%252520%252520%252520%252520positions.%25250A%252520%252520%252520%252520Parameters%25250A%252520%252520%252520%252520----------%25250A%252520%252520%252520%252520n_clusters%252520%25253A%252520int%25252C%252520optional%25250A%252520%252520%252520%252520%252520%252520%252520%252520The%252520number%252520of%252520clusters%252520to%252520compute.%252520Default%252520is%2525203.%25250A%252520%252520%252520%252520centroid_init%252520%25253A%252520str%25252C%252520optional%25250A%252520%252520%252520%252520%252520%252520%252520%252520The%252520method%252520used%252520to%252520initialized%252520the%252520centroids.%252520Must%252520be%252520either%25250A%252520%252520%252520%252520%252520%252520%252520%252520%252522kmeans%25252B%25252B%252522%252520or%252520%252522random%252522.%252520Default%252520is%252520%252522kmeans%25252B%25252B%252522.%25250A%252520%252520%252520%252520max_iterations%252520%25253A%252520int%25252C%252520optional%25250A%252520%252520%252520%252520%252520%252520%252520%252520The%252520maximum%252520%252520number%252520of%252520iterations%252520that%252520the%252520algorithm%252520will%252520run%25250A%252520%252520%252520%252520%252520%252520%252520%252520prior%252520to%252520ending%252520the%252520convergence.%252520Default%252520is%2525201000.%25250A%252520%252520%252520%252520verbose%252520%25253A%252520bool%25252C%252520optional%25250A%252520%252520%252520%252520%252520%252520%252520%252520Whether%252520to%252520print%252520the%252520status%252520of%252520the%252520algorithm.%25250A%252520%252520%252520%252520%252520%252520%252520%252520Default%252520is%252520False.%25250A%252520%252520%252520%252520Attributes%25250A%252520%252520%252520%252520----------%25250A%252520%252520%252520%252520cluster_centers%252520%25253A%252520np.ndarray%252520of%252520shape%252520%2528m_clusters%25252C%252520n_feature%2529%25250A%252520%252520%252520%252520%252520%252520%252520%252520The%252520centers%252520to%252520the%252520fitted%252520clusters%25250A%252520%252520%252520%252520labels%252520%25253A%252520np.ndarray%252520of%252520shape%252520%2528n_samples%25252C%2529%25250A%252520%252520%252520%252520%252520%252520%252520%252520An%252520array%252520containing%2525200-index%252520labels%252520for%252520each%252520input%252520data%252520point.%25250A%252520%252520%252520%252520%252522%252522%252522%25250A%252520%252520%252520%252520%252523%252520First%252520initialize%252520random%252520location%252520for%252520the%252520clusters%25250A%252520%252520%252520%252520if%252520centroid_init.lower%2528%2529%252520not%252520in%252520%25255B%252522kmeans%25252B%25252B%252522%25252C%252520%252522random%252522%25255D%25253A%25250A%252520%252520%252520%252520%252520%252520%252520%252520raise%252520ValueError%2528%252522centroid_init%252520should%252520be%252520either%252520%2527kmeans%25252B%25252B%2527%252520or%252520%2527random%2527%252522%2529%25250A%252520%252520%252520%252520self.centroid_init%252520%25253D%252520centroid_init%25250A%25250A%252520%252520%252520%252520self.n_clusters%252520%25253D%252520n_clusters%25250A%252520%252520%252520%252520self.max_iterations%252520%25253D%252520max_iterations%25250A%252520%252520%252520%252520self.verbose%252520%25253D%252520verbose%25250A%252520%252520%252520%252520return&amp;display_name=Carbon&amp;url=https%3A%2F%2Fcarbon.now.sh%2F%3Fbg%3Drgba%252528255%25252C255%25252C255%25252C0%252529%26t%3Dmaterial%26wt%3Dnone%26l%3Dpython%26width%3D680%26ds%3Dfalse%26dsyoff%3D20px%26dsblur%3D68px%26wc%3Dtrue%26wa%3Dtrue%26pv%3D11px%26ph%3D8px%26ln%3Dtrue%26fl%3D1%26fm%3DHack%26fs%3D10px%26lh%3D133%252525%26si%3Dfalse%26es%3D2x%26wm%3Dfalse%26code%3Ddef%25252520__init__%252528%2525250A%25252520%25252520%25252520%25252520self%2525252C%2525250A%25252520%25252520%25252520%25252520n_clusters%2525253A%25252520int%25252520%2525253D%252525203%2525252C%2525250A%25252520%25252520%25252520%25252520centroid_init%2525253A%25252520str%25252520%2525253D%25252520%25252522kmeans%2525252B%2525252B%25252522%2525252C%2525250A%25252520%25252520%25252520%25252520max_iterations%2525253A%25252520int%25252520%2525253D%252525201000%2525252C%2525250A%25252520%25252520%25252520%25252520verbose%2525253A%25252520bool%25252520%2525253D%25252520False%2525252C%2525250A%252529%2525253A%2525250A%25252520%25252520%25252520%25252520%25252522%25252522%25252522Construct%25252520a%25252520k-means%25252520clustering%25252520object.%2525250A%25252520%25252520%25252520%25252520Can%25252520be%25252520fit%25252520to%25252520input%25252520data%2525252C%25252520return%25252520data%25252520labels%2525252C%25252520and%25252520return%25252520centroid%2525250A%25252520%25252520%25252520%25252520positions.%2525250A%25252520%25252520%25252520%25252520Parameters%2525250A%25252520%25252520%25252520%25252520----------%2525250A%25252520%25252520%25252520%25252520n_clusters%25252520%2525253A%25252520int%2525252C%25252520optional%2525250A%25252520%25252520%25252520%25252520%25252520%25252520%25252520%25252520The%25252520number%25252520of%25252520clusters%25252520to%25252520compute.%25252520Default%25252520is%252525203.%2525250A%25252520%25252520%25252520%25252520centroid_init%25252520%2525253A%25252520str%2525252C%25252520optional%2525250A%25252520%25252520%25252520%25252520%25252520%25252520%25252520%25252520The%25252520method%25252520used%25252520to%25252520initialized%25252520the%25252520centroids.%25252520Must%25252520be%25252520either%2525250A%25252520%25252520%25252520%25252520%25252520%25252520%25252520%25252520%25252522kmeans%2525252B%2525252B%25252522%25252520or%25252520%25252522random%25252522.%25252520Default%25252520is%25252520%25252522kmeans%2525252B%2525252B%25252522.%2525250A%25252520%25252520%25252520%25252520max_iterations%25252520%2525253A%25252520int%2525252C%25252520optional%2525250A%25252520%25252520%25252520%25252520%25252520%25252520%25252520%25252520The%25252520maximum%25252520%25252520number%25252520of%25252520iterations%25252520that%25252520the%25252520algorithm%25252520will%25252520run%2525250A%25252520%25252520%25252520%25252520%25252520%25252520%25252520%25252520prior%25252520to%25252520ending%25252520the%25252520convergence.%25252520Default%25252520is%252525201000.%2525250A%25252520%25252520%25252520%25252520verbose%25252520%2525253A%25252520bool%2525252C%25252520optional%2525250A%25252520%25252520%25252520%25252520%25252520%25252520%25252520%25252520Whether%25252520to%25252520print%25252520the%25252520status%25252520of%25252520the%25252520algorithm.%2525250A%25252520%25252520%25252520%25252520%25252520%25252520%25252520%25252520Default%25252520is%25252520False.%2525250A%25252520%25252520%25252520%25252520Attributes%2525250A%25252520%25252520%25252520%25252520----------%2525250A%25252520%25252520%25252520%25252520cluster_centers%25252520%2525253A%25252520np.ndarray%25252520of%25252520shape%25252520%252528m_clusters%2525252C%25252520n_feature%252529%2525250A%25252520%25252520%25252520%25252520%25252520%25252520%25252520%25252520The%25252520centers%25252520to%25252520the%25252520fitted%25252520clusters%2525250A%25252520%25252520%25252520%25252520labels%25252520%2525253A%25252520np.ndarray%25252520of%25252520shape%25252520%252528n_samples%2525252C%252529%2525250A%25252520%25252520%25252520%25252520%25252520%25252520%25252520%25252520An%25252520array%25252520containing%252525200-index%25252520labels%25252520for%25252520each%25252520input%25252520data%25252520point.%2525250A%25252520%25252520%25252520%25252520%25252522%25252522%25252522%2525250A%25252520%25252520%25252520%25252520%25252523%25252520First%25252520initialize%25252520random%25252520location%25252520for%25252520the%25252520clusters%2525250A%25252520%25252520%25252520%25252520if%25252520centroid_init.lower%252528%252529%25252520not%25252520in%25252520%2525255B%25252522kmeans%2525252B%2525252B%25252522%2525252C%25252520%25252522random%25252522%2525255D%2525253A%2525250A%25252520%25252520%25252520%25252520%25252520%25252520%25252520%25252520raise%25252520ValueError%252528%25252522centroid_init%25252520should%25252520be%25252520either%25252520%252527kmeans%2525252B%2525252B%252527%25252520or%25252520%252527random%252527%25252522%252529%2525250A%25252520%25252520%25252520%25252520self.centroid_init%25252520%2525253D%25252520centroid_init%2525250A%2525250A%25252520%25252520%25252520%25252520self.n_clusters%25252520%2525253D%25252520n_clusters%2525250A%25252520%25252520%25252520%25252520self.max_iterations%25252520%2525253D%25252520max_iterations%2525250A%25252520%25252520%25252520%25252520self.verbose%25252520%2525253D%25252520verbose%2525250A%25252520%25252520%25252520%25252520return&amp;image=https%3A%2F%2Fcarbon.now.sh%2Fstatic%2Fbrand%2Fbanner.png&amp;key=a19fcc184b9711e1b4764040d3dc5c07&amp;type=text%2Fhtml&amp;scroll=auto&amp;schema=carbon" width="1024" height="480" frameborder="0" scrolling="no"><a href="https://medium.com/media/c28d9bd5476b210f690fdad0f3f0328a/href">https://medium.com/media/c28d9bd5476b210f690fdad0f3f0328a/href</a></iframe><p>Now we can put the model to use with my walkthrough notebook <a href="https://github.com/JacobBumgarner/learning-repo/blob/main/k_means/k_means_walkthrough.ipynb">found here</a>. This notebook uses synthetically generated data (shown in the videos above) to demonstrate the functionality of our newly written <a href="https://github.com/JacobBumgarner/learning-repo/blob/main/k_means/k_means.py">k_means.py</a> code.</p><h3>3. K-Means for Video Keyframe Extraction: Bee Pose Estimation</h3><p>Wonderful — we’ve worked our way through the construction of a k-means model entirely from scratch. Rather than just tossing that code aside, let’s use it in an example scenario.</p><p>Over the past few years, there have been impressive advancements in the neuroscience &amp; DL research communities that have enabled highly accurate and automated animal behavioral tracking and analysis*. The frameworks used in this research domain implement a variety of convolutional neural network architectures. The models also lean heavily on transfer learning to reduce the amount of training data that researchers need to generate. Two popular examples of these frameworks include <a href="https://github.com/DeepLabCut/DeepLabCut">DeepLabCut</a> and <a href="https://github.com/talmolab/sleap">SLEAP</a>.</p><blockquote>* <em>Side note: this subdomain is commonly dubbed </em><strong>computational neuroethology</strong></blockquote><p>To train models for automated tracking of specific points on animals, researchers typically have to manually label 100–150 unique frames from their behavioral videos. All things considered, this is a pretty small number that enables automated tracking of <em>indefinitely</em> <em>long</em> behavioral videos!</p><p>However, an important aspect that researchers must consider when labeling these training frames is that they should be as <em>unique</em> as possible from one another. It would be extremely aimless to label the first 5 seconds of a single video if hours and hours of recordings exist. This is because the behavior and body states of the animals in the first 5 seconds will likely not accurately represent the features of the entire video dataset. As such, the model would not be trained to effectively recognize a variety of features.</p><p>So what does this have to do with k-means? Rather than having to manually identify unique keyframes from the videos, algorithms such as k-means can be implemented to automatically cluster the video frames into unique groups. Let’s visualize how this works:</p><iframe src="https://cdn.embedly.com/widgets/media.html?src=https%3A%2F%2Fwww.youtube.com%2Fembed%2FjeOwQiIjsdw%3Ffeature%3Doembed&amp;display_name=YouTube&amp;url=https%3A%2F%2Fwww.youtube.com%2Fwatch%3Fv%3DjeOwQiIjsdw&amp;image=https%3A%2F%2Fi.ytimg.com%2Fvi%2FjeOwQiIjsdw%2Fhqdefault.jpg&amp;key=a19fcc184b9711e1b4764040d3dc5c07&amp;type=text%2Fhtml&amp;schema=youtube" width="854" height="480" frameborder="0" scrolling="no"><a href="https://medium.com/media/da6990d02ad7b15f66f7369e6ccc6581/href">https://medium.com/media/da6990d02ad7b15f66f7369e6ccc6581/href</a></iframe><p>To get a hands-on understanding of this process, you can follow along with the code used to isolate these frames with <a href="https://github.com/JacobBumgarner/learning-repo/blob/main/k_means/kmeans_frame_selection_walkthrough.ipynb">my walkthrough notebook</a>.</p><h3>4. Implementing K-means with scikit-learn</h3><p>In the real world, one should generally avoid implementing self-constructed algorithms unless necessary. Instead, we should rely on carefully and efficiently designed frameworks that are maintained by expert paid and volunteer contributors.</p><p>In this instance, let’s see how easy it is to implement k-means with scikit-learn. The documentation for this class can be found <a href="http://4. Implementing K-means with scikit-learn">here</a>.</p><iframe src="" width="0" height="0" frameborder="0" scrolling="no"><a href="https://medium.com/media/c565e8e679c6c028aea4f6a6ab11f5e6/href">https://medium.com/media/c565e8e679c6c028aea4f6a6ab11f5e6/href</a></iframe><p>The scikit-learn implementation of the model initialization and the fitting is very similar to ours (not a coincidence!), but we got to skip writing ~250 lines of the <a href="https://github.com/JacobBumgarner/learning-repo/blob/main/k_means/k_means.py">k_means.py</a> code. Moreover, the scikit-learn framework implements <a href="https://github.com/scikit-learn/scikit-learn/blob/60f16feaadaca28f9a1cc68d2f406201860d27e8/sklearn/cluster/_k_means_lloyd.pyx#L186-L190">optimized BLAS routines</a> for k-means that make their implementation <em>much</em> faster than ours.</p><p>Long story short — learning from scratch is invaluable, but working from scratch isn’t.</p><h3>5. Summary</h3><p>In this post, we explore the fundamentals of the math and intuition behind the k-means algorithm. We built a k-means model from scratch using NumPy, used it to extract unique keyframes from an animal behavior video, and learned how to implement k-means with scikit-learn.</p><p>I hope this article was valuable for you! Feel free to reach out to me with any comments, ideas, or questions.</p><h3>6. Resources</h3><pre><strong>References:<br></strong><a href="https://journals.plos.org/ploscompbiol/article?id=10.1371/journal.pcbi.1008625">1. Hicks SC, Liu R, Ni Y, Purdom E, Risso D (2021). mbkmeans: Fast clustering for single cell data using mini-batch <em>k</em>-means. PLoS Comput Biol 17(1): e1008625.</a><br><a href="https://www.semanticscholar.org/paper/Spam-Filtering-using-K-mean-Clustering-with-Local-Sharma-Rastogi/901af90a3bf03f34064f22e3c5e39bbe6a5cf661?p2df">2. Sharma A, Rastogi V (2014). Spam Filtering using K mean Clustering with Local Feature Selection Classifier. Int J Comput ApplMB means<em> </em>108: 35-39.</a><br><a href="https://www.kaggle.com/code/muhammadshahzadkhan/bank-customer-segmentation-pca-kmeans">3. Muhammad Shahzad, Bank Customer Segmentation (PCA-KMeans)</a><br><a href="http://ilpubs.stanford.edu:8090/778/1/2006-13.pdf">4. Arthur D, Vassilvitskii S (2006). k-means++: The Advantages of Careful Seeding. <em>Proceedings of the eighteenth annual ACM-SIAM symposium on Discrete algorithms</em>. Society for Industrial and Applied Mathematics Philadelphia, PA, USA. pp. 1027–1035</a></pre><pre><strong>Educational Resources:</strong><br>- <a href="https://developers.google.com/machine-learning/clustering">Google Machine Learning: Clustering</a><br>- <a href="http://cs229.stanford.edu/notes2020spring/cs229-notes7a.pdf">Andrew Ng, CS229 Lecture Notes, K-Means</a><br>- <a href="https://stanford.edu/~cpiech/cs221/handouts/kmeans.html">Chris Piech, K-Means</a></pre><img src="https://medium.com/_/stat?event=post.clientViewed&referrerSource=full_rss&postId=e0ef0168688d" width="1" height="1" alt=""><hr><p><a href="https://medium.com/data-science/breaking-it-down-k-means-clustering-e0ef0168688d">Breaking it Down: K-Means Clustering</a> was originally published in <a href="https://medium.com/data-science">TDS Archive</a> on Medium, where people are continuing the conversation by highlighting and responding to this story.</p>]]></content:encoded>
        </item>
        <item>
            <title><![CDATA[Breaking it Down: Logistic Regression]]></title>
            <link>https://medium.com/data-science/breaking-it-down-logistic-regression-e5c3f1450bd?source=rss-e1f3762eb90c------2</link>
            <guid isPermaLink="false">https://medium.com/p/e5c3f1450bd</guid>
            <category><![CDATA[machine-learning]]></category>
            <category><![CDATA[deep-dives]]></category>
            <category><![CDATA[numpy]]></category>
            <category><![CDATA[tensorflow]]></category>
            <category><![CDATA[logistic-regression]]></category>
            <dc:creator><![CDATA[Jacob Bumgarner, Ph.D.]]></dc:creator>
            <pubDate>Fri, 19 Aug 2022 08:54:07 GMT</pubDate>
            <atom:updated>2025-04-12T14:47:24.017Z</atom:updated>
            <content:encoded><![CDATA[<h4>Exploring the fundamentals of logistic regression with NumPy, TensorFlow, and the UCI Heart Disease Dataset</h4><figure><img alt="" src="https://cdn-images-1.medium.com/max/1024/1*z7EZfRPFHzY0NJN7YfdEYw.png" /><figcaption>Logistic Regression Overview. Image by Author.</figcaption></figure><pre><strong>Outline:<br></strong>1. <a href="#6a7a">What is Logistic Regression?<br></a>2. <a href="#c9d3">Breaking Down Logistic Regression</a><br>   1. <a href="#9cd1">Linear Transformation</a><br>   2. <a href="#bd84">Sigmoid Activation</a><br>   3. <a href="#43d1">Cross-Entropy Loss Function</a><br>   4. <a href="#cee3">Gradient Descent</a><br>   5. <a href="#3930">Fitting the Model</a><br>3. <a href="#6f98">Learning by Example with the UCI Heart Disease Dataset</a><br>4. <a href="#84da">Training and Testing Our Classifier<br></a>5. <a href="#e0fa">Implementing Logistic Regression with TensorFlow</a><br>6. Summary<br>7. <a href="#df98">Notes and Resources</a></pre><h3>1. What is Logistic Regression?</h3><p><a href="https://en.wikipedia.org/wiki/Logistic_regression">Logistic regression</a> is a supervised machine learning algorithm that creates classification labels for sets of input data (<a href="https://web.stanford.edu/~jurafsky/slp3/5.pdf">1</a>, <a href="https://see.stanford.edu/materials/aimlcs229/cs229-notes1.pdf">2</a>). Logistic regression (logit) models are used in a variety of contexts, including <a href="https://www.ncbi.nlm.nih.gov/pmc/articles/PMC7785709/">healthcare</a>, <a href="https://www.researchgate.net/publication/263933254_Association_between_light_exposure_at_night_and_insomnia_in_the_general_elderly_population_The_HEIJO-KYO_cohort">research</a>, and <a href="http://www.computerscijournal.org/vol10no1/churn-analysis-in-telecommunication-using-logistic-regression/">business</a> analytics.</p><p>Understanding the logic behind logistic regression can provide strong foundational insight into the basics of deep learning.</p><p>In this article, we’ll break down logistic regression to gain a fundamental understanding of the concept. To do this, we will:</p><ol><li>Explore the fundamental components of logistic regression and build a model from scratch with NumPy</li><li>Train our model on the UCI Heart Disease Dataset to predict whether adults have heart disease based on their input health data</li><li>Build a ‘formal’ logit model with TensorFlow</li></ol><p>You can follow the code in this post with my walkthrough <a href="https://github.com/JacobBumgarner/learning-repo/blob/main/logistic_regression/logistic_regression_walkthrough.ipynb">Jupyter Notebook</a> and <a href="https://github.com/JacobBumgarner/learning-repo/blob/main/logistic_regression/logistic_regression.py">Python</a> script files in my <a href="https://github.com/JacobBumgarner/learning-repo">GitHub</a> learning-repo.</p><h3>2. Breaking Down Logistic Regression</h3><p>Logistic regression models create probabilistic labels for sets of input data. These labels are often binary (yes/no).</p><p>Let’s work through an example to highlight the major aspects of logistic regression, and then we’ll start our deep dive:</p><blockquote>Imagine that we have a logit model that’s been trained to predict if someone has diabetes. The input data to the model are a person’s <strong>age</strong>, <strong>height</strong>, <strong>weight</strong>, and <strong>blood glucose</strong>. To make its prediction, the model will transform these input data using the <strong>logistic</strong> <strong>function</strong>. The output of this function will be a probabilistic label between <strong>0</strong> and <strong>1. </strong>The closer this label is to <strong>1</strong>, the greater the model’s confidence that the person <strong>has diabetes</strong>, and vice versa.</blockquote><blockquote>Importantly: to create classification labels, our diabetes logit model first had to <strong>learn</strong> how to weigh the importance of each piece of input data. It’s probable that someone’s <strong>blood glucose </strong>should be weighted higher than their <strong>height</strong> for predicting diabetes. This learning occurred using a set of labeled test data and gradient descent. The learned information is stored in the model in the form of <strong>Weights<em> </em></strong>and <strong>bias<em> </em></strong>parameter values used in the logistic function.</blockquote><p>This example provided a satellite-view outline of what logistic regression models do and how they work. We’re now ready for our deep dive.</p><p><strong>To start our deep dive, let’s break down the core component of logistic regression: the logistic function.</strong></p><iframe src="https://cdn.embedly.com/widgets/media.html?src=https%3A%2F%2Fwww.youtube.com%2Fembed%2FOyCYrKYM96s%3Ffeature%3Doembed&amp;display_name=YouTube&amp;url=https%3A%2F%2Fwww.youtube.com%2Fwatch%3Fv%3DOyCYrKYM96s&amp;image=https%3A%2F%2Fi.ytimg.com%2Fvi%2FOyCYrKYM96s%2Fhqdefault.jpg&amp;type=text%2Fhtml&amp;schema=youtube" width="854" height="480" frameborder="0" scrolling="no"><a href="https://medium.com/media/4365eb8af5099983c338aef43187d6fc/href">https://medium.com/media/4365eb8af5099983c338aef43187d6fc/href</a></iframe><p>Rather than just learning from reading alone, we’ll build our own logit model from scratch with NumPy. This will be the model’s outline:</p><iframe src="" width="0" height="0" frameborder="0" scrolling="no"><a href="https://medium.com/media/02e1d2b4b2005e20c3bf027cef4f3f37/href">https://medium.com/media/02e1d2b4b2005e20c3bf027cef4f3f37/href</a></iframe><p>In sections <strong>2.1</strong> and <strong>2.2</strong>, we’ll implement the linear and sigmoid transformation functions.</p><p>In <strong>2.3 </strong>we’ll define the <strong>cross-entropy cost function </strong>to tell the model when its predictions are ‘good’ and ‘bad’. In section <strong>2.4</strong> we’ll help the model learn its parameters via gradient descent.</p><p>Finally, in section <strong>2.5</strong>, we’ll tie all of these functions together.</p><h4>2.1 Linear Transformation</h4><p>As we saw above, the logistic function first applies a <strong>linear transformation</strong> to the input data using its learned parameters: the <strong>Weights</strong> and <strong>bias</strong>.</p><p>The <strong>Weights</strong> (<strong><em>W</em></strong>) parameters indicate how important each piece of input data is to the classification. The closer an individual weight is to <strong>0</strong>, the less important the corresponding piece of data is to the classification. The dot product of the <strong>Weights </strong>vector and input data <strong>X</strong> flattens the data into a single scalar that we can place onto a number line.</p><blockquote>For example, if we’re trying to predict whether someone is tired based on their height and the hours they’ve spent awake, the weight for that person’s height would be very close to zero.</blockquote><p>The <strong>bias</strong> (<strong><em>b</em></strong>) parameter is used to shift this scalar along the decision boundary of this line (<strong>0</strong>).</p><p><strong>Let’s visualize how the linear component of the logistic function uses its learned weights and bias to transform input data from the UCI Heart Disease Dataset.</strong></p><iframe src="https://cdn.embedly.com/widgets/media.html?src=https%3A%2F%2Fwww.youtube.com%2Fembed%2FRb4mooLXnx4%3Ffeature%3Doembed&amp;display_name=YouTube&amp;url=https%3A%2F%2Fwww.youtube.com%2Fwatch%3Fv%3DRb4mooLXnx4&amp;image=https%3A%2F%2Fi.ytimg.com%2Fvi%2FRb4mooLXnx4%2Fhqdefault.jpg&amp;type=text%2Fhtml&amp;schema=youtube" width="854" height="480" frameborder="0" scrolling="no"><a href="https://medium.com/media/ca23c8527f2cc6451f6b9ce55944d4dc/href">https://medium.com/media/ca23c8527f2cc6451f6b9ce55944d4dc/href</a></iframe><p>We’re now ready to start populating our model’s functions. To start, we need to initialize our model with its <strong>Weights</strong> and <strong>bias</strong> parameters. The <strong>Weights</strong> parameter will be an (n, 1) shaped array, where n is equal to the number of features in the input data. The <strong>bias</strong> parameter is a scalar. Both parameters will be initialized to <strong><em>0.</em></strong></p><iframe src="" width="0" height="0" frameborder="0" scrolling="no"><a href="https://medium.com/media/6e421c2bc93f6cf513a52d7940400e53/href">https://medium.com/media/6e421c2bc93f6cf513a52d7940400e53/href</a></iframe><p>Next, we can populate the function to compute the linear portion of the logistic function.</p><iframe src="" width="0" height="0" frameborder="0" scrolling="no"><a href="https://medium.com/media/debec00e634955f0ae75d2466abbd3f4/href">https://medium.com/media/debec00e634955f0ae75d2466abbd3f4/href</a></iframe><h4>2.2 Sigmoid Activation</h4><p>Logistic models create probabilistic labels (<strong><em>ŷ</em></strong>) by applying the <strong>sigmoid function</strong> to the output data from the logistic function’s linear transformation. The sigmoid function is useful to create probabilities from input data because it squishes input data to produce values between <strong><em>0</em></strong> and <strong><em>1</em></strong>.</p><blockquote>The sigmoid function is the inverse of the logit function, hence the name, logistic regression.</blockquote><p>To create binary labels from the output of the sigmoid function, we define our decision boundary to be <strong><em>0.5</em></strong>. This means that if <strong><em>ŷ ≥ 0.5</em></strong>, we say the label is <em>positive</em>, and when <strong><em>ŷ &lt; 0.5</em></strong>, we say the label is <em>negative</em>.</p><p><strong>Let’s visualize how the sigmoid function transforms the input data from the linear component of the logistic function.</strong></p><iframe src="https://cdn.embedly.com/widgets/media.html?src=https%3A%2F%2Fwww.youtube.com%2Fembed%2FUQVN75vDdKs%3Ffeature%3Doembed&amp;display_name=YouTube&amp;url=https%3A%2F%2Fwww.youtube.com%2Fwatch%3Fv%3DUQVN75vDdKs&amp;image=https%3A%2F%2Fi.ytimg.com%2Fvi%2FUQVN75vDdKs%2Fhqdefault.jpg&amp;type=text%2Fhtml&amp;schema=youtube" width="854" height="480" frameborder="0" scrolling="no"><a href="https://medium.com/media/47ad3516133187481f9bda682b79e915/href">https://medium.com/media/47ad3516133187481f9bda682b79e915/href</a></iframe><p>Now, let’s implement this function into our model.</p><iframe src="" width="0" height="0" frameborder="0" scrolling="no"><a href="https://medium.com/media/11021212882c0c33a2cf5cf039bd788e/href">https://medium.com/media/11021212882c0c33a2cf5cf039bd788e/href</a></iframe><h4>2.3 Cross-Entropy Cost Function</h4><p>To teach our model how to optimize its <strong><em>Weights</em></strong> and <strong><em>bias </em></strong>parameters, we will feed in training data. However, for the model to <em>learn</em> optimal parameters, it must know how to tell if its parameters did a ‘good’ or ‘bad’ job at producing probabilistic labels.</p><p>This ‘goodness’ factor, or the difference between the probability label and the ground-truth label, is called the <em>loss</em> for individual samples. We operationally say that losses should be <em>high</em> if the parameters did a bad job at predicting the label and <em>low</em> if they did a good job.</p><p>The losses across the training data are then averaged to create a <em>cost</em>.</p><p>The function that has been adopted for logistic regression is the <strong>Cross-Entropy Cost Function</strong>. In the function below, <strong><em>Y</em></strong> is the ground-truth label, and <strong><em>A</em></strong> is our probabilistic label.</p><figure><img alt="" src="https://cdn-images-1.medium.com/max/637/1*R7QU_qx5Lir8Baukay1sew@2x.png" /><figcaption>Cross-Entropy Cost Function</figcaption></figure><p>Notice that the function changes based on whether <strong><em>y</em></strong> is <strong><em>1</em></strong> or <strong><em>0</em></strong>.</p><ul><li>When <strong><em>y = 1</em></strong>, the function computes the <strong><em>log</em></strong> of the label. If the prediction is correct, the <em>loss</em> will be <strong><em>0</em></strong> (i.e., <strong><em>log(1) = 0</em></strong>). If it’s incorrect, the loss will get larger and larger as the prediction approaches <strong><em>0</em></strong>.</li><li>When <strong><em>y = 0</em></strong>, the function subtracts <strong><em>1</em></strong> from <strong><em>y</em></strong> and then computes the <strong><em>log</em></strong> of the label. This subtraction keeps the loss <em>low</em> for correct predictions and <em>high</em> for incorrect predictions.</li></ul><figure><img alt="" src="https://cdn-images-1.medium.com/max/460/1*MS74vmgcqyNm1WE2VaLPmw@2x.png" /><figcaption>Cross-Entropy Cases for 1 and 0 Ground-Truth Labels</figcaption></figure><p>Let’s now populate our function to compute the cross-entropy cost for an input data array.</p><iframe src="" width="0" height="0" frameborder="0" scrolling="no"><a href="https://medium.com/media/554742ecd29657ed1972d97a68d71b66/href">https://medium.com/media/554742ecd29657ed1972d97a68d71b66/href</a></iframe><h4>2.4 Gradient Descent</h4><p>Now that we can compute the cost of the model, we must use the cost to ‘tune’ the model’s parameters via gradient descent. If you need a refresher on gradient descent, check out my <a href="https://pub.towardsai.net/breaking-it-down-gradient-descent-b94c124f1dfd"><em>Breaking it Down: Gradient Descent</em></a> post.</p><p>Let’s create a fake scenario: imagine that we are training a model to predict if an adult is tired. Our fake model only gets two input features: height and hours spent awake. To accurately predict if an adult is tired, the model should probably develop a very small weight for the height feature, and a much larger weight for the hours spent awake feature.</p><p>Gradient descent will step these parameters <em>down</em> their gradient such that their new values will produce smaller costs. Remember, gradient descent minimizes the output of a function. We can visualize our imaginary example below.</p><figure><img alt="" src="https://cdn-images-1.medium.com/max/600/1*XADsxfdOrrIWT8_Ql-8gWQ.gif" /><figcaption>Example Gradient Descent</figcaption></figure><p>To compute the gradient of the cost function w.r.t. the <strong><em>Weights</em></strong> and the <strong><em>bias</em></strong>, we’ll have to implement the <a href="https://www.khanacademy.org/math/ap-calculus-ab/ab-differentiation-2-new/ab-3-1a/a/chain-rule-review">chain rule</a>. To find the gradients of our parameters, we’ll differentiate the cost function and the sigmoid function to find their product. We’ll then differentiate the linear function w.r.t the <strong><em>Weights</em></strong> and <strong><em>bias</em></strong> function separately.</p><p><strong>Let’s explore a visual proof of partial differentiation for logistic regression:</strong></p><iframe src="https://cdn.embedly.com/widgets/media.html?src=https%3A%2F%2Fwww.youtube.com%2Fembed%2FUMlMXtTrJe4%3Ffeature%3Doembed&amp;display_name=YouTube&amp;url=https%3A%2F%2Fwww.youtube.com%2Fwatch%3Fv%3DUMlMXtTrJe4&amp;image=https%3A%2F%2Fi.ytimg.com%2Fvi%2FUMlMXtTrJe4%2Fhqdefault.jpg&amp;type=text%2Fhtml&amp;schema=youtube" width="854" height="480" frameborder="0" scrolling="no"><a href="https://medium.com/media/bcfd636b5010a13249fb74374b32aea9/href">https://medium.com/media/bcfd636b5010a13249fb74374b32aea9/href</a></iframe><p>Let’s implement these simplified equations to compute the average gradients for each parameter across the training examples.</p><iframe src="" width="0" height="0" frameborder="0" scrolling="no"><a href="https://medium.com/media/f1276e7c7dc99893dd4056b3a0636440/href">https://medium.com/media/f1276e7c7dc99893dd4056b3a0636440/href</a></iframe><h3>2.5 Fitting the Model</h3><p>Finally, we’ve constructed all of the necessary components for our model, so now we need to integrate them. We’ll create a function that is compatible with both <em>batch</em> and <em>mini-batch</em> gradient descent.</p><ul><li>In <em>batch gradient descent</em>, every training sample is used to update the model’s parameters.</li><li>In <em>mini-batch gradient descent</em>, a random portion of the training samples is selected to update the parameters. Mini-batch selection isn’t that important here, but it’s extremely useful when training data are too large to fit into the GPU/RAM.</li></ul><p>As a reminder, fitting the model is a three-step iterative process:</p><ol><li>Apply linear transformation to input data with the <strong><em>Weights</em></strong> and <strong><em>Bias</em></strong></li><li>Apply non-linear sigmoid transformation to acquire a probabilistic label.</li><li>Compute the gradients of the cost function w.r.t <strong><em>W</em></strong> and <strong><em>b</em></strong> and step these parameters down their gradients.</li></ol><p>Let’s build the function!</p><iframe src="" width="0" height="0" frameborder="0" scrolling="no"><a href="https://medium.com/media/a79540be0ee75dd2904b945cb00a2b55/href">https://medium.com/media/a79540be0ee75dd2904b945cb00a2b55/href</a></iframe><h3>3. Learning by Example with the UCI Heart Disease Dataset</h3><p>To make sure we’re not just creating a model in isolation, let’s train the model with an example human dataset. In the context of clinical health, the model we’ll train could improve physician awareness of patient health risks.</p><p>Let’s learn by example with the <a href="https://archive.ics.uci.edu/ml/datasets/heart+disease">UCI Heart Disease Dataset</a>.</p><p>The dataset contains <strong>13</strong> features about the cardiac and physical health of adult patients. Each sample is also labeled to indicate whether the subject <em>does</em> or <em>does not</em> have heart disease.</p><p>To start, we’ll load the dataset, inspect it for missing data, and examine our feature columns. Importantly, the labels are reversed in this dataset (i.e., 1=no disease, 0=disease) so we’ll have to fix that.</p><iframe src="" width="0" height="0" frameborder="0" scrolling="no"><a href="https://medium.com/media/4c79bef9ebd6bc87cef41da49f2d5765/href">https://medium.com/media/4c79bef9ebd6bc87cef41da49f2d5765/href</a></iframe><pre>Number of subjects: 303<br>Percentage of subjects diagnosed with heart disease:  45.54%<br>Number of NaN values in the dataset: 0</pre><p>Let’s also visualize the features. I’ve created custom figures, but see my <a href="https://gist.github.com/JacobBumgarner/48cdb6c374d14dac83c5a933baac267f">gist here</a> to create your own with Seaborn.</p><figure><img alt="" src="https://cdn-images-1.medium.com/max/1024/1*5RKs6Q3HUQ5GRyyAaWJXhg.png" /></figure><p>From our inspection, we can conclude that there are no obvious missing features. We can also see that there are some stark group separations in several of the features, including age (age), exercise-induced angina (exang), chest pain (cp), and ECG shapes during exercise (oldpeak &amp; slope). These data will be good to train a logit model!</p><p>To conclude this section, we’ll finish preparing the dataset. First, we’ll do a 75/25 split on the data to create <a href="https://towardsdatascience.com/train-test-split-and-cross-validation-in-python-80b61beca4b6">test and train sets</a>. Then we’ll standardize* the continuous features listed below.</p><p>to_standardize = [&quot;age&quot;, &quot;trestbps&quot;, &quot;chol&quot;, &quot;thalach&quot;, &quot;oldpeak&quot;]</p><iframe src="" width="0" height="0" frameborder="0" scrolling="no"><a href="https://medium.com/media/19fb8d5a7bf729bf027a034c7bac46d7/href">https://medium.com/media/19fb8d5a7bf729bf027a034c7bac46d7/href</a></iframe><blockquote>*You don’t have to standardize data for logit models unless you’re running some form of regularization. I do it here just as a best practice.</blockquote><h4>4. Training and Testing Our Classifier</h4><p>Now that we’ve built the model and prepared our dataset, let’s train our model to predict health labels.</p><p>We’ll instantiate the model, train it with our x_train and y_train data, and we&#39;ll test it with the x_test and y_test data.</p><iframe src="" width="0" height="0" frameborder="0" scrolling="no"><a href="https://medium.com/media/229cbafe2e882a6e9719ff68aaac256b/href">https://medium.com/media/229cbafe2e882a6e9719ff68aaac256b/href</a></iframe><pre>Final model cost: 0.36<br>Model test prediction accuracy: 86.84%</pre><p>And there we have it: a test set accuracy of <strong>86.8%</strong>. This is much better than a 50% random chance, and for such a simple model, the accuracy is quite high.</p><p>To inspect things a bit more closely, let’s visualize the model’s features during its training. On the top row, we can see the model’s cost and accuracy during its training. Then on the bottom row, we can see how the <strong><em>Weights</em></strong> and <strong><em>bias</em></strong> parameters change during training (my favorite part!).</p><iframe src="https://cdn.embedly.com/widgets/media.html?src=https%3A%2F%2Fwww.youtube.com%2Fembed%2FlnhE25hB87I%3Ffeature%3Doembed&amp;display_name=YouTube&amp;url=https%3A%2F%2Fwww.youtube.com%2Fwatch%3Fv%3DlnhE25hB87I&amp;image=https%3A%2F%2Fi.ytimg.com%2Fvi%2FlnhE25hB87I%2Fhqdefault.jpg&amp;type=text%2Fhtml&amp;schema=youtube" width="640" height="480" frameborder="0" scrolling="no"><a href="https://medium.com/media/73490b39b3589ae3cff01b700a04f303/href">https://medium.com/media/73490b39b3589ae3cff01b700a04f303/href</a></iframe><h4>5. Implementing Logistic Regression with TensorFlow</h4><p>In the real world, it’s not best practice to build your own model when you need to use one. Instead, we can rely on powerful and well-designed open-source packages like TensorFlow, PyTorch, or scikit-learn for our ML/DL needs.</p><p>Below, let’s see how simple it is to build a logit model with TensorFlow and compare its training/test results to our own. We’ll prepare the data, create a single-layer and single-unit model with a sigmoid activation, and we’ll compile it with a binary cross-entropy loss function. Lastly, we’ll fit and evaluate the model.</p><iframe src="" width="0" height="0" frameborder="0" scrolling="no"><a href="https://medium.com/media/4f5fb50fd570deb89543f84b55ab19f7/href">https://medium.com/media/4f5fb50fd570deb89543f84b55ab19f7/href</a></iframe><pre>Epoch 5000/5000<br>1/1 [==============================] - 0s 3ms/step - loss: 0.3464 - accuracy: 0.8634</pre><pre>Test Set Accuracy:<br>1/1 [==============================] - 0s 191ms/step - loss: 0.3788 - accuracy: 0.8553<br>[0.3788422644138336, 0.8552631735801697]</pre><p>From this, we can see that the model’s final training cost was 0.34 (compared to our 0.36), and the test set accuracy was <strong>85.5%</strong>, very similar to our result above. There are a few minor differences under the hood, but the model performances are very similar.</p><p>Importantly, the TensorFlow model was built, trained, and tested in less than 25 lines of code, as opposed to our 200+ lines of code in thelogit_model.py script.</p><h3>6. Summary</h3><p>In this post, we’ve explored all of the individual aspects of the logistic regression. We started the post by building a model from scratch with NumPy. We first implemented the linear and sigmoid transformations, implemented the binary-cross entropy loss function, and created a fitting function to train our model with input data.</p><p>To understand the purpose of logistic regression, we then training our NumPy model on the UCI Heart Disease Dataset to predict heart disease in patients. We found saw the simple model had an 86% prediction accuracy — pretty impressive.</p><p>Finally, after taking the time to learn and understand these fundamentals, we then saw how easy it was to build a logit model with TensorFlow.</p><p>In sum, logistic regression is both a useful algorithm for predictive analysis. Understanding this model is a powerful first step in the road of studying deep learning.</p><p>Well, that’s a wrap! If you’ve made it this far, thanks for reading. I hope that this post was useful for you to gain some valuable insight into the fundamentals of logistic regression.</p><h3>7. Notes and Resources</h3><p>Below are a few questions that I had when initially learning about logistic regression. Maybe they’ll be interesting to you too!</p><blockquote><strong><em>Q1:</em></strong><em> Isn’t a logistic regression model basically just a single unit of a neural network?</em></blockquote><blockquote><strong><em>A1:</em></strong><em> Effectively, yes. We can think of logistic regression models as single-layer, single-unit neural networks. </em><a href="https://sebastianraschka.com/faq/docs/logisticregr-neuralnet.html"><em>Sebastian Raschka</em></a><em> provides some nice insight into why this is so. Many neural networks use sigmoid activation functions to generate unit outputs, just as logistic regression does.</em></blockquote><blockquote><strong><em>Q2:</em></strong><em> What do we mean by </em>logistic<em>?</em></blockquote><blockquote><strong><em>A2:</em></strong><em> The ‘logistic’ of logistic regression comes from the fact that the model uses the inverse of the </em><a href="https://towardsdatascience.com/understanding-logistic-regression-9b02c2aec102">logit</a><em> function, aka the sigmoid function.</em></blockquote><pre>Resources<br>- <a href="https://www.google.com/search?client=safari&amp;rls=en&amp;q=uci+heart+disease+dataset&amp;ie=UTF-8&amp;oe=UTF-8">UCI Heart Disease Dataset</a><br>- <a href="https://web.stanford.edu/~jurafsky/slp3/5.pdf">Speech and Language Processing. Daniel Jurafsky &amp; James H. Martin.<br></a>- <a href="https://see.stanford.edu/materials/aimlcs229/cs229-notes1.pdf">CS229 Lecture notes, Andrew Ng</a><br>- <a href="https://github.com/3b1b/manim">Manim, 3Blue1Brown</a></pre><p><strong><em>All images unless otherwise noted are by the author.</em></strong></p><img src="https://medium.com/_/stat?event=post.clientViewed&referrerSource=full_rss&postId=e5c3f1450bd" width="1" height="1" alt=""><hr><p><a href="https://medium.com/data-science/breaking-it-down-logistic-regression-e5c3f1450bd">Breaking it Down: Logistic Regression</a> was originally published in <a href="https://medium.com/data-science">TDS Archive</a> on Medium, where people are continuing the conversation by highlighting and responding to this story.</p>]]></content:encoded>
        </item>
        <item>
            <title><![CDATA[Breaking it Down: Gradient Descent]]></title>
            <link>https://pub.towardsai.net/breaking-it-down-gradient-descent-b94c124f1dfd?source=rss-e1f3762eb90c------2</link>
            <guid isPermaLink="false">https://medium.com/p/b94c124f1dfd</guid>
            <category><![CDATA[machine-learning]]></category>
            <category><![CDATA[differentiation]]></category>
            <category><![CDATA[deep-learning]]></category>
            <category><![CDATA[python]]></category>
            <category><![CDATA[gradient-descent]]></category>
            <dc:creator><![CDATA[Jacob Bumgarner, Ph.D.]]></dc:creator>
            <pubDate>Mon, 25 Jul 2022 20:26:23 GMT</pubDate>
            <atom:updated>2025-04-12T16:22:04.957Z</atom:updated>
            <content:encoded><![CDATA[<h4>Exploring and visualizing the mathematical fundamentals of gradient descent from scratch with <a href="https://github.com/JacobBumgarner/grad-descent-visualizer">Grad-Descent-Visualizer</a>.</h4><iframe src="https://cdn.embedly.com/widgets/media.html?src=https%3A%2F%2Fwww.youtube.com%2Fembed%2F0Vx0MZCUfKM%3Ffeature%3Doembed&amp;display_name=YouTube&amp;url=https%3A%2F%2Fwww.youtube.com%2Fwatch%3Fv%3D0Vx0MZCUfKM&amp;image=https%3A%2F%2Fi.ytimg.com%2Fvi%2F0Vx0MZCUfKM%2Fhqdefault.jpg&amp;type=text%2Fhtml&amp;schema=youtube" width="640" height="480" frameborder="0" scrolling="no"><a href="https://medium.com/media/669c8fa61ea6412bb136741b54b0ac27/href">https://medium.com/media/669c8fa61ea6412bb136741b54b0ac27/href</a></iframe><pre><strong>Outline</strong><em><br></em>1. <a href="#d9c4">What is Gradient Descent?</a><br>2. <a href="#e501">Breaking Down Gradient Descent</a><br>  2.1 <a href="#f7ef">Computing the Gradient</a><br>  2.2 <a href="#eec4">Descending the Gradient</a><br>3. <a href="#e4a7">Visualizing Multivariate Descents with Grad-Descent-Visualizer</a><br>  3.1 <a href="#bfec">Descent Montage</a><br>4. <a href="#991f">Conclusion: Contextualizing Gradient Descent<br></a>5. <a href="#2918">Resources</a></pre><h4>1. What is Gradient Descent?</h4><p>Gradient descent is an optimization algorithm that is used to improve the performance of deep/machine learning models. Over a repeated series of training steps, gradient descent identifies optimal parameter values that minimize the output of a cost function.</p><p>In the next two sections of this post, we’ll step down from this satellite-view description and break down gradient descent into something a bit easier to understand. We will also visualize the gradient descent of various test functions with my Python package, <a href="https://github.com/JacobBumgarner/grad-descent-visualizer">grad-descent-visualizer</a>.</p><iframe src="https://cdn.embedly.com/widgets/media.html?src=https%3A%2F%2Fwww.youtube.com%2Fembed%2Fu1aZgqeS43U%3Ffeature%3Doembed&amp;display_name=YouTube&amp;url=https%3A%2F%2Fwww.youtube.com%2Fwatch%3Fv%3Du1aZgqeS43U&amp;image=https%3A%2F%2Fi.ytimg.com%2Fvi%2Fu1aZgqeS43U%2Fhqdefault.jpg&amp;type=text%2Fhtml&amp;schema=youtube" width="640" height="480" frameborder="0" scrolling="no"><a href="https://medium.com/media/901068de98325afb8fc30c0346c9f0d4/href">https://medium.com/media/901068de98325afb8fc30c0346c9f0d4/href</a></iframe><h4>2. Breaking Down Gradient Descent</h4><p>To gain an intuitive understanding of gradient descent, let’s first ignore machine learning and deep learning. Let’s instead start with a simple function:</p><figure><img alt="" src="https://cdn-images-1.medium.com/max/102/1*v2_KNtzOml4VYs0bR1-6SA@2x.png" /><figcaption>A simple univariate function</figcaption></figure><p>The goal in gradient descent is to find the <em>minima</em> of a function or the lowest possible output value of that function. This means that given our above function <strong><em>f(x)</em></strong>, the goal of gradient descent will be to find the value of <strong><em>x </em></strong>that leads the output of <strong><em>f(x) </em></strong>to approach <strong><em>0</em></strong>. By visualizing this function (below), it’s quite obvious to see that <strong><em>x = 0 </em></strong>produces the minima of <strong><em>f(x)</em></strong>.</p><iframe src="https://cdn.embedly.com/widgets/media.html?src=https%3A%2F%2Fwww.youtube.com%2Fembed%2Fz5xa7zEJSVU%3Ffeature%3Doembed&amp;display_name=YouTube&amp;url=https%3A%2F%2Fwww.youtube.com%2Fwatch%3Fv%3Dz5xa7zEJSVU&amp;image=https%3A%2F%2Fi.ytimg.com%2Fvi%2Fz5xa7zEJSVU%2Fhqdefault.jpg&amp;type=text%2Fhtml&amp;schema=youtube" width="640" height="480" frameborder="0" scrolling="no"><a href="https://medium.com/media/1869e2f95cc200da0e1a148a51563efb/href">https://medium.com/media/1869e2f95cc200da0e1a148a51563efb/href</a></iframe><p>The important part of gradient descent is: if we initialize <strong><em>x</em></strong> to some random number, say <strong><em>x = 1.8</em></strong>, is there some way to <em>automatically</em> update <strong><em>x</em></strong> so that it eventually produces the minimal output of the function? Indeed, we can automatically find this minimal output with a two-step process:</p><ol><li>Find the <em>slope</em> of the function at the point where our input parameter <strong><em>x</em></strong> sits.</li><li>Update our input parameter <strong><em>x</em></strong> by stepping it <em>down</em> the gradient.</li></ol><p>In our simple gradient descent algorithm, this two-step process is repeated over and over until the output of our function stabilizes at a minimum, or reaches a defined gradient tolerance level. Of note, other more efficient descent algorithms take different approaches (e.g., <a href="https://en.wikipedia.org/wiki/Stochastic_gradient_descent#RMSProp">RMSProp</a>, <a href="https://en.wikipedia.org/wiki/Stochastic_gradient_descent#AdaGrad">AdaGrad</a>, <a href="https://en.wikipedia.org/wiki/Stochastic_gradient_descent#Adam">Adam</a>).</p><h4>2.1. Computing the gradient</h4><p>To find the slope (or <em>gradient, </em>hence gradient descent) of the function <strong><em>f(x)</em></strong> at any value of <strong><em>x</em></strong>, we can differentiate* the function. Differentiating the simple example function is simple with the power rule (below), providing us with: <strong><em>f’(x) = 2x</em></strong>.</p><figure><img alt="" src="https://cdn-images-1.medium.com/max/148/1*VZ6zfiMt9PEVGtVWrjuysQ@2x.png" /><figcaption>Power Rule</figcaption></figure><p>Using our starting point <strong><em>x = 1.8</em></strong>, we find our starting gradient of <strong><em>x</em></strong> (<strong><em>dx</em></strong>) to be <strong><em>dx = 3.6</em></strong>.</p><p>Let’s write a simple function in python to automatically compute the derivative of any input variable for<strong> <em>f(x) = x²</em></strong>.</p><blockquote>*I’d strongly recommend checking out <a href="https://www.youtube.com/watch?v=9vKqVkMQHKk&amp;list=PLZHQObOWTQDMsr9K-rj53DwVRMYO3t5Yr&amp;index=2&amp;t=2s">3Blue1Brown’s video</a> to intuitively understand differentiation. The differentiation of this sample function from first principles can be seen <a href="https://socratic.org/questions/how-you-you-find-the-derivative-f-x-x-2-using-first-principles">here</a>.</blockquote><iframe src="" width="0" height="0" frameborder="0" scrolling="no"><a href="https://medium.com/media/36d794a90d21f550724cac98fd94bb2a/href">https://medium.com/media/36d794a90d21f550724cac98fd94bb2a/href</a></iframe><pre>Gradient at x = 1.8: dx = 3.6</pre><h4>2.2. Descending the gradient</h4><p>Once we find the gradient of the starting point, we want to update our input parameter so that it steps <em>down</em> this gradient. Doing this will minimize the output of the function.</p><p>To move a variable down its gradient, we can simply subtract the gradient from the input parameter. However, if you’ve looked closely, you may have noticed that subtracting the entire gradient from the input parameter <strong><em>x=1.8</em></strong> would cause it to infinitely bouncing back and forth between <strong><em>1.8</em></strong> and <strong><em>-1.8</em></strong>, preventing it from ever coming close to <strong><em>0</em></strong>.</p><p>Instead, we can define a <strong><em>Learning Rate = 0.1</em></strong>. We’ll scale the <strong><em>dx</em></strong> with this learning rate before subtracting it from <strong><em>x</em></strong>. By tuning the learning rate, we can create ‘smoother’ descents. Large learning rates produce large jumps along the function, and small learning rates lead to small steps along the function.</p><p>Lastly, we’ll eventually have to stop the gradient descent. Otherwise, the algorithm would continue endlessly as the gradient approaches 0. For this example, we’ll simply stop the descent once <strong><em>dx </em></strong>is less than <strong><em>0.01</em></strong>. In your own IDE, you can alter the learning_rate and tolerance parameters to see how the iterations and the output of <strong><em>x</em></strong> vary.</p><iframe src="" width="0" height="0" frameborder="0" scrolling="no"><a href="https://medium.com/media/40a69e5304d24f1c052528096c16dc37/href">https://medium.com/media/40a69e5304d24f1c052528096c16dc37/href</a></iframe><pre>Function minimum found in 27 iterations. X = 0.00</pre><p>As seen in the video above, our starting value of <strong><em>x = 1.8</em></strong> was able to automatically be updated to <strong><em>x = 0.0</em></strong> through the iterative process of gradient descent.</p><h3>3. Visualizing Multivariate Descents with Grad-Descent-Visualizer</h3><p>Hopefully, this univariate example provided some foundational insight into what gradient descent actually does. Now let’s expand to the context of multivariate functions.</p><p>We’ll first visualize a gradient descent of <a href="https://en.wikipedia.org/wiki/Himmelblau%27s_function">Himmelblau’s function</a>.</p><figure><img alt="" src="https://cdn-images-1.medium.com/max/517/1*dBriX5IO0vhJvzvgUzN7eA@2x.png" /><figcaption>Himmelblau’s Function</figcaption></figure><p>There are a few key differences in the descent of multivariate functions.</p><p>First, we need to compute <em>partial</em> derivatives to update each parameter. In Himmelblau’s function, the gradient of <strong><em>x</em></strong> depends on <strong><em>y</em></strong> (their sums are squared, requiring the <a href="https://g.co/kgs/8bwVeF">chain rule</a>). This means that the formula used to differentiate <strong><em>x</em></strong> will contain <strong><em>y</em></strong> and vice versa.</p><p>Second, you may have noticed that there was only one minimum in the simple function from Section 2. In reality, there may be many unknown local minima in our cost functions. This means that the local minima that our parameters find will depend on their starting positions and the behavior of the gradient descent algorithm.</p><p>To visualize the descent of this landscape, we’re going to initialize our starting parameters as <strong><em>x = -0.4</em></strong> and <strong><em>y = -0.65</em></strong>. We can then watch the descent of each parameter in its own dimension and a 2D descent, sliced by the position of the opposite parameter.</p><iframe src="https://cdn.embedly.com/widgets/media.html?src=https%3A%2F%2Fwww.youtube.com%2Fembed%2FTu4GLg0aGog%3Ffeature%3Doembed&amp;display_name=YouTube&amp;url=https%3A%2F%2Fwww.youtube.com%2Fwatch%3Fv%3DTu4GLg0aGog&amp;image=https%3A%2F%2Fi.ytimg.com%2Fvi%2FTu4GLg0aGog%2Fhqdefault.jpg&amp;type=text%2Fhtml&amp;schema=youtube" width="640" height="480" frameborder="0" scrolling="no"><a href="https://medium.com/media/69524201d50ad8104f51a6486474f450/href">https://medium.com/media/69524201d50ad8104f51a6486474f450/href</a></iframe><p>For greater context, let’s visualize the descent of the same point in 3D using my <a href="https://github.com/JacobBumgarner/grad-descent-visualizer">grad-descent-visualizer</a> package created with the help of <a href="https://github.com/pyvista/pyvista">PyVista</a>.</p><iframe src="https://cdn.embedly.com/widgets/media.html?src=https%3A%2F%2Fwww.youtube.com%2Fembed%2FCXTeCHAAmBM%3Ffeature%3Doembed&amp;display_name=YouTube&amp;url=https%3A%2F%2Fwww.youtube.com%2Fwatch%3Fv%3DCXTeCHAAmBM&amp;image=https%3A%2F%2Fi.ytimg.com%2Fvi%2FCXTeCHAAmBM%2Fhqdefault.jpg&amp;type=text%2Fhtml&amp;schema=youtube" width="640" height="480" frameborder="0" scrolling="no"><a href="https://medium.com/media/46d226db6ba00b312525ac68805e5a0a/href">https://medium.com/media/46d226db6ba00b312525ac68805e5a0a/href</a></iframe><h4>3.1 Descent Montage</h4><p>Now let’s visualize the descent of some more <a href="https://www.sfu.ca/~ssurjano/optimization.html">test functions</a>! We’ll place a grid of points across each of these functions and watch how the points move as they descend whatever gradient they are sitting on.</p><p>The <a href="https://www.sfu.ca/~ssurjano/spheref.html">Sphere Function</a>.</p><iframe src="https://cdn.embedly.com/widgets/media.html?src=https%3A%2F%2Fwww.youtube.com%2Fembed%2Fty_lrxZk4cA%3Ffeature%3Doembed&amp;display_name=YouTube&amp;url=https%3A%2F%2Fwww.youtube.com%2Fwatch%3Fv%3Dty_lrxZk4cA&amp;image=https%3A%2F%2Fi.ytimg.com%2Fvi%2Fty_lrxZk4cA%2Fhqdefault.jpg&amp;type=text%2Fhtml&amp;schema=youtube" width="640" height="480" frameborder="0" scrolling="no"><a href="https://medium.com/media/ca19aa708038e76712ae6dafd2815038/href">https://medium.com/media/ca19aa708038e76712ae6dafd2815038/href</a></iframe><p>The <a href="https://www.sfu.ca/~ssurjano/griewank.html">Griewank Function</a>.</p><iframe src="https://cdn.embedly.com/widgets/media.html?src=https%3A%2F%2Fwww.youtube.com%2Fembed%2FLxus_GwYUSE%3Ffeature%3Doembed&amp;display_name=YouTube&amp;url=https%3A%2F%2Fwww.youtube.com%2Fwatch%3Fv%3DLxus_GwYUSE&amp;image=https%3A%2F%2Fi.ytimg.com%2Fvi%2FLxus_GwYUSE%2Fhqdefault.jpg&amp;type=text%2Fhtml&amp;schema=youtube" width="640" height="480" frameborder="0" scrolling="no"><a href="https://medium.com/media/25bcc4f3f59cb6b8c9508af32a39b3cc/href">https://medium.com/media/25bcc4f3f59cb6b8c9508af32a39b3cc/href</a></iframe><p>The <a href="https://www.sfu.ca/~ssurjano/camel6.html">Six-Hump Camel Function</a>. Notice the many local minima of the function.</p><iframe src="https://cdn.embedly.com/widgets/media.html?src=https%3A%2F%2Fwww.youtube.com%2Fembed%2F0Vx0MZCUfKM%3Ffeature%3Doembed&amp;display_name=YouTube&amp;url=https%3A%2F%2Fwww.youtube.com%2Fwatch%3Fv%3D0Vx0MZCUfKM&amp;image=https%3A%2F%2Fi.ytimg.com%2Fvi%2F0Vx0MZCUfKM%2Fhqdefault.jpg&amp;type=text%2Fhtml&amp;schema=youtube" width="640" height="480" frameborder="0" scrolling="no"><a href="https://medium.com/media/669c8fa61ea6412bb136741b54b0ac27/href">https://medium.com/media/669c8fa61ea6412bb136741b54b0ac27/href</a></iframe><p>Let’s re-visualize a gridded descent of the <a href="https://en.wikipedia.org/wiki/Himmelblau%27s_function">Himmelblau Function</a>. Notice how different parameter initializations lead to different minima.</p><iframe src="https://cdn.embedly.com/widgets/media.html?src=https%3A%2F%2Fwww.youtube.com%2Fembed%2FUlXZ76MXI3g%3Ffeature%3Doembed&amp;display_name=YouTube&amp;url=https%3A%2F%2Fwww.youtube.com%2Fwatch%3Fv%3DUlXZ76MXI3g&amp;image=https%3A%2F%2Fi.ytimg.com%2Fvi%2FUlXZ76MXI3g%2Fhqdefault.jpg&amp;type=text%2Fhtml&amp;schema=youtube" width="640" height="480" frameborder="0" scrolling="no"><a href="https://medium.com/media/b46d8cfd4759e0e25cadcd428c3212e8/href">https://medium.com/media/b46d8cfd4759e0e25cadcd428c3212e8/href</a></iframe><p>And lastly, the <a href="https://www.sfu.ca/~ssurjano/easom.html">Easom Function</a>. Notice how many points sit still because they are initialized on a flat gradient.</p><iframe src="https://cdn.embedly.com/widgets/media.html?src=https%3A%2F%2Fwww.youtube.com%2Fembed%2F1HcQXu3UXyQ%3Ffeature%3Doembed&amp;display_name=YouTube&amp;url=https%3A%2F%2Fwww.youtube.com%2Fwatch%3Fv%3D1HcQXu3UXyQ&amp;image=https%3A%2F%2Fi.ytimg.com%2Fvi%2F1HcQXu3UXyQ%2Fhqdefault.jpg&amp;type=text%2Fhtml&amp;schema=youtube" width="640" height="480" frameborder="0" scrolling="no"><a href="https://medium.com/media/d5482a3729bd59bfba8eeb9dfc6b9b27/href">https://medium.com/media/d5482a3729bd59bfba8eeb9dfc6b9b27/href</a></iframe><h4>4. Conclusion: Contextualizing Gradient Descent</h4><p>So far, we’ve worked through gradient descent with a univariate function and have visualized the descent of several multivariate functions. In reality, modern deep learning models have <strong><em>vastly</em></strong> more parameters than the functions that we’ve examined.</p><p>For example, Hugging Face’s newest natural language processing model, Bloom, has <em>175 billion</em> parameters. The chained functions used in this model are also more complicated than our test functions.</p><p>However, it’s important to realize that the <em>foundations</em> of what we’ve learned about gradient descent still apply. During each iteration of training of any deep learning model, the gradient of every parameter is calculated. This gradient will then be averaged across the training examples and then subtracted from the parameters so that they ‘step down’ their gradients, pushing them to produce a minimal output from the model’s cost function.</p><p>Thanks for reading!</p><h4>5. Resources</h4><pre>- <a href="https://github.com/JacobBumgarner/grad-descent-visualizer">Grad-Descent-Visualizer</a><br>- <a href="https://www.youtube.com/c/3blue1brown">3Blue1Brown</a><br>  - <a href="https://www.youtube.com/watch?v=IHZwWFHWa-w">Gradient Descent</a><br>  - <a href="https://www.youtube.com/watch?v=9vKqVkMQHKk&amp;t=10s">Derivatives</a><br>- <a href="https://www.sfu.ca/~ssurjano/optimization.html">Simon Fraser University: Test Functions for Optimization</a><br>- <a href="https://docs.pyvista.org">PyVista</a><br>- <a href="http://neuralnetworksanddeeplearning.com/chap1.html">Michael Nielsen&#39;s Neural Networks and Deep Learning</a></pre><img src="https://medium.com/_/stat?event=post.clientViewed&referrerSource=full_rss&postId=b94c124f1dfd" width="1" height="1" alt=""><hr><p><a href="https://pub.towardsai.net/breaking-it-down-gradient-descent-b94c124f1dfd">Breaking it Down: Gradient Descent</a> was originally published in <a href="https://pub.towardsai.net">Towards AI</a> on Medium, where people are continuing the conversation by highlighting and responding to this story.</p>]]></content:encoded>
        </item>
        <item>
            <title><![CDATA[NYT Mini Crossword —  Group Competition Analysis]]></title>
            <link>https://medium.com/@jacobbumgarner/nyt-mini-crossword-group-competition-analysis-e9cd23f6d1b1?source=rss-e1f3762eb90c------2</link>
            <guid isPermaLink="false">https://medium.com/p/e9cd23f6d1b1</guid>
            <category><![CDATA[analysis]]></category>
            <category><![CDATA[python]]></category>
            <category><![CDATA[sqlite3]]></category>
            <dc:creator><![CDATA[Jacob Bumgarner, Ph.D.]]></dc:creator>
            <pubDate>Sun, 03 Oct 2021 23:31:36 GMT</pubDate>
            <atom:updated>2021-10-19T21:52:06.588Z</atom:updated>
            <content:encoded><![CDATA[<h3>NYT Mini Crossword — Automated Competition Analysis</h3><figure><img alt="" src="https://cdn-images-1.medium.com/max/1024/1*87j598Lu6aYn6m1QVs7QcA.jpeg" /><figcaption>Typical group chat banter</figcaption></figure><p>Almost every day over the past two years, a group of friends and I have raced each other to get the fastest solve time on the daily NYT Mini crossword puzzles. These mini puzzles are found in 5x5 to 6x6 grid formats with 10 to 12 clues. Each day, we send our solve times and associated trash talk to a group chat.</p><p>It’s pretty obvious who always wins in our group (hint… it’s not me), but I’ve always wanted to formally visualize our solve times across extended periods of play.</p><p>Before I started learning how to program, we used to manually type in all of our times into excel each month to visualize scores. However, after learning to use python, I knew that I had to figure out how to automate extracting these times from our group chat for easy analysis. Below I detail the methods that I used to filter solve times from our group chat, analyze solves, and export data for easy plotting.</p><figure><img alt="" src="https://cdn-images-1.medium.com/max/935/1*T9qT713p3m-3lFTMp7lXVw.jpeg" /><figcaption>Placements and solve times from September, 2021</figcaption></figure><h4>Extracting Texts</h4><p>To start, I use iOS/MacOS, so all of my texts are sent using iMessage. Apple locally stores all iMessage texts in an SQLite database called ‘chat.db’ in the hidden user Library file. Prior to working on this project, I was unfamiliar with SQLite, so I had to learn some of the syntax!</p><p>To scrape all of the texts from our group chat from a specific period, I wound up using this lovely command:</p><pre>SELECT text, handle_id, date FROM message MSG <br>     INNER JOIN chat_message_join CMJ <br>     ON CMJ.message_id = MSG.ROWID <br>     INNER JOIN chat <br>     ON chat.ROWID = CMJ.chat_id <br>     WHERE (chat.display_name = &quot;Double Dash 🥊&quot; AND date &gt; {start_time} AND date &lt; {end_time}) <br>     ORDER BY MSG.date ASC;</pre><p>Basically, from every text in our group chat, I want to get the text string, the text sender (handle_id), and the send date. The chat.db stores all messages without explicitly stating what group chat they’re from. To link each message with the chat it was sent in, we have to use the “chat_message_join” database. Then, because I want to find texts using our group chat’s string name (rather than it’s “chat_id”), I have to link the chat_message_join with the actual chat database where I can explicitly filter texts to only come from our crossword chat. I also limit the texts to specific date windows and sort them chronologically. To facilitate automation, I use the python sqlite package to make this request.</p><h4>Filtering Times</h4><p>After dealing with this mess of a database inquiry, I’m left with a ton of texts. These texts contain solve times, trash talk, and oftentimes solve time typo corrections.</p><p>In brief, I end up using the python re package (ReGex) to filter texts that only contain solve times (i.e., #:## format). If multiple times are sent from the same person within a single day, I assume the most recent one is the correct time and use that one.</p><pre>results = re.findall(&quot;\d:\d\d&quot;, text)</pre><p>After I filter the banter texts from solve times, I store all of these texts in a Score class, with text, solver, and date information.</p><h4>Analyzing &amp; Exporting Data</h4><figure><img alt="" src="https://cdn-images-1.medium.com/max/620/1*itxeMUcOpYUjhA0RoN0L2w.gif" /><figcaption>Analysis portion of the code</figcaption></figure><p>After a bit more organization and time filtering, I’m left with raw data that I can analyze. I’m particularly interested in the effect of time of day on solves, so I keep close track of that. At the moment, the program finds placements, average solve wins/losses, average win/loss time of day, and averages of solves by weekdays.</p><p>Lastly, I export using the data using the csv package. So far, I’ve just been visualizing these data with GraphPad, but one day I may try to create a R script that will generate these graphs for me automatically.</p><figure><img alt="" src="https://cdn-images-1.medium.com/max/1024/1*yrRvuid4jtgTXh0XEQrzUw.png" /><figcaption>Terminal output</figcaption></figure><p>For now, the code I wrote to use extract and analyze times can be found at my GitHub page below:</p><p><a href="https://github.com/JacobBumgarner/Daily_Mini_Analysis">GitHub - JacobBumgarner/Daily_Mini_Analysis: A short python script written to scrape iOS text messages for crossword solve times.</a></p><p>See some of the other results below!</p><p>Saturdays are clearly our slowest days — the puzzles are much harder then.</p><figure><img alt="" src="https://cdn-images-1.medium.com/max/1024/1*_nK1GYmPcpLy7tckKxzHqw.jpeg" /><figcaption>Solves across the month and averaged by weekday.</figcaption></figure><p>It doesn’t seem like there’s a clear effect of time-of-day on the solves, but maybe when I add more data, some trend may appear.</p><figure><img alt="" src="https://cdn-images-1.medium.com/max/1024/1*vv8RloSsYbmQXpZDvIqb4A.jpeg" /><figcaption>Solve speed/time of solve for all players</figcaption></figure><p>Disclaimer: I am in no way affiliated with the NYT. Thanks for reading!</p><img src="https://medium.com/_/stat?event=post.clientViewed&referrerSource=full_rss&postId=e9cd23f6d1b1" width="1" height="1" alt="">]]></content:encoded>
        </item>
    </channel>
</rss>