<?xml version="1.0" encoding="UTF-8"?><rss xmlns:dc="http://purl.org/dc/elements/1.1/" xmlns:content="http://purl.org/rss/1.0/modules/content/" xmlns:atom="http://www.w3.org/2005/Atom" version="2.0" xmlns:cc="http://cyber.law.harvard.edu/rss/creativeCommonsRssModule.html">
    <channel>
        <title><![CDATA[Stories by Krish Soni on Medium]]></title>
        <description><![CDATA[Stories by Krish Soni on Medium]]></description>
        <link>https://medium.com/@isonikrish?source=rss-80877cfaa6dc------2</link>
        <image>
            <url>https://cdn-images-1.medium.com/fit/c/150/150/1*I4gbMOpmGQVBwFSs5qO76w.jpeg</url>
            <title>Stories by Krish Soni on Medium</title>
            <link>https://medium.com/@isonikrish?source=rss-80877cfaa6dc------2</link>
        </image>
        <generator>Medium</generator>
        <lastBuildDate>Sun, 17 May 2026 06:38:09 GMT</lastBuildDate>
        <atom:link href="https://medium.com/@isonikrish/feed" rel="self" type="application/rss+xml"/>
        <webMaster><![CDATA[yourfriends@medium.com]]></webMaster>
        <atom:link href="http://medium.superfeedr.com" rel="hub"/>
        <item>
            <title><![CDATA[Feature Engineering in Machine Learning — The Secret Sauce Behind Smart Models]]></title>
            <link>https://medium.com/@isonikrish/feature-engineering-in-machine-learning-the-secret-sauce-behind-smart-models-ee598fbafed4?source=rss-80877cfaa6dc------2</link>
            <guid isPermaLink="false">https://medium.com/p/ee598fbafed4</guid>
            <category><![CDATA[machine-learning]]></category>
            <category><![CDATA[feature-engineering]]></category>
            <category><![CDATA[data-science]]></category>
            <category><![CDATA[ai]]></category>
            <category><![CDATA[predictions]]></category>
            <dc:creator><![CDATA[Krish Soni]]></dc:creator>
            <pubDate>Wed, 03 Sep 2025 06:46:13 GMT</pubDate>
            <atom:updated>2025-09-03T06:46:13.817Z</atom:updated>
            <content:encoded><![CDATA[<h3>Feature Engineering in Machine Learning — The Secret Sauce Behind Smart Models</h3><p>When we think of machine learning, most people picture fancy algorithms like neural networks or random forests. But here’s the truth: <strong>a simple model with well-engineered features can often beat a complex model with messy data.</strong></p><p>This process of shaping raw data into meaningful inputs for a model is called <strong>Feature Engineering</strong>, and it’s one of the most powerful (yet underrated) parts of the ML pipeline.</p><p>In this blog, we’ll explore the main areas of feature engineering with simple explanations and examples:</p><ul><li>Feature Transformation</li><li>Feature Selection</li><li>Other useful tricks (handling missing values, outliers, and feature creation)</li></ul><h3><strong>1. What is Feature Engineering?</strong></h3><p>Feature engineering is the process of <strong>transforming raw data into features that make machine learning algorithms work better.</strong></p><p>Think of it like cooking: raw vegetables aren’t very tasty, but when you cut, season, and cook them, they become a delicious dish. Similarly, raw data often needs to be cleaned, scaled, and encoded before models can “digest” it.</p><h3><strong>2. Feature Transformation</strong></h3><p>Feature transformation means changing the way a feature is represented without changing the underlying information. This makes data easier for models to process.</p><p>The two most common transformations are <strong>Scaling</strong> and <strong>Encoding</strong>.</p><h4><strong>A) Feature Scaling</strong></h4><p>Not all features are measured on the same scale. Imagine predicting car prices with these features:</p><ul><li>km_driven ranges from <strong>500 to 300,000</strong></li><li>owner ranges from <strong>0 to 3</strong></li></ul><p>If we feed this directly into models like Linear Regression or KNN, the model will think km_driven is way more important just because it has bigger numbers.</p><p>That’s where <strong>scaling</strong> comes in.</p><p>Before:</p><pre>km_driven: [500, 300000, 45000]<br>owner:     [0, 1, 2]</pre><p>After:</p><pre>km_driven: [0.0, 1.0, 0.15]<br>owner:     [0.0, 0.33, 0.67]</pre><p>✅ Now both features are on a comparable scale, and the model can judge importance fairly.</p><h4>B) Feature Encoding</h4><p>Models cannot understand text directly.</p><p>👉 Suppose we want to include fuel type when predicting prices:</p><ul><li>Possible values: <strong>Petrol, Diesel, CNG</strong></li></ul><p>If we just assign numbers (Petrol = 1, Diesel = 2, CNG = 3), the model will think Diesel &gt; Petrol or CNG &gt; Diesel, which is wrong.</p><p>That’s why we use <strong>encoding</strong>.</p><p><strong>Before:</strong></p><pre>fuel: [Petrol, Diesel, CNG, Petrol]</pre><p><strong>After (One-Hot Encoding):</strong></p><pre>fuel_Petrol  fuel_Diesel  fuel_CNG<br>     1            0           0<br>     0            1           0<br>     0            0           1<br>     1            0           0</pre><p>✅ Now each fuel type is represented fairly without fake order.</p><h3>3. Feature Selection</h3><p>Not all features help the model. Some add noise or unnecessary complexity.</p><p>👉 Example: predicting car prices:</p><ul><li>Useful features: brand, fuel, km_driven, owner</li><li>Less useful: seller_name, car_color</li></ul><p><strong>Before:</strong></p><pre>[brand, fuel, km_driven, owner, car_color, seller_name]</pre><p><strong>After (Selected):</strong></p><pre>[brand, fuel, km_driven, owner]</pre><p>✅ Removing irrelevant features reduces overfitting and speeds up training.</p><h3>4. Other Handy Tricks in Feature Engineering</h3><ul><li><strong>Handling Missing Values:</strong> Fill missing km_driven with median instead of dropping rows.</li><li><strong>Outlier Treatment:</strong> Remove extreme values like “1 crore km driven.”</li><li><strong>Feature Creation:</strong> From date_of_birth, create a new feature age.</li></ul><h3>Final Takeaway</h3><p>Feature Engineering is the backbone of Machine Learning. It prepares raw data so algorithms can do their job effectively.</p><ul><li><strong>Scaling</strong> → puts numbers on the same level</li><li><strong>Encoding</strong> → converts text to numbers</li><li><strong>Selection</strong> → keeps only what matters</li><li><strong>Transformation</strong> → reshapes features to reveal patterns</li></ul><blockquote><em>Better features → Better models → Better predictions.</em></blockquote><img src="https://medium.com/_/stat?event=post.clientViewed&referrerSource=full_rss&postId=ee598fbafed4" width="1" height="1" alt="">]]></content:encoded>
        </item>
        <item>
            <title><![CDATA[ Predict Instagram Likes with Just Hashtags using Python + Linear Regression]]></title>
            <link>https://medium.com/@isonikrish/predict-instagram-likes-with-just-hashtags-using-python-linear-regression-6e684b0b4faa?source=rss-80877cfaa6dc------2</link>
            <guid isPermaLink="false">https://medium.com/p/6e684b0b4faa</guid>
            <category><![CDATA[linear-regression]]></category>
            <category><![CDATA[machine-learning]]></category>
            <category><![CDATA[python]]></category>
            <category><![CDATA[ai]]></category>
            <category><![CDATA[predictions]]></category>
            <dc:creator><![CDATA[Krish Soni]]></dc:creator>
            <pubDate>Fri, 11 Jul 2025 14:44:05 GMT</pubDate>
            <atom:updated>2025-07-11T14:44:05.615Z</atom:updated>
            <content:encoded><![CDATA[<p>Ever wondered how much likes your Instagram post might get <strong>based on the hashtags</strong> you use? In this article, we’ll build a fun, hands-on machine learning project that predicts the number of <strong>likes</strong> using past hashtag performance data.</p><h3>📦 What We’ll Build</h3><p>A small Python script that:</p><ul><li>Takes your post’s hashtags as input</li><li>Analyzes their past performance</li><li>Predicts how many likes your post might get based on that</li></ul><p>All using just <strong>Pandas</strong>, <strong>Scikit-Learn</strong>, and a CSV file.</p><h3>🧠 Prerequisites</h3><ul><li>Basic understanding of Python and machine learning</li><li>pandas, scikit-learn, and a CSV of your Instagram data (with columns like: Likes, Hashtags, etc.)</li></ul><h3>🧾 Step 1: Import Libraries and Load Data</h3><pre>import pandas as pd<br>from sklearn.linear_model import LinearRegression<br>from sklearn.model_selection import train_test_split<br><br># Load the dataset<br>df = pd.read_csv(&quot;instagram_data.csv&quot;, encoding=&#39;latin1&#39;)</pre><h3>🧹 Step 2: Clean Hashtags Column</h3><pre># Fill missing values and lowercase everything for consistency<br>df[&#39;Hashtags&#39;] = df[&#39;Hashtags&#39;].fillna(&#39;&#39;).str.lower()</pre><h3>🔍 Step 3: Extract and Encode Hashtags</h3><pre># Explode all hashtags into individual tags<br>all_hashtags = df[&#39;Hashtags&#39;].str.split().explode()<br><br># Get the 20 most common hashtags<br>top_hashtags = all_hashtags.value_counts().head(20).index.tolist()<br><br># Create binary columns for each top hashtag (1 if present, 0 if not)<br>for tag in top_hashtags:<br>    df[tag] = df[&#39;Hashtags&#39;].apply(lambda x: 1 if tag in x else 0)</pre><h3>📊 Step 4: Prepare Features and Target Variable</h3><pre>X = df[top_hashtags]  # Independent variables<br>y = df[&#39;Likes&#39;]       # Target variable</pre><h3>🏋️ Step 5: Train the Linear Regression Model</h3><pre>X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=0.2, random_state=42)<br><br>model = LinearRegression()<br>model.fit(X_train, y_train)</pre><h3>🧪 Step 6: Predict Likes for New Hashtags</h3><pre># Ask the user for input<br>input_tags = input(&quot;Enter hashtags for your post: &quot;)<br><br># Check if at least one known hashtag is present<br>known = any(tag in input_tags for tag in top_hashtags)<br><br>if not known:<br>    print(&quot;⚠️ No known high-performing hashtags found.&quot;)<br>else:<br>    # Create feature vector from input<br>    input_vector = [1 if tag in input_tags else 0 for tag in top_hashtags]<br>    <br>    # Predict likes<br>    predicted_likes = model.predict([input_vector])<br>    print(f&quot;🎯 Predicted Likes: {int(predicted_likes[0])}&quot;)</pre><h3>💡 Example Input &amp; Output</h3><p>Input:</p><pre>Enter hashtags for your post: #python #ai</pre><p>Output:</p><pre>🎯 Predicted Likes: 66</pre><img src="https://medium.com/_/stat?event=post.clientViewed&referrerSource=full_rss&postId=6e684b0b4faa" width="1" height="1" alt="">]]></content:encoded>
        </item>
        <item>
            <title><![CDATA[Understanding Tokenization in LLMs]]></title>
            <link>https://medium.com/@isonikrish/understanding-tokenization-in-llms-e3b4c43e5c6c?source=rss-80877cfaa6dc------2</link>
            <guid isPermaLink="false">https://medium.com/p/e3b4c43e5c6c</guid>
            <category><![CDATA[llm]]></category>
            <category><![CDATA[ai]]></category>
            <dc:creator><![CDATA[Krish Soni]]></dc:creator>
            <pubDate>Sat, 28 Jun 2025 09:40:26 GMT</pubDate>
            <atom:updated>2025-06-28T09:40:26.956Z</atom:updated>
            <content:encoded><![CDATA[<p>When you interact with a Large Language Model (LLM) like GPT, your input text doesn’t go into the model as-is. It first goes through a crucial process called <strong>tokenization</strong>. This step is what allows the model to break down human language into something it can understand and work with.</p><h3>🤔 <strong>What is tokenization ?</strong></h3><p><strong>Tokenization</strong> is the process of splitting text into smaller units called <strong>tokens</strong>. These tokens can be words, parts of words (subwords), or even characters depending on the tokenizer used.</p><h3>📌Example:</h3><p>“Tokenization is powerful.”</p><p>This might get tokenized as:</p><p>[“Token”, “ization”, “ is”, “ powerful”, “.”]</p><h3>🔍 Breaking Down the Example</h3><p>Notice how the word &quot;Tokenization&quot; is split into two tokens: &quot;Token&quot; and &quot;ization&quot;. This is done because the tokenizer tries to reuse common word parts. If a word is uncommon, it&#39;s more efficient to break it into familiar segments.</p><p>Also, &quot; is&quot; and &quot; powerful&quot; include the space in the token — that&#39;s how the tokenizer remembers where words start or end.</p><h3>🤔 Why Tokenization?</h3><ol><li><strong>Efficient Vocabulary</strong>: Instead of remembering every possible word (which is infinite), the model learns a limited set of subwords that can represent any text.</li><li><strong>Handles New or Rare Words: </strong>If the model sees a rare word like neurogenomics, it might not know the full word — but it can understand &quot;neuro&quot;, &quot;genom&quot;, and &quot;ics&quot; separately.</li><li><strong>Smaller Context Units:</strong> Smaller tokens give the model more precision. For example, breaking &quot;unbelievable&quot; into &quot;un&quot;, &quot;believ&quot;, and &quot;able&quot; helps the model understand the meaning better.</li><li><strong>Optimizes Performance and Cost:</strong> Most LLMs work with a <strong>token limit</strong> (e.g., 4,000 or 8,000 tokens). Also, API usage is priced <em>per token</em>. Understanding tokenization helps you stay within limits and optimize usage.</li></ol><h3>🧪 Types of Tokenizers</h3><ol><li><strong>GPT-3/4</strong>: Byte Pair Encoding (BPE) or Byte-level BPE</li><li><strong>BERT</strong>: WordPiece</li><li><strong>T5</strong>: SentencePiece</li></ol><h3>🔚 Conclusion</h3><p>Tokenization might seem like a technical detail, but it’s one of the most important parts of how LLMs work. It bridges the gap between natural language and machine understanding. The better you understand tokenization, the better you’ll be at using and building AI systems.</p><img src="https://medium.com/_/stat?event=post.clientViewed&referrerSource=full_rss&postId=e3b4c43e5c6c" width="1" height="1" alt="">]]></content:encoded>
        </item>
        <item>
            <title><![CDATA[Understanding Synchronous vs. Asynchronous in JavaScript]]></title>
            <link>https://medium.com/@isonikrish/understanding-synchronous-vs-asynchronous-in-javascript-c72b6966423d?source=rss-80877cfaa6dc------2</link>
            <guid isPermaLink="false">https://medium.com/p/c72b6966423d</guid>
            <category><![CDATA[js]]></category>
            <category><![CDATA[asynchronous-programming]]></category>
            <category><![CDATA[javascript]]></category>
            <category><![CDATA[synchronous]]></category>
            <category><![CDATA[programming]]></category>
            <dc:creator><![CDATA[Krish Soni]]></dc:creator>
            <pubDate>Sun, 24 Nov 2024 04:38:09 GMT</pubDate>
            <atom:updated>2024-11-24T04:38:09.250Z</atom:updated>
            <content:encoded><![CDATA[<p>JavaScript is <strong>everywhere</strong>. From creating Interactive websites to server side applications, It’s go-to language for developers. One of the fundamental concept of JavaScript is understanding how it handles tasks — specifically difference between synchronous and asynchronous behavior. Let’s break it down simply.</p><h3><strong>What is Synchronous JavaScript?</strong></h3><p>Synchronous behavior is default behavior of JavaScript. It executes the code line by line, in order it appears. Each line of code waits for the previous line of code to finish. This sometimes refer to blocking behavior because the program waits for current task to complete.</p><p>Example :</p><pre>console.log(&quot;Step 1&quot;);  <br>console.log(&quot;Step 2&quot;);  <br>console.log(&quot;Step 3&quot;);  </pre><p>Output:</p><pre>Step 1  <br>Step 2  <br>Step 3</pre><p>As shown in the example, Each console.log runs after the other. Every line has to wait for previous line to finish execution.</p><h3>What is Asynchronous JavaScript?</h3><p>Asynchronous code allows JavaScript to handle time-consuming operations, such as API requests, file reading, or database access, without blocking the execution of other tasks. While the asynchronous operation runs in the background, the main program continues to execute synchronous code. Once the asynchronous task is complete, its result is processed.</p><p>Example:</p><pre>console.log(&quot;Step 1&quot;);<br><br>async function step2(){<br>    console.log(await &quot;Step 2&quot;)<br>}<br>step2();<br>console.log(&quot;Step 3&quot;);</pre><p>Output:</p><pre>Step 1<br>Step 3<br>Step 2</pre><p>As shown in the example, Step 1 and Step 3 run immediately because they are synchronous. Step 2, however, is asynchronous. It doesn’t block the main thread and waits in the background. Once all the synchronous code (Steps 1 and 3) has finished executing, Step 2 is then processed.</p><img src="https://medium.com/_/stat?event=post.clientViewed&referrerSource=full_rss&postId=c72b6966423d" width="1" height="1" alt="">]]></content:encoded>
        </item>
    </channel>
</rss>