Stories by Ahsan Umar on Medium

batch.md

Ahsan Umar — Tue, 24 Dec 2024 09:32:47 GMT

Gradient descent is the workhorse of modern machine learning, powering everything from simple linear regression to complex neural networks. In this comprehensive guide, we’ll dive deep into batch gradient descent, understanding how it works, its advantages and limitations, and when to use it.

Batch gradient descent is an optimization algorithm used to find the parameters (weights and biases) that minimize the cost function of a machine learning model. The term “batch” refers to the fact that it uses the entire training dataset to compute the gradient in each iteration.

At its core, batch gradient descent follows a simple yet powerful principle: iteratively adjust the parameters in the direction that reduces the cost function the most. This direction is given by the negative gradient of the cost function.

The update rule for batch gradient descent is:

$$ \theta_{j} = \theta_{j} — \alpha \frac{\partial}{\partial \theta_{j}} J(\theta) $$

Where:

$\theta_{j}$ is the j-th parameter
$\alpha$ is the learning rate
$J(\theta)$ is the cost function
$\frac{\partial}{\partial \theta_{j}} J(\theta)$ is the partial derivative of the cost function with respect to $\theta_{j}$

For linear regression, the cost function is typically the Mean Squared Error (MSE):

$$ J(\theta) = \frac{1}{2m} \sum_{i=1}^{m} (h_\theta(x^{(i)}) — y^{(i)})² $$

Where:

$m$ is the number of training examples
$h_\theta(x)$ is the hypothesis function
$x^{(i)}$ and $y^{(i)}$ are the i-th input feature and target value respectively

Let’s break down how batch gradient descent works step by step:

Initialization: Start with random parameter values
Forward Pass: Compute predictions for all training examples
Cost Calculation: Calculate the cost using all training examples
Gradient Computation: Calculate the gradient of the cost function
Parameter Update: Update all parameters using the computed gradient
Repeat: Steps 2–5 until convergence

Stable Convergence: By using the entire dataset, batch gradient descent provides stable updates and a smooth convergence path.
Guaranteed Convergence: For convex problems, it will always converge to the global minimum (with proper learning rate).
Vectorization: Efficient implementation using matrix operations, especially on modern hardware.
Memory Requirements: Needs to store the entire dataset in memory.
Computational Cost: Each iteration requires processing the entire dataset.
Redundancy: May perform redundant computations when data points are similar.
Local Minima: Can get stuck in local minima for non-convex problems.

The learning rate $\alpha$ is crucial for successful optimization. Here are some guidelines:

Start with a small value (e.g., 0.01)
If convergence is too slow, increase by a factor of 10
If diverging, decrease by a factor of 10
Consider learning rate schedules for better convergence

Always normalize your features before applying gradient descent:

Monitor the change in cost function and stop when:

The change is below a threshold
A maximum number of iterations is reached
The gradient magnitude is sufficiently small

Batch gradient descent is most suitable when:

Dataset fits in memory
Computing power is not a constraint
Need for stable convergence is paramount
Problem is convex or nearly convex

For larger datasets or non-convex problems, consider alternatives like:

Stochastic Gradient Descent (SGD)
Mini-batch Gradient Descent
Advanced optimizers (Adam, RMSprop, etc.)

Batch gradient descent remains a fundamental algorithm in machine learning, providing a solid foundation for understanding more advanced optimization techniques. While it may not always be the most practical choice for modern large-scale problems, its principles form the basis for more sophisticated approaches.

Remember these key points:

Always normalize your features
Choose learning rate carefully
Monitor convergence
Consider the trade-offs with other optimization methods

By mastering batch gradient descent, you’ll better understand the optimization landscape of machine learning and make informed decisions about which algorithm to use for your specific problems.

This article is part of our Machine Learning Fundamentals series. For more in-depth tutorials and guides, follow us on Medium.

links:

Originally published at http://github.com.

️ Crawl4AI: The Open-Source LLM-Friendly Web Crawler & Scraper You’ve Been Waiting For

Ahsan Umar — Sun, 15 Dec 2024 10:59:13 GMT

Crawl4AI is the ultimate tool for asynchronous web crawling and data extraction, purpose-built for LLM (Large Language Model) and AI-driven applications. Whether you’re handling dynamic websites, cleaning up content, or performing advanced link and media analysis, Crawl4AI is here to simplify it all.

Let’s dive into how you can use Crawl4AI and explore its features through this comprehensive guide.

🚀 Getting Started with Crawl4AI

1. Installation

To begin, install the necessary dependencies:

pip install -U crawl4ai
pip install nest_asyncio
playwright install

You’re now ready to crawl the web!

2. Basic Setup: Crawl Your First Website

Start simple by crawling a web page and retrieving its content:

import asyncio
from crawl4ai import AsyncWebCrawler, CacheMode
import nest_asyncio

nest_asyncio.apply()

async def simple_crawl():
    async with AsyncWebCrawler(verbose=True) as crawler:
        result = await crawler.arun(
            url="https://www.kidocode.com/degrees/technology",
            cach_mode=CacheMode.ENABLED  # Cache is enabled by default
        )
        print(result.markdown_v2.raw_markdown[:500].replace("\n", " -- "))  # Print the first 500 characters

asyncio.run(simple_crawl())

In just a few lines of code, Crawl4AI fetches the page content in Markdown format!

3. Handling Dynamic Websites with JavaScript

Dynamic websites often load content using JavaScript. Crawl4AI handles such scenarios with ease:

async def crawl_dynamic_content():
    async with AsyncWebCrawler(verbose=True) as crawler:
        js_code = [
            "const loadMoreButton = Array.from(document.querySelectorAll('button')).find(button => button.textContent.includes('Load More')); loadMoreButton && loadMoreButton.click();"
        ]
        result = await crawler.arun(
            url="https://www.nbcnews.com/business",
            js_code=js_code,
            cach_mode=CacheMode.ENABLED
        )
        print(result.markdown_v2.raw_markdown[:500].replace("\n", " -- "))
        
asyncio.run(crawl_dynamic_content())

With support for custom JavaScript execution, you can effortlessly interact with buttons, scrollbars, and more.

4. Content Cleaning for Structured Results

Crawl4AI offers advanced content filtering, letting you extract only what’s needed:

from crawl4ai.content_filter_strategy import PruningContentFilter
from crawl4ai.markdown_generation_strategy import DefaultMarkdownGenerator

async def clean_content():
    async with AsyncWebCrawler(verbose=True) as crawler:
        result = await crawler.arun(
            url="https://en.wikipedia.org/wiki/Apple",
            excluded_tags=['nav', 'footer', 'aside'],
            remove_overlay_elements=True,
            markdown_generator=DefaultMarkdownGenerator(
                content_filter=PruningContentFilter(threshold=0.48),
                options={"ignore_links": True}
            )
        )
        print(f"Markdown Length: {len(result.markdown_v2.raw_markdown)}")

asyncio.run(clean_content())

This makes Crawl4AI perfect for AI applications requiring clean, contextually relevant data.

5. Advanced Crawling Features

Link Analysis

Want to analyze and filter links? Crawl4AI’s link analysis lets you manage both internal and external links:

async def link_analysis():
    async with AsyncWebCrawler() as crawler:
        result = await crawler.arun(
            url="https://www.nbcnews.com/business",
            exclude_external_links=True,
            exclude_social_media_links=True
        )
        print(f"Found {len(result.links['internal'])} internal links")
        for link in result.links['internal'][:5]:
            print(f"Link: {link['href']}, Text: {link['text']}")
            
asyncio.run(link_analysis())

Media Handling

Extract images and media data with ease:

async def media_handling():
    async with AsyncWebCrawler() as crawler:
        result = await crawler.arun(
            url="https://www.nbcnews.com/business",
            exclude_external_images=False
        )
        for img in result.media['images'][:5]:
            print(f"Image URL: {img['src']}, Alt: {img['alt']}")
            
asyncio.run(media_handling())

6. Hooks for Custom Workflows

Crawl4AI allows you to customize workflows using hooks. For example, you can modify headers before navigation:

async def custom_hook_workflow():
    async with AsyncWebCrawler() as crawler:
        crawler.crawler_strategy.set_hook("before_goto", lambda page, _: print("[Hook] Preparing to navigate..."))
        result = await crawler.arun(url="https://crawl4ai.com", cach_mode=CacheMode.ENABLED)
        print(result.markdown_v2.raw_markdown[:500].replace("\n", " -- "))

asyncio.run(custom_hook_workflow())

7. Multi-Page Crawling with Sessions

Session-based crawling is ideal for navigating multi-page content while maintaining browser state:

async def multi_page_crawling():
    async with AsyncWebCrawler() as crawler:
        for page in range(3):
            result = await crawler.arun(
                url="https://github.com/microsoft/TypeScript/commits/main",
                session_id="typescript_session",
                js_code="document.querySelector('a[data-testid=\"pagination-next-button\"]').click();",
                js_only=page > 0,
                cache_mode=CacheMode.BYPASS
            )
            print(f"Page {page + 1}: {result.success}")
            
asyncio.run(multi_page_crawling())

8. Structured Data Extraction

Extract structured data using JSON schemas or even LLM-based strategies:

from crawl4ai.extraction_strategy import JsonCssExtractionStrategy

async def extract_data():
    schema = {
        "name": "Courses",
        "baseSelector": ".course-section",
        "fields": [
            {"name": "title", "selector": "h2", "type": "text"},
            {"name": "description", "selector": "p", "type": "text"}
        ]
    }
    async with AsyncWebCrawler() as crawler:
        result = await crawler.arun(
            url="https://www.example.com",
            extraction_strategy=JsonCssExtractionStrategy(schema, verbose=True)
        )
        print(result.extracted_content)
        
asyncio.run(extract_data())

For even more advanced cases, integrate LLMs like OpenAI for schema-based data extraction.

9. Semantic Content Extraction

Need contextually relevant sections? Crawl4AI leverages semantic clustering for better results:

from crawl4ai.extraction_strategy import CosineStrategy

async def semantic_extraction():
    strategy = CosineStrategy(
        semantic_filter="AI trends, future technology",
        sim_threshold=0.3,
        verbose=True
    )
    async with AsyncWebCrawler() as crawler:
        result = await crawler.arun(
            url="https://example.com/ai-trends",
            extraction_strategy=strategy
        )
        print(result.extracted_content)
        
asyncio.run(semantic_extraction())

10. The Future of Web Crawling

Crawl4AI empowers developers with the tools to tackle modern web scraping challenges. Its robust features — from handling dynamic content to advanced extraction strategies — make it an essential companion for building LLM applications.

GitHub: Crawl4AI Repository
Website: https://crawl4ai.com

Happy crawling! 🕷️

Implementing and Comparing Polynomial Regression from Scratch in Python.

Ahsan Umar — Fri, 13 Dec 2024 16:04:38 GMT

Polynomial regression extends linear regression by modeling nonlinear relationships using polynomial terms. In this comprehensive guide, we’ll implement polynomial regression from scratch, compare it with scikit-learn’s implementation, and explore optimization techniques.

Basic linear regression is expressed as:

$$y = β₀ + β₁x + ε$$

where:

y is the dependent variable
x is the independent variable
β₀ is the intercept
β₁ is the slope
ε is the error term

Polynomial regression extends this to:

$$y = β₀ + β₁x + β₂x² + … + βₙxⁿ + ε$$

The Mean Squared Error (MSE) cost function:

$$J(β) = \frac{1}{2m} \sum_{i=1}^m (h_β(x^{(i)}) — y^{(i)})²$$

First, let’s create synthetic nonlinear data:

Our enhanced CustomPolynomialEstimator class:

Key Features:

Feature Scaling: Normalizes features using StandardScaler
Polynomial Transformation: Creates polynomial terms up to specified degree
Interaction Terms: Optional interaction terms between features
Feature Selection: Variance-based feature selection
Memory Optimization: Efficient matrix operations

The gradient descent update rule:

$$β = β — α\frac{\partial}{\partial β}J(β)$$

where α is the learning rate.

$$Variance_{threshold} = \frac{1}{n}\sum_{i=1}^n (x_i — \bar{x})²J(β) = MSE + λ\sum|β_i|$$$$J(β) = MSE + λ\sum β_i²$$

Complete implementation: GitHub Repository Link

Cross-validation implementation
Advanced regularization techniques
Sparse matrix support
GPU acceleration
Scikit-learn Documentation
Statistical Learning Theory
Numerical Optimization Techniques

Follow me on social media:
- [LinkedIn](https://linkedin.com/in/codewithdark)
- [NoteBook](https://www.kaggle.com/code/codewithdark/polynomial-regression-from-scratch)
- [GitHub](https://github.com/codewithdark-git)

Originally published at http://github.com.

Implementing Multiple Linear Regression from Scratch

Ahsan Umar — Wed, 11 Dec 2024 16:49:06 GMT

Overview

Multiple Linear Regression (MLR) is a statistical method that models the relationship between a dependent variable and multiple independent variables. It extends simple linear regression, which uses a single feature to predict the outcome, to handle multiple features. The general form of the MLR equation is:

Approach

We’ll implement multiple linear regression from scratch using two different methods:

Gradient Descent: An iterative approach to optimize the model parameters.
Normal Equation: A direct mathematical solution to compute the optimal parameters.

1. Data Generation and Preprocessing

To demonstrate the MLR model, we first generate synthetic data. For this example, we will create a dataset with three features, where the true relationship is:

The noise is added to simulate real-world data variability, making it more challenging for the model to fit the data perfectly.

2. Building the Multiple Linear Regression Class

We implement the model using two methods: Gradient Descent and Normal Equation.

Gradient Descent Method

Gradient descent is an optimization algorithm used to minimize the loss function. In the case of linear regression, the loss function is the Mean Squared Error (MSE). Gradient descent iteratively adjusts the model’s parameters to find the values that minimize the loss.

import numpy as np

class MultipleLinearRegression:
    def __init__(self, learning_rate=0.01, n_iterations=1000):
        self.learning_rate = learning_rate
        self.n_iterations = n_iterations
        self.weights = None
        self.bias = None
        self.history = {'loss': [], 'weights': [], 'bias': []}

    def predict(self, X):
        return np.dot(X, self.weights) + self.bias

    def fit(self, X, y):
        n_samples, n_features = X.shape
        self.weights = np.zeros(n_features)
        self.bias = 0
        for i in range(self.n_iterations):
            y_predicted = self.predict(X)

            # Compute gradients
            dw = (1/n_samples) * np.dot(X.T, (y_predicted - y))
            db = (1/n_samples) * np.sum(y_predicted - y)

            # Update parameters
            self.weights -= self.learning_rate * dw
            self.bias -= self.learning_rate * db

            # Track loss and parameters
            loss = np.mean((y_predicted - y) ** 2)
            self.history['loss'].append(loss)
            self.history['weights'].append(self.weights.copy())
            self.history['bias'].append(self.bias)

Normal Equation Method

The normal equation provides a closed-form solution to calculate the optimal weights directly. It avoids the need for iterative optimization but can be computationally expensive for large datasets.

class MultipleLinearRegressionNormal:
    def __init__(self):
        self.weights = None
        self.bias = None
    
    def predict(self, X):
        return np.dot(X, self.weights) + self.bias

    def fit(self, X, y):
        # Add bias term (column of ones) to the feature matrix
        X_b = np.c_[np.ones((X.shape[0], 1)), X]

        # Compute the optimal weights using the normal equation
        betas = np.linalg.inv(X_b.T @ X_b) @ X_b.T @ y

        # Separate bias and weights
        self.bias = betas[0]
        self.weights = betas[1:

3. Training the Model

After defining the models, we proceed to train them using the generated synthetic data. The dataset is split into training and testing sets, and the models are fitted to the training data.

# Generate synthetic data
X, y = generate_data()
X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=0.2, random_state=42)

# Scale the features (important for gradient descent)
scaler = StandardScaler()
X_train_scaled = scaler.fit_transform(X_train)
X_test_scaled = scaler.transform(X_test)
# Train Gradient Descent Model
gd_model = MultipleLinearRegression(learning_rate=0.06, n_iterations=700)
gd_model.fit(X_train_scaled, y_train)
# Train Normal Equation Model
normal_model = MultipleLinearRegressionNormal()
normal_model.fit(X_train_scaled, y_train)

4. Evaluating the Model

After training, we evaluate both models using the R² score, which measures the proportion of variance explained by the model. A higher R² score indicates a better model fit.

The R² (R-squared), or coefficient of determination, is a statistical measure that indicates how well the independent variables explain the variance in the dependent variable. The formula for R² is:

from sklearn.metrics import r2_score

# Evaluate the models
gd_predictions = gd_model.predict(X_test_scaled)
normal_predictions = normal_model.predict(X_test_scaled)

print("Gradient Descent Model R² Score:", r2_score(y_test, gd_predictions))
print("Normal Equation Model R² Score:", r2_score(y_test, normal_predictions))

5. Visualization

Visualizing the training process can help us understand the model’s convergence. We plot the loss progression over iterations and track how the weights change during training.

import matplotlib.pyplot as plt

# Plot loss progression
plt.figure(figsize=(10, 5))
plt.subplot(1, 2, 1)
plt.plot(gd_model.history['loss'])
plt.title('Loss Progression')
plt.xlabel('Iterations')
plt.ylabel('Mean Squared Error')
# Plot weight changes
plt.subplot(1, 2, 2)
weights_history = np.array(gd_model.history['weights'])
for i in range(weights_history.shape[1]):
    plt.plot(weights_history[:, i], label=f'Weight {i+1}')
plt.title('Weight Progression')
plt.xlabel('Iterations')
plt.ylabel('Weight Value')
plt.legend()
plt.tight_layout()
plt.show()

Complete Example Code

Here’s the complete code combining all steps:

import numpy as np
import matplotlib.pyplot as plt
from sklearn.model_selection import train_test_split
from sklearn.metrics import r2_score
from sklearn.preprocessing import StandardScaler

Generate synthetic data

# Set random seed for reproducibility
np.random.seed(42)

def generate_data(n_samples=100, n_features=3):
    X = np.random.randn(n_samples, n_features)
    true_coefficients = np.array([3, 1.5, -2])
    noise = np.random.normal(0, 0.5, n_samples)
    y = 2 + np.dot(X, true_coefficients) + noise
    return X, y

Generate and prepare data

X, y = generate_data()
X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=0.2, random_state=42)

Scale the features

scaler = StandardScaler()
X_train_scaled = scaler.fit_transform(X_train)
X_test_scaled = scaler.transform(X_test)

Train Gradient Descent Model

gd_model = MultipleLinearRegression(learning_rate=0.06, n_iterations=700)
gd_model.fit(X_train_scaled, y_train)

Train Normal Equation Model


normal_model = MultipleLinearRegressionNormal()
normal_model.fit(X_train_scaled, y_train)

Evaluate models


gd_predictions = gd_model.predict(X_test_scaled)
normal_predictions = normal_model.predict(X_test_scaled)
print("Gradient Descent Model R² Score:", r2_score(y_test, gd_predictions))
print("Normal Equation Model R² Score:", r2_score(y_test, normal_predictions))

Visualization

plt.figure(figsize=(12, 5))
plt.subplot(1, 2, 1)
plt.plot(gd_model.history['loss'])
plt.title('Loss Progression')
plt.xlabel('Iterations')
plt.ylabel('Mean Squared Error')
plt.subplot(1, 2, 2)
weights_history = np.array(gd_model.history['weights'])
for i in range(weights_history.shape[1]):
    plt.plot(weights_history[:, i], label=f'Weight {i+1}')
plt.title('Weight Progression')
plt.xlabel('Iterations')
plt.ylabel('Weight Value')
plt.legend()
plt.tight_layout()
plt.show()

Conclusion

By implementing multiple linear regression from scratch, we gain a deeper understanding of the underlying mechanics of regression models. The two methods — Gradient Descent and Normal Equation — offer different trade-offs in terms of complexity and computational efficiency. Gradient Descent is more flexible and scales well with large datasets, while the Normal Equation provides a direct solution, making it faster for smaller datasets.

Key Takeaways:

Multiple Linear Regression models the relationship between multiple features and a target variable.
Gradient Descent and Normal Equation are two common methods for solving MLR problems.
Feature scaling is important for
Gradient Descent to converge efficiently.
Evaluating the model using metrics like R² helps assess its performance.

Potential Improvements:

Implement regularization (e.g., Lasso, Ridge) to prevent overfitting.
Use cross-validation to get a better estimate of model performance.
Explore more advanced optimization algorithms (e.g., Stochastic Gradient Descent).

If you’re looking to dive deeper into linear regression and machine learning, here are some great resources:

Feel free to reach out with any questions or thoughts. Happy learning and coding!

Building an Intelligent Search Agent with LangChain

Ahsan Umar — Thu, 05 Dec 2024 03:00:57 GMT

Search Agent with LangChain

Introduction

In this blog post, we’ll explore how to create a powerful search agent using LangChain. This agent can perform web searches and provide informative responses to our queries. We’ll use the DuckDuckGo search API and OpenAI’s language model to create an intelligent agent to understand questions and provide relevant answers.

Prerequisites

Before we begin, ensure you have the following:

Python 3.8+
LangChain installed (pip install langchain)
GROQ API key (or another compatible language model)
Basic understanding of Python and AI concepts

1. Setting Up the Environment

First, let’s set up our project and install the necessary dependencies:

%pip install python-dotenv langchain langchain-community langchain_groq

2. Importing Required Libraries

import os
from dotenv import load_dotenv
from langchain.agents import Tool, AgentExecutor, create_react_agent
from langchain.tools import DuckDuckGoSearchRun
from langchain_groq import ChatGroq
from langchain.prompts import PromptTemplate
from langchain.memory import ConversationBufferMemory

# Load environment variables
load_dotenv()

2. Creating the Search Tool

We’ll create a search tool using DuckDuckGo’s search API. This will allow our agent to search the internet for information.

# Initialize the search tool
search = DuckDuckGoSearchRun()

tools = [
    Tool(
        name="Search",
        func=search.run,
        description="Useful for searching information on the internet. Use this when you need to find current or factual information."
    )
]

3. Setting Up the Language Model

We will use a Large Language Model (OpenAI, Claude, Gemini) as the brain of our agent. This will aid in processing and understanding user queries and search results. In this app, we utilized the Llama model provided by Groq.

# Initialize the language model
llm = ChatGroq(
    temperature=0.7,
    model="llama3-8b-8192",
    api_key=os.getenv("Groq_API_KEY"),
)

4. Creating the Agent

Now we’ll create our agent by combining the search tool with the language model.

# Define the prompt template for the agent
template = """Answer the following questions as best you can. You have access to the following tools:

{tools}

Use the following format:

Question: the input question you must answer
Thought: you should always think about what to do
Action: the action to take, should be one of [{tool_names}]
Action Input: the input to the action
Observation: the result of the action
... (this Thought/Action/Action Input/Observation can repeat N times)
Thought: I now know the final answer
Final Answer: the final answer to the original input question

Begin!

Question: {input}
Thought: {agent_scratchpad}"""

# Create the agent
agent = create_react_agent(
    llm=llm,
    tools=tools,
    prompt=PromptTemplate.from_template(template)
)

# Create the agent executor
agent_executor = AgentExecutor.from_agent_and_tools(
    agent=agent,
    tools=tools,
    verbose=True,
    memory=ConversationBufferMemory(memory_key="chat_history")
)

5. Testing the Search Agent

Let’s test our agent with some example queries to see how it performs.

def ask_question(query):
    """
    Function to ask a question to the agent and get a response
    """
    try:
        response = agent_executor.invoke({"input": query})
        return response.get("output", "Sorry, I couldn't generate a response.")
    except Exception as e:
        return f"An error occurred: {str(e)}"

Test the agent with a query

query = "What are the latest developments in quantum computing?"
print(ask_question(query))

Understanding How It Works

Our search agent works through several key components:

1. Tools: The DuckDuckGo search tool allows the agent to search the internet for information.

2. Language Model: The ChatOpenAI model processes queries and search results intelligently.

3. Agent: The ReAct agent follows a thought process of:

Understanding the question
Deciding what information is needed
Using tools to gather information
Formulating a response

4. Memory: The ConversationBufferMemory allows the agent to maintain context across multiple interactions.

Best Practices and Tips

1. API Key Security: Always use environment variables for API keys

2. Temperature Setting: Adjust the temperature parameter to control response creativity

3. Error Handling: Implement proper error handling for API calls

4. Rate Limiting: Be mindful of API rate limits

Conclusion

We’ve successfully built a powerful search agent using LangChain that can:

Perform web searches
Process and understand search results
Provide informative responses
Maintain conversation context

GitHub:: https://shorturl.at/ofO3O

LinkedIn:: https://shorturl.at/ZmxDv

Notebook:: https://shorturl.at/1ATxC

Mastering Linear Regression: A Visual Journey from Theory to Implementation

Ahsan Umar — Tue, 03 Dec 2024 12:34:04 GMT

Data

Linear Regression is a fundamental algorithm in machine learning that serves as a stepping stone to understanding more complex models. In this comprehensive guide, we’ll explore the algorithm through interactive visualizations and practical implementation.

1. Understanding Linear Regression

The Core Concept

Linear Regression models the relationship between variables by fitting a linear equation to observed data. In its simplest form:

y = wx + b

Where:

y is the predicted value (dependent variable)
x is the input feature (independent variable)
w is the weight (slope)
b is the bias (y-intercept)

The Learning Process

The model learns by minimizing the Mean Squared Error (MSE):

MSE = (1/n) * Σ(y_true - y_predicted)²

2. Implementation from Scratch

Let’s implement Linear Regression using NumPy. Our implementation includes:

class LinearRegression:
    def __init__(self, learning_rate=0.01, n_iterations=1000):
        self.learning_rate = learning_rate
        self.n_iterations = n_iterations
        self.weights = None
        self.bias = None
        self.history = {'loss': [], 'weights': [], 'bias': []}
    
    def fit(self, X, y):
        # Initialize parameters
        n_samples, n_features = X.shape
        self.weights = np.zeros(n_features)
        self.bias = 0
        
        # Gradient descent
        for _ in range(self.n_iterations):
            # Forward pass
            y_predicted = np.dot(X, self.weights) + self.bias
            
            # Compute gradients
            dw = (1/n_samples) * np.dot(X.T, (y_predicted - y))
            db = (1/n_samples) * np.sum(y_predicted - y)
            
            # Update parameters
            self.weights -= self.learning_rate * dw
            self.bias -= self.learning_rate * db
            
            # Track history
            loss = self._compute_loss(y, y_predicted)
            self.history['loss'].append(loss)

Visualizing the Learning Process

Learning Process

The animation above shows how our model iteratively learns to fit the data. Watch as the red line (our model’s predictions) adjusts to better fit the blue data points.

Training Progress

Loss History

The loss history plot demonstrates how our model’s error decreases over time, indicating successful learning.

Parameter Evolution

This visualization shows how our model’s parameters (weight and bias) converge to their optimal values. The dashed red lines indicate the true parameters used to generate our synthetic data.

Final Model Performance

The final model achieves excellent fit to the data, with predictions (red line) closely following the underlying pattern in the data points (blue dots).

3. Key Implementation Details

Gradient Descent

The model uses gradient descent to minimize the loss function:

1. Calculate predictions: `y_pred = X * w + b`

2. Compute gradients:

dw = (1/n) * X.T * (y_pred - y)
db = (1/n) * sum(y_pred - y)

3. Update parameters:

w = w - learning_rate * dw
b = b - learning_rate * db

Hyperparameters

Learning Rate: Controls step size during optimization
Number of Iterations: Determines training duration
Initial Parameters: Starting points for weights and bias

4. Best Practices and Tips

1. Data Preprocessing.

Scale features to similar ranges
Remove outliers
Handle missing values

2. Model Training.

Start with small learning rates
Monitor loss for convergence
Use early stopping if needed

3. Evaluation.

Split data into train/test sets
Use multiple metrics (MSE, R², MAE)
Validate assumptions

6. Advanced Topics

1. Regularization.

L1 (Lasso)
L2 (Ridge)
Elastic Net

2. Extensions.

Polynomial Regression
Multiple Linear Regression
Weighted Linear Regression

Conclusion

Linear Regression, despite its simplicity, provides:

Strong foundation for machine learning concepts
Practical utility in many real-world applications
Insights into model training and optimization

The visual approach we’ve taken helps us understand:

How the model learns over time
The role of different parameters
The importance of proper training

Resources

https://github.com/codewithdark-git/ML-Algorithms-From-Scratch

https://www.kaggle.com/code/codewithdark/linear-regression-implementation-from-scratch

DocsGPT: A Technical Deep Dive into Modern Documentation Search

Ahsan Umar — Mon, 02 Dec 2024 15:35:47 GMT

System Architecture Overview

System Architecture Overview

Technical Architecture

DocsGPT is built on a sophisticated stack that combines vector search, large language models, and modern web technologies. Let’s break down each component and understand how they work together.

Core Components:

Vector Search Engine.

# Key technologies:
- FAISS (Facebook AI Similarity Search)
- HuggingFace Embeddings
- Groq LLM Integration

Content Processing Pipeline.

# Features:
- Recursive text splitting
- Metadata enrichment
- Chunk optimization
- Vector store management

Search Orchestration.

# Capabilities:
- Multi-source retrieval
- Context-aware querying
- Response enhancement
- Source attribution

Technical Implementation Details

Vector Store Implementation

1. Vector Store Implementation

The system uses FAISS for efficient similarity search:

class ContentIngestionPipeline:
    def __init__(self):
        self.vector_store = None
        self.embeddings = HuggingFaceEmbeddings()
        self._load_vector_store()

Key features:

Persistent vector storage
Optimized similarity search
Configurable search parameters
Automatic store management

2. Content Processing

The content processing pipeline is sophisticated:

splitter = RecursiveCharacterTextSplitter(
    chunk_size=1000,
    chunk_overlap=200,
    length_function=len,
    separators=["\n\n", "\n", ". ", " ", ""]
)

Processing steps:

URL content extraction
Text chunking with overlap
Metadata enrichment
Quality filtering
Vector embedding generation

Search Orchestration

3. Search Orchestration

The search flow includes:

async def search(self, query: str, k: int = 3):
    # 1. Vector DB check
    # 2. Google search fallback
    # 3. Content processing
    # 4. AI enhancement

Features:

Asynchronous processing
Multi-stage search pipeline
Error handling and logging
Result refinement

4. AI Enhancement

The system uses Groq for response generation:

self.llm = ChatGroq(
    api_key=os.getenv('GROQ_API_KEY'),
    model_name=os.getenv('MODEL_NAME'),
    max_tokens=2000,
    temperature=0.3
)

Capabilities:

Context-aware responses
Temperature-controlled output
Token optimization
Source attribution

Performance Optimizations.

Vector Search.

Similarity score thresholding
Configurable k-nearest neighbors
Fetch optimization

search_kwargs={
    "k": 3,
    "score_threshold": 0.5,
    "fetch_k": 10
}

2. Content Processing.

Chunk size optimization
Overlap management
Small chunk filtering
Metadata enrichment

3. Response Generation.

Async processing
Error handling
Logging integration
Result caching

roadmap visualization

Future Technical Enhancements:

Scalability Improvements.

Distributed vector storage
Load balancing
Caching optimization

2. Search Enhancement.

Cross-language support
Semantic search improvements
Custom embedding models

3. Content Processing.

Advanced text extraction
Multi-format support
Real-time updates

Conclusion

DocsGPT represents a sophisticated implementation of modern search technologies, combining vector search, large language models, and efficient content processing. Its modular architecture and thoughtful optimizations make it a powerful tool for documentation search and retrieval.

The system’s technical design prioritizes:

Search accuracy
Processing efficiency
Response quality
Scalability
Maintainability

This technical implementation ensures that DocsGPT can handle complex documentation queries while providing accurate, context-aware responses in a timely manner.

https://github.com/codewithdark-git/DocsGPT.git