Stories by Halil İbrahim Hatun on Medium

Yapay Zeka Bizi Nasıl Anlıyor?

Halil İbrahim Hatun — Mon, 10 Mar 2025 14:10:53 GMT

Basit bir şekilde teknik anlatım

Merhaba, ben Halil İbrahim. Umarım her şey yolundadır. Bugün, yapay zeka alanında bir kariyer yolu çizmeyi düşünenler ve genel olarak yapay zeka hakkında bilgi edinmek isteyenler için bir blog yazısı paylaşacağım.

Bu konu üzerine uzun süre düşündüm ve genelde birçok kişinin aklında şu soru var: Yapay zeka, yazdıklarımızı nasıl anlıyor ve buna göre nasıl cevap üretiyor? Çoğu kişi, yapay zekanın Google benzeri bir şekilde arama yapıp cevap verdiğini düşünüyor. Ancak, yapay zekanın işleyiş mekanizmasını tam olarak kavrayamayanların sayısı oldukça fazla.

Bu nedenle, şimdilik yapay zekanın bir metni nasıl anladığı konusuna odaklanacağım. Bu alanda birçok farklı yöntem bulunuyor, ancak ben olabildiğince basit ve anlaşılır bir şekilde, en güncel yapıyı sizlere aktarmaya çalışacağım.

Umarım bu yazı, yapay zeka dünyasına ilgi duyanlar için faydalı bir rehber olur.

1. Frekans bazlı anlam çıkarma işlemi (TF-IDF)

TF-IDF, yani “Term Frequency — Inverse Document Frequency”, metindeki kelimelerin hangi sıklıkla geçtiğini ve bu kelimelerin ne kadar anlamlı olduğunu hesaplayan bir yöntemdir. Sık geçen kelimeler önemlidir gibi düşünebilirsiniz ama her zaman değil! Mesela “ve”, “bir”, “gibi” kelimeleri her metinde geçer, bu yüzden bu tür kelimelere düşük önem (ağırlık) verilir. Daha az geçen ve metne özel olan kelimeler ise daha anlamlı kabul edilir.

TF-IDF, özellikle klasik makine öğrenmesi modellerinde sık kullanılan bir tekniktir.

2. Dağıtımsal Anlam Teorisi ve Word2Vec

Bu yaklaşımın temelinde, bir kelimenin anlamının, çevresindeki kelimelerle olan ilişkisiyle öğrenilebileceği fikri yatar. “Arkadaşlarından bir kelimeyi tanıma” diyebiliriz.

Word2Vec bu fikri uygulayan ilk popüler modellerden biridir. 2013'te Google tarafından geliştirildi ve iki ana yöntemi vardır:

CBOW (Continuous Bag of Words): Çevredeki kelimelere bakarak ortadaki kelimeyi tahmin eder.
Skip-gram: Tam tersi, bir kelimeyi verip çevresindeki kelimeleri tahmin etmeye çalışır.

Bu eğitim süreci sonucunda her kelime bir sayısal vektör haline gelir ve bu vektörler anlam açısından birbirine yakın olan kelimeleri birbirine yakın konumlandırır. Örneğin “kral” ve “kraliçe” gibi kelimeler birbirine yakın olurken, “masa” daha uzak bir konumda bulunur.

Word2Vec’in avantajı, metindeki kelimeleri sadece frekansla değil, bağlama göre anlamlandırır. Bu, kelimeler arasındaki semantik ilişkileri yakalamayı sağlar.

3. BERT Embedding (Contextual Embedding)

Word2Vec’in güzel bir başlangıç olmasına rağmen, bir kelimenin anlamının cümledeki yerine ve bağlamına göre değişebildiğini fark etmek gerekiyor. Örneğin:

“Bank’ta oturuyorum.” (parkta bir bank)
“Bank kredisi çektim.” (finans kurumu)

İşte bu noktada BERT (Bidirectional Encoder Representations from Transformers) devreye giriyor. 2018’de Google tarafından geliştirilen BERT, kelimeleri çift yönlü bağlamda anlıyor. Yani, bir kelimenin anlamını hem öncesindeki hem sonrasındaki kelimelere bakarak çözümlüyor. Aslında bu yapıyı da Transformer mimarisine dayanarak yapıyor.

Transformer mimarisi ilk olarak 2017 de yine Google tarafından tanıtıldı. Bu mimarinin başlığı “Attention Is All You Need” idi.

Bu başlık aslında yapay zeka mimarilerinde bulunan dikkat mekanizmasının öneminden bahsediyordu. Bir transformer mimarisi encoder ve decoder bloklarından oluşuyordu. Encoder, verilen girdiği matematiksel olarak anlamdırıyor; Decoder ise bunu tekrardan insanların anlayabileceği bir dile dönüştürüyordu. Bu iki blokta da aslında transformer mimarisinin neden önemli olduğu ve ileride büyük başarıların temeli olacağını gösteren bir yapı vardı: Multi-Head Attention. Bu mekanizma tam da başlıkla bağdaşır bir yapıdaydı. Diğer bloglarda daha detaylı ve matematiksel olarak açıklamak üzere şuan için bu yapı nedir kısaca anlatmak istiyorum.

Multi-Head Attention mekanizması bizim bir içeriği kavrayabilme (comprehension) yeteneğimizi arttıran bir yapıdır. Şöyle düşünelim, bir Tarihçi ve bir Mühendise yapay zeka hakkında fikirleri belirtmeleri isteniyor. Burada doğal olarak Tarihçi yapay zekanın tarihi gelişimine odaklanırken, mühendis ise teknik altyapısına odaklanır. Her bir uzmanlığı bir Head olarak düşünürsek Multi-Head yapısı hepsinin birleşimi olmuş oluyor. Bu da aslında bize her konuda daha tutarlı ve kapsayıcı cevaplar vermesini sağlıyor.

Multi-Head Attention’ı da cebe attığımıza göre artık Bert Embedding işleminin derinlerine girip blogumuzu tamamlamızın zamanı geldi.

BERT, bir cümleyi işlerken kelimelerin yalnızca kendilerine ait anlamlarını değil, aynı zamanda cümle içindeki konumlarını ve hangi alt cümlenin parçası olduklarını da göz önünde bulundurur. Bu bağlamda, BERT embedding işlemi üç ana bileşenden oluşur:

Token Embeddings

Öncelikle, bu aşamada tokenization işlemini kısaca açıklamak istiyorum. BERT modeli, öğrenme sürecinde büyük bir veri seti üzerinde ön eğitime tabi tutuldu. Tokenization işleminde ise her bir kelime veya alt kelime, 0'dan başlayarak belirli bir numara verildi ve bu numaralandırma, tüm kelime havuzu tükenene kadar devam etti. Artık yeni bir kelime geldiğinde, o kelimenin bu numara ile temsil edilmesine tokenization işlemi denir.

Token embedding işleminde, örneğin BERT-base modeli üzerinden düşünürsek, bu model 768 adet gizli katmana sahiptir. Yani, modele 1 boyutlu bir girdi verdiğinizde, bu girdiyi 768 boyutlu bir vektör olarak çıktı verir. Token Embedding tam da burada devreye girer. Bu işlem, 0'dan başlayarak belirlenen sayıların (tokenların), modelin ön eğitimi sonrasında 768 boyutlu bir vektör olarak temsil edilmesini sağlar.

Örnek

Token 1: “Merhaba” (veya “Mer”, “##haba”)
Token 2: “dünya”

“Merhaba” token’ına ID: 1050
“dünya” token’ına ID: 2034
olarak atama yapılabilir.

Bu ID değerlerinin BERT modeline girdi olarak verilmesi ve 768'lik iki vektör ortaya çıkması.

2. Segment Embeddings

BERT modelinde, örneğin iki ayrı cümle veya metin parçası olduğunu düşünelim. Her bir tokena, o tokenın ait olduğu segmenti (cümleyi) belirlemek için ekstradan 768 boyutlu bir vektör eklenir. Bu segment embedding işleminin amacı, iki cümleyi birbirinden ayırmaktır.

Örneğin, şu iki cümleyi ele alalım:

Cümle 1: “Onunla görüşmek istemiyorum.”
Cümle 2: “Herkese iyi günler dilerim.”

Bu durumda, Cümle 1'deki tokenlara segment olarak 1 numarası verilir ve bu, modeldeki 768 boyutlu vektör çıktısına yansıtılır. Benzer şekilde, Cümle 2'deki tokenlara ise segment olarak 2 numarası verilir. Bu sayede model, iki cümleyi birbirinden ayırt edebilir ve her bir tokenın hangi cümleye ait olduğunu anlayabilir.

3. Positional Embeddings

Segment Embedding, cümlelerin konumlanmasını veya sırasını belirlerken, Positional Embedding ise bir token’ın cümle veya metin parçası içindeki sırasını temsil eder.

Örneğin, şu cümleyi ele alalım:
“Bugün günlerden pazar.”

Bu cümlenin tokenlarını “Bugün”, “günlerden” ve “pazar” olarak düşünelim. Bu tokenların positional embedding değerleri, cümledeki sıralarına göre belirlenir. Yani:

“Bugün” → 1
“günlerden” → 2
“pazar” → 3

Bu sıra numaraları, BERT-base modelinin çıktısı olan 768 boyutlu bir vektöre dönüştürülür. Bu sayede model, her bir token’ın cümle içindeki konumunu anlayabilir ve bu bilgiyi işlemlerinde kullanır.

Evet, üç farklı embedding yönteminden bahsettik. BERT Embedding’in genel formülasyonu, her biri 768 boyutlu vektörler olan bu embeddinglerin toplamıyla oluşur. Formül şu şekildedir:

BERT Embedding = Token Embedding + Segment Embedding + Positional Embedding

Bu üç embedding türünün toplamı, BERT modelinin her bir token için oluşturduğu nihai vektör temsilini (embedding) ortaya çıkarır.

Kapanış

Evet, blogumuzun sonuna geldik. Umarım yazıyı beğenmişsinizdir. Olumlu veya olumsuz, her türlü geri dönüşe açığım. Ayrıca, gelecekte ne tarz içerikler üretilebileceği konusunda da önerilerinizi bekliyorum. Sağlıcakla kalın, bir sonraki blogta görüşmek üzere!

Evaluate and Debug Your GenAI Model | Part 1 — Introduction to W&B

Halil İbrahim Hatun — Fri, 03 Jan 2025 06:53:15 GMT

Evaluate and Debug Your GenAI Model | Part 1 — Introduction to W&B

This tutorial was designed using the Weights & Biases (W&B) Library.

Hello, ı hope everything is going well 😊. Today is so exciting for me 🎉. I am starting a new tutorial that touches on how we can monitor our LLM or GenAI model during fine-tuning, serving, and evaluating 💡. These kinds of cases are crucial nowadays because due to AI, everything is becoming so easier to handle day by day. However, evaluating and monitoring part is still extremely important to specify customer market, average cost, and so on. Many different kinds of apps help us in this kind of situation. I prefer to continue with Weight & Biases (W&B) which is more effective than others, ı think.

Let’s start our tutorial by mentioning how can we log our model during training.

What is W&B?

Weights & Biases (W&B) is your new best friend in the world of machine learning. It’s like having a personal assistant who keeps track of all your experiments, visualizes your model’s performance, and even helps you collaborate with your team. Imagine having a superhero sidekick who makes sure your AI adventures are smooth and successful.

Key Features of W&B:

Experiment Tracking: Never lose track of your experiments again. W&B logs and tracks everything, so you can compare runs and figure out what’s working (and what’s not).
Visualization: See your model’s performance metrics, loss curves, and more in real time. It’s like having a crystal ball for your AI projects.
Collaboration: Share your experiments with your team, get feedback, and work together more effectively. Teamwork makes the dream work, right?
Version Control: Keep tabs on different versions of your models and datasets. No more “Wait, which version was this again?” moments.
Integration: W&B plays nicely with popular machine learning frameworks like PyTorch and TensorFlow. It’s like the social butterfly of AI tools.

Why Use W&B?

Using W&B can supercharge your workflow. Here’s how:

Efficiency: Automate the logging of your experiments and say goodbye to manual errors. It’s like having a robot butler for your AI projects.
Insight: Get deep insights into your model’s performance with detailed visualizations and analytics. Knowledge is power!
Collaboration: Work better with your team by sharing and discussing experiments. Two heads are better than one, after all.
Reproducibility: Ensure your experiments are reproducible, making debugging and improvements a breeze. No more “It worked on my machine” excuses.

Okay, enough theory. Let’s dive in and have some fun!

Let’s Create a Basic MLP Model and Log it with W&B for Sprite Classification

Step 1: Define Our Libraries

First things first, let’s import the necessary libraries:

import math
from pathlib import Path
from types import SimpleNamespace
from tqdm.auto import tqdm
import torch
import torch.nn as nn
import torch.nn.functional as F
from torch.optim import Adam
from utilities import get_dataloaders

import wandb

Step 2: Define Constants and Create the get_model Method

Let’s set up our constants and build a simple model:

INPUT_SIZE = 3 * 16 * 16
OUTPUT_SIZE = 5
HIDDEN_SIZE = 256
NUM_WORKERS = 2
CLASSES = ["hero", "non-hero", "food", "spell", "side-facing"]
DATA_DIR = Path('./data/')
DEVICE = torch.device("cuda" if torch.cuda.is_available()  else "cpu")

def get_model(dropout):
    "Simple MLP with Dropout"
    return nn.Sequential(
        nn.Flatten(),
        nn.Linear(INPUT_SIZE, HIDDEN_SIZE),
        nn.BatchNorm1d(HIDDEN_SIZE),
        nn.ReLU(),
        nn.Dropout(dropout),
        nn.Linear(HIDDEN_SIZE, OUTPUT_SIZE)
    ).to(DEVICE)

Step 3: Define Hyperparameters

Let’s store our hyperparameters in a config object:

config = SimpleNamespace(
    epochs = 2,
    batch_size = 128,
    lr = 1e-5,
    dropout = 0.5,
    slice_size = 10_000,
    valid_pct = 0.2,
)

Step 4: Define Train and Evaluate Methods

Now, let’s define our training and evaluation methods:

def train_model(config):
    "Train a model with a given config"
    
    wandb.init(
        project="dlai_intro",
        config=config,
    )

    # Get the data
    train_dl, valid_dl = get_dataloaders(DATA_DIR, 
                                         config.batch_size, 
                                         config.slice_size, 
                                         config.valid_pct)
    n_steps_per_epoch = math.ceil(len(train_dl.dataset) / config.batch_size)

    # A simple MLP model
    model = get_model(config.dropout)

    # Make the loss and optimizer
    loss_func = nn.CrossEntropyLoss()
    optimizer = Adam(model.parameters(), lr=config.lr)

    example_ct = 0

    for epoch in tqdm(range(config.epochs), total=config.epochs):
        model.train()

        for step, (images, labels) in enumerate(train_dl):
            images, labels = images.to(DEVICE), labels.to(DEVICE)

            outputs = model(images)
            train_loss = loss_func(outputs, labels)
            optimizer.zero_grad()
            train_loss.backward()
            optimizer.step()

            example_ct += len(images)
            metrics = {
                "train/train_loss": train_loss,
                "train/epoch": epoch + 1,
                "train/example_ct": example_ct
            }
            # To log training metrics on W&B dashboard
            wandb.log(metrics)
            
        # Compute validation metrics, log images on last epoch
        val_loss, accuracy = validate_model(model, valid_dl, loss_func)
        # Compute train and validation metrics
        val_metrics = {
            "val/val_loss": val_loss,
            "val/val_accuracy": accuracy
        }
        # To log validation metrics on W&B dashboard
        wandb.log(val_metrics)
     
    # Ending process
    wandb.finish()

def validate_model(model, valid_dl, loss_func):
    "Compute the performance of the model on the validation dataset"
    model.eval()
    val_loss = 0.0
    correct = 0

    with torch.inference_mode():
        for i, (images, labels) in enumerate(valid_dl):
            images, labels = images.to(DEVICE), labels.to(DEVICE)

            # Forward pass
            outputs = model(images)
            val_loss += loss_func(outputs, labels) * labels.size(0)

            # Compute accuracy and accumulate
            _, predicted = torch.max(outputs.data, 1)
            correct += (predicted == labels).sum().item()
            
    return val_loss / len(valid_dl.dataset), correct / len(valid_dl.dataset)

Step 5: Configure W&B and Bake Our Model

Before we shine, let’s configure W&B. You can see your logs without an account, or use your W&B API key to save them.

If you want to continue without signing in, enter 1. If not, enter 2 and paste your W&B API key. You can find it here. I'll continue with my API key.

And… Time to bake our model!

train_model(config)

Our baked model is ready! You can see your logs in Jupyter Notebook. But that’s not all. You can also access your project link that comes after “View Project.”

We trained our model and saw our logs on our accounts. However, we often don’t get what we want in one training process. Let’s do other training processes by changing the learning rate.

config.lr = 1e-4
train_model(config)

config.lr = 1e-4
train_model(config)

config.dropout = 0.1
config.epochs = 1
train_model(config)

config.lr = 1e-3
train_model(config)

After running these commands, we trained four models with different hyperparameters. When you click on our project on W&B again, you can compare them like this:

That’s it! We have trained many models and compared them. Therefore, we can easily decide which hyperparameter configuration is more suitable.

Conclusion

In this blog, we introduced W&B by training simple MLP models with different hyperparameters and logged their metrics into W&B.

This is just the beginning. I will continue with training diffusion models, evaluating them, tracing our LLM models, and fine-tuning them by always using W&B.

Your responses are so valuable for me to continue this tutorial. Play the waiting game. See you soon, bye-bye!

Exploring DataGemma: An Overview

Halil İbrahim Hatun — Fri, 13 Sep 2024 13:55:12 GMT

Figure 1. DataGemma Logo

Despite the advancements in large language models (LLMs), AI hallucinations remain a significant challenge. On September 12, Google made a major stride by releasing DataGemma as open source. DataGemma leverages real-world data to tackle these hallucinations, reflecting Google’s commitment to addressing this issue. In this blog, I will provide an overview of DataGemma and explore two distinct approaches used to improve LLM accuracy and reasoning.

What is Data Commons

Google’s Data Commons project serves as a vast repository of public data, designed to streamline the access and use of important global statistics. It consolidates information from a wide range of trusted sources, including the United Nations, government agencies, environmental organizations, and universities. With over 250 billion data points and more than 2.5 trillion triples, it represents a significant open-source initiative aimed at making global data more accessible and useful.

Data Commons features two notable innovations. First, it has dedicated years to curating diverse public datasets, understanding their underlying assumptions, and organizing them using Schema.org, a universal language for structured data. This effort results in a comprehensive Knowledge Graph that integrates data from various sources.

Second, Data Commons incorporates a natural language interface powered by large language models (LLMs). This allows users to pose questions in everyday language, with the LLM translating these queries into the Data Commons’ format. This interface facilitates the exploration of charts and graphs without altering or fabricating the underlying data.

Interfacing LLMs with Data Commons

Two different approaches have been described for interfacing LLMs with Data Commons.

Figure 2. Comparison of Baseline, RIG, and RAG approaches for generating responses with statistical data

The first approach, called Retrieval Interleaved Generation (RIG), fine-tunes the LLM to not only generate natural language queries but also pull stats from Data Commons. It uses a multi-step pipeline to convert these into structured data queries. We then compare this to how the base models, Gemma 7B IT and 27B IT, perform.

The second approach, Retrieval Augmented Generation (RAG), takes a more classic retrieval method. It extracts variables from the query, grabs relevant data, and adds context to the original question. Then it produces an answer using an LLM (Gemini 1.5 Pro), which we use for comparison against the baseline results.

Retrieval Interleaved Generation (RIG)

Retrieval Interleaved Generation (RIG) is a three-step process designed to enhance the accuracy and reliability of language model responses. First, a fine-tuned model generates natural language queries for Data Commons. Next, a post-processor converts these queries into structured data formats. Finally, the system retrieves statistical answers from Data Commons and presents them alongside the original LLM-generated results.

https://medium.com/media/87680ebcc96ea4b1fe1bed4e23136c61/href

In this process, when the LLM provides a numerical answer, it is matched with the most relevant value from the Data Commons database, known as the Data Commons Statistical Value (DC-SV). The original output from the LLM is referred to as the LLM Statistical Value (LLM-SV). Instead of generating formal queries like SQL, the LLM is fine-tuned to produce natural language queries. This method is more efficient given the vast array of variables in Data Commons and helps maintain the natural and fluent quality of the model’s responses.

Query Conversation Part

In their pipeline, natural language queries generated by the LLM are transformed into structured queries for the Data Commons database. Despite the extensive range of variables and properties in Data Commons, most queries can be categorized into a few types, which streamlines the extraction process. Each query is broken down into key components: statistical variables or topics, places, and attributes. Specific NLP techniques are applied to these components: semantic search for variables, named entity recognition for places, and regex-based heuristics for attributes.

The identified components are then mapped to predefined query templates, as illustrated in the table below:

Figure 4. Predefined query templates for RIG approach

Structured queries are generated based on these templates and submitted to the Data Commons API. The response, typically a numeric value, is presented alongside the original LLM-generated statistic, facilitating verification of the LLM’s output. Future developments will explore various presentation methods for these results, including side-by-side comparisons and highlighted differences.

Retrieval Augmented Generation (RAG)

In the RAG pipeline, the process begins with a fine-tuned LLM managing the user’s query. This model generates relevant queries for Data Commons, which are used to retrieve pertinent tables from the Data Commons interface. Finally, a long-context LLM, such as Gemini 1.5 Pro, generates a response based on both the original query and the retrieved tables.

https://medium.com/media/f1091fab536e11ccbdac04fa898ffd91/href

Extracting Data Commons Queries

An LLM is fine-tuned to transform user queries into Data Commons queries. Training utilizes Gemini 1.5 Pro to generate queries in specific formats. Although the effectiveness can be limited by data availability, the initial method generally provides better results compared to alternatives that use a full variable list.

Retrieving Tables

Queries are processed using the RIG framework to identify variables and map them to Data Commons APIs. These APIs return relevant tables, such as life expectancy by country, which are used for generating responses.

Prompting

Once the tables are retrieved, a prompt is created combining the original query and serialized table data. This prompt is then processed by long-context LLMs like Gemini 1.5 Pro to generate and return a comprehensive response.

Conclusion

I hope this information proves helpful. I plan to continue exploring DataGemma, focusing on evaluating RIG and RAG approaches and their implementation in code.

Stay tuned for the next updates on this topic. Take care!

Cultivating Hospitality Excellence: Transformative Applications for Efficient Staff Management in…

Halil İbrahim Hatun — Fri, 12 Apr 2024 10:24:30 GMT

Cultivating Hospitality Excellence: Transformative Applications for Efficient Staff Management in the Hotel Industry

Image 1 — Driving Hotel Technology

Today, I’d like to discuss optimizing staff management to enhance hospitality in hotels. It’s important to note that the information I’ll be sharing reflects my opinion or is sourced from various references.

General Based Apps

In the vast realm of mobile apps, numerous solutions address this question. I’ll list some of them and offer insights for better understanding

İci4Stuff

Image 2 — İci4Stuff

Let’s delve into İcibot’s StaffApp, named “İci4Staff” [1]. Aligned with the common goal of enhancing hospitality, its primary aim is to offer swift support via mobile devices. Should a customer encounter any inquiries or issues, they can effortlessly relay them through the mobile app. These matters are then promptly assigned to staff members, who can conveniently access them on their mobile devices.

2. Smart Parking

Image 3 — Smart Parking

Let’s further explore the concept of smart parking [2]. As you’re aware, even within hotels, securing parking spots can be a significant bottleneck. To address this, computer vision technology can guide staff to the most optimal parking areas, ensuring efficient space utilization.

3. Webee

Image 4 — Webee

Webee is one of the most widely used hotel management apps in Türkiye [3], serving a purpose similar to İcibot.

CRM and Hospitality

The apps I mentioned earlier primarily focus on the broader aspects of hotel management. Now, let’s delve into other applications specifically dedicated to CRM, guest optimization, and hospitality.

Beonx

Image 5 — Dashboard page of BeonX

One notable example is Beonx [4], which prioritizes sustainable profitability. They employ AI-powered strategies such as Revenue Optimization, Hyper Segmentation, Real-Time Data and Automation, Demand Forecasting, and more to optimize hotel costs.

2. Bookboost

Image 6 — Dashboard page of Bookboost

It’s recognized as a Multi-channel CRM system [5]. Its objectives include increasing revenues, refining marketing operations, and boosting customer loyalty through the creation of individual customer profiles and the enhancement of conversation campaigns.

Personal Opinion

In my view, CRM stands out as the most potent method for optimizing revenues through personalized service and prompt feedback. Consequently, there’s a need to enhance data collection methods to gain deeper insights into customers’ behaviors.

Conclusion

In summary, hospitality is paramount in the hotel industry, and there are numerous approaches to address it. The key lies in our intent and execution. I aimed to convey essential information concisely. I hope this brief blog proved helpful to you.

Take care, and stay tuned for more in upcoming blogs.

References

Taking a Broad View of the Israel-Hamas Conflict

Halil İbrahim Hatun — Fri, 20 Oct 2023 09:33:25 GMT

As many are aware, the conflict between Israel and Hamas has reignited once again. I’ve delved into research and analysis on this critical issue. Today, my aim is to discuss these matters. We’ll navigate through the complexities, aiming for a clear and balanced perspective.

Alright, let’s delve into the intricate history of the Palestine-Israel relationship. It’s a subject that demands our attention and consideration.

History of Israel-Palestine Relations: A Swift Overview

The relationship between Israel and Palestine is a complex historical narrative. It begins with ancient Hebrew settlements around 1300 BCE, followed by various empires’ rule, including the Ottomans from the 16th century to World War I. After WWI, British control was established, leading to complications.

Zionism, a movement for a Jewish homeland, gained traction in the late 19th century, backed by the Balfour Declaration in 1917. The UN proposed a partition plan in 1947, resulting in Israel’s declaration of independence in 1948 and a subsequent war.

The conflict led to the Nakba, displacing many Palestinians. The aftermath of this war continues to influence the region’s politics. The Six-Day War in 1967 further complicated matters, leading to Israeli occupation of the West Bank, Gaza Strip, and East Jerusalem.

Though there have been attempts at peace, such as the Oslo Accords in the 1990s, finding a lasting solution remains challenging due to deep-rooted historical, religious, and political differences. Understanding this complex history is crucial for meaningful discussions about the region’s present and future.

A Deeper Dive into the Israel-Palestine Conflict

1. Israeli Side

Support for Israel is prevalent, particularly in the United States. The majority of the American government has historically aligned itself with Israel. However, it’s worth noting that many young Americans, especially teenagers, might not possess in-depth knowledge about the intricacies of the conflict. Often, they rely on what they’ve been taught or what they hear in passing.

Interestingly, within this pro-Israel camp, there are cases of Palestinians who also lend their support to Israel and advocate for its cause. One notable example is Ali Wahap, an Arab Muslim who has publicly expressed this viewpoint. Let’s look the text Ali Wahab said:

As we embark on this journey of comprehension, let’s start with a concise overview. In the forthcoming blogs, we’ll plunge into the intricacies. One term that sparks curiosity in Ali Wahab’s discourse is “apartheid.” It raises an important question: can Israel be labeled as an apartheid state? To tackle this, we’ll dissect diverse viewpoints and meticulously assess each stance.

Envisioned opinion

Those who have experienced Israel firsthand, whether through residency or visits, often contest the label of “apartheid state” attributed to it. They argue that Israel, as a nation, upholds a policy of equal rights for all its citizens, irrespective of their race, religion, ethnicity, gender, or sexual orientation. This perspective insists that any notion of Israel as an apartheid state is a gross mischaracterization.

Critics, however, interpret these claims differently. They assert that anti-Israel sentiments are rooted in a desire for the dismantling of Israel itself. To them, branding Israel as an apartheid state signifies the separation between Israelis and Palestinians, and is seen as a strategic move toward eradicating Israel in favor of a unified Palestine. This would entail the return of descendants of refugees and a shift in the demographic majority towards Muslims.

Within Israel, there exists a significant Arab population who hold citizenship, affording them equal rights. Notably, these Arab citizens are not subject to compulsory military service. It is important to note that, due to the absence of a civil marriage law, individuals of different religions cannot legally marry within Israel. Instead, marriage laws are rooted in the Ottoman era. However, marriages conducted abroad under civil law are recognized within the country.

Waking up from an envisioned opinion (fable)

That really depends on how you want to define apartheid.

The frustrating thing about the apartheid debate is that the two sides argue completely different points.

Those who want to paint Israel as the oppressor point to the West Bank. Those who want to point to Israel as “the only democracy in the Middle East” point to Palestinian citizens of Israel.

It’s true that Palestinian Israelis can, and do, serve as lawyers, doctors, teachers, policemen, politicians and judges. This is not to say that there is no discrimination in Israel, because of course there is, but to call this apartheid is just silly. Discrimination is not apartheid.

So, the more relevant issue is that of the West Bank.

The West Bank is split into 3 (non-contiguous) areas, A, B and C, set out in the Oslo Accords.

Area A is under complete Palestinian administration, and comprises the major Palestinian urban centres. It makes up about 18% of the West Bank in area.

Area B is under Palestinian civil authority and Israeli military authority, and makes up about 22% of the West Bank.

Neither area A nor area B have Jewish residents, and therefore we cannot really talk about apartheid there.

Area C is where the issue really lies. Some 60% of the West Bank, populated both by Israeli Jews and by Palestinians. This area is under full Israeli control.

There is no arguing the fact that Israeli and Palestinian residents of Area C are not equal.

Israeli Jews are full and equal Israeli citizens, beholden to, and served by, the legal systems of Israel.

Palestinians are not Israeli citizens, and are beholden to Israeli military courts. They do not generally have recourse to the Israeli civil courts (and indeed, just this week a law was passed that further restricts Palestinian access to the High Court in land disputes). They are policed by the IDF, rather than by police forces, which in general means that they do not benefit from police protection, but are only restricted by it.

This is the basis of claims of apartheid — you have Jews and Arabs living side by side, under different legal jurisdictions.

Is this apartheid? As I said, it depends on how you want to define apartheid. You can definitely define it broadly, in a way which seems to include this situation — there is indeed a legal distinction between two populations, which plays out along racial/ethnic lines.

The counter argument to this is, of course, that it is not a racial or ethnic distinction, it is a perfectly normal distinction between citizens and non citizens. All non-citizens are not equal to citizens, in every country. Palestinian citizens who go to area C are still full citizens.

The counter-counter-argument could claim that this is something of a facetious claim, in that citizenship was never granted to the Palestinian residents of the West Bank, whereas the Jewish residents have citizenship based on their ethnicity, and indeed any Jew in the entire world could come tomorrow and receive citizenship. If citizenship is ethnically biased, then a distinction based on citizenship is ethnically biased too.

And there are counter-counter-counter arguments, and counter-counter-counter-counter arguments. It goes on for ever.

(If you’re keen on delving deeper into this topic, I highly recommend watching this video. It provides valuable insights and a nuanced perspective on the matter.)

2. Hamas Side

What is Hamas? Alright, so, Hamas is a Palestinian political and military organization. They’ve been a significant player in the Israeli-Palestinian conflict for quite some time now. They emerged in the late 1980s and have since gained considerable support, especially in the Gaza Strip.

Now, when it comes to their actions against Israel, things get pretty intense. Hamas has been involved in a number of conflicts and acts of violence. They’ve launched rockets and carried out suicide bombings, which have resulted in both civilian and military casualties on the Israeli side.

These actions have led to a lot of tension and responses from Israel, including military operations in the Gaza Strip. It’s safe to say that Hamas and Israel have a long history of conflict, and their interactions continue to shape the dynamics of the region.

What is Hamas’ authority in Gaza? Hamas isn’t a political party, it’s not running a state and it’s not a religious cult either. Hamas is an organized crime ring, anchored in Islamic fundamentalism, that is also performing the duties of a state in Gaza. In other words, it is an unholy alliance between mafia and religious wackos. As such it has complete control over the distribution of goods in Gaza. Those who do not work for Hamas live in abject squalor, being a member is basically a prerequesite to hold a job, with very few exceptions. Those who oppose Hamas die a violent death. The result is most people either work for or with Hamas and no one is willing to oppose them, it’s too dangerous and spies are everywhere.

Conclusion

The human toll in this conflict is immense, with lives lost on both sides. What’s even more disheartening is the passive stance taken by many powerful nations, who either watch this tragedy unfold or make hollow and insensitive statements. It’s a painful truth that nothing in this world is as precious as the smile of a child, and yet, the innocence of many is being ruthlessly extinguished. This is neither a matter for negotiation nor a case for defense.

Our hearts ache for the countless innocent lives lost in this conflict, and we yearn for a sense of peace and reprieve for all those affected, regardless of their background or affiliation.

In my forthcoming blogs, I’ll delve deeper into the political complexities of this conflict, examining the various perspectives and avenues for resolution. I remain open to all responses and negotiations, and I fervently wish for a swift end to this ongoing struggle. Until then, take care.

Now, here’s where it gets interesting. Within the Palestinian community, there were some who, surprisingly, supported Israel and advocated for its cause. Take, for example, Ali Wahap, an Arab Muslim who made his stance clear. This just goes to show that even in the midst of a heated conflict, there were individuals who saw things from a different perspective.

Graph Neural Networks (GNNs)

Halil İbrahim Hatun — Wed, 30 Aug 2023 10:51:33 GMT

GCN and GAT Python Implementation

In the world of artificial intelligence, where data is the lifeblood that fuels innovation, there’s a particular type of data that often challenges traditional machine learning techniques, data with inherent relationships and connections. This is where Graph Neural Networks (GNNs) step in, revolutionizing the way we process and understand data structured as graphs.

As a unique non-Euclidean data structure for machine learning, graph analysis focuses on tasks such as node classification, link prediction, and clustering. In a world that can be visualized as networks of entities and their interactions, be it social networks, molecular structures, recommendation systems, or citation networks, traditional machine learning algorithms often fall short. Conventional models are designed for independent and identically distributed data, struggling to capture the nuances of interconnectedness and dependencies that define these complex relationships.

Fig. 1. Left: image in Euclidean space. Right: graph in non-Euclidean space

Graph Neural Networks have emerged as a revolutionary approach, garnering significant attention in recent years due to their remarkable capacity to extract invaluable insights from interconnected data. Unlike traditional methods that focus solely on individual data points in isolation, GNNs operate by delving deep into the intricate tapestry of relationships. This unique approach empowers us to not only analyze data points but to unravel the concealed structures and intricate patterns that lay beneath the surface.

In this blog, our focus will be on dissecting the fundamental operational architecture of Graph Neural Networks (GNNs), coupled with a practical exploration of their implementation using the Python programming language. Through this journey, we aim to demystify the inner workings of GNNs, providing you with a clear understanding of how these networks navigate and make sense of interconnected data. So, let’s embark on a guided tour of the core concepts behind GNNs, while also rolling up our sleeves for some hands-on Python coding to bring these concepts to life.

Getting Started

Let’s examine the Planetoid Cora dataset and apply Graph Neural Networks (GNNs) using PyTorch. This practical exploration will provide us with hands-on experience working with real-world graph data.

The Planetoid dataset combines citation networks from Cora, CiteSeer, and PubMed. Nodes, representing documents, feature 1433-dimensional bag-of-words vectors, interconnected by citations. With 7 classes, the challenge involves training a model to predict missing labels using the web of connections.

from torch_geometric.datasets import Planetoid
from torch_geometric.transforms import NormalizeFeatures

dataset = Planetoid(root='data/Planetoid', name='Cora', transform=NormalizeFeatures())

print(f'Dataset: {dataset}:')
print('======================')
print(f'Number of graphs: {len(dataset)}')
print(f'Number of features: {dataset.num_features}')
print(f'Number of classes: {dataset.num_classes}')

data = dataset[0]
print(data)

Output =

Dataset: Cora():
======================
Number of graphs: 1
Number of features: 1433
Number of classes: 7
Data(x=[2708, 1433], edge_index=[2, 10556], y=[2708], train_mask=[2708], val_mask=[2708], test_mask=[2708])

Prior to initiating the training process, let’s analyze the data distribution by visually representing it within the second and third dimensions.

https://medium.com/media/a0d49fc5a32ad377c9f2b1c7d83e5cc7/href https://medium.com/media/e897df33cb20a4bd50588c47b8a49f5a/href

Just one more step to go. Now, it’s time to write the classes and methods that will be employed in the upcoming sections:

class BuildModel():
    def __init__(self, model, lr = 0.01):
        self.model = model
        self.optimizer = torch.optim.Adam(model.parameters(), lr=lr, weight_decay=5e-4)
        self.criterion = torch.nn.CrossEntropyLoss()

    def single_train(self):
          self.model.train()
          self.optimizer.zero_grad()
          out = self.model(data.x, data.edge_index)
          loss = self.criterion(out[data.train_mask], data.y[data.train_mask])
          loss.backward()
          self.optimizer.step()
          return loss

    def test(self):
          self.model.eval()
          out = self.model(data.x, data.edge_index)
          pred = out.argmax(dim=1)
          test_correct = pred[data.test_mask] == data.y[data.test_mask]
          test_acc = int(test_correct.sum()) / int(data.test_mask.sum())
          return test_acc
    
    def train_with_early_stopping(self, epochs=150, patience=10, plot = False, plot_name = None):
        
        history = {
            'epoch': [],
            'loss': [],
            'test_acc': []
        }
        
        best_test_acc = 0.0
        epochs_without_improvement = 0
    
        for epoch in range(1, epochs + 1):
            loss = self.single_train()
            test_acc = self.test()
    
            print(f'Epoch: {epoch:03d}, Loss: {loss:.4f}, Test Acc: {test_acc:.4f}')
    
            history['epoch'].append(epoch)
            history['loss'].append(loss.item())
            history['test_acc'].append(test_acc)
    
            if test_acc > best_test_acc:
                best_test_acc = test_acc
                epochs_without_improvement = 0
            else:
                epochs_without_improvement += 1
    
            if epochs_without_improvement >= patience:
                print(f'Early stopping triggered at epoch {epoch}.')
                break
    
        if plot:
            self.history_plot(history, plot_name)
    
        return history
    
    def train(self, epoch = 100):
        for epoch in range(1, epochs + 1):
            loss = self.single_train()
            test_acc = self.test()
    
            print(f'Epoch: {epoch:03d}, Loss: {loss:.4f}, Test Acc: {test_acc:.4f}')
    
    
    def history_plot(self, history, plot_name):
        fig = go.Figure()
        fig.add_trace(go.Scatter(x=history['epoch'], y=history['loss'], mode='lines', name='Training Loss'))
        fig.add_trace(go.Scatter(x=history['epoch'], y=history['test_acc'], mode='lines', name='Test Accuracy'))
        fig.update_layout(
            title='Training History',
            xaxis_title='Epoch',
            yaxis_title='Value',
            legend=dict(x=0, y=1),
            template='plotly_dark'
        )
        fig.show()

Following that, we proceed to define our visualization methods:

def visualize_2d(h, color, name = '2D_dist_plot'):
    z = TSNE(n_components=2).fit_transform(h.detach().cpu().numpy())

    fig = px.scatter(x=z[:, 0], y=z[:, 1], color=color, color_continuous_scale="magma")
    fig.update_layout(
        xaxis_title="Dimension 1",
        yaxis_title="Dimension 2",
        xaxis=dict(showgrid=False, zeroline=False, showticklabels=False),
        yaxis=dict(showgrid=False, zeroline=False, showticklabels=False),
        coloraxis_showscale=False,
        width=800,
        height=800
    )
    fig.show()
 


def visualize_3d(h, color, name = '3D_dist_plot'):
    z = TSNE(n_components=3).fit_transform(h.detach().cpu().numpy())

    fig = px.scatter_3d(x=z[:, 0], y=z[:, 1], z=z[:, 2], color=color, color_continuous_scale="magma")
    fig.update_layout(
        scene=dict(
            xaxis_title="Dimension 1",
            yaxis_title="Dimension 2",
            zaxis_title="Dimension 3"
        ),
        coloraxis_showscale=False,
        width=800,
        height=800
    )
    fig.show()

Now let’s delve into the exciting world of Graph Neural Networks (GNNs) and explore how we can use them to train our dataset.

1. Graph Convolutional Network (GCN)

A Graph Convolutional Network (GCN) is a Graph Neural Network (GNN) variant tailored for processing graph-structured data. Unlike Convolutional Neural Networks (CNNs), which excel at grid-like data (such as images), GCNs specialize in datasets where entities are connected through edges, forming networks.

While CNNs leverage local patterns in grid data, GCNs harness the interconnectedness of graph data. They propagate and aggregate information across neighboring nodes, updating each node’s representation based on its neighbors’ features. This contextual understanding enables GCNs to capture relationships and patterns.

Fig. 2. Comparison between GCN and CNN

The notable shift in GCNs lies in adapting the convolutional operation for graphs. This operation computes weighted averages of neighboring node features, generating central node representations. As these layers stack, GCNs learn abstract features while considering the overall graph context.

In the realm of traditional neural networks, linear layers play a pivotal role by applying a fundamental linear transformation to the input data. This transformation holds the power to metamorphose the input features denoted as x into a fresh realm of hidden vectors, which are symbolized as h. This enchanting metamorphosis is orchestrated through the agency of a weight matrix 𝐖, an omnipresent protagonist in this neural narrative. Disregarding the role of biases for this moment of elucidation, we can elegantly express this process as follows:

Fig. 3. Linear relationship formula

One way to enhance our node representations is by combining their features with those of their neighboring nodes. This process, known as convolution or neighborhood aggregation, involves incorporating information from the immediate neighborhood of a node, including the node itself (denoted as Ñ).

Fig. 4. Aggregated linear relationship formula

Unlike CNN filters, in Graph Neural Networks (GNNs), our weight matrix 𝐖 is singular and shared across all nodes. However, a challenge arises due to the variable number of neighbors nodes can have, unlike the fixed grid structure of pixels in CNNs. This distinction is a key aspect of GNNs that enables them to effectively operate on graph-structured data.

How should we handle situations in which a single node is connected to only one neighbor, while another node has 700 connections? If we were to merely combine the feature vectors, the resultant embedding ‘h’ would be disproportionately influenced by the 700-neighbor node. To ensure uniform value ranges across all nodes and enable meaningful comparisons between them, we can normalize the output according to the nodes’ degrees (the count of connections each node possesses).

Fig. 5. Normalized and aggregated linear relationship formula

The researchers noted that attributes originating from nodes with a high degree of neighbors spread more effortlessly compared to those from relatively secluded nodes. In order to counterbalance this phenomenon, they proposed the idea of assigning greater weights to attributes from nodes with limited neighbors. This strategy aims to harmonize the impact across the entire node network. This process can be expressed as follows:

Fig. 6. Harmonized formula

Let’s implement the concepts we’re discussing in Python using PyTorch for a deeper understanding.

First, let’s build the GCN model using PyTorch:

from torch_geometric.nn import GCNConv

class GCN(torch.nn.Module):
    def __init__(self, hidden_channels):
        super().__init__()
        torch.manual_seed(1234567)
        self.conv1 = GCNConv(dataset.num_features, hidden_channels)
        self.conv2 = GCNConv(hidden_channels, dataset.num_classes)

    def forward(self, x, edge_index):
        x = self.conv1(x, edge_index)
        x = x.relu()
        x = F.dropout(x, p=0.5, training=self.training)
        x = self.conv2(x, edge_index)
        return x

model = GCN(hidden_channels=16)
print(model)

>>>GCN(
>>>  (conv1): GCNConv(1433, 16)
>>>  (conv2): GCNConv(16, 7)
>>>)

Once we have built our model, we can move on to training and visualizing it:

model = GCN(hidden_channels=16)

built_model = BuildModel(model)

Epoch: 001, Loss: 1.9463, Test Acc: 0.2700
Epoch: 002, Loss: 1.9409, Test Acc: 0.2910
Epoch: 003, Loss: 1.9343, Test Acc: 0.2910
Epoch: 004, Loss: 1.9275, Test Acc: 0.3210
Epoch: 005, Loss: 1.9181, Test Acc: 0.3630
Epoch: 006, Loss: 1.9086, Test Acc: 0.4120
Epoch: 007, Loss: 1.9015, Test Acc: 0.4010
Epoch: 008, Loss: 1.8933, Test Acc: 0.4020
Epoch: 009, Loss: 1.8808, Test Acc: 0.4180
Epoch: 010, Loss: 1.8685, Test Acc: 0.4470
Epoch: 011, Loss: 1.8598, Test Acc: 0.4680
Epoch: 012, Loss: 1.8482, Test Acc: 0.5180
Epoch: 013, Loss: 1.8290, Test Acc: 0.5440
Epoch: 014, Loss: 1.8233, Test Acc: 0.5720
Epoch: 015, Loss: 1.8057, Test Acc: 0.5910
Epoch: 016, Loss: 1.7966, Test Acc: 0.6080
Epoch: 017, Loss: 1.7825, Test Acc: 0.6300
Epoch: 018, Loss: 1.7617, Test Acc: 0.6450
Epoch: 019, Loss: 1.7491, Test Acc: 0.6520
Epoch: 020, Loss: 1.7310, Test Acc: 0.6560
Epoch: 021, Loss: 1.7147, Test Acc: 0.6570
Epoch: 022, Loss: 1.7056, Test Acc: 0.6640
Epoch: 023, Loss: 1.6954, Test Acc: 0.6770
Epoch: 024, Loss: 1.6697, Test Acc: 0.6950
Epoch: 025, Loss: 1.6538, Test Acc: 0.7140
Epoch: 026, Loss: 1.6312, Test Acc: 0.7150
Epoch: 027, Loss: 1.6161, Test Acc: 0.7170
Epoch: 028, Loss: 1.5899, Test Acc: 0.7230
Epoch: 029, Loss: 1.5711, Test Acc: 0.7220
Epoch: 030, Loss: 1.5576, Test Acc: 0.7210
Epoch: 031, Loss: 1.5393, Test Acc: 0.7280
Epoch: 032, Loss: 1.5137, Test Acc: 0.7370
Epoch: 033, Loss: 1.4948, Test Acc: 0.7380
Epoch: 034, Loss: 1.4913, Test Acc: 0.7430
Epoch: 035, Loss: 1.4698, Test Acc: 0.7510
Epoch: 036, Loss: 1.3998, Test Acc: 0.7570
Epoch: 037, Loss: 1.4041, Test Acc: 0.7600
Epoch: 038, Loss: 1.3761, Test Acc: 0.7640
Epoch: 039, Loss: 1.3631, Test Acc: 0.7700
Epoch: 040, Loss: 1.3258, Test Acc: 0.7800
Epoch: 041, Loss: 1.3030, Test Acc: 0.7810
Epoch: 042, Loss: 1.3119, Test Acc: 0.7760
Epoch: 043, Loss: 1.2519, Test Acc: 0.7760
Epoch: 044, Loss: 1.2530, Test Acc: 0.7790
Epoch: 045, Loss: 1.2492, Test Acc: 0.7800
Epoch: 046, Loss: 1.2205, Test Acc: 0.7790
Epoch: 047, Loss: 1.2037, Test Acc: 0.7850
Epoch: 048, Loss: 1.1571, Test Acc: 0.7900
Epoch: 049, Loss: 1.1700, Test Acc: 0.7920
Epoch: 050, Loss: 1.1296, Test Acc: 0.7940
Epoch: 051, Loss: 1.0860, Test Acc: 0.7930
Epoch: 052, Loss: 1.1080, Test Acc: 0.7910
Epoch: 053, Loss: 1.0564, Test Acc: 0.7930
Epoch: 054, Loss: 1.0157, Test Acc: 0.7930
Epoch: 055, Loss: 1.0362, Test Acc: 0.7920
Epoch: 056, Loss: 1.0328, Test Acc: 0.7980
Epoch: 057, Loss: 1.0058, Test Acc: 0.8000
Epoch: 058, Loss: 0.9865, Test Acc: 0.7970
Epoch: 059, Loss: 0.9667, Test Acc: 0.8010
Epoch: 060, Loss: 0.9741, Test Acc: 0.8000
Epoch: 061, Loss: 0.9769, Test Acc: 0.8030
Epoch: 062, Loss: 0.9122, Test Acc: 0.8040
Epoch: 063, Loss: 0.8993, Test Acc: 0.8050
Epoch: 064, Loss: 0.8769, Test Acc: 0.8050
Epoch: 065, Loss: 0.8575, Test Acc: 0.8060
Epoch: 066, Loss: 0.8897, Test Acc: 0.8030
Epoch: 067, Loss: 0.8312, Test Acc: 0.8060
Epoch: 068, Loss: 0.8262, Test Acc: 0.8030
Epoch: 069, Loss: 0.8511, Test Acc: 0.8070
Epoch: 070, Loss: 0.7711, Test Acc: 0.8070
Epoch: 071, Loss: 0.8012, Test Acc: 0.8080
Epoch: 072, Loss: 0.7529, Test Acc: 0.8080
Epoch: 073, Loss: 0.7525, Test Acc: 0.8070
Epoch: 074, Loss: 0.7689, Test Acc: 0.8110
Epoch: 075, Loss: 0.7553, Test Acc: 0.8140
Epoch: 076, Loss: 0.7032, Test Acc: 0.8120
Epoch: 077, Loss: 0.7326, Test Acc: 0.8110
Epoch: 078, Loss: 0.7122, Test Acc: 0.8120
Epoch: 079, Loss: 0.7090, Test Acc: 0.8110
Epoch: 080, Loss: 0.6755, Test Acc: 0.8130
Epoch: 081, Loss: 0.6666, Test Acc: 0.8070
Epoch: 082, Loss: 0.6679, Test Acc: 0.8080
Epoch: 083, Loss: 0.7037, Test Acc: 0.8100
Epoch: 084, Loss: 0.6752, Test Acc: 0.8070
Epoch: 085, Loss: 0.6266, Test Acc: 0.8100
Early stopping triggered at epoch 85.

https://medium.com/media/ae0275e81de953082e24d5fe530cf132/href https://medium.com/media/7da5abb38246270e1ea4ef87fbd79af2/href https://medium.com/media/8529447c49b09194d8463b55b6b1e110/href

2. Graph Attention Networks (GAT)

GAT stands for Graph Attention Network, and it’s a type of Graph Neural Network (GNN) that has gained significant attention due to its effectiveness in modeling relationships within graph-structured data. GAT was introduced by Velickovic et al. in their 2018 paper “Graph Attention Networks.”

GAT addresses one of the key challenges in GNNs, which is how to effectively aggregate information from neighboring nodes in a graph while assigning different levels of importance to different neighbors. Traditional GNNs, such as Graph Convolutional Networks (GCNs), use fixed aggregation schemes that treat all neighbors equally. GAT, on the other hand, introduces the concept of attention mechanisms into the aggregation process, allowing each node to dynamically weigh the importance of its neighbors’ information.

GAT Layer

In this step, a shared, linear transformation takes center stage, embodied by the matrix W with dimensions (F’, F). This transformation is aptly named “shared” because every individual node undergoes the same W matrix-based transformation.

The mission here is to harmonize the dimensionality of node features. Originally of dimension F, these features are transformed to a uniform dimensionality of F’. This transformation is systematically applied to all nodes in node i’s neighborhood, encompassing node i itself.

Fig. 7. Element-wise scalar multiplication formula

During this process, the embedding representation h_i of the target node i is fused with the embeddings of its immediate neighbors. Each pairing is then combined and transformed using matrix W^a, which is characterized by its dimensions (2F’, F’) here, F’ might stay the same or differ from the previous stage based on a hyperparameter.

The central aim here is to facilitate a collective learning of attention between these pairs of nodes, bypassing the specifics of the graph structure.

Fig. 8. Interaction formula

Here, each intermediate attention scalar comes to life through a non-linear activation, denoted as σ. In the GAT research, the authors opt for LeakyReLU as their chosen non-linear activation function.

Wrapping things up, the energized intermediate attention scalars flow through a softmax layer. This transformation imbues the attention coefficients with the properties of a probability distribution.

In essence, this phase centers on the normalization of attention coefficients, aligning them for further processing.

Fig. 9. GAT normalized formula 1

Fig. 10. GAT normalized formula 2

The attention mechanism utilized by our model is defined by a parametric weight vector. This weight vector is associated with a LeakyReLU activation function, contributing to the overall functionality of the mechanism.

Consider an illustrative example of multihead attention with the scenario where K equals 3 heads. In this case, we focus on node 1 within its local neighborhood. Distinct styles and colors of arrows symbolize separate computations of attention, each operating independently. The outcomes from each head’s attention calculation are then combined by means of concatenation or averaging. This fusion of features results in the final representation denoted as h1'.

Fig. 11. GAT Architecture 1

Put differently, employing this approach allows us to observe the operational dynamics of the Graph Attention Network (GAT).

Fig. 12. GAT Architecture 2

Let’s put these concepts we discussed into practice using PyTorch in Python to develop a deeper understanding.

from torch_geometric.nn import GATConv

class GAT(torch.nn.Module):
    def __init__(self, hidden_channels, heads):
        super().__init__()
        torch.manual_seed(1234567)
        self.conv1 = GATConv(dataset.num_features, hidden_channels,heads)
        self.conv2 = GATConv(heads*hidden_channels, dataset.num_classes,heads)

    def forward(self, x, edge_index):
        x = F.dropout(x, p=0.6, training=self.training)
        x = self.conv1(x, edge_index)
        x = F.elu(x)
        x = F.dropout(x, p=0.6, training=self.training)
        x = self.conv2(x, edge_index)
        return x

model = GAT(hidden_channels=8, heads=8)
print(model)

GAT(
  (conv1): GATConv(1433, 8, heads=8)
  (conv2): GATConv(64, 7, heads=8)
)

Now, we should proceed with training and also visualize our GAT model:

buildGANModel = BuildModel(model, lr = 0.05)
history = buildGANModel.train_with_early_stopping(plot = True, plot_name = 'gat_history_plot')

Epoch: 001, Loss: 0.6564, Test Acc: 0.7780
Epoch: 002, Loss: 0.6479, Test Acc: 0.7780
Epoch: 003, Loss: 0.6208, Test Acc: 0.8100
Epoch: 004, Loss: 0.5841, Test Acc: 0.8180
Epoch: 005, Loss: 0.5878, Test Acc: 0.8170
Epoch: 006, Loss: 0.5477, Test Acc: 0.8060
Epoch: 007, Loss: 0.4723, Test Acc: 0.7950
Epoch: 008, Loss: 0.4452, Test Acc: 0.8000
Epoch: 009, Loss: 0.4338, Test Acc: 0.8070
Epoch: 010, Loss: 0.4332, Test Acc: 0.8100
Epoch: 011, Loss: 0.4218, Test Acc: 0.8150
Epoch: 012, Loss: 0.3900, Test Acc: 0.8150
Epoch: 013, Loss: 0.4190, Test Acc: 0.8160
Epoch: 014, Loss: 0.4238, Test Acc: 0.8080
Early stopping triggered at epoch 14.

https://medium.com/media/7a1d020343ba34dcb2e4563ec0fc39b8/href https://medium.com/media/98b5a6ba507b07c49b3e2622628d8fb9/href https://medium.com/media/5441327b43edc6a15c31e690129f578a/href

Feel free to click here to access the entire code.

That concludes my blog. I trust that you found it impactful. I’m open to both positive and negative feedback, so please don’t hesitate to share your thoughts. Your feedback is eagerly anticipated. Stay committed and dedicated.

References

https://www.datacamp.com/tutorial/comprehensive-introduction-graph-neural-networks-gnns-tutorial
https://www.youtube.com/watch?v=SnRfBfXwLuY
https://nabila-abraham.medium.com/ohmygraphs-graph-attention-networks-b7562289ae4b
https://towardsdatascience.com/graph-convolutional-networks-introduction-to-gnns-24b3f60d6c95
Petar Veličković (2018). Graph Attention Networks. https://arxiv.org/pdf/1710.10903.pdf
Jie Zhou (2020). Graph neural networks: A review of methods and applications. https://arxiv.org/ftp/arxiv/papers/1812/1812.08434.pdf

K-Means

Halil İbrahim Hatun — Thu, 22 Jun 2023 07:47:38 GMT

Understanding the basic K-means algorithm and applying

Clustering

Clustering is a technique used in unsupervised machine learning to group similar data points together based on their inherent characteristics or similarities. It aims to find patterns or structures in the data without prior knowledge of the desired output or labels.

The goal of clustering is to divide a dataset into clusters or groups, where data points within the same cluster are more similar to each other than to those in other clusters. The similarity or dissimilarity between data points is typically measured using a distance or similarity metric, such as Euclidean distance or cosine similarity.

K-Means

K-means is a popular clustering algorithm that aims to partition a given dataset into k clusters. It is an iterative algorithm that alternates between two steps: assigning data points to the nearest cluster centroid and updating the cluster centroid based on the assigned data points.

Here’s a high-level overview of the k-means algorithm:

Initialization:

Choose the value of k, the number of clusters.
Randomly initialize k centroids in the feature space.

Randomly initialized k centroids plot

2. Assignment Step:

For each data point, calculate the distance to each centroid. For example, use the Euclidean Distance metric.

Euclidean Distance Formula

Calculate distance plot

Assign the data point to the cluster with the nearest centroid.

Assign the nearest centroid plot

3. Move Cluster Centroids Step:

Recalculate the centroids of each cluster by taking the mean of all data points assigned to that cluster.

Mean formula

K-means move cluster centroids plot

4. Iteration:

Repeat the assignment and move cluster centroids steps until convergence. Convergence occurs when either the centroids do not change significantly or a maximum number of iterations is reached.

K-means process

5. Output:

The final output of the k-means algorithm is a set of k clusters, where each data point is assigned to one of the clusters.

Final output plot

Here’s a simplified pseudocode for the k-means algorithm:

https://medium.com/media/7d794aa62df730b42cb60415ee93575e/href

Here’s the Python implementation code for the k-means algorithm:

https://medium.com/media/301163daaaabc40b92de87a4b196ee3b/href

How can we decide what the “k” (number of clusters) value is?

Some factors can challenge the efficacy of the final output of the K-means clustering algorithm, and one of them is finalizing the number of clusters (K). Selecting a lower number of clusters will result in underfitting while specifying a higher number of clusters can result in overfitting. Unfortunately, there is no definitive way to find the optimal number.

Elbow Method

It involves plotting the within-cluster sum of squares (WCSS) against the number of clusters and identifying the “elbow” point, which indicates the number of clusters where the rate of improvement in clustering quality starts to diminish significantly.

Elbow Method Visualization

2. Silhouette method

It provides a measure of how well each data point fits into its assigned cluster. The method calculates a silhouette coefficient for each data point, which is a value between -1 and 1.

The silhouette coefficient measures the cohesion and separation of a data point within its cluster. A coefficient close to +1 indicates that the data point is well-matched to its own cluster and poorly matched to neighboring clusters. A coefficient close to 0 suggests that the data point is on or very close to the decision boundary between neighboring clusters. A coefficient close to -1 indicates that the data point may have been assigned to the wrong cluster.

Silhouette method notation

Silhouette Score Formula

3. Gap Statistic Method

It compares the observed within-cluster dispersion to the expected dispersion under a reference null distribution. The larger the gap between the observed and expected dispersions, the more distinct the clusters are considered to be.

Gap statistic = log(observed dispersion) — log(expected dispersion)

The gap statistic method provides a quantitative measure for selecting the number of clusters by comparing the clustering results to a reference null distribution. It helps avoid overfitting or underfitting the data by providing a statistically guided approach to determining the appropriate number of clusters for a given dataset.

Gap Statistic Method Plot

Here’s the implementation code for the gap statistic method in Python:

https://medium.com/media/95282443b3f7bb53d60c49c78b0eac20/href

Customer Segmentation Example

Now, let’s perform a customer segmentation operation on the Mall customer dataset in Python.

Let’s take a look at our data.

df.head()

df.dtypes

As you see, In our data, there are three integer features (Age, Annual Income, and Spending Score) to perform clustering.

Let’s look at the distribution of these integer features.

https://medium.com/media/97ef497eda89af31dab302fcc8216c8b/href

Now, We choose the value of ‘k ’ (the number of clusters).

Elbow Method

If we use the elbow method. The plot between SSE values and the number of clusters looks like this:

https://medium.com/media/abcd54769d8a2687599abf1525d8c3ad/href

By looking at the given graph, we see that the elbow point is 6, albeit difficult. But, as you can see, the elbow method did not help us determine the number of clusters.

2. Silhouette Method

Let’s try the silhouette method.

https://medium.com/media/abcd54769d8a2687599abf1525d8c3ad/href

As you can see, the Silhouette method chose six clusters as well.

3. Gap Statistic Method

As a last one, We try the gap statistic method.

https://medium.com/media/abcd54769d8a2687599abf1525d8c3ad/href

The gap method also chose 6 and 10 values.

Therefore, let’s apply our k-means algorithm to six clusters and visualize the results.

https://medium.com/media/e41eeb313b131a1780039c88b1a246cd/href

Conclusion

In conclusion, k-means clustering is a powerful unsupervised machine learning algorithm that enables the grouping of data points into distinct clusters based on their similarity. By iteratively optimizing cluster centroids and assigning data points to the nearest centroid, k-means effectively partitions the data space. Its simplicity and efficiency make it a popular choice for various applications, such as customer segmentation, image compression, and anomaly detection. However, k-means has some limitations, such as sensitivity to initial centroid placement and dependence on the number of clusters (k) specified. Despite these challenges, understanding the principles and techniques behind k-means can greatly enhance our ability to extract meaningful insights from complex datasets and pave the way for more advanced clustering algorithms in the field of machine learning.

That’s all I’m going to say. Thank you for reading. If you want to look at the Customer Segmentation Notebook that I showed some plots of in this blog, You can achieve this through here.

I am open to any comments, positive or negative. Don’t miss your comments.

Stay well.

References

Convolutional Neural Networks (CNNs)

Halil İbrahim Hatun — Fri, 16 Jun 2023 15:58:14 GMT

Understanding The Basic CNN Structure

What is the image?

As almost everybody knows, an image is a combination of pixels arranged according to each pixel's color class. These pixels consist of a combination of three primary colors (RGB or BGR) according to certain weights.

Figure 1. RGB images

The image you see above is a [10, 5, 3] (x, y, color type) size image. If this image were grayscale, the color type value would be 1 due to the fact that there is a color.

What is the Convolutional Neural Network (CNN)?

A CNN (Convolutional Neural Network) is a type of artificial neural network that is specifically designed for processing grid-like data, such as images or sequences. CNNs are widely used in computer vision tasks, including image classification, object detection, image segmentation, and more.

CNN image classifications takes an input image, process it and classify it under certain categories (Eg., Dog, Cat, Tiger, Lion). Computers sees an input image as array of pixels and it depends on the image resolution.

CNNs are structured with layers of interconnected artificial neurons, including convolutional layers, pooling layers, and fully connected layers.

Figure 2. CNN Sample

The purpose of performing convolution is to extract features or patterns from input data. Convolution is a fundamental operation in various domains, including image processing, signal processing, and deep learning.

In image processing, convolution is used to apply filters or kernels to an image. These filters can enhance certain features of an image, such as edges or textures, or perform tasks like blurring or sharpening. By convolving an image with different filters, we can highlight specific characteristics and extract relevant information.

Let’s dive into the structures of CNN

Convolutional Layer

Convolution is performed by sliding the kernel over the input image and computing the element-wise multiplication and summation at each position. This process is repeated for each position in the image, resulting in a new output image.

Here’s the formula for convolution:

Figure 3 Convolution demonstration

And in the example below, performing convolution with a 3x3 kernel

Figure 4. Convolution GIF

After all convolution processes, the obtained result is called a “Feature Map”.

Figure 5. Convolution Neural Network GIF

Convolution of an image with different filters can perform operations such as edge detection, blur, and sharpening by applying filters. The below example shows various convolution images after applying different types of filters (Kernels).

Let’s examine the filtering processes on the picture of Tarkan Gözübüyük, the beloved bass guitarist of the Pentagram (Mezarkabul) band.

Figure 6. Filtering samples

Strides

Stride is the number of pixels that shift over the input matrix. When the stride is 1, then we move the filters 1 pixel at a time. When the stride is 2, then we move the filters to 2 pixels at a time, and so on. The below figure shows how convolution would work with a stride of 2.

Figure 7. Strides

Padding

Padding is a technique used to preserve the spatial dimensions of the input image after convolution operations on a feature map. Padding involves adding extra pixels around the border of the input feature map before convolution.

This can be done in two ways:

Valid Padding: In the valid padding, no padding is added to the input feature map, and the output feature map is smaller than the input feature map. This is useful when we want to reduce the spatial dimensions of the feature maps.
Same Padding: In the same padding, padding is added to the input feature map such that the size of the output feature map is the same as the input feature map. This is useful when we want to preserve the spatial dimensions of the feature maps.

The number of pixels to be added for padding can be calculated based on the size of the kernel and the desired output of the feature map size. The most common padding value is zero-padding, which involves adding zeros to the borders of the input feature map.

Padding can help in reducing the loss of information at the borders of the input feature map and can improve the performance of the model. However, it also increases the computational cost of the convolution operation. Overall, padding is an important technique in CNNs that helps in preserving the spatial dimensions of the feature maps and can improve the performance of the model.

Figure 8. Padding

Pooling

Pooling in convolutional neural networks is a technique for generalizing features extracted by convolutional filters and helping the network recognize features independent of their location in the image.

Figure 9. Pooling

Activation Functions

Activation functions are mathematical functions applied to the output of a neuron or a neural network layer to introduce non-linearity into the network. These functions determine the output of a neuron or a layer based on its weighted inputs and provide the capability for neural networks to model complex relationships between inputs and outputs.

Sigmoid: For a binary classification in the CNN model.

Figure 10. Sigmoid

tanh: The tanh function is very similar to the sigmoid function. The only difference is that it is symmetric around the origin. The range of values, in this case, is from -1 to 1.

Figure 11. tanh

Softmax: It is used in multinomial logistic regression and is often used as the last activation function of a neural network to normalize the output of a network to a probability distribution over predicted output classes.

Figure 12. Softmax

ReLU: The main advantage of using the ReLU function over other activation functions is that it does not activate all the neurons at the same time.

Figure 13. ReLu

Flatten Layer

The flatten layer reshapes this input tensor into a one-dimensional array or vector, collapsing all the dimensions except the batch dimension. The output of the flatten layer has the shape (batch_size, flattened_size), where flattened_size is the product of the remaining dimensions after flattening.

Figure 14. Flattening

Fully Connected Layer

A fully connected layer, also known as a dense layer or a fully connected neural layer, is a type of layer in a neural network where each neuron or node is connected to every neuron in the previous layer. In a fully connected layer, all the outputs from the previous layer serve as inputs to each neuron in the current layer.

Figure 15. Fully Connected Layer

Conclusion

CNNs possess spatial invariance properties, meaning they can recognize patterns and objects regardless of their location in an image. This is achieved through the use of shared weights in the convolutional layers, allowing the network to detect similar patterns at different positions, making CNNs robust to translation and small variations in the input data.
CNNs can capture spatial relationships and exploit local patterns effectively, enhancing their ability to learn intricate structures in images.

Thank you for reading. I hope the blog has been useful to you. I am open to any feedback. I look forward to your positive or negative feedback.

Stay well.

References

Operating System: Threads

Halil İbrahim Hatun — Thu, 18 May 2023 14:21:11 GMT

Hello, this is Halil Ibrahim. Today's topic is threads. Threads are one of the best parts of an operating system. Before I start, I want to give a quote for this post.

“The future belongs to those who believe in the beauty of their dreams.” Eleanor Roosevelt

And... let’s start.

In operating systems, a thread is the smallest unit of execution within a process. A process is an instance of a program that is being executed by the operating system, and it can have one or more threads. Each thread has its own program counter, stack, and register set, which allow it to execute code independently of other threads in the same process.

Threads are used to achieve concurrency within a process, allowing multiple tasks to be executed simultaneously. By using threads, a program can perform multiple operations concurrently, such as listening for user input while processing data in the background. This can improve the performance and responsiveness of the program, as well as make it more efficient by allowing multiple operations to be executed on a single processor core.

2. Single-threaded and multithreaded processes.

Let’s give a comprehensible example using threads. Almost everybody uses web servers. Have you thought about how web servers are managed?

A web server accepts client requests for web pages, images, sound, and so forth. A busy web server may have several (perhaps thousands of) clients concurrently accessing it. If the web server ran as a traditional single-threaded process, it would be able to service only one client at a time, and a client might have to wait a very long time for its request to be serviced.

One solution is to have the server run as a single process that accepts requests. When the server receives a request, it creates a separate process to service that request. In fact, this process-creation method was in common use before threads became popular. Process creation is time-consuming and resource intensive, however. If the new process will perform the same tasks as the existing process, why incur all that overhead? It is generally more efficient to use one process that contains multiple threads. If the web-server process is multithreaded, the server will create a separate thread that listens for client requests. When a request is made, rather than creating another process, the server creates a new thread to service the request and resumes listening for additional requests.

3. Multithreaded server architecture.

Benefits of Multithread Programming

Responsiveness: Multithreading an interactive application may allow a program to continue running even if part of it is blocked or is performing a lengthy operation, thereby increasing responsiveness to the user.
Resource sharing: Processes can share resources only through techniques such as shared memory and message passing. Such techniques must be explicitly arranged by the programmer. However, threads share the memory and the resources of the process to which they belong by default. The benefit of sharing code and data is that it allows an application to have several different threads of activity within the same address space.
Economy: Allocating memory and resources for process creation is costly. Because threads share the resources of the process to which they belong, it is more economical to create and context-switch threads. Therefore, general thread creation consumes less time and memory than process creation.
Scalability: The benefits of multithreading can be even greater in a multiprocessor architecture, where threads may be running in parallel on different processing cores. A single-threaded process can run on only one processor, regardless of how many are available. We explore this issue further in the following section.

Multicore Programming

Multicore programming refers to the process of developing software that can take advantage of the processing power provided by multiple processor cores within a single computer or device. In recent years, the number of processor cores in modern computers and devices has been steadily increasing, with some systems now having dozens or even hundreds of cores.

Multicore programming allows software to be written in a way that can distribute processing tasks across multiple cores, which can improve performance and reduce the time required to complete complex tasks. However, writing software that can effectively utilize multiple cores can be challenging, as it requires a different approach to programming than traditional single-threaded programs.

Parallelism implies a system can perform more than one task simultaneously
Concurrency supports more than one task making progress

4. Concurrent execution on a single-core system.

5. Parallel execution on a multicore system.

Amdahal’s Law

Amdahl’s Law is a formula that identifies potential performance gains from adding additional computing cores to an application that has both serial (nonparallel) and parallel components. If S is the portion of the application that must be performed serially on a system with N processing cores, the formula appears as follows:

6. Amdahl’s Law Formula

As an example, assume we have an application that is 75 percent parallel and 25 percent serial. If we run this application on a system with two processing cores, we can get a speedup of 1.6 times. If we add two additional cores (for a total of four), the speedup is 2.28 times. Below is a graph illustrating Amdahl’s Law in several different scenarios.

7. Amdahl’s Law graphic

The sequential operation always has a lower value and is constant.

8. Idea behind Amdahl’s Law

Types Of Parallelism

Data Parallelism: Data parallelism focuses on distributing subsets of the same data across multiple computing cores and performing the same operation on each core.
Task Parallelism: Task parallelism involves distributing not data but tasks (threads) across multiple computing cores. Each thread is performing a unique operation. Different threads may be operating on the same data, or they may be operating on different data.

9. Data and task parallelism

User-level threads are managed entirely by the user-level thread library, without any support from the operating system kernel. This means that the thread library is responsible for managing thread creation, scheduling, and synchronization. User-level threads are typically lightweight and efficient, as they do not require kernel-level intervention for context switching. However, they can also be limited in their capabilities, as they may not have direct access to system resources such as I/O devices or the network.

Kernel-level threads, on the other hand, are managed directly by the operating system kernel. This means that the kernel is responsible for managing thread creation, scheduling, and synchronization. Kernel-level threads are typically more powerful and flexible, as they have direct access to system resources. However, they can also be less efficient than user-level threads, as they require more overhead for context switching.

Multithreading Models

One-to-one Model: The one-to-one model maps each of the user threads to a kernel thread. This means that many threads can run in parallel on multiprocessors and other threads can run when one thread makes a blocking system call.

2. Many-to-One Model: The many-to-one model maps many of the user threads to a single kernel thread. This model is quite efficient as the user space manages the thread management.

A disadvantage of the many-to-one model is that a thread-blocking system call blocks the entire process. Also, multiple threads cannot run in parallel, as only one thread can access the kernel at a time.

3. Many-to-Many Model: The many-to-many model maps many of the user threads to an equal number or lesser number of kernel threads. The number of kernel threads depends on the application or machine.

The many-to-many model does not have the disadvantages of the one-to-one model or the many-to-one model. There can be as many user threads as required, and their corresponding kernel threads can run in parallel on a multiprocessor.

Pthreads in Linux

https://medium.com/media/179fd780ccb6f0f3ae2814d2ac193ee9/href

The C program demonstrates the basic Pthreads API for constructing a multithreaded program that calculates the summation of a non-negative integer in a separate thread. In a Pthreads program, separate threads begin execution of a specified function. In the code above, this is the runner() function. When this program begins, a single thread of control begins in main(). After some initialization, main() creates a second thread that begins controlling the runner() function. Both threads share the global data sum. Let’s look more closely at this program. All Pthreads programs must include the pthread.h header file. The statement pthread_ tid declares the identifier for the thread we will create. Each thread has a set of attributes, including stack size and scheduling information. The pthread_attr_t attr declaration represents the attributes for the thread. We set the attributes in the function call pthread_attr_init(&attr). Because we did not explicitly set any attributes, we use the default attributes provided. A separate thread is created with the pthread _create() function call. In addition to passing the thread identifier and the attributes for the thread, we also pass the name of the function where the new thread will begin execution — in this case, the runner() function. Last, we pass the integer parameter that was provided on the command line, argv[1]. At this point, the program has two threads: the initial (or parent) thread in main() and the summation (or child) thread performing the summation operation in the runner() function. This program follows the thread create/join strategy, whereby after creating the summation thread, the parent thread will wait for it to terminate by calling the pthread_join() function. The summation thread will terminate when it calls the function pthread_exit(). Once the summation thread has returned, the parent thread will output the value of the shared data sum.

Conclusion

Threads are a fundamental concept in computer science that allows for concurrent execution of multiple tasks within a single process.
They are lightweight, independent units of execution that share the same memory space.
Threads enable parallelism and can improve the performance of applications by taking advantage of multiple CPU cores.
They can communicate and synchronize with each other through various mechanisms like shared variables, locks, and semaphores. However, managing threads can be complex and prone to issues like race conditions and deadlocks, requiring careful design and synchronization techniques to ensure correct and efficient execution.

References

https://www.tutorialspoint.com/multi-threading-models
Operating System Concepts” by Silberschatz, Galvin, and Gagne is the 10th edition.

R-Squared

Halil İbrahim Hatun — Mon, 19 Sep 2022 07:47:15 GMT

In data science, we create and use regression models of the process of estimating a variable (the dependent variable) using one or more variables. So, how will we understand the performance of the regression model we have created?
One of the metrics used to measure regression performance is R — squared.

What is R-Squared?

Let’s go through an example to explain what R-squared is. We have data. We have given this data to a regression model by making the necessary preprocessing applications. Then a regression line was formed. So is this regression line appropriate? Or how good is this regression line, could it be better? R-squared is a metric that allows us to get answers to such questions. It is a metric that shows us how well the regression line formed, with a numerical value in an appropriate position.

How to calculate R-Squared?

R-squared is obtained by dividing the sum of the squares of the distance of each point from the regression line by the sum of the squares of the distance of each moment from the mean and subtracting the result from 1.

There are some exceptions when interpreting the R-square metric. For example, logically, if R-Squared is low, we think that the fit of the model is bad, and if it is high, we think that the fit of the model is good. But this is not always the case. In some data, this situation varies. Therefore, it is not correct to evaluate the performance of the model only with the R-Squared metric.

Adjusted R-Squared

As the number of independent values increases, the R-Squared metric will increase indirectly. For example, let’s say we’re calculating the R-Squared of a house estimate. Next, add an attribute called the average height of previous homeowners to this home estimate data. This attribute has nothing to do with house prices, but R-Squared will be higher. In other words, it will be deduced that the prices of the houses with a high average height of the old house owners are higher. This approach is wrong. We use the Adjusted R-squared metric to improve this situation as much as possible. This metric’s expected value is the number of individual elements.

Let’s examine R-Squared and Adjusted R-Squared metrics by applying

Firstly, we are importing libraries and methods that we use

https://medium.com/media/718356623352af8f8a10ca7c8311e6d5/href https://medium.com/media/657c11a227802fc049aad160165810eb/href

We delete the “Posted On” feature for it is not necessary for the regression model.

https://medium.com/media/992efd0db9eb969c939c5219cc2a6002/href

Preprocessing Part

https://medium.com/media/b44087d63bb9d5f3abf6b2a8916342f3/href

Model Split Part

https://medium.com/media/94ce10808d1798d34e50490b32a46668/href

shape control image

Regression Part

https://medium.com/media/9f92941b45aec644398d6e24cd291f52/href https://medium.com/media/4003ca76225c704b0e83a7f058630e56/href

results of the main regression part

https://medium.com/media/9c70275b07cd6db183c5114958996d95/href

scoring df

Visualization of R-Squared And Adjusted R-Squared Values

https://medium.com/media/65826da7b632d267cdf08dc0015d58f4/href