<?xml version="1.0" encoding="UTF-8"?><rss xmlns:dc="http://purl.org/dc/elements/1.1/" xmlns:content="http://purl.org/rss/1.0/modules/content/" xmlns:atom="http://www.w3.org/2005/Atom" version="2.0" xmlns:cc="http://cyber.law.harvard.edu/rss/creativeCommonsRssModule.html">
    <channel>
        <title><![CDATA[Stories by Halil İbrahim Hatun on Medium]]></title>
        <description><![CDATA[Stories by Halil İbrahim Hatun on Medium]]></description>
        <link>https://medium.com/@halil7hatun?source=rss-a71bf613874c------2</link>
        <image>
            <url>https://cdn-images-1.medium.com/fit/c/150/150/1*rQTiALA6eFGVyF754c4iDQ.jpeg</url>
            <title>Stories by Halil İbrahim Hatun on Medium</title>
            <link>https://medium.com/@halil7hatun?source=rss-a71bf613874c------2</link>
        </image>
        <generator>Medium</generator>
        <lastBuildDate>Tue, 02 Jun 2026 22:37:14 GMT</lastBuildDate>
        <atom:link href="https://medium.com/@halil7hatun/feed" rel="self" type="application/rss+xml"/>
        <webMaster><![CDATA[yourfriends@medium.com]]></webMaster>
        <atom:link href="http://medium.superfeedr.com" rel="hub"/>
        <item>
            <title><![CDATA[Yapay Zeka Bizi Nasıl Anlıyor?]]></title>
            <link>https://halil7hatun.medium.com/yapay-zeka-bizi-nas%C4%B1l-anl%C4%B1yor-78c0ab069916?source=rss-a71bf613874c------2</link>
            <guid isPermaLink="false">https://medium.com/p/78c0ab069916</guid>
            <category><![CDATA[transformers]]></category>
            <category><![CDATA[genai]]></category>
            <category><![CDATA[bert]]></category>
            <category><![CDATA[ai]]></category>
            <dc:creator><![CDATA[Halil İbrahim Hatun]]></dc:creator>
            <pubDate>Mon, 10 Mar 2025 14:10:53 GMT</pubDate>
            <atom:updated>2025-03-10T14:12:38.167Z</atom:updated>
            <content:encoded><![CDATA[<h4>Basit bir şekilde teknik anlatım</h4><p>Merhaba, ben Halil İbrahim. Umarım her şey yolundadır. Bugün, yapay zeka alanında bir kariyer yolu çizmeyi düşünenler ve genel olarak yapay zeka hakkında bilgi edinmek isteyenler için bir blog yazısı paylaşacağım.</p><p>Bu konu üzerine uzun süre düşündüm ve genelde birçok kişinin aklında şu soru var: Yapay zeka, yazdıklarımızı nasıl anlıyor ve buna göre nasıl cevap üretiyor? Çoğu kişi, yapay zekanın Google benzeri bir şekilde arama yapıp cevap verdiğini düşünüyor. Ancak, yapay zekanın işleyiş mekanizmasını tam olarak kavrayamayanların sayısı oldukça fazla.</p><p>Bu nedenle, şimdilik yapay zekanın bir metni nasıl anladığı konusuna odaklanacağım. Bu alanda birçok farklı yöntem bulunuyor, ancak ben olabildiğince basit ve anlaşılır bir şekilde, en güncel yapıyı sizlere aktarmaya çalışacağım.</p><p>Umarım bu yazı, yapay zeka dünyasına ilgi duyanlar için faydalı bir rehber olur.</p><h4><strong>1. Frekans bazlı anlam çıkarma işlemi (TF-IDF)</strong></h4><figure><img alt="" src="https://cdn-images-1.medium.com/max/1024/0*xdwdSMd7NDtySHVD.png" /></figure><p>TF-IDF, yani “Term Frequency — Inverse Document Frequency”, metindeki kelimelerin hangi sıklıkla geçtiğini ve bu kelimelerin ne kadar anlamlı olduğunu hesaplayan bir yöntemdir. Sık geçen kelimeler önemlidir gibi düşünebilirsiniz ama her zaman değil! Mesela “ve”, “bir”, “gibi” kelimeleri her metinde geçer, bu yüzden bu tür kelimelere düşük önem (ağırlık) verilir. Daha az geçen ve metne özel olan kelimeler ise daha anlamlı kabul edilir.</p><p>TF-IDF, özellikle klasik makine öğrenmesi modellerinde sık kullanılan bir tekniktir.</p><h4><strong>2. Dağıtımsal Anlam Teorisi ve Word2Vec</strong></h4><figure><img alt="" src="https://cdn-images-1.medium.com/max/678/0*9e7R-l3MGxMMAfBu.jpeg" /></figure><p>Bu yaklaşımın temelinde, bir kelimenin anlamının, çevresindeki kelimelerle olan ilişkisiyle öğrenilebileceği fikri yatar. “Arkadaşlarından bir kelimeyi tanıma” diyebiliriz.</p><p><strong>Word2Vec</strong> bu fikri uygulayan ilk popüler modellerden biridir. 2013&#39;te Google tarafından geliştirildi ve iki ana yöntemi vardır:</p><ul><li><strong>CBOW (Continuous Bag of Words):</strong> Çevredeki kelimelere bakarak ortadaki kelimeyi tahmin eder.</li><li><strong>Skip-gram:</strong> Tam tersi, bir kelimeyi verip çevresindeki kelimeleri tahmin etmeye çalışır.</li></ul><p>Bu eğitim süreci sonucunda her kelime bir sayısal vektör haline gelir ve bu vektörler anlam açısından birbirine yakın olan kelimeleri birbirine yakın konumlandırır. Örneğin “kral” ve “kraliçe” gibi kelimeler birbirine yakın olurken, “masa” daha uzak bir konumda bulunur.</p><p>Word2Vec’in avantajı, metindeki kelimeleri sadece frekansla değil, <strong>bağlama göre</strong> anlamlandırır. Bu, kelimeler arasındaki semantik ilişkileri yakalamayı sağlar.</p><h4><strong>3. BERT Embedding (Contextual Embedding)</strong></h4><figure><img alt="" src="https://cdn-images-1.medium.com/max/1024/0*7pLzVX-0YMhY2wby.png" /></figure><p>Word2Vec’in güzel bir başlangıç olmasına rağmen, bir kelimenin anlamının <strong>cümledeki yerine ve bağlamına göre değişebildiğini</strong> fark etmek gerekiyor. Örneğin:</p><ul><li>“Bank’ta oturuyorum.” (parkta bir bank)</li><li>“Bank kredisi çektim.” (finans kurumu)</li></ul><p>İşte bu noktada <strong>BERT (Bidirectional Encoder Representations from Transformers)</strong> devreye giriyor. 2018’de Google tarafından geliştirilen BERT, kelimeleri <strong>çift yönlü bağlamda</strong> anlıyor. Yani, bir kelimenin anlamını hem öncesindeki hem sonrasındaki kelimelere bakarak çözümlüyor. Aslında bu yapıyı da <strong>Transformer </strong>mimarisine dayanarak yapıyor.</p><p>Transformer mimarisi ilk olarak 2017 de yine Google tarafından tanıtıldı. Bu mimarinin başlığı <strong>“Attention Is All You Need”</strong> idi.</p><figure><img alt="" src="https://cdn-images-1.medium.com/max/996/0*sIqhnSDtWQFWJNV-.png" /></figure><p>Bu başlık aslında yapay zeka mimarilerinde bulunan dikkat mekanizmasının öneminden bahsediyordu. Bir transformer mimarisi encoder ve decoder bloklarından oluşuyordu. Encoder, verilen girdiği matematiksel olarak anlamdırıyor; Decoder ise bunu tekrardan insanların anlayabileceği bir dile dönüştürüyordu. Bu iki blokta da aslında transformer mimarisinin neden önemli olduğu ve ileride büyük başarıların temeli olacağını gösteren bir yapı vardı: <strong>Multi-Head Attention</strong>. Bu mekanizma tam da başlıkla bağdaşır bir yapıdaydı. Diğer bloglarda daha detaylı ve matematiksel olarak açıklamak üzere şuan için bu yapı nedir kısaca anlatmak istiyorum.</p><p>Multi-Head Attention mekanizması bizim bir içeriği kavrayabilme (comprehension) yeteneğimizi arttıran bir yapıdır. Şöyle düşünelim, bir Tarihçi ve bir Mühendise yapay zeka hakkında fikirleri belirtmeleri isteniyor. Burada doğal olarak Tarihçi yapay zekanın tarihi gelişimine odaklanırken, mühendis ise teknik altyapısına odaklanır. Her bir uzmanlığı bir Head olarak düşünürsek Multi-Head yapısı hepsinin birleşimi olmuş oluyor. Bu da aslında bize her konuda daha tutarlı ve kapsayıcı cevaplar vermesini sağlıyor.</p><p>Multi-Head Attention’ı da cebe attığımıza göre artık Bert Embedding işleminin derinlerine girip blogumuzu tamamlamızın zamanı geldi.</p><figure><img alt="" src="https://cdn-images-1.medium.com/max/850/0*Rr3ZTme1JSIs_4lb.png" /></figure><p>BERT, bir cümleyi işlerken kelimelerin yalnızca kendilerine ait anlamlarını değil, aynı zamanda cümle içindeki konumlarını ve hangi alt cümlenin parçası olduklarını da göz önünde bulundurur. Bu bağlamda, BERT embedding işlemi üç ana bileşenden oluşur:</p><ol><li><strong>Token Embeddings</strong></li></ol><p>Öncelikle, bu aşamada tokenization işlemini kısaca açıklamak istiyorum. BERT modeli, öğrenme sürecinde büyük bir veri seti üzerinde ön eğitime tabi tutuldu. Tokenization işleminde ise her bir kelime veya alt kelime, 0&#39;dan başlayarak belirli bir numara verildi ve bu numaralandırma, tüm kelime havuzu tükenene kadar devam etti. Artık yeni bir kelime geldiğinde, o kelimenin bu numara ile temsil edilmesine <strong>tokenization</strong> işlemi denir.</p><figure><img alt="" src="https://cdn-images-1.medium.com/max/1024/0*dBiNQa70gl2tmKWm.png" /></figure><p><strong>Token embedding</strong> işleminde, örneğin BERT-base modeli üzerinden düşünürsek, bu model 768 adet gizli katmana sahiptir. Yani, modele 1 boyutlu bir girdi verdiğinizde, bu girdiyi 768 boyutlu bir vektör olarak çıktı verir. <strong>Token Embedding</strong> tam da burada devreye girer. Bu işlem, 0&#39;dan başlayarak belirlenen sayıların (tokenların), modelin ön eğitimi sonrasında 768 boyutlu bir vektör olarak temsil edilmesini sağlar.</p><p><strong>Örnek</strong></p><p>1.</p><ul><li><strong>Token 1:</strong> “Merhaba” (veya “Mer”, “##haba”)</li><li><strong>Token 2:</strong> “dünya”</li></ul><p>2.</p><ul><li>“Merhaba” token’ına <strong>ID: 1050</strong></li><li>“dünya” token’ına <strong>ID: 2034</strong><br>olarak atama yapılabilir.</li></ul><p>3.</p><ul><li>Bu ID değerlerinin BERT modeline girdi olarak verilmesi ve 768&#39;lik iki vektör ortaya çıkması.</li></ul><p><strong>2. Segment Embeddings</strong></p><p>BERT modelinde, örneğin iki ayrı cümle veya metin parçası olduğunu düşünelim. Her bir tokena, o tokenın ait olduğu segmenti (cümleyi) belirlemek için ekstradan 768 boyutlu bir vektör eklenir. Bu <strong>segment embedding</strong> işleminin amacı, iki cümleyi birbirinden ayırmaktır.</p><p>Örneğin, şu iki cümleyi ele alalım:</p><ul><li><strong>Cümle 1:</strong> “Onunla görüşmek istemiyorum.”</li><li><strong>Cümle 2:</strong> “Herkese iyi günler dilerim.”</li></ul><p>Bu durumda, Cümle 1&#39;deki tokenlara segment olarak <strong>1</strong> numarası verilir ve bu, modeldeki 768 boyutlu vektör çıktısına yansıtılır. Benzer şekilde, Cümle 2&#39;deki tokenlara ise segment olarak <strong>2</strong> numarası verilir. Bu sayede model, iki cümleyi birbirinden ayırt edebilir ve her bir tokenın hangi cümleye ait olduğunu anlayabilir.</p><p><strong>3. Positional Embeddings</strong></p><p><strong>Segment Embedding</strong>, cümlelerin konumlanmasını veya sırasını belirlerken, <strong>Positional Embedding</strong> ise bir token’ın cümle veya metin parçası içindeki sırasını temsil eder.</p><p>Örneğin, şu cümleyi ele alalım:<br><em>“Bugün günlerden pazar.”</em></p><p>Bu cümlenin tokenlarını <em>“Bugün”</em>, <em>“günlerden”</em> ve <em>“pazar”</em> olarak düşünelim. Bu tokenların <strong>positional embedding</strong> değerleri, cümledeki sıralarına göre belirlenir. Yani:</p><ul><li><em>“Bugün”</em> → 1</li><li><em>“günlerden”</em> → 2</li><li><em>“pazar”</em> → 3</li></ul><p>Bu sıra numaraları, BERT-base modelinin çıktısı olan 768 boyutlu bir vektöre dönüştürülür. Bu sayede model, her bir token’ın cümle içindeki konumunu anlayabilir ve bu bilgiyi işlemlerinde kullanır.</p><p>Evet, üç farklı embedding yönteminden bahsettik. BERT Embedding’in genel formülasyonu, her biri 768 boyutlu vektörler olan bu embeddinglerin toplamıyla oluşur. Formül şu şekildedir:</p><blockquote><strong>BERT Embedding = Token Embedding + Segment Embedding + Positional Embedding</strong></blockquote><p>Bu üç embedding türünün toplamı, BERT modelinin her bir token için oluşturduğu nihai vektör temsilini (embedding) ortaya çıkarır.</p><h3>Kapanış</h3><p>Evet, blogumuzun sonuna geldik. Umarım yazıyı beğenmişsinizdir. Olumlu veya olumsuz, her türlü geri dönüşe açığım. Ayrıca, gelecekte ne tarz içerikler üretilebileceği konusunda da önerilerinizi bekliyorum. Sağlıcakla kalın, bir sonraki blogta görüşmek üzere!</p><img src="https://medium.com/_/stat?event=post.clientViewed&referrerSource=full_rss&postId=78c0ab069916" width="1" height="1" alt="">]]></content:encoded>
        </item>
        <item>
            <title><![CDATA[Evaluate and Debug Your GenAI Model | Part 1 — Introduction to W&B]]></title>
            <link>https://halil7hatun.medium.com/evaluate-and-debug-your-genai-model-part-1-introduction-to-w-b-8ac2d91a1e00?source=rss-a71bf613874c------2</link>
            <guid isPermaLink="false">https://medium.com/p/8ac2d91a1e00</guid>
            <category><![CDATA[wandb]]></category>
            <category><![CDATA[llmops]]></category>
            <category><![CDATA[genai]]></category>
            <category><![CDATA[llm]]></category>
            <category><![CDATA[evaluate]]></category>
            <dc:creator><![CDATA[Halil İbrahim Hatun]]></dc:creator>
            <pubDate>Fri, 03 Jan 2025 06:53:15 GMT</pubDate>
            <atom:updated>2025-01-03T06:53:15.872Z</atom:updated>
            <content:encoded><![CDATA[<h3>Evaluate and Debug Your GenAI Model | Part 1 — Introduction to W&amp;B</h3><p>This tutorial was designed using the Weights &amp; Biases (W&amp;B) Library.</p><p>Hello, ı hope everything is going well 😊. Today is so exciting for me 🎉. I am starting a new tutorial that touches on how we can monitor our LLM or GenAI model during fine-tuning, serving, and evaluating 💡. These kinds of cases are crucial nowadays because due to AI, everything is becoming so easier to handle day by day. However, evaluating and monitoring part is still extremely important to specify customer market, average cost, and so on. Many different kinds of apps help us in this kind of situation. I prefer to continue with <strong>Weight &amp; Biases (W&amp;B)</strong> which is more effective than others, ı think.</p><p>Let’s start our tutorial by mentioning how can we log our model during training.</p><figure><img alt="" src="https://cdn-images-1.medium.com/max/1024/1*Q9k7uPvaueOUp_x42kwf1g.png" /></figure><h3>What is W&amp;B?</h3><p>Weights &amp; Biases (W&amp;B) is your new best friend in the world of machine learning. It’s like having a personal assistant who keeps track of all your experiments, visualizes your model’s performance, and even helps you collaborate with your team. Imagine having a superhero sidekick who makes sure your AI adventures are smooth and successful.</p><h4>Key Features of W&amp;B:</h4><ol><li><strong>Experiment Tracking</strong>: Never lose track of your experiments again. W&amp;B logs and tracks everything, so you can compare runs and figure out what’s working (and what’s not).</li><li><strong>Visualization</strong>: See your model’s performance metrics, loss curves, and more in real time. It’s like having a crystal ball for your AI projects.</li><li><strong>Collaboration</strong>: Share your experiments with your team, get feedback, and work together more effectively. Teamwork makes the dream work, right?</li><li><strong>Version Control</strong>: Keep tabs on different versions of your models and datasets. No more “Wait, which version was this again?” moments.</li><li><strong>Integration</strong>: W&amp;B plays nicely with popular machine learning frameworks like PyTorch and TensorFlow. It’s like the social butterfly of AI tools.</li></ol><h3>Why Use W&amp;B?</h3><p>Using W&amp;B can supercharge your workflow. Here’s how:</p><ul><li><strong>Efficiency</strong>: Automate the logging of your experiments and say goodbye to manual errors. It’s like having a robot butler for your AI projects.</li><li><strong>Insight</strong>: Get deep insights into your model’s performance with detailed visualizations and analytics. Knowledge is power!</li><li><strong>Collaboration</strong>: Work better with your team by sharing and discussing experiments. Two heads are better than one, after all.</li><li><strong>Reproducibility</strong>: Ensure your experiments are reproducible, making debugging and improvements a breeze. No more “It worked on my machine” excuses.</li></ul><p>Okay, enough theory. Let’s dive in and have some fun!</p><h3>Let’s Create a Basic MLP Model and Log it with W&amp;B for Sprite Classification</h3><figure><img alt="" src="https://cdn-images-1.medium.com/max/745/1*DNQiFmhtEnfz4abtsCAKRA.png" /></figure><h4>Step 1: Define Our Libraries</h4><p>First things first, let’s import the necessary libraries:</p><pre>import math<br>from pathlib import Path<br>from types import SimpleNamespace<br>from tqdm.auto import tqdm<br>import torch<br>import torch.nn as nn<br>import torch.nn.functional as F<br>from torch.optim import Adam<br>from utilities import get_dataloaders<br><br>import wandb</pre><h4>Step 2: Define Constants and Create the get_model Method</h4><p>Let’s set up our constants and build a simple model:</p><pre>INPUT_SIZE = 3 * 16 * 16<br>OUTPUT_SIZE = 5<br>HIDDEN_SIZE = 256<br>NUM_WORKERS = 2<br>CLASSES = [&quot;hero&quot;, &quot;non-hero&quot;, &quot;food&quot;, &quot;spell&quot;, &quot;side-facing&quot;]<br>DATA_DIR = Path(&#39;./data/&#39;)<br>DEVICE = torch.device(&quot;cuda&quot; if torch.cuda.is_available()  else &quot;cpu&quot;)<br><br>def get_model(dropout):<br>    &quot;Simple MLP with Dropout&quot;<br>    return nn.Sequential(<br>        nn.Flatten(),<br>        nn.Linear(INPUT_SIZE, HIDDEN_SIZE),<br>        nn.BatchNorm1d(HIDDEN_SIZE),<br>        nn.ReLU(),<br>        nn.Dropout(dropout),<br>        nn.Linear(HIDDEN_SIZE, OUTPUT_SIZE)<br>    ).to(DEVICE)</pre><h4>Step 3: Define Hyperparameters</h4><p>Let’s store our hyperparameters in a config object:</p><pre>config = SimpleNamespace(<br>    epochs = 2,<br>    batch_size = 128,<br>    lr = 1e-5,<br>    dropout = 0.5,<br>    slice_size = 10_000,<br>    valid_pct = 0.2,<br>)</pre><h4>Step 4: Define Train and Evaluate Methods</h4><p>Now, let’s define our training and evaluation methods:</p><pre>def train_model(config):<br>    &quot;Train a model with a given config&quot;<br>    <br>    wandb.init(<br>        project=&quot;dlai_intro&quot;,<br>        config=config,<br>    )<br><br>    # Get the data<br>    train_dl, valid_dl = get_dataloaders(DATA_DIR, <br>                                         config.batch_size, <br>                                         config.slice_size, <br>                                         config.valid_pct)<br>    n_steps_per_epoch = math.ceil(len(train_dl.dataset) / config.batch_size)<br><br>    # A simple MLP model<br>    model = get_model(config.dropout)<br><br>    # Make the loss and optimizer<br>    loss_func = nn.CrossEntropyLoss()<br>    optimizer = Adam(model.parameters(), lr=config.lr)<br><br>    example_ct = 0<br><br>    for epoch in tqdm(range(config.epochs), total=config.epochs):<br>        model.train()<br><br>        for step, (images, labels) in enumerate(train_dl):<br>            images, labels = images.to(DEVICE), labels.to(DEVICE)<br><br>            outputs = model(images)<br>            train_loss = loss_func(outputs, labels)<br>            optimizer.zero_grad()<br>            train_loss.backward()<br>            optimizer.step()<br><br>            example_ct += len(images)<br>            metrics = {<br>                &quot;train/train_loss&quot;: train_loss,<br>                &quot;train/epoch&quot;: epoch + 1,<br>                &quot;train/example_ct&quot;: example_ct<br>            }<br>            # To log training metrics on W&amp;B dashboard<br>            wandb.log(metrics)<br>            <br>        # Compute validation metrics, log images on last epoch<br>        val_loss, accuracy = validate_model(model, valid_dl, loss_func)<br>        # Compute train and validation metrics<br>        val_metrics = {<br>            &quot;val/val_loss&quot;: val_loss,<br>            &quot;val/val_accuracy&quot;: accuracy<br>        }<br>        # To log validation metrics on W&amp;B dashboard<br>        wandb.log(val_metrics)<br>     <br>    # Ending process<br>    wandb.finish()</pre><pre>def validate_model(model, valid_dl, loss_func):<br>    &quot;Compute the performance of the model on the validation dataset&quot;<br>    model.eval()<br>    val_loss = 0.0<br>    correct = 0<br><br>    with torch.inference_mode():<br>        for i, (images, labels) in enumerate(valid_dl):<br>            images, labels = images.to(DEVICE), labels.to(DEVICE)<br><br>            # Forward pass<br>            outputs = model(images)<br>            val_loss += loss_func(outputs, labels) * labels.size(0)<br><br>            # Compute accuracy and accumulate<br>            _, predicted = torch.max(outputs.data, 1)<br>            correct += (predicted == labels).sum().item()<br>            <br>    return val_loss / len(valid_dl.dataset), correct / len(valid_dl.dataset)</pre><h4>Step 5: Configure W&amp;B and Bake Our Model</h4><p>Before we shine, let’s configure W&amp;B. You can see your logs without an account, or use your W&amp;B API key to save them.</p><p>If you want to continue without signing in, enter 1. If not, enter 2 and paste your W&amp;B API key. You can find it <a href="https://wandb.ai/settings#api"><strong><em>here</em></strong></a>. I&#39;ll continue with my API key.</p><figure><img alt="" src="https://cdn-images-1.medium.com/max/1024/1*vlqlADYbgQrqdHvNVj4_yw.png" /></figure><p>And… Time to bake our model!</p><pre>train_model(config)</pre><p>Our baked model is ready! You can see your logs in Jupyter Notebook. But that’s not all. You can also access your project link that comes after “View Project.”</p><figure><img alt="" src="https://cdn-images-1.medium.com/max/1024/1*tNzZUKvmw_Va82fvtI1kqg.png" /></figure><figure><img alt="" src="https://cdn-images-1.medium.com/max/1024/1*lhTaxxndILWrMiO-EPjOCQ.png" /></figure><p>We trained our model and saw our logs on our accounts. However, we often don’t get what we want in one training process. Let’s do other training processes by changing the learning rate.</p><pre>config.lr = 1e-4<br>train_model(config)<br><br>config.lr = 1e-4<br>train_model(config)<br><br>config.dropout = 0.1<br>config.epochs = 1<br>train_model(config)<br><br>config.lr = 1e-3<br>train_model(config)</pre><p>After running these commands, we trained four models with different hyperparameters. When you click on our project on W&amp;B again, you can compare them like this:</p><figure><img alt="" src="https://cdn-images-1.medium.com/max/1024/1*JfPKTOvlo_iDO03n2fbrYw.png" /></figure><p>That’s it! We have trained many models and compared them. Therefore, we can easily decide which hyperparameter configuration is more suitable.</p><figure><img alt="" src="https://cdn-images-1.medium.com/max/1024/1*vXfuDzLpSPvP0WC275KOAA.png" /></figure><h3>Conclusion</h3><p>In this blog, we introduced W&amp;B by training simple MLP models with different hyperparameters and logged their metrics into W&amp;B.</p><p>This is just the beginning. I will continue with training diffusion models, evaluating them, tracing our LLM models, and fine-tuning them by always using W&amp;B.</p><p>Your responses are so valuable for me to continue this tutorial. Play the waiting game. See you soon, bye-bye!</p><img src="https://medium.com/_/stat?event=post.clientViewed&referrerSource=full_rss&postId=8ac2d91a1e00" width="1" height="1" alt="">]]></content:encoded>
        </item>
        <item>
            <title><![CDATA[Exploring DataGemma: An Overview]]></title>
            <link>https://halil7hatun.medium.com/exploring-datagemma-an-overview-89d0e369e562?source=rss-a71bf613874c------2</link>
            <guid isPermaLink="false">https://medium.com/p/89d0e369e562</guid>
            <category><![CDATA[gemini]]></category>
            <category><![CDATA[retrieval-augmented-gen]]></category>
            <category><![CDATA[gemma]]></category>
            <category><![CDATA[datagemma]]></category>
            <category><![CDATA[google]]></category>
            <dc:creator><![CDATA[Halil İbrahim Hatun]]></dc:creator>
            <pubDate>Fri, 13 Sep 2024 13:55:12 GMT</pubDate>
            <atom:updated>2024-09-13T13:57:16.270Z</atom:updated>
            <content:encoded><![CDATA[<figure><img alt="" src="https://cdn-images-1.medium.com/max/1024/1*tl1jtg-ytt7uG_EY6onurg.jpeg" /><figcaption>Figure 1. DataGemma Logo</figcaption></figure><p>Despite the advancements in large language models (LLMs), AI hallucinations remain a significant challenge. On September 12, Google made a major stride by releasing DataGemma as open source. DataGemma leverages real-world data to tackle these hallucinations, reflecting Google’s commitment to addressing this issue. In this blog, I will provide an overview of DataGemma and explore two distinct approaches used to improve LLM accuracy and reasoning.</p><h3>What is Data Commons</h3><p>Google’s Data Commons project serves as a vast repository of public data, designed to streamline the access and use of important global statistics. It consolidates information from a wide range of trusted sources, including the United Nations, government agencies, environmental organizations, and universities. With over 250 billion data points and more than 2.5 trillion triples, it represents a significant open-source initiative aimed at making global data more accessible and useful.</p><p>Data Commons features two notable innovations. First, it has dedicated years to curating diverse public datasets, understanding their underlying assumptions, and organizing them using Schema.org, a universal language for structured data. This effort results in a comprehensive Knowledge Graph that integrates data from various sources.</p><p>Second, Data Commons incorporates a natural language interface powered by large language models (LLMs). This allows users to pose questions in everyday language, with the LLM translating these queries into the Data Commons’ format. This interface facilitates the exploration of charts and graphs without altering or fabricating the underlying data.</p><h3>Interfacing LLMs with Data Commons</h3><p>Two different approaches have been described for interfacing LLMs with Data Commons.</p><figure><img alt="" src="https://cdn-images-1.medium.com/max/903/1*bRY5hAosE-AA3xWXxxitJw.png" /><figcaption>Figure 2. Comparison of Baseline, RIG, and RAG approaches for generating responses with statistical data</figcaption></figure><p>The first approach, called Retrieval Interleaved Generation (RIG), fine-tunes the LLM to not only generate natural language queries but also pull stats from Data Commons. It uses a multi-step pipeline to convert these into structured data queries. We then compare this to how the base models, Gemma 7B IT and 27B IT, perform.</p><p>The second approach, Retrieval Augmented Generation (RAG), takes a more classic retrieval method. It extracts variables from the query, grabs relevant data, and adds context to the original question. Then it produces an answer using an LLM (Gemini 1.5 Pro), which we use for comparison against the baseline results.</p><h4>Retrieval Interleaved Generation (RIG)</h4><p>Retrieval Interleaved Generation (RIG) is a three-step process designed to enhance the accuracy and reliability of language model responses. First, a fine-tuned model generates natural language queries for Data Commons. Next, a post-processor converts these queries into structured data formats. Finally, the system retrieves statistical answers from Data Commons and presents them alongside the original LLM-generated results.</p><iframe src="https://cdn.embedly.com/widgets/media.html?src=https%3A%2F%2Fplayer.vimeo.com%2Fvideo%2F1009179191%3Fapp_id%3D122963&amp;dntp=1&amp;display_name=Vimeo&amp;url=https%3A%2F%2Fvimeo.com%2F1009179191%3Fshare%3Dcopy&amp;image=https%3A%2F%2Fi.vimeocdn.com%2Fvideo%2F1926331344-44306034b89ec510d9aebe77559abb8c0da0753ecaa20e3c781af88daadef958-d_1280&amp;key=a19fcc184b9711e1b4764040d3dc5c07&amp;type=text%2Fhtml&amp;schema=vimeo" width="1440" height="810" frameborder="0" scrolling="no"><a href="https://medium.com/media/87680ebcc96ea4b1fe1bed4e23136c61/href">https://medium.com/media/87680ebcc96ea4b1fe1bed4e23136c61/href</a></iframe><p>In this process, when the LLM provides a numerical answer, it is matched with the most relevant value from the Data Commons database, known as the Data Commons Statistical Value (DC-SV). The original output from the LLM is referred to as the LLM Statistical Value (LLM-SV). Instead of generating formal queries like SQL, the LLM is fine-tuned to produce natural language queries. This method is more efficient given the vast array of variables in Data Commons and helps maintain the natural and fluent quality of the model’s responses.</p><p><strong>Query Conversation Part</strong></p><p>In their pipeline, natural language queries generated by the LLM are transformed into structured queries for the Data Commons database. Despite the extensive range of variables and properties in Data Commons, most queries can be categorized into a few types, which streamlines the extraction process. Each query is broken down into key components: statistical variables or topics, places, and attributes. Specific NLP techniques are applied to these components: semantic search for variables, named entity recognition for places, and regex-based heuristics for attributes.</p><p>The identified components are then mapped to predefined query templates, as illustrated in the table below:</p><figure><img alt="" src="https://cdn-images-1.medium.com/max/1024/1*f3rF9-99rGxZquNUE_I4dQ.png" /><figcaption>Figure 4. Predefined query templates for RIG approach</figcaption></figure><p>Structured queries are generated based on these templates and submitted to the Data Commons API. The response, typically a numeric value, is presented alongside the original LLM-generated statistic, facilitating verification of the LLM’s output. Future developments will explore various presentation methods for these results, including side-by-side comparisons and highlighted differences.</p><h4><strong>Retrieval Augmented Generation (RAG)</strong></h4><p>In the RAG pipeline, the process begins with a fine-tuned LLM managing the user’s query. This model generates relevant queries for Data Commons, which are used to retrieve pertinent tables from the Data Commons interface. Finally, a long-context LLM, such as Gemini 1.5 Pro, generates a response based on both the original query and the retrieved tables.</p><iframe src="https://cdn.embedly.com/widgets/media.html?src=https%3A%2F%2Fplayer.vimeo.com%2Fvideo%2F1009183444%3Fapp_id%3D122963&amp;dntp=1&amp;display_name=Vimeo&amp;url=https%3A%2F%2Fvimeo.com%2F1009183444%3Fshare%3Dcopy&amp;image=https%3A%2F%2Fi.vimeocdn.com%2Fvideo%2F1926340180-c70249652eec9a7be236f572ad03fadb2aceaa7c2d801906e0aa289336d4a8c5-d_1280&amp;key=a19fcc184b9711e1b4764040d3dc5c07&amp;type=text%2Fhtml&amp;schema=vimeo" width="1440" height="810" frameborder="0" scrolling="no"><a href="https://medium.com/media/f1091fab536e11ccbdac04fa898ffd91/href">https://medium.com/media/f1091fab536e11ccbdac04fa898ffd91/href</a></iframe><p><strong>Extracting Data Commons Queries</strong></p><p>An LLM is fine-tuned to transform user queries into Data Commons queries. Training utilizes Gemini 1.5 Pro to generate queries in specific formats. Although the effectiveness can be limited by data availability, the initial method generally provides better results compared to alternatives that use a full variable list.</p><p><strong>Retrieving Tables</strong></p><p>Queries are processed using the RIG framework to identify variables and map them to Data Commons APIs. These APIs return relevant tables, such as life expectancy by country, which are used for generating responses.</p><p><strong>Prompting</strong></p><p>Once the tables are retrieved, a prompt is created combining the original query and serialized table data. This prompt is then processed by long-context LLMs like Gemini 1.5 Pro to generate and return a comprehensive response.</p><h3>Conclusion</h3><p>I hope this information proves helpful. I plan to continue exploring DataGemma, focusing on evaluating RIG and RAG approaches and their implementation in code.</p><p>Stay tuned for the next updates on this topic. Take care!</p><img src="https://medium.com/_/stat?event=post.clientViewed&referrerSource=full_rss&postId=89d0e369e562" width="1" height="1" alt="">]]></content:encoded>
        </item>
        <item>
            <title><![CDATA[Cultivating Hospitality Excellence: Transformative Applications for Efficient Staff Management in…]]></title>
            <link>https://halil7hatun.medium.com/cultivating-hospitality-excellence-transformative-applications-for-efficient-staff-management-in-dd71d4fbe63c?source=rss-a71bf613874c------2</link>
            <guid isPermaLink="false">https://medium.com/p/dd71d4fbe63c</guid>
            <category><![CDATA[hotel]]></category>
            <category><![CDATA[hotel-industry]]></category>
            <category><![CDATA[hotel-technology]]></category>
            <category><![CDATA[hotel-booking]]></category>
            <category><![CDATA[hotel-management]]></category>
            <dc:creator><![CDATA[Halil İbrahim Hatun]]></dc:creator>
            <pubDate>Fri, 12 Apr 2024 10:24:30 GMT</pubDate>
            <atom:updated>2024-04-12T10:28:41.642Z</atom:updated>
            <content:encoded><![CDATA[<h3>Cultivating Hospitality Excellence: Transformative Applications for Efficient Staff Management in the Hotel Industry</h3><figure><img alt="" src="https://cdn-images-1.medium.com/max/1024/1*Nsx8gHxVyyVqjyIB37vd2w.jpeg" /><figcaption>Image 1 — Driving Hotel Technology</figcaption></figure><p>Today, I’d like to discuss optimizing staff management to enhance hospitality in hotels. It’s important to note that the information I’ll be sharing reflects my opinion or is sourced from various references.</p><h4><strong>General Based Apps</strong></h4><p>In the vast realm of mobile apps, numerous solutions address this question. I’ll list some of them and offer insights for better understanding</p><ol><li><strong>İci4Stuff</strong></li></ol><figure><img alt="" src="https://cdn-images-1.medium.com/max/800/1*S-GZc_P_04WHOXwiYjeqRA.jpeg" /><figcaption>Image 2 — İci4Stuff</figcaption></figure><p>Let’s delve into İcibot’s StaffApp, named “İci4Staff”<em> [1]</em>. Aligned with the common goal of enhancing hospitality, its primary aim is to offer swift support via mobile devices. Should a customer encounter any inquiries or issues, they can effortlessly relay them through the mobile app. These matters are then promptly assigned to staff members, who can conveniently access them on their mobile devices.</p><p><strong>2. Smart Parking</strong></p><figure><img alt="" src="https://cdn-images-1.medium.com/max/800/1*UUd9ah3IOjX_9PXa_rQ7-A.jpeg" /><figcaption>Image 3 — Smart Parking</figcaption></figure><p>Let’s further explore the concept of smart parking<em> [2]</em>. As you’re aware, even within hotels, securing parking spots can be a significant bottleneck. To address this, computer vision technology can guide staff to the most optimal parking areas, ensuring efficient space utilization.</p><p><strong>3. Webee</strong></p><figure><img alt="" src="https://cdn-images-1.medium.com/max/580/1*pYFUbEGYyj2jNGgJDiDOBQ.jpeg" /><figcaption>Image 4 — Webee</figcaption></figure><p>Webee is one of the most widely used hotel management apps in Türkiye <em>[3]</em>, serving a purpose similar to İcibot.</p><h4>CRM and Hospitality</h4><p>The apps I mentioned earlier primarily focus on the broader aspects of hotel management. Now, let’s delve into other applications specifically dedicated to CRM, guest optimization, and hospitality.</p><ol><li><strong>Beonx</strong></li></ol><figure><img alt="" src="https://cdn-images-1.medium.com/max/655/1*Y_dulAEkQFzwWzzJcbkEOw.jpeg" /><figcaption>Image 5 — Dashboard page of BeonX</figcaption></figure><p>One notable example is Beonx<em> [4]</em>, which prioritizes sustainable profitability. They employ AI-powered strategies such as Revenue Optimization, Hyper Segmentation, Real-Time Data and Automation, Demand Forecasting, and more to optimize hotel costs.</p><p><strong>2. Bookboost</strong></p><figure><img alt="" src="https://cdn-images-1.medium.com/max/1024/1*XaoQnSoipxwe3HMXsbfuyw.png" /><figcaption>Image 6 — Dashboard page of Bookboost</figcaption></figure><p>It’s recognized as a Multi-channel CRM system <em>[5]</em>. Its objectives include increasing revenues, refining marketing operations, and boosting customer loyalty through the creation of individual customer profiles and the enhancement of conversation campaigns.</p><h4>Personal Opinion</h4><p>In my view, CRM stands out as the most potent method for optimizing revenues through personalized service and prompt feedback. Consequently, there’s a need to enhance data collection methods to gain deeper insights into customers’ behaviors.</p><h4>Conclusion</h4><p>In summary, hospitality is paramount in the hotel industry, and there are numerous approaches to address it. The key lies in our intent and execution. I aimed to convey essential information concisely. I hope this brief blog proved helpful to you.</p><p>Take care, and stay tuned for more in upcoming blogs.</p><h3>References</h3><ol><li><a href="https://icibot.com/en/platforms/ici4staff/">https://icibot.com/en/platforms/ici4staff/</a></li><li><a href="https://medium.com/debutinfotech/mobile-apps-hospitality-a-high-tech-makeover-on-the-move-ceee9d71b771">https://medium.com/debutinfotech/mobile-apps-hospitality-a-high-tech-makeover-on-the-move-ceee9d71b771</a></li><li><a href="https://www.getwebee.com/tr/webee-cozumler">https://www.getwebee.com/tr/webee-cozumler</a></li><li><a href="https://www.beonx.com/">https://www.beonx.com/</a></li><li><a href="https://www.bookboost.io/multi-channel-hospitality-crm">https://www.bookboost.io/multi-channel-hospitality-crm</a></li></ol><img src="https://medium.com/_/stat?event=post.clientViewed&referrerSource=full_rss&postId=dd71d4fbe63c" width="1" height="1" alt="">]]></content:encoded>
        </item>
        <item>
            <title><![CDATA[Taking a Broad View of the Israel-Hamas Conflict]]></title>
            <link>https://halil7hatun.medium.com/taking-a-broad-view-of-the-israel-hamas-conflict-f76363269720?source=rss-a71bf613874c------2</link>
            <guid isPermaLink="false">https://medium.com/p/f76363269720</guid>
            <category><![CDATA[west-bank]]></category>
            <category><![CDATA[hamas]]></category>
            <category><![CDATA[gaze]]></category>
            <category><![CDATA[jerusalem]]></category>
            <category><![CDATA[israel]]></category>
            <dc:creator><![CDATA[Halil İbrahim Hatun]]></dc:creator>
            <pubDate>Fri, 20 Oct 2023 09:33:25 GMT</pubDate>
            <atom:updated>2023-10-20T12:57:14.969Z</atom:updated>
            <content:encoded><![CDATA[<p>As many are aware, the conflict between Israel and Hamas has reignited once again. I’ve delved into research and analysis on this critical issue. Today, my aim is to discuss these matters. We’ll navigate through the complexities, aiming for a clear and balanced perspective.</p><p>Alright, let’s delve into the intricate history of the Palestine-Israel relationship. It’s a subject that demands our attention and consideration.</p><h4>History of Israel-Palestine Relations: A Swift Overview</h4><p>The relationship between Israel and Palestine is a complex historical narrative. It begins with ancient Hebrew settlements around 1300 BCE, followed by various empires’ rule, including the Ottomans from the 16th century to World War I. After WWI, British control was established, leading to complications.</p><p>Zionism, a movement for a Jewish homeland, gained traction in the late 19th century, backed by the Balfour Declaration in 1917. The UN proposed a partition plan in 1947, resulting in Israel’s declaration of independence in 1948 and a subsequent war.</p><p>The conflict led to the Nakba, displacing many Palestinians. The aftermath of this war continues to influence the region’s politics. The Six-Day War in 1967 further complicated matters, leading to Israeli occupation of the West Bank, Gaza Strip, and East Jerusalem.</p><p>Though there have been attempts at peace, such as the Oslo Accords in the 1990s, finding a lasting solution remains challenging due to deep-rooted historical, religious, and political differences. Understanding this complex history is crucial for meaningful discussions about the region’s present and future.</p><figure><img alt="" src="https://cdn-images-1.medium.com/max/933/1*iAsX48lZrTfpgjmZHvRxaQ.jpeg" /></figure><h3>A Deeper Dive into the Israel-Palestine Conflict</h3><h4>1. Israeli Side</h4><p>Support for Israel is prevalent, particularly in the United States. The majority of the American government has historically aligned itself with Israel. However, it’s worth noting that many young Americans, especially teenagers, might not possess in-depth knowledge about the intricacies of the conflict. Often, they rely on what they’ve been taught or what they hear in passing.</p><p>Interestingly, within this pro-Israel camp, there are cases of Palestinians who also lend their support to Israel and advocate for its cause. One notable example is Ali Wahap, an Arab Muslim who has publicly expressed this viewpoint. Let’s look the text Ali Wahab said:</p><figure><img alt="" src="https://cdn-images-1.medium.com/max/598/1*zF-hUe9fagHXnm40C079rA.png" /></figure><p>As we embark on this journey of comprehension, let’s start with a concise overview. In the forthcoming blogs, we’ll plunge into the intricacies. One term that sparks curiosity in Ali Wahab’s discourse is “apartheid.” It raises an important question: can Israel be labeled as an apartheid state? To tackle this, we’ll dissect diverse viewpoints and meticulously assess each stance.</p><p><strong>Envisioned opinion</strong></p><p>Those who have experienced Israel firsthand, whether through residency or visits, often contest the label of “apartheid state” attributed to it. They argue that Israel, as a nation, upholds a policy of equal rights for all its citizens, irrespective of their race, religion, ethnicity, gender, or sexual orientation. This perspective insists that any notion of Israel as an apartheid state is a gross mischaracterization.</p><figure><img alt="" src="https://cdn-images-1.medium.com/max/469/1*HWEA04pvSZx2o1VkTWma1g.png" /></figure><p>Critics, however, interpret these claims differently. They assert that anti-Israel sentiments are rooted in a desire for the dismantling of Israel itself. To them, branding Israel as an apartheid state signifies the separation between Israelis and Palestinians, and is seen as a strategic move toward eradicating Israel in favor of a unified Palestine. This would entail the return of descendants of refugees and a shift in the demographic majority towards Muslims.</p><p>Within Israel, there exists a significant Arab population who hold citizenship, affording them equal rights. Notably, these Arab citizens are not subject to compulsory military service. It is important to note that, due to the absence of a civil marriage law, individuals of different religions cannot legally marry within Israel. Instead, marriage laws are rooted in the Ottoman era. However, marriages conducted abroad under civil law are recognized within the country.</p><figure><img alt="" src="https://cdn-images-1.medium.com/max/600/1*_UvpWsuWP0iOzQZbAHURRA.png" /></figure><p><strong>Waking up from an envisioned opinion (fable)</strong></p><p>That really depends on how you want to define apartheid.</p><p>The frustrating thing about the apartheid debate is that the two sides argue completely different points.</p><p>Those who want to paint Israel as the oppressor point to the West Bank. Those who want to point to Israel as “the only democracy in the Middle East” point to Palestinian citizens of Israel.</p><p>It’s true that Palestinian Israelis can, and do, serve as lawyers, doctors, teachers, policemen, politicians and judges. This is not to say that there is no discrimination in Israel, because of course there is, but to call <em>this </em>apartheid is just silly. Discrimination is not apartheid.</p><p>So, the more relevant issue is that of the West Bank.</p><p>The West Bank is split into 3 (non-contiguous) areas, A, B and C, set out in the Oslo Accords.</p><p>Area A is under complete Palestinian administration, and comprises the major Palestinian urban centres. It makes up about 18% of the West Bank in area.</p><p>Area B is under Palestinian civil authority and Israeli military authority, and makes up about 22% of the West Bank.</p><p>Neither area A nor area B have Jewish residents, and therefore we cannot really talk about apartheid there.</p><p>Area C is where the issue really lies. Some 60% of the West Bank, populated both by Israeli Jews and by Palestinians. This area is under full Israeli control.</p><figure><img alt="" src="https://cdn-images-1.medium.com/max/1024/1*Ne2Nac9X_y1Ct8ENcxJ7qg.jpeg" /></figure><p>There is no arguing the fact that Israeli and Palestinian residents of Area C are not equal.</p><p>Israeli Jews are full and equal Israeli citizens, beholden to, and served by, the legal systems of Israel.</p><p>Palestinians are not Israeli citizens, and are beholden to Israeli <em>military </em>courts. They do not generally have recourse to the Israeli civil courts (and indeed, just this week a law was passed that further restricts Palestinian access to the High Court in land disputes). They are policed by the IDF, rather than by police forces, which in general means that they do not benefit from police protection, but are only restricted by it.</p><figure><img alt="" src="https://cdn-images-1.medium.com/max/798/1*9GeCM2Uu5VHGlFC8SbzDmQ.png" /></figure><p>This is the basis of claims of apartheid — you have Jews and Arabs living side by side, under different legal jurisdictions.</p><p>Is this apartheid? As I said, it depends on how you want to define apartheid. You can definitely define it broadly, in a way which seems to include this situation — there is indeed a legal distinction between two populations, which plays out along racial/ethnic lines.</p><p>The counter argument to this is, of course, that it is not a racial or ethnic distinction, it is a perfectly normal distinction between citizens and non citizens. All non-citizens are not equal to citizens, in every country. Palestinian citizens who go to area C are still full citizens.</p><figure><img alt="" src="https://cdn-images-1.medium.com/max/617/1*eCms0sI_NuZRoBjVZ_Ajyg.png" /></figure><p>The counter-counter-argument could claim that this is something of a facetious claim, in that citizenship was never granted to the Palestinian residents of the West Bank, whereas the Jewish residents have citizenship based on their ethnicity, and indeed any Jew in the entire world could come tomorrow and receive citizenship. If citizenship is ethnically biased, then a distinction based on citizenship is ethnically biased too.</p><p>And there are counter-counter-counter arguments, and counter-counter-counter-counter arguments. It goes on for ever.</p><p>(If you’re keen on delving deeper into this topic, I highly recommend watching<a href="https://www.youtube.com/watch?v=aEdGcej-6D0&amp;t=1037s"><strong> this video</strong></a><strong>.</strong> It provides valuable insights and a nuanced perspective on the matter.)</p><h4>2. Hamas Side</h4><p><em>What is Hamas?</em> Alright, so, Hamas is a Palestinian political and military organization. They’ve been a significant player in the Israeli-Palestinian conflict for quite some time now. They emerged in the late 1980s and have since gained considerable support, especially in the Gaza Strip.</p><p>Now, when it comes to their actions against Israel, things get pretty intense. Hamas has been involved in a number of conflicts and acts of violence. They’ve launched rockets and carried out suicide bombings, which have resulted in both civilian and military casualties on the Israeli side.</p><p>These actions have led to a lot of tension and responses from Israel, including military operations in the Gaza Strip. It’s safe to say that Hamas and Israel have a long history of conflict, and their interactions continue to shape the dynamics of the region.</p><figure><img alt="" src="https://cdn-images-1.medium.com/max/1024/1*H0Kck_SmQEAnV-zKElZcig.jpeg" /></figure><p><em>What is Hamas’ authority in Gaza? </em>Hamas isn’t a political party, it’s not running a state and it’s not a religious cult either. Hamas is an organized crime ring, anchored in Islamic fundamentalism, that is also performing the duties of a state in Gaza. In other words, it is an unholy alliance between mafia and religious wackos. As such it has complete control over the distribution of goods in Gaza. Those who do not work for Hamas live in abject squalor, being a member is basically a prerequesite to hold a job, with very few exceptions. Those who oppose Hamas die a violent death. The result is most people either work for or with Hamas and no one is willing to oppose them, it’s too dangerous and spies are everywhere.</p><h3>Conclusion</h3><p>The human toll in this conflict is immense, with lives lost on both sides. What’s even more disheartening is the passive stance taken by many powerful nations, who either watch this tragedy unfold or make hollow and insensitive statements. It’s a painful truth that nothing in this world is as precious as the smile of a child, and yet, the innocence of many is being ruthlessly extinguished. This is neither a matter for negotiation nor a case for defense.</p><p>Our hearts ache for the countless innocent lives lost in this conflict, and we yearn for a sense of peace and reprieve for all those affected, regardless of their background or affiliation.</p><p>In my forthcoming blogs, I’ll delve deeper into the political complexities of this conflict, examining the various perspectives and avenues for resolution. I remain open to all responses and negotiations, and I fervently wish for a swift end to this ongoing struggle. Until then, take care.</p><p>Now, here’s where it gets interesting. Within the Palestinian community, there were some who, surprisingly, supported Israel and advocated for its cause. Take, for example, Ali Wahap, an Arab Muslim who made his stance clear. This just goes to show that even in the midst of a heated conflict, there were individuals who saw things from a different perspective.</p><img src="https://medium.com/_/stat?event=post.clientViewed&referrerSource=full_rss&postId=f76363269720" width="1" height="1" alt="">]]></content:encoded>
        </item>
        <item>
            <title><![CDATA[Graph Neural Networks (GNNs)]]></title>
            <link>https://halil7hatun.medium.com/graph-neural-networks-gnns-1f463df4bb77?source=rss-a71bf613874c------2</link>
            <guid isPermaLink="false">https://medium.com/p/1f463df4bb77</guid>
            <category><![CDATA[gnn]]></category>
            <category><![CDATA[neural-networks]]></category>
            <category><![CDATA[graph]]></category>
            <dc:creator><![CDATA[Halil İbrahim Hatun]]></dc:creator>
            <pubDate>Wed, 30 Aug 2023 10:51:33 GMT</pubDate>
            <atom:updated>2023-08-30T11:14:05.269Z</atom:updated>
            <content:encoded><![CDATA[<h4>GCN and GAT Python Implementation</h4><p>In the world of artificial intelligence, where data is the lifeblood that fuels innovation, there’s a particular type of data that often challenges traditional machine learning techniques, data with inherent relationships and connections. This is where Graph Neural Networks (GNNs) step in, revolutionizing the way we process and understand data structured as graphs.</p><p>As a unique non-Euclidean data structure for machine learning, graph analysis focuses on tasks such as node classification, link prediction, and clustering. In a world that can be visualized as networks of entities and their interactions, be it social networks, molecular structures, recommendation systems, or citation networks, traditional machine learning algorithms often fall short. Conventional models are designed for independent and identically distributed data, struggling to capture the nuances of interconnectedness and dependencies that define these complex relationships.</p><figure><img alt="" src="https://cdn-images-1.medium.com/max/905/1*R3j9VIb6WYKlAdqEJcB9xw.png" /><figcaption>Fig. 1. Left: image in Euclidean space. Right: graph in non-Euclidean space</figcaption></figure><p>Graph Neural Networks have emerged as a revolutionary approach, garnering significant attention in recent years due to their remarkable capacity to extract invaluable insights from interconnected data. Unlike traditional methods that focus solely on individual data points in isolation, GNNs operate by delving deep into the intricate tapestry of relationships. This unique approach empowers us to not only analyze data points but to unravel the concealed structures and intricate patterns that lay beneath the surface.</p><p>In this blog, our focus will be on dissecting the fundamental operational architecture of Graph Neural Networks (GNNs), coupled with a practical exploration of their implementation using the Python programming language. Through this journey, we aim to demystify the inner workings of GNNs, providing you with a clear understanding of how these networks navigate and make sense of interconnected data. So, let’s embark on a guided tour of the core concepts behind GNNs, while also rolling up our sleeves for some hands-on Python coding to bring these concepts to life.</p><h4>Getting Started</h4><p>Let’s examine the Planetoid Cora dataset and apply Graph Neural Networks (GNNs) using PyTorch. This practical exploration will provide us with hands-on experience working with real-world graph data.</p><p>The <strong>Planetoid dataset</strong> combines citation networks from Cora, CiteSeer, and PubMed. Nodes, representing documents, feature 1433-dimensional bag-of-words vectors, interconnected by citations. With 7 classes, the challenge involves training a model to predict missing labels using the web of connections.</p><pre>from torch_geometric.datasets import Planetoid<br>from torch_geometric.transforms import NormalizeFeatures<br><br>dataset = Planetoid(root=&#39;data/Planetoid&#39;, name=&#39;Cora&#39;, transform=NormalizeFeatures())<br><br>print(f&#39;Dataset: {dataset}:&#39;)<br>print(&#39;======================&#39;)<br>print(f&#39;Number of graphs: {len(dataset)}&#39;)<br>print(f&#39;Number of features: {dataset.num_features}&#39;)<br>print(f&#39;Number of classes: {dataset.num_classes}&#39;)<br><br>data = dataset[0]<br>print(data)</pre><pre>Output =<br><br>Dataset: Cora():<br>======================<br>Number of graphs: 1<br>Number of features: 1433<br>Number of classes: 7<br>Data(x=[2708, 1433], edge_index=[2, 10556], y=[2708], train_mask=[2708], val_mask=[2708], test_mask=[2708])</pre><p>Prior to initiating the training process, let’s analyze the data distribution by visually representing it within the second and third dimensions.</p><iframe src="https://cdn.embedly.com/widgets/media.html?src=https%3A%2F%2Fplotly.com%2F%7Ehalilibr.09%2F14.embed%3Fautosize%3Dtrue&amp;display_name=Plotly&amp;url=http%3A%2F%2Fchart-studio.plotly.com%2F%7Ehalilibr.09%2F14%2F&amp;image=http%3A%2F%2Fchart-studio.plotly.com%2Fstatic%2Fwebapp%2Fimages%2Fplotly-logo.8d56a320dbb8.png&amp;key=a19fcc184b9711e1b4764040d3dc5c07&amp;type=text%2Fhtml&amp;schema=plotly" width="600" height="400" frameborder="0" scrolling="no"><a href="https://medium.com/media/a0d49fc5a32ad377c9f2b1c7d83e5cc7/href">https://medium.com/media/a0d49fc5a32ad377c9f2b1c7d83e5cc7/href</a></iframe><iframe src="https://cdn.embedly.com/widgets/media.html?src=https%3A%2F%2Fplotly.com%2F%7Ehalilibr.09%2F21.embed%3Fautosize%3Dtrue&amp;display_name=Plotly&amp;url=https%3A%2F%2Fchart-studio.plotly.com%2F%7Ehalilibr.09%2F21%2F&amp;image=https%3A%2F%2Fchart-studio.plotly.com%2Fstatic%2Fwebapp%2Fimages%2Fplotly-logo.8d56a320dbb8.png&amp;key=a19fcc184b9711e1b4764040d3dc5c07&amp;type=text%2Fhtml&amp;schema=plotly" width="600" height="400" frameborder="0" scrolling="no"><a href="https://medium.com/media/e897df33cb20a4bd50588c47b8a49f5a/href">https://medium.com/media/e897df33cb20a4bd50588c47b8a49f5a/href</a></iframe><p>Just one more step to go. Now, it’s time to write the classes and methods that will be employed in the upcoming sections:</p><pre>class BuildModel():<br>    def __init__(self, model, lr = 0.01):<br>        self.model = model<br>        self.optimizer = torch.optim.Adam(model.parameters(), lr=lr, weight_decay=5e-4)<br>        self.criterion = torch.nn.CrossEntropyLoss()<br><br>    def single_train(self):<br>          self.model.train()<br>          self.optimizer.zero_grad()<br>          out = self.model(data.x, data.edge_index)<br>          loss = self.criterion(out[data.train_mask], data.y[data.train_mask])<br>          loss.backward()<br>          self.optimizer.step()<br>          return loss<br><br>    def test(self):<br>          self.model.eval()<br>          out = self.model(data.x, data.edge_index)<br>          pred = out.argmax(dim=1)<br>          test_correct = pred[data.test_mask] == data.y[data.test_mask]<br>          test_acc = int(test_correct.sum()) / int(data.test_mask.sum())<br>          return test_acc<br>    <br>    def train_with_early_stopping(self, epochs=150, patience=10, plot = False, plot_name = None):<br>        <br>        history = {<br>            &#39;epoch&#39;: [],<br>            &#39;loss&#39;: [],<br>            &#39;test_acc&#39;: []<br>        }<br>        <br>        best_test_acc = 0.0<br>        epochs_without_improvement = 0<br>    <br>        for epoch in range(1, epochs + 1):<br>            loss = self.single_train()<br>            test_acc = self.test()<br>    <br>            print(f&#39;Epoch: {epoch:03d}, Loss: {loss:.4f}, Test Acc: {test_acc:.4f}&#39;)<br>    <br>            history[&#39;epoch&#39;].append(epoch)<br>            history[&#39;loss&#39;].append(loss.item())<br>            history[&#39;test_acc&#39;].append(test_acc)<br>    <br>            if test_acc &gt; best_test_acc:<br>                best_test_acc = test_acc<br>                epochs_without_improvement = 0<br>            else:<br>                epochs_without_improvement += 1<br>    <br>            if epochs_without_improvement &gt;= patience:<br>                print(f&#39;Early stopping triggered at epoch {epoch}.&#39;)<br>                break<br>    <br>        if plot:<br>            self.history_plot(history, plot_name)<br>    <br>        return history<br>    <br>    def train(self, epoch = 100):<br>        for epoch in range(1, epochs + 1):<br>            loss = self.single_train()<br>            test_acc = self.test()<br>    <br>            print(f&#39;Epoch: {epoch:03d}, Loss: {loss:.4f}, Test Acc: {test_acc:.4f}&#39;)<br>    <br>    <br>    def history_plot(self, history, plot_name):<br>        fig = go.Figure()<br>        fig.add_trace(go.Scatter(x=history[&#39;epoch&#39;], y=history[&#39;loss&#39;], mode=&#39;lines&#39;, name=&#39;Training Loss&#39;))<br>        fig.add_trace(go.Scatter(x=history[&#39;epoch&#39;], y=history[&#39;test_acc&#39;], mode=&#39;lines&#39;, name=&#39;Test Accuracy&#39;))<br>        fig.update_layout(<br>            title=&#39;Training History&#39;,<br>            xaxis_title=&#39;Epoch&#39;,<br>            yaxis_title=&#39;Value&#39;,<br>            legend=dict(x=0, y=1),<br>            template=&#39;plotly_dark&#39;<br>        )<br>        fig.show()</pre><p>Following that, we proceed to define our visualization methods:</p><pre>def visualize_2d(h, color, name = &#39;2D_dist_plot&#39;):<br>    z = TSNE(n_components=2).fit_transform(h.detach().cpu().numpy())<br><br>    fig = px.scatter(x=z[:, 0], y=z[:, 1], color=color, color_continuous_scale=&quot;magma&quot;)<br>    fig.update_layout(<br>        xaxis_title=&quot;Dimension 1&quot;,<br>        yaxis_title=&quot;Dimension 2&quot;,<br>        xaxis=dict(showgrid=False, zeroline=False, showticklabels=False),<br>        yaxis=dict(showgrid=False, zeroline=False, showticklabels=False),<br>        coloraxis_showscale=False,<br>        width=800,<br>        height=800<br>    )<br>    fig.show()<br> <br><br><br>def visualize_3d(h, color, name = &#39;3D_dist_plot&#39;):<br>    z = TSNE(n_components=3).fit_transform(h.detach().cpu().numpy())<br><br>    fig = px.scatter_3d(x=z[:, 0], y=z[:, 1], z=z[:, 2], color=color, color_continuous_scale=&quot;magma&quot;)<br>    fig.update_layout(<br>        scene=dict(<br>            xaxis_title=&quot;Dimension 1&quot;,<br>            yaxis_title=&quot;Dimension 2&quot;,<br>            zaxis_title=&quot;Dimension 3&quot;<br>        ),<br>        coloraxis_showscale=False,<br>        width=800,<br>        height=800<br>    )<br>    fig.show()</pre><p>Now let’s delve into the exciting world of Graph Neural Networks (GNNs) and explore how we can use them to train our dataset.</p><h4>1. Graph Convolutional Network (GCN)</h4><p>A Graph Convolutional Network (GCN) is a Graph Neural Network (GNN) variant tailored for processing graph-structured data. Unlike Convolutional Neural Networks (CNNs), which excel at grid-like data (such as images), GCNs specialize in datasets where entities are connected through edges, forming networks.</p><p>While CNNs leverage local patterns in grid data, GCNs harness the interconnectedness of graph data. They propagate and aggregate information across neighboring nodes, updating each node’s representation based on its neighbors’ features. This contextual understanding enables GCNs to capture relationships and patterns.</p><figure><img alt="" src="https://cdn-images-1.medium.com/max/760/1*rnCtQd6MJvAMPIGm1prF1Q.png" /><figcaption>Fig. 2. Comparison between GCN and CNN</figcaption></figure><p>The notable shift in GCNs lies in adapting the convolutional operation for graphs. This operation computes weighted averages of neighboring node features, generating central node representations. As these layers stack, GCNs learn abstract features while considering the overall graph context.</p><p>In the realm of traditional neural networks, linear layers play a pivotal role by applying a fundamental linear transformation to the input data. This transformation holds the power to metamorphose the input features denoted as x into a fresh realm of hidden vectors, which are symbolized as h. This enchanting metamorphosis is orchestrated through the agency of a weight matrix 𝐖, an omnipresent protagonist in this neural narrative. Disregarding the role of biases for this moment of elucidation, we can elegantly express this process as follows:</p><figure><img alt="" src="https://cdn-images-1.medium.com/max/223/1*4Bg9OCuqnOL2LimePJPHsg.png" /><figcaption>Fig. 3. Linear relationship formula</figcaption></figure><p>One way to enhance our <strong>node representations</strong> is by combining their features with those of their neighboring nodes. This process, known as <strong>convolution or neighborhood aggregation</strong>, involves incorporating information from the immediate neighborhood of a node, including the node itself (denoted as Ñ).</p><figure><img alt="" src="https://cdn-images-1.medium.com/max/290/1*MijIhOsGdOUHMMleruzH5g.png" /><figcaption>Fig. 4. Aggregated linear relationship formula</figcaption></figure><p>Unlike CNN filters, in Graph Neural Networks (GNNs), our weight matrix 𝐖 is singular and shared across all nodes. However, a challenge arises due to the variable number of neighbors nodes can have, unlike the fixed grid structure of pixels in CNNs. This distinction is a key aspect of GNNs that enables them to <strong>effectively operate on graph-structured data</strong>.</p><p>How should we handle situations in which a single node is connected to only one neighbor, while another node has 700 connections? If we were to merely combine the feature vectors, the resultant embedding ‘h’ would be disproportionately influenced by the 700-neighbor node. To ensure uniform value ranges across all nodes and enable meaningful comparisons between them, we can <strong>normalize </strong>the output according to the nodes’ <strong>degrees </strong>(the count of connections each node possesses).</p><figure><img alt="" src="https://cdn-images-1.medium.com/max/397/1*OfGtI9vEiMn97FVjdkUxbQ.png" /><figcaption>Fig. 5. Normalized and aggregated linear relationship formula</figcaption></figure><p>The researchers noted that attributes originating from nodes with a high degree of neighbors spread more effortlessly compared to those from relatively secluded nodes. In order to counterbalance this phenomenon, they proposed the idea of assigning greater weights to attributes from nodes with limited neighbors. This strategy aims to harmonize the impact across the entire node network. This process can be expressed as follows:</p><figure><img alt="" src="https://cdn-images-1.medium.com/max/547/1*K2JFO4FUn7ht7FPqWd-CoA.png" /><figcaption>Fig. 6. Harmonized formula</figcaption></figure><p>Let’s implement the concepts we’re discussing in Python using PyTorch for a deeper understanding.</p><p>First, let’s build the GCN model using PyTorch:</p><pre>from torch_geometric.nn import GCNConv<br><br>class GCN(torch.nn.Module):<br>    def __init__(self, hidden_channels):<br>        super().__init__()<br>        torch.manual_seed(1234567)<br>        self.conv1 = GCNConv(dataset.num_features, hidden_channels)<br>        self.conv2 = GCNConv(hidden_channels, dataset.num_classes)<br><br>    def forward(self, x, edge_index):<br>        x = self.conv1(x, edge_index)<br>        x = x.relu()<br>        x = F.dropout(x, p=0.5, training=self.training)<br>        x = self.conv2(x, edge_index)<br>        return x<br><br>model = GCN(hidden_channels=16)<br>print(model)</pre><pre>&gt;&gt;&gt;GCN(<br>&gt;&gt;&gt;  (conv1): GCNConv(1433, 16)<br>&gt;&gt;&gt;  (conv2): GCNConv(16, 7)<br>&gt;&gt;&gt;)</pre><p>Once we have built our model, we can move on to training and visualizing it:</p><pre>model = GCN(hidden_channels=16)<br><br>built_model = BuildModel(model)<br></pre><pre>Epoch: 001, Loss: 1.9463, Test Acc: 0.2700<br>Epoch: 002, Loss: 1.9409, Test Acc: 0.2910<br>Epoch: 003, Loss: 1.9343, Test Acc: 0.2910<br>Epoch: 004, Loss: 1.9275, Test Acc: 0.3210<br>Epoch: 005, Loss: 1.9181, Test Acc: 0.3630<br>Epoch: 006, Loss: 1.9086, Test Acc: 0.4120<br>Epoch: 007, Loss: 1.9015, Test Acc: 0.4010<br>Epoch: 008, Loss: 1.8933, Test Acc: 0.4020<br>Epoch: 009, Loss: 1.8808, Test Acc: 0.4180<br>Epoch: 010, Loss: 1.8685, Test Acc: 0.4470<br>Epoch: 011, Loss: 1.8598, Test Acc: 0.4680<br>Epoch: 012, Loss: 1.8482, Test Acc: 0.5180<br>Epoch: 013, Loss: 1.8290, Test Acc: 0.5440<br>Epoch: 014, Loss: 1.8233, Test Acc: 0.5720<br>Epoch: 015, Loss: 1.8057, Test Acc: 0.5910<br>Epoch: 016, Loss: 1.7966, Test Acc: 0.6080<br>Epoch: 017, Loss: 1.7825, Test Acc: 0.6300<br>Epoch: 018, Loss: 1.7617, Test Acc: 0.6450<br>Epoch: 019, Loss: 1.7491, Test Acc: 0.6520<br>Epoch: 020, Loss: 1.7310, Test Acc: 0.6560<br>Epoch: 021, Loss: 1.7147, Test Acc: 0.6570<br>Epoch: 022, Loss: 1.7056, Test Acc: 0.6640<br>Epoch: 023, Loss: 1.6954, Test Acc: 0.6770<br>Epoch: 024, Loss: 1.6697, Test Acc: 0.6950<br>Epoch: 025, Loss: 1.6538, Test Acc: 0.7140<br>Epoch: 026, Loss: 1.6312, Test Acc: 0.7150<br>Epoch: 027, Loss: 1.6161, Test Acc: 0.7170<br>Epoch: 028, Loss: 1.5899, Test Acc: 0.7230<br>Epoch: 029, Loss: 1.5711, Test Acc: 0.7220<br>Epoch: 030, Loss: 1.5576, Test Acc: 0.7210<br>Epoch: 031, Loss: 1.5393, Test Acc: 0.7280<br>Epoch: 032, Loss: 1.5137, Test Acc: 0.7370<br>Epoch: 033, Loss: 1.4948, Test Acc: 0.7380<br>Epoch: 034, Loss: 1.4913, Test Acc: 0.7430<br>Epoch: 035, Loss: 1.4698, Test Acc: 0.7510<br>Epoch: 036, Loss: 1.3998, Test Acc: 0.7570<br>Epoch: 037, Loss: 1.4041, Test Acc: 0.7600<br>Epoch: 038, Loss: 1.3761, Test Acc: 0.7640<br>Epoch: 039, Loss: 1.3631, Test Acc: 0.7700<br>Epoch: 040, Loss: 1.3258, Test Acc: 0.7800<br>Epoch: 041, Loss: 1.3030, Test Acc: 0.7810<br>Epoch: 042, Loss: 1.3119, Test Acc: 0.7760<br>Epoch: 043, Loss: 1.2519, Test Acc: 0.7760<br>Epoch: 044, Loss: 1.2530, Test Acc: 0.7790<br>Epoch: 045, Loss: 1.2492, Test Acc: 0.7800<br>Epoch: 046, Loss: 1.2205, Test Acc: 0.7790<br>Epoch: 047, Loss: 1.2037, Test Acc: 0.7850<br>Epoch: 048, Loss: 1.1571, Test Acc: 0.7900<br>Epoch: 049, Loss: 1.1700, Test Acc: 0.7920<br>Epoch: 050, Loss: 1.1296, Test Acc: 0.7940<br>Epoch: 051, Loss: 1.0860, Test Acc: 0.7930<br>Epoch: 052, Loss: 1.1080, Test Acc: 0.7910<br>Epoch: 053, Loss: 1.0564, Test Acc: 0.7930<br>Epoch: 054, Loss: 1.0157, Test Acc: 0.7930<br>Epoch: 055, Loss: 1.0362, Test Acc: 0.7920<br>Epoch: 056, Loss: 1.0328, Test Acc: 0.7980<br>Epoch: 057, Loss: 1.0058, Test Acc: 0.8000<br>Epoch: 058, Loss: 0.9865, Test Acc: 0.7970<br>Epoch: 059, Loss: 0.9667, Test Acc: 0.8010<br>Epoch: 060, Loss: 0.9741, Test Acc: 0.8000<br>Epoch: 061, Loss: 0.9769, Test Acc: 0.8030<br>Epoch: 062, Loss: 0.9122, Test Acc: 0.8040<br>Epoch: 063, Loss: 0.8993, Test Acc: 0.8050<br>Epoch: 064, Loss: 0.8769, Test Acc: 0.8050<br>Epoch: 065, Loss: 0.8575, Test Acc: 0.8060<br>Epoch: 066, Loss: 0.8897, Test Acc: 0.8030<br>Epoch: 067, Loss: 0.8312, Test Acc: 0.8060<br>Epoch: 068, Loss: 0.8262, Test Acc: 0.8030<br>Epoch: 069, Loss: 0.8511, Test Acc: 0.8070<br>Epoch: 070, Loss: 0.7711, Test Acc: 0.8070<br>Epoch: 071, Loss: 0.8012, Test Acc: 0.8080<br>Epoch: 072, Loss: 0.7529, Test Acc: 0.8080<br>Epoch: 073, Loss: 0.7525, Test Acc: 0.8070<br>Epoch: 074, Loss: 0.7689, Test Acc: 0.8110<br>Epoch: 075, Loss: 0.7553, Test Acc: 0.8140<br>Epoch: 076, Loss: 0.7032, Test Acc: 0.8120<br>Epoch: 077, Loss: 0.7326, Test Acc: 0.8110<br>Epoch: 078, Loss: 0.7122, Test Acc: 0.8120<br>Epoch: 079, Loss: 0.7090, Test Acc: 0.8110<br>Epoch: 080, Loss: 0.6755, Test Acc: 0.8130<br>Epoch: 081, Loss: 0.6666, Test Acc: 0.8070<br>Epoch: 082, Loss: 0.6679, Test Acc: 0.8080<br>Epoch: 083, Loss: 0.7037, Test Acc: 0.8100<br>Epoch: 084, Loss: 0.6752, Test Acc: 0.8070<br>Epoch: 085, Loss: 0.6266, Test Acc: 0.8100<br>Early stopping triggered at epoch 85.</pre><iframe src="https://cdn.embedly.com/widgets/media.html?src=https%3A%2F%2Fplotly.com%2F%7Ehalilibr.09%2F30.embed%3Fautosize%3Dtrue&amp;display_name=Plotly&amp;url=https%3A%2F%2Fchart-studio.plotly.com%2F%7Ehalilibr.09%2F30%2F&amp;image=https%3A%2F%2Fchart-studio.plotly.com%2Fstatic%2Fwebapp%2Fimages%2Fplotly-logo.8d56a320dbb8.png&amp;key=a19fcc184b9711e1b4764040d3dc5c07&amp;type=text%2Fhtml&amp;schema=plotly" width="600" height="400" frameborder="0" scrolling="no"><a href="https://medium.com/media/ae0275e81de953082e24d5fe530cf132/href">https://medium.com/media/ae0275e81de953082e24d5fe530cf132/href</a></iframe><iframe src="https://cdn.embedly.com/widgets/media.html?src=https%3A%2F%2Fplotly.com%2F%7Ehalilibr.09%2F25.embed%3Fautosize%3Dtrue&amp;display_name=Plotly&amp;url=https%3A%2F%2Fchart-studio.plotly.com%2F%7Ehalilibr.09%2F25%2F&amp;image=https%3A%2F%2Fchart-studio.plotly.com%2Fstatic%2Fwebapp%2Fimages%2Fplotly-logo.8d56a320dbb8.png&amp;key=a19fcc184b9711e1b4764040d3dc5c07&amp;type=text%2Fhtml&amp;schema=plotly" width="600" height="400" frameborder="0" scrolling="no"><a href="https://medium.com/media/7da5abb38246270e1ea4ef87fbd79af2/href">https://medium.com/media/7da5abb38246270e1ea4ef87fbd79af2/href</a></iframe><iframe src="https://cdn.embedly.com/widgets/media.html?src=https%3A%2F%2Fplotly.com%2F%7Ehalilibr.09%2F23.embed%3Fautosize%3Dtrue&amp;display_name=Plotly&amp;url=https%3A%2F%2Fchart-studio.plotly.com%2F%7Ehalilibr.09%2F23%2F&amp;image=https%3A%2F%2Fchart-studio.plotly.com%2Fstatic%2Fwebapp%2Fimages%2Fplotly-logo.8d56a320dbb8.png&amp;key=a19fcc184b9711e1b4764040d3dc5c07&amp;type=text%2Fhtml&amp;schema=plotly" width="600" height="400" frameborder="0" scrolling="no"><a href="https://medium.com/media/8529447c49b09194d8463b55b6b1e110/href">https://medium.com/media/8529447c49b09194d8463b55b6b1e110/href</a></iframe><h4><strong>2. Graph Attention Networks (GAT)</strong></h4><p>GAT stands for Graph Attention Network, and it’s a type of Graph Neural Network (GNN) that has gained significant attention due to its effectiveness in modeling relationships within graph-structured data. GAT was introduced by Velickovic et al. in their 2018 paper “<a href="https://arxiv.org/pdf/1710.10903.pdf">Graph Attention Networks.</a>”</p><p>GAT addresses one of the key challenges in GNNs, which is how to effectively aggregate information from neighboring nodes in a graph while assigning different levels of importance to different neighbors. Traditional GNNs, such as Graph Convolutional Networks (GCNs), use fixed aggregation schemes that treat all neighbors equally. GAT, on the other hand, introduces the concept of attention mechanisms into the aggregation process, allowing <strong>each node to dynamically weigh the importance</strong> of its neighbors’ information.</p><h4>GAT Layer</h4><p>In this step, a shared, linear transformation takes center stage, embodied by the matrix W with dimensions (F’, F). This transformation is aptly named “shared” because every individual node undergoes the same W matrix-based transformation.</p><p>The mission here is to harmonize the dimensionality of node features. Originally of dimension F, these features are transformed to a uniform dimensionality of F’. This transformation is systematically applied to all nodes in node i’s neighborhood, encompassing node i itself.</p><figure><img alt="" src="https://cdn-images-1.medium.com/max/159/1*w4soQpW5YgjACznJlQEyAQ.png" /><figcaption>Fig. 7. Element-wise scalar multiplication formula</figcaption></figure><p>During this process, the embedding representation h_i of the target node i is fused with the embeddings of its immediate neighbors. Each pairing is then combined and transformed using matrix W^a, which is characterized by its dimensions (2F’, F’) here, F’ might stay the same or differ from the previous stage based on a hyperparameter.</p><p>The central aim here is to facilitate a collective learning of attention between these pairs of nodes, bypassing the specifics of the graph structure.</p><figure><img alt="" src="https://cdn-images-1.medium.com/max/216/1*jDgLIGoqZkRzseNYj17vLg.png" /><figcaption>Fig. 8. Interaction formula</figcaption></figure><p>Here, each intermediate attention scalar comes to life through a non-linear activation, denoted as σ. In the GAT research, the authors opt for <strong>LeakyReLU </strong>as their chosen non-linear activation function.</p><p>Wrapping things up, the energized intermediate attention scalars flow through a <strong>softmax </strong>layer. This transformation imbues the attention coefficients with the properties of a probability distribution.</p><p>In essence, this phase centers on the normalization of attention coefficients, aligning them for further processing.</p><figure><img alt="" src="https://cdn-images-1.medium.com/max/389/1*xZVNgjocUOKUHoqflZsuwA.png" /><figcaption>Fig. 9. GAT normalized formula 1</figcaption></figure><figure><img alt="" src="https://cdn-images-1.medium.com/max/496/1*jp5RcC7ZOHDePj7DXLQ1iA.png" /><figcaption>Fig. 10. GAT normalized formula 2</figcaption></figure><p>The attention mechanism utilized by our model is defined by a parametric weight vector. This weight vector is associated with a LeakyReLU activation function, contributing to the overall functionality of the mechanism.</p><p>Consider an illustrative example of multihead attention with the scenario where K equals 3 heads. In this case, we focus on node 1 within its local neighborhood. Distinct styles and colors of arrows symbolize separate computations of attention, each operating independently. The outcomes from each head’s attention calculation are then combined by means of concatenation or averaging. This fusion of features results in the final representation denoted as h1&#39;.</p><figure><img alt="" src="https://cdn-images-1.medium.com/max/900/1*uWHgcy40sxNx6VZjrnD0rw.png" /><figcaption>Fig. 11. GAT Architecture 1</figcaption></figure><p>Put differently, employing this approach allows us to observe the operational dynamics of the Graph Attention Network (GAT).</p><figure><img alt="" src="https://cdn-images-1.medium.com/max/904/1*f2LiYOkb6SaNNRjCkYvF9w.png" /><figcaption>Fig. 12. GAT Architecture 2</figcaption></figure><p>Let’s put these concepts we discussed into practice using PyTorch in Python to develop a deeper understanding.</p><pre>from torch_geometric.nn import GATConv<br><br>class GAT(torch.nn.Module):<br>    def __init__(self, hidden_channels, heads):<br>        super().__init__()<br>        torch.manual_seed(1234567)<br>        self.conv1 = GATConv(dataset.num_features, hidden_channels,heads)<br>        self.conv2 = GATConv(heads*hidden_channels, dataset.num_classes,heads)<br><br>    def forward(self, x, edge_index):<br>        x = F.dropout(x, p=0.6, training=self.training)<br>        x = self.conv1(x, edge_index)<br>        x = F.elu(x)<br>        x = F.dropout(x, p=0.6, training=self.training)<br>        x = self.conv2(x, edge_index)<br>        return x<br><br>model = GAT(hidden_channels=8, heads=8)<br>print(model)</pre><pre>GAT(<br>  (conv1): GATConv(1433, 8, heads=8)<br>  (conv2): GATConv(64, 7, heads=8)<br>)</pre><p>Now, we should proceed with training and also visualize our GAT model:</p><pre>buildGANModel = BuildModel(model, lr = 0.05)<br>history = buildGANModel.train_with_early_stopping(plot = True, plot_name = &#39;gat_history_plot&#39;)</pre><pre>Epoch: 001, Loss: 0.6564, Test Acc: 0.7780<br>Epoch: 002, Loss: 0.6479, Test Acc: 0.7780<br>Epoch: 003, Loss: 0.6208, Test Acc: 0.8100<br>Epoch: 004, Loss: 0.5841, Test Acc: 0.8180<br>Epoch: 005, Loss: 0.5878, Test Acc: 0.8170<br>Epoch: 006, Loss: 0.5477, Test Acc: 0.8060<br>Epoch: 007, Loss: 0.4723, Test Acc: 0.7950<br>Epoch: 008, Loss: 0.4452, Test Acc: 0.8000<br>Epoch: 009, Loss: 0.4338, Test Acc: 0.8070<br>Epoch: 010, Loss: 0.4332, Test Acc: 0.8100<br>Epoch: 011, Loss: 0.4218, Test Acc: 0.8150<br>Epoch: 012, Loss: 0.3900, Test Acc: 0.8150<br>Epoch: 013, Loss: 0.4190, Test Acc: 0.8160<br>Epoch: 014, Loss: 0.4238, Test Acc: 0.8080<br>Early stopping triggered at epoch 14.</pre><iframe src="https://cdn.embedly.com/widgets/media.html?src=https%3A%2F%2Fplotly.com%2F%7Ehalilibr.09%2F32.embed%3Fautosize%3Dtrue&amp;display_name=Plotly&amp;url=https%3A%2F%2Fchart-studio.plotly.com%2F%7Ehalilibr.09%2F32%2F&amp;image=https%3A%2F%2Fchart-studio.plotly.com%2Fstatic%2Fwebapp%2Fimages%2Fplotly-logo.8d56a320dbb8.png&amp;key=a19fcc184b9711e1b4764040d3dc5c07&amp;type=text%2Fhtml&amp;schema=plotly" width="600" height="400" frameborder="0" scrolling="no"><a href="https://medium.com/media/7a1d020343ba34dcb2e4563ec0fc39b8/href">https://medium.com/media/7a1d020343ba34dcb2e4563ec0fc39b8/href</a></iframe><iframe src="https://cdn.embedly.com/widgets/media.html?src=https%3A%2F%2Fplotly.com%2F%7Ehalilibr.09%2F34.embed%3Fautosize%3Dtrue&amp;display_name=Plotly&amp;url=https%3A%2F%2Fchart-studio.plotly.com%2F%7Ehalilibr.09%2F34%2F&amp;image=https%3A%2F%2Fchart-studio.plotly.com%2Fstatic%2Fwebapp%2Fimages%2Fplotly-logo.8d56a320dbb8.png&amp;key=a19fcc184b9711e1b4764040d3dc5c07&amp;type=text%2Fhtml&amp;schema=plotly" width="600" height="400" frameborder="0" scrolling="no"><a href="https://medium.com/media/98b5a6ba507b07c49b3e2622628d8fb9/href">https://medium.com/media/98b5a6ba507b07c49b3e2622628d8fb9/href</a></iframe><iframe src="https://cdn.embedly.com/widgets/media.html?src=https%3A%2F%2Fplotly.com%2F%7Ehalilibr.09%2F36.embed%3Fautosize%3Dtrue&amp;display_name=Plotly&amp;url=https%3A%2F%2Fchart-studio.plotly.com%2F%7Ehalilibr.09%2F36%2F&amp;image=https%3A%2F%2Fchart-studio.plotly.com%2Fstatic%2Fwebapp%2Fimages%2Fplotly-logo.8d56a320dbb8.png&amp;key=a19fcc184b9711e1b4764040d3dc5c07&amp;type=text%2Fhtml&amp;schema=plotly" width="600" height="400" frameborder="0" scrolling="no"><a href="https://medium.com/media/5441327b43edc6a15c31e690129f578a/href">https://medium.com/media/5441327b43edc6a15c31e690129f578a/href</a></iframe><p>Feel free to click <a href="https://github.com/Halil3509/Simple-Python-Works/blob/main/GNN/GNN_Medium_Blog.ipynb"><strong><em>here </em></strong></a>to access the entire code.</p><p>That concludes my blog. I trust that you found it impactful. I’m open to both positive and negative feedback, so please don’t hesitate to share your thoughts. Your feedback is eagerly anticipated. Stay committed and dedicated.</p><h3>References</h3><ul><li><a href="https://www.datacamp.com/tutorial/comprehensive-introduction-graph-neural-networks-gnns-tutorial">https://www.datacamp.com/tutorial/comprehensive-introduction-graph-neural-networks-gnns-tutorial</a></li><li><a href="https://www.youtube.com/watch?v=SnRfBfXwLuY">https://www.youtube.com/watch?v=SnRfBfXwLuY</a></li><li><a href="https://nabila-abraham.medium.com/ohmygraphs-graph-attention-networks-b7562289ae4b">https://nabila-abraham.medium.com/ohmygraphs-graph-attention-networks-b7562289ae4b</a></li><li><a href="https://towardsdatascience.com/graph-convolutional-networks-introduction-to-gnns-24b3f60d6c95">https://towardsdatascience.com/graph-convolutional-networks-introduction-to-gnns-24b3f60d6c95</a></li><li>Petar Veličković (2018). Graph Attention Networks. <a href="https://arxiv.org/pdf/1710.10903.pdf">https://arxiv.org/pdf/1710.10903.pdf</a></li><li>Jie Zhou (2020). Graph neural networks: A review of methods and applications. <a href="https://arxiv.org/ftp/arxiv/papers/1812/1812.08434.pdf">https://arxiv.org/ftp/arxiv/papers/1812/1812.08434.pdf</a></li></ul><img src="https://medium.com/_/stat?event=post.clientViewed&referrerSource=full_rss&postId=1f463df4bb77" width="1" height="1" alt="">]]></content:encoded>
        </item>
        <item>
            <title><![CDATA[K-Means]]></title>
            <link>https://halil7hatun.medium.com/k-means-7d48f4110bc?source=rss-a71bf613874c------2</link>
            <guid isPermaLink="false">https://medium.com/p/7d48f4110bc</guid>
            <category><![CDATA[k-means]]></category>
            <category><![CDATA[k-means-clustering]]></category>
            <category><![CDATA[elbow-method]]></category>
            <category><![CDATA[clustering]]></category>
            <category><![CDATA[unsupervised-learning]]></category>
            <dc:creator><![CDATA[Halil İbrahim Hatun]]></dc:creator>
            <pubDate>Thu, 22 Jun 2023 07:47:38 GMT</pubDate>
            <atom:updated>2023-07-03T14:39:11.805Z</atom:updated>
            <content:encoded><![CDATA[<h4>Understanding the basic K-means algorithm and applying</h4><h4>Clustering</h4><p>Clustering is a technique used in unsupervised machine learning to group similar data points together based on their inherent characteristics or similarities. It aims to find patterns or structures in the data without prior knowledge of the desired output or labels.</p><p>The goal of clustering is to divide a dataset into clusters or groups, where data points within the same cluster are more similar to each other than to those in other clusters. The similarity or dissimilarity between data points is typically measured using a distance or similarity metric, such as Euclidean distance or cosine similarity.</p><h4>K-Means</h4><p>K-means is a popular clustering algorithm that aims to partition a given dataset into k clusters. It is an iterative algorithm that alternates between two steps: assigning data points to the nearest cluster centroid and updating the cluster centroid based on the assigned data points.</p><p>Here’s a high-level overview of the k-means algorithm:</p><ol><li><strong>Initialization:</strong></li></ol><ul><li>Choose the value of k, the number of clusters.</li><li>Randomly initialize k centroids in the feature space.</li></ul><figure><img alt="" src="https://cdn-images-1.medium.com/max/420/1*vv3B30VAddFFG5d6JUN63w.png" /><figcaption>Randomly initialized k centroids plot</figcaption></figure><p>2. <strong>Assignment Step:</strong></p><ul><li>For each data point, calculate the distance to each centroid. For example, use the Euclidean Distance metric.</li></ul><figure><img alt="" src="https://cdn-images-1.medium.com/max/290/1*JVvy7x95i8dBAWOGk5-aAQ.png" /><figcaption>Euclidean Distance Formula</figcaption></figure><figure><img alt="" src="https://cdn-images-1.medium.com/max/437/1*1fs8onWBcBgfQiE18n47cA.png" /><figcaption>Calculate distance plot</figcaption></figure><ul><li>Assign the data point to the cluster with the nearest centroid.</li></ul><figure><img alt="" src="https://cdn-images-1.medium.com/max/422/1*dCmzZ3JbnqlEJnIpKbLL8w.png" /><figcaption>Assign the nearest centroid plot</figcaption></figure><p>3. <strong>Move Cluster Centroids Step:</strong></p><ul><li>Recalculate the centroids of each cluster by taking the mean of all data points assigned to that cluster.</li></ul><figure><img alt="" src="https://cdn-images-1.medium.com/max/221/1*4UQiabJikQHMIQh1AocdIA.png" /><figcaption>Mean formula</figcaption></figure><figure><img alt="" src="https://cdn-images-1.medium.com/max/413/1*26j4T2aYzLpwzeBW1pbD8g.png" /><figcaption>K-means move cluster centroids plot</figcaption></figure><p>4. <strong>Iteration:</strong></p><ul><li><strong>Repeat the assignment and move cluster centroids steps</strong> until convergence. Convergence occurs when either the centroids do not change significantly or a maximum number of iterations is reached.</li></ul><figure><img alt="" src="https://cdn-images-1.medium.com/max/600/1*psbhLB_qOFm7UdeYZoLgeA.gif" /><figcaption>K-means process</figcaption></figure><p>5. <strong>Output</strong>:</p><ul><li>The final output of the k-means algorithm is a set of k clusters, where each data point is assigned to one of the clusters.</li></ul><figure><img alt="" src="https://cdn-images-1.medium.com/max/407/1*VmaIQn0q4hLTwGISU4bcDQ.png" /><figcaption>Final output plot</figcaption></figure><p>Here’s a simplified pseudocode for the k-means algorithm:</p><iframe src="" width="0" height="0" frameborder="0" scrolling="no"><a href="https://medium.com/media/7d794aa62df730b42cb60415ee93575e/href">https://medium.com/media/7d794aa62df730b42cb60415ee93575e/href</a></iframe><p>Here’s the Python implementation code for the k-means algorithm:</p><iframe src="" width="0" height="0" frameborder="0" scrolling="no"><a href="https://medium.com/media/301163daaaabc40b92de87a4b196ee3b/href">https://medium.com/media/301163daaaabc40b92de87a4b196ee3b/href</a></iframe><h4>How can we decide what the “k” (number of clusters) value is?</h4><p>Some factors can challenge the efficacy of the final output of the K-means clustering algorithm, and one of them is finalizing the number of clusters (K). Selecting a lower number of clusters will result in underfitting while specifying a higher number of clusters can result in overfitting. Unfortunately, there is no definitive way to find the optimal number.</p><ol><li><strong>Elbow Method</strong></li></ol><p>It involves plotting the within-cluster sum of squares (WCSS) against the number of clusters and identifying the “elbow” point, which indicates the number of clusters where the rate of improvement in clustering quality starts to diminish significantly.</p><figure><img alt="" src="https://cdn-images-1.medium.com/max/715/1*eM1a6Y9n531GGMvRXIaaFQ.png" /><figcaption>Elbow Method Visualization</figcaption></figure><p>2. <strong>Silhouette method</strong></p><p>It provides a measure of how well each data point fits into its assigned cluster. The method calculates a silhouette coefficient for each data point, which is a value between -1 and 1.</p><p>The silhouette coefficient measures the cohesion and separation of a data point within its cluster. A coefficient close to +1 indicates that the data point is well-matched to its own cluster and poorly matched to neighboring clusters. A coefficient close to 0 suggests that the data point is on or very close to the decision boundary between neighboring clusters. A coefficient close to -1 indicates that the data point may have been assigned to the wrong cluster.</p><figure><img alt="" src="https://cdn-images-1.medium.com/max/821/1*-BMlkn5qIgw3pWuOTV8eYw.png" /><figcaption>Silhouette method notation</figcaption></figure><figure><img alt="" src="https://cdn-images-1.medium.com/max/592/1*3eoSg_l6Kj2_A1uoJ862eQ.png" /><figcaption>Silhouette Score Formula</figcaption></figure><p>3. <strong>Gap Statistic Method</strong></p><p>It compares the observed within-cluster dispersion to the expected dispersion under a reference null distribution. The larger the gap between the observed and expected dispersions, the more distinct the clusters are considered to be.</p><p>Gap statistic = log(observed dispersion) — log(expected dispersion)</p><p>The gap statistic method provides a quantitative measure for selecting the number of clusters by comparing the clustering results to a reference null distribution. It helps avoid overfitting or underfitting the data by providing a statistically guided approach to determining the appropriate number of clusters for a given dataset.</p><figure><img alt="" src="https://cdn-images-1.medium.com/max/939/1*isV15N6zrgcHjg48V2qo6g.png" /><figcaption>Gap Statistic Method Plot</figcaption></figure><p>Here’s the implementation code for the gap statistic method in Python:</p><iframe src="" width="0" height="0" frameborder="0" scrolling="no"><a href="https://medium.com/media/95282443b3f7bb53d60c49c78b0eac20/href">https://medium.com/media/95282443b3f7bb53d60c49c78b0eac20/href</a></iframe><h4>Customer Segmentation Example</h4><p>Now, let’s perform a customer segmentation operation on the Mall customer dataset in Python.</p><p>Let’s take a look at our data.</p><figure><img alt="" src="https://cdn-images-1.medium.com/max/506/1*hX6Rzz7dw8n3DMgUBSLeew.png" /><figcaption>df.head()</figcaption></figure><figure><img alt="" src="https://cdn-images-1.medium.com/max/295/1*H610VE3UGLlsCjN0tAjEig.png" /><figcaption>df.dtypes</figcaption></figure><p>As you see, In our data, there are three integer features (Age, Annual Income, and Spending Score) to perform clustering.</p><p>Let’s look at the distribution of these integer features.</p><iframe src="https://cdn.embedly.com/widgets/media.html?src=https%3A%2F%2Fplotly.com%2F%7Ehalilibr.09%2F4.embed%3Fautosize%3Dtrue&amp;display_name=Plotly&amp;url=https%3A%2F%2Fchart-studio.plotly.com%2F%7Ehalilibr.09%2F4%2F&amp;image=https%3A%2F%2Fchart-studio.plotly.com%2Fstatic%2Fwebapp%2Fimages%2Fplotly-logo.8d56a320dbb8.png&amp;key=a19fcc184b9711e1b4764040d3dc5c07&amp;type=text%2Fhtml&amp;schema=plotly" width="600" height="400" frameborder="0" scrolling="no"><a href="https://medium.com/media/97ef497eda89af31dab302fcc8216c8b/href">https://medium.com/media/97ef497eda89af31dab302fcc8216c8b/href</a></iframe><p>Now, We choose the value of ‘k ’ (the number of clusters).</p><ol><li><strong>Elbow Method</strong></li></ol><p>If we use the elbow method. The plot between SSE values and the number of clusters looks like this:</p><iframe src="https://cdn.embedly.com/widgets/media.html?src=https%3A%2F%2Fplotly.com%2F%7Ehalilibr.09%2F4.embed%3Fautosize%3Dtrue&amp;display_name=Plotly&amp;url=https%3A%2F%2Fchart-studio.plotly.com%2F%7Ehalilibr.09%2F4%2F&amp;image=https%3A%2F%2Fchart-studio.plotly.com%2Fstatic%2Fwebapp%2Fimages%2Fplotly-logo.8d56a320dbb8.png&amp;key=a19fcc184b9711e1b4764040d3dc5c07&amp;type=text%2Fhtml&amp;schema=plotly" width="600" height="400" frameborder="0" scrolling="no"><a href="https://medium.com/media/abcd54769d8a2687599abf1525d8c3ad/href">https://medium.com/media/abcd54769d8a2687599abf1525d8c3ad/href</a></iframe><p>By looking at the given graph, we see that the elbow point is 6, albeit difficult. But, as you can see, the elbow method did not help us determine the number of clusters.</p><p>2. <strong>Silhouette Method</strong></p><p>Let’s try the silhouette method.</p><iframe src="https://cdn.embedly.com/widgets/media.html?src=https%3A%2F%2Fplotly.com%2F%7Ehalilibr.09%2F4.embed%3Fautosize%3Dtrue&amp;display_name=Plotly&amp;url=https%3A%2F%2Fchart-studio.plotly.com%2F%7Ehalilibr.09%2F4%2F&amp;image=https%3A%2F%2Fchart-studio.plotly.com%2Fstatic%2Fwebapp%2Fimages%2Fplotly-logo.8d56a320dbb8.png&amp;key=a19fcc184b9711e1b4764040d3dc5c07&amp;type=text%2Fhtml&amp;schema=plotly" width="600" height="400" frameborder="0" scrolling="no"><a href="https://medium.com/media/abcd54769d8a2687599abf1525d8c3ad/href">https://medium.com/media/abcd54769d8a2687599abf1525d8c3ad/href</a></iframe><p>As you can see, the Silhouette method chose six clusters as well.</p><p>3. <strong>Gap Statistic Method</strong></p><p>As a last one, We try the gap statistic method.</p><iframe src="https://cdn.embedly.com/widgets/media.html?src=https%3A%2F%2Fplotly.com%2F%7Ehalilibr.09%2F4.embed%3Fautosize%3Dtrue&amp;display_name=Plotly&amp;url=https%3A%2F%2Fchart-studio.plotly.com%2F%7Ehalilibr.09%2F4%2F&amp;image=https%3A%2F%2Fchart-studio.plotly.com%2Fstatic%2Fwebapp%2Fimages%2Fplotly-logo.8d56a320dbb8.png&amp;key=a19fcc184b9711e1b4764040d3dc5c07&amp;type=text%2Fhtml&amp;schema=plotly" width="600" height="400" frameborder="0" scrolling="no"><a href="https://medium.com/media/abcd54769d8a2687599abf1525d8c3ad/href">https://medium.com/media/abcd54769d8a2687599abf1525d8c3ad/href</a></iframe><p>The gap method also chose 6 and 10 values.</p><p>Therefore, let’s apply our k-means algorithm to six clusters and visualize the results.</p><iframe src="https://cdn.embedly.com/widgets/media.html?src=https%3A%2F%2Fplotly.com%2F%7Ehalilibr.09%2F2.embed%3Fautosize%3Dtrue&amp;display_name=Plotly&amp;url=https%3A%2F%2Fchart-studio.plotly.com%2F%7Ehalilibr.09%2F2%2F&amp;image=https%3A%2F%2Fchart-studio.plotly.com%2Fstatic%2Fwebapp%2Fimages%2Fplotly-logo.8d56a320dbb8.png&amp;key=d04bfffea46d4aeda930ec88cc64b87c&amp;type=text%2Fhtml&amp;schema=plotly" width="600" height="400" frameborder="0" scrolling="no"><a href="https://medium.com/media/e41eeb313b131a1780039c88b1a246cd/href">https://medium.com/media/e41eeb313b131a1780039c88b1a246cd/href</a></iframe><h4>Conclusion</h4><p>In conclusion, k-means clustering is a powerful unsupervised machine learning algorithm that enables the grouping of data points into distinct clusters based on their similarity. By iteratively optimizing cluster centroids and assigning data points to the nearest centroid, k-means effectively partitions the data space. Its simplicity and efficiency make it a popular choice for various applications, such as customer segmentation, image compression, and anomaly detection. However, k-means has some limitations, such as sensitivity to initial centroid placement and dependence on the number of clusters (k) specified. Despite these challenges, understanding the principles and techniques behind k-means can greatly enhance our ability to extract meaningful insights from complex datasets and pave the way for more advanced clustering algorithms in the field of machine learning.</p><p>That’s all I’m going to say. Thank you for reading. If you want to look at the Customer Segmentation Notebook that I showed some plots of in this blog, You can achieve this through <a href="https://www.kaggle.com/code/halilbrahimhatun/customer-segmentation-k-means"><em>here</em></a>.</p><p>I am open to any comments, positive or negative. Don’t miss your comments.</p><p>Stay well.</p><h4>References</h4><ul><li><a href="https://neptune.ai/blog/k-means-clustering">https://neptune.ai/blog/k-means-clustering</a></li><li><a href="https://towardsdatascience.com/k-means-clustering-algorithm-applications-evaluation-methods-and-drawbacks-aa03e644b48a">https://towardsdatascience.com/k-means-clustering-algorithm-applications-evaluation-methods-and-drawbacks-aa03e644b48a</a></li><li><a href="https://www.kaggle.com/code/kushal1996/customer-segmentation-k-means-analysis">https://www.kaggle.com/code/kushal1996/customer-segmentation-k-means-analysis</a></li><li><a href="https://medium.com/@ozturkfemre/unsupervised-learning-determination-of-cluster-number-be8842cdb11">https://medium.com/@ozturkfemre/unsupervised-learning-determination-of-cluster-number-be8842cdb11</a></li></ul><img src="https://medium.com/_/stat?event=post.clientViewed&referrerSource=full_rss&postId=7d48f4110bc" width="1" height="1" alt="">]]></content:encoded>
        </item>
        <item>
            <title><![CDATA[Convolutional Neural Networks (CNNs)]]></title>
            <link>https://halil7hatun.medium.com/convolutional-neural-networks-cnns-95321b1f63ff?source=rss-a71bf613874c------2</link>
            <guid isPermaLink="false">https://medium.com/p/95321b1f63ff</guid>
            <category><![CDATA[computer-vision]]></category>
            <category><![CDATA[data-science]]></category>
            <category><![CDATA[deep-learning]]></category>
            <category><![CDATA[cnn]]></category>
            <category><![CDATA[convolutional-network]]></category>
            <dc:creator><![CDATA[Halil İbrahim Hatun]]></dc:creator>
            <pubDate>Fri, 16 Jun 2023 15:58:14 GMT</pubDate>
            <atom:updated>2023-06-17T05:20:30.532Z</atom:updated>
            <content:encoded><![CDATA[<h4>Understanding The Basic CNN Structure</h4><h4>What is the image?</h4><p>As almost everybody knows, an image is a combination of pixels arranged according to each pixel&#39;s color class. These pixels consist of a combination of <strong>three primary colors (RGB or BGR)</strong> according to certain weights.</p><figure><img alt="" src="https://cdn-images-1.medium.com/max/370/1*Z7DBqrqsDdoioDkv5on0EA.jpeg" /><figcaption>Figure 1. RGB images</figcaption></figure><p>The image you see above is a [10, 5, 3] (x, y, color type) size image. If this image were grayscale, the color type value would be 1 due to the fact that there is a color.</p><h4>What is the Convolutional Neural Network (CNN)?</h4><p>A CNN (Convolutional Neural Network) is a type of artificial neural network that is specifically designed for processing grid-like data, such as images or sequences. CNNs are widely used in computer vision tasks, including image classification, object detection, image segmentation, and more.</p><p>CNN image classifications takes an input image, process it and classify it under certain categories (Eg., Dog, Cat, Tiger, Lion). Computers sees an input image as array of pixels and it depends on the image resolution.</p><p>CNNs are structured with <strong>layers of interconnected artificial neurons, including convolutional layers, pooling layers, and fully connected layers</strong>.</p><figure><img alt="" src="https://cdn-images-1.medium.com/max/1024/1*cZKKqiASdIfCy_qvkZ0pSA.jpeg" /><figcaption>Figure 2. CNN Sample</figcaption></figure><p>The purpose of performing convolution is to extract features or patterns from input data. Convolution is a fundamental operation in various domains, including image processing, signal processing, and deep learning.</p><p>In image processing, convolution is used to apply filters or kernels to an image. These filters can <strong>enhance certain features</strong> of an image, such as edges or textures, or perform tasks like blurring or sharpening. By convolving an image with different filters, we can<strong> highlight specific characteristics and extract relevant information.</strong></p><p>Let’s dive into the structures of CNN</p><h4>Convolutional Layer</h4><p>Convolution is performed by sliding the kernel over the input image and computing the element-wise multiplication and summation at each position. This process is repeated for each position in the image, resulting in a new output image.</p><p>Here’s the formula for convolution:</p><figure><img alt="" src="https://cdn-images-1.medium.com/max/552/1*kiaudYdRkTriNGYhG4jvew.png" /><figcaption>Figure 3 Convolution demonstration</figcaption></figure><p>And in the example below, performing convolution with a 3x3 kernel</p><figure><img alt="" src="https://cdn-images-1.medium.com/max/960/1*D6iRfzDkz-sEzyjYoVZ73w.gif" /><figcaption>Figure 4. Convolution GIF</figcaption></figure><p>After all convolution processes, the obtained result is called a <strong>“Feature Map”.</strong></p><figure><img alt="" src="https://cdn-images-1.medium.com/max/580/1*xFbdZQZwAqsJWE9XeYPlCA.gif" /><figcaption>Figure 5. Convolution Neural Network GIF</figcaption></figure><p>Convolution of an image with different filters can perform operations such as edge detection, blur, and sharpening by applying filters. The below example shows various convolution images after applying different types of filters (Kernels).</p><p>Let’s examine the filtering processes on the picture of Tarkan Gözübüyük, the beloved bass guitarist of the Pentagram (Mezarkabul) band.</p><figure><img alt="" src="https://cdn-images-1.medium.com/max/1024/1*G0W5ms1rzA1czH2TpkLPog.png" /><figcaption>Figure 6. Filtering samples</figcaption></figure><h4>Strides</h4><p>Stride is the number of pixels that shift over the input matrix. When the stride is 1, then we move the filters 1 pixel at a time. When the stride is 2, then we move the filters to 2 pixels at a time, and so on. The below figure shows how convolution would work with a stride of 2.</p><figure><img alt="" src="https://cdn-images-1.medium.com/max/1024/1*HNsLVmAg5Q9dVu2iEFbjug.jpeg" /><figcaption>Figure 7. Strides</figcaption></figure><h4>Padding</h4><p>Padding is a technique used to preserve the spatial dimensions of the input image after convolution operations on a feature map. Padding involves adding extra pixels around the border of the input feature map before convolution.</p><p><strong>This can be done in two ways:</strong></p><ul><li><strong>Valid Padding</strong>: In the valid padding, no padding is added to the input feature map, and the output feature map is smaller than the input feature map. This is useful when we want to reduce the spatial dimensions of the feature maps.</li><li><strong>Same Padding</strong>: In the same padding, padding is added to the input feature map such that the size of the output feature map is the same as the input feature map. This is useful when we want to preserve the spatial dimensions of the feature maps.</li></ul><p>The number of pixels to be added for padding can be calculated based on the size of the kernel and the desired output of the feature map size. <strong>The most common padding value is zero-padding</strong>, which involves adding zeros to the borders of the input feature map.</p><p>Padding can help in reducing the loss of information at the borders of the input feature map and can improve the performance of the model. However, it also increases the computational cost of the convolution operation. Overall, padding is an important technique in CNNs that helps in preserving the spatial dimensions of the feature maps and can improve the performance of the model.</p><figure><img alt="" src="https://cdn-images-1.medium.com/max/805/1*R21aCTxF0nOo0D9_ax_bbw.png" /><figcaption>Figure 8. Padding</figcaption></figure><h4>Pooling</h4><p>Pooling in convolutional neural networks is a technique for generalizing features extracted by convolutional filters and helping the network recognize features independent of their location in the image.</p><figure><img alt="" src="https://cdn-images-1.medium.com/max/548/1*wY-3mbuhbCLz8mp4OGm6ig.png" /><figcaption>Figure 9. Pooling</figcaption></figure><h4>Activation Functions</h4><p>Activation functions are mathematical functions applied to the output of a neuron or a neural network layer to introduce non-linearity into the network. These functions determine the output of a neuron or a layer based on its weighted inputs and provide the capability for neural networks to model complex relationships between inputs and outputs.</p><p><strong>Sigmoid: </strong>For a binary classification in the CNN model.</p><figure><img alt="" src="https://cdn-images-1.medium.com/max/1024/1*SECTTtwbtaGCFuL32LcmYQ.png" /><figcaption>Figure 10. Sigmoid</figcaption></figure><p><strong>tanh: </strong>The tanh function is very similar to the sigmoid function. The only difference is that it is symmetric around the origin. The range of values, in this case, is from -1 to 1.</p><figure><img alt="" src="https://cdn-images-1.medium.com/max/356/1*k2PfuMtvwf1Ih3vQ9ITXtg.png" /><figcaption>Figure 11. tanh</figcaption></figure><p>S<strong>oftmax: </strong>It is used in multinomial logistic regression and is often used as the last activation function of a neural network to normalize the output of a network to a probability distribution over predicted output classes.</p><figure><img alt="" src="https://cdn-images-1.medium.com/max/737/1*R3AnH6R1r4cMw3cz_fgUXg.png" /><figcaption>Figure 12. Softmax</figcaption></figure><p><strong>ReLU: </strong>The main advantage of using the ReLU function over other activation functions is that it does not activate all the neurons at the same time.</p><figure><img alt="" src="https://cdn-images-1.medium.com/max/830/1*eQvvnl0lRj994V0wem6DTA.png" /><figcaption>Figure 13. ReLu</figcaption></figure><h4>Flatten Layer</h4><p>The flatten layer reshapes this input tensor into a one-dimensional array or vector, collapsing all the dimensions except the batch dimension. The output of the flatten layer has the shape (batch_size, flattened_size), where flattened_size is the product of the remaining dimensions after flattening.</p><figure><img alt="" src="https://cdn-images-1.medium.com/max/638/1*dv2z2i1UfC76mcNp8qWReA.png" /><figcaption>Figure 14. Flattening</figcaption></figure><h4><strong>Fully Connected Layer</strong></h4><p>A fully connected layer, also known as a dense layer or a fully connected neural layer, is a type of layer in a neural network where each neuron or node is connected to every neuron in the previous layer. In a fully connected layer, all the outputs from the previous layer serve as inputs to each neuron in the current layer.</p><figure><img alt="" src="https://cdn-images-1.medium.com/max/500/1*63sGPbvLLpvlD16hG1bvmA.gif" /><figcaption>Figure 15. Fully Connected Layer</figcaption></figure><h3>Conclusion</h3><ul><li>CNNs possess spatial invariance properties, meaning they can recognize patterns and objects regardless of their location in an image. This is achieved through the use of shared weights in the convolutional layers, allowing the network to detect similar patterns at different positions, making CNNs robust to translation and small variations in the input data.</li><li>CNNs can capture spatial relationships and exploit local patterns effectively, enhancing their ability to learn intricate structures in images.</li></ul><p>Thank you for reading. I hope the blog has been useful to you. I am open to any feedback. I look forward to your positive or negative feedback.</p><p>Stay well.</p><h3><strong>References</strong></h3><ul><li><a href="https://medium.com/@draj0718/convolutional-neural-networks-cnn-architectures-explained-716fb197b243">https://medium.com/@draj0718/convolutional-neural-networks-cnn-architectures-explained-716fb197b243</a></li><li><a href="https://medium.com/@tuncerergin/convolutional-neural-network-convnet-yada-cnn-nedir-nasil-calisir-97a0f5d34cad">https://medium.com/@tuncerergin/convolutional-neural-network-convnet-yada-cnn-nedir-nasil-calisir-97a0f5d34cad</a></li><li><a href="https://medium.com/@RaghavPrabhu/understanding-of-convolutional-neural-network-cnn-deep-learning-99760835f148">https://medium.com/@RaghavPrabhu/understanding-of-convolutional-neural-network-cnn-deep-learning-99760835f148</a></li><li>geeksforgeeks.org/cnn-introduction-to-padding/</li><li><a href="https://towardsdatascience.com/softmax-activation-function-explained-a7e1bc3ad60">https://towardsdatascience.com/softmax-activation-function-explained-a7e1bc3ad60</a></li></ul><img src="https://medium.com/_/stat?event=post.clientViewed&referrerSource=full_rss&postId=95321b1f63ff" width="1" height="1" alt="">]]></content:encoded>
        </item>
        <item>
            <title><![CDATA[Operating System: Threads]]></title>
            <link>https://halil7hatun.medium.com/operating-system-threads-cd8ceca1a8d1?source=rss-a71bf613874c------2</link>
            <guid isPermaLink="false">https://medium.com/p/cd8ceca1a8d1</guid>
            <category><![CDATA[amdahls-law]]></category>
            <category><![CDATA[threads]]></category>
            <category><![CDATA[operating-systems]]></category>
            <category><![CDATA[processes-and-threads]]></category>
            <category><![CDATA[linux]]></category>
            <dc:creator><![CDATA[Halil İbrahim Hatun]]></dc:creator>
            <pubDate>Thu, 18 May 2023 14:21:11 GMT</pubDate>
            <atom:updated>2023-05-18T14:21:11.304Z</atom:updated>
            <content:encoded><![CDATA[<p>Hello, this is Halil Ibrahim. Today&#39;s topic is threads. Threads are one of the best parts of an operating system. Before I start, I want to give a quote for this post.</p><p><strong>“The future belongs to those who believe in the beauty of their dreams.” <em>Eleanor Roosevelt</em></strong></p><figure><img alt="" src="https://cdn-images-1.medium.com/max/649/1*YFWurQ_LKeAsAHMOiaLWFw.png" /></figure><p>And... let’s start.</p><p>In operating systems, a thread is the smallest unit of execution within a process. A process is an instance of a program that is being executed by the operating system, and it can have one or more threads. Each thread has its own program counter, stack, and register set, which allow it to execute code independently of other threads in the same process.</p><p>Threads are used to achieve concurrency within a process, allowing multiple tasks to be executed simultaneously. By using threads, a program can perform multiple operations concurrently, such as listening for user input while processing data in the background. This can improve the performance and responsiveness of the program, as well as make it more efficient by allowing multiple operations to be executed on a single processor core.</p><figure><img alt="" src="https://cdn-images-1.medium.com/max/796/1*-kPCnnUbxtqZJk0-OUvNxA.png" /><figcaption>2. Single-threaded and multithreaded processes.</figcaption></figure><p>Let’s give a comprehensible example using threads. Almost everybody uses web servers. Have you thought about how web servers are managed?</p><p>A web server accepts client requests for web pages, images, sound, and so forth. A busy web server may have several (perhaps thousands of) clients concurrently accessing it. If the web server ran as a traditional single-threaded process, it would be able to service only one client at a time, and a client might have to wait a very long time for its request to be serviced.</p><p>One solution is to have the server run as a single process that accepts requests. When the server receives a request, it creates a separate process to service that request. In fact, this process-creation method was in common use before threads became popular. Process creation is time-consuming and resource intensive, however. If the new process will perform the same tasks as the existing process, why incur all that overhead? It is generally more efficient to use one process that contains multiple threads. If the web-server process is multithreaded, the server will create a separate thread that listens for client requests. When a request is made, rather than creating another process, the server creates a new thread to service the request and resumes listening for additional requests.</p><figure><img alt="" src="https://cdn-images-1.medium.com/max/806/1*8-SN_o94s6RAfo8-vmU1pg.png" /><figcaption>3. Multithreaded server architecture.</figcaption></figure><h4>Benefits of Multithread Programming</h4><ol><li><strong>Responsiveness</strong>: Multithreading an interactive application may allow a program to continue running even if part of it is blocked or is performing a lengthy operation, thereby increasing responsiveness to the user.</li><li><strong>Resource sharing:</strong> Processes can share resources only through techniques such as shared memory and message passing. Such techniques must be explicitly arranged by the programmer. However, threads share the memory and the resources of the process to which they belong by default. The benefit of sharing code and data is that it allows an application to have several different threads of activity within the same address space.</li><li><strong>Economy:</strong> Allocating memory and resources for process creation is costly. Because threads share the resources of the process to which they belong, it is more economical to create and context-switch threads. Therefore, general thread creation consumes less time and memory than process creation.</li><li><strong>Scalability:</strong> The benefits of multithreading can be even greater in a multiprocessor architecture, where threads may be running in parallel on different processing cores. A single-threaded process can run on only one processor, regardless of how many are available. We explore this issue further in the following section.</li></ol><h4>Multicore Programming</h4><p>Multicore programming refers to the process of developing software that can take advantage of the processing power provided by multiple processor cores within a single computer or device. In recent years, the number of processor cores in modern computers and devices has been steadily increasing, with some systems now having dozens or even hundreds of cores.</p><p>Multicore programming allows software to be written in a way that can distribute processing tasks across multiple cores, which can improve performance and reduce the time required to complete complex tasks. However, writing software that can effectively utilize multiple cores can be challenging, as it requires a different approach to programming than traditional single-threaded programs.</p><ul><li><strong>Parallelism</strong> implies a system can perform more than one task simultaneously</li><li><strong>Concurrency </strong>supports more than one task making progress</li></ul><figure><img alt="" src="https://cdn-images-1.medium.com/max/703/1*HJiH36CmFJNCX3f-3rrijA.png" /><figcaption>4. Concurrent execution on a single-core system.</figcaption></figure><figure><img alt="" src="https://cdn-images-1.medium.com/max/504/1*8IQ6nxVm8Lm_qyNo7KX7Bg.png" /><figcaption>5. Parallel execution on a multicore system.</figcaption></figure><h4>Amdahal’s Law</h4><p>Amdahl’s Law is a formula that identifies potential performance gains from adding additional computing cores to an application that has both serial (nonparallel) and parallel components. If S is the portion of the application that must be performed serially on a system with N processing cores, the formula appears as follows:</p><figure><img alt="" src="https://cdn-images-1.medium.com/max/199/1*Fhsn-0Po1VEqxm5PK_y23A.png" /><figcaption>6. Amdahl’s Law Formula</figcaption></figure><p>As an example, assume we have an application that is 75 percent parallel and 25 percent serial. If we run this application on a system with two processing cores, we can get a speedup of 1.6 times. If we add two additional cores (for a total of four), the speedup is 2.28 times. Below is a graph illustrating Amdahl’s Law in several different scenarios.</p><figure><img alt="" src="https://cdn-images-1.medium.com/max/455/1*80EeOhFCi9GnVzWc6e5tFw.png" /><figcaption>7. Amdahl’s Law graphic</figcaption></figure><p>The sequential operation always has a lower value and is constant.</p><figure><img alt="" src="https://cdn-images-1.medium.com/max/1024/1*DG5O_cNP19qrpKxBo71HHw.png" /><figcaption>8. Idea behind Amdahl’s Law</figcaption></figure><h4>Types Of Parallelism</h4><ol><li><strong>Data Parallelism:</strong> Data parallelism focuses on distributing subsets of the same data across multiple computing cores and performing the same operation on each core.</li><li><strong>Task Parallelism: </strong>Task parallelism involves distributing not data but tasks (threads) across multiple computing cores. Each thread is performing a unique operation. Different threads may be operating on the same data, or they may be operating on different data.</li></ol><figure><img alt="" src="https://cdn-images-1.medium.com/max/604/1*JLeQGDE4CBq14oubTxLnQw.png" /><figcaption>9. Data and task parallelism</figcaption></figure><p><strong>User-level threads </strong>are managed entirely by the user-level thread library, without any support from the operating system kernel. This means that the thread library is responsible for managing thread creation, scheduling, and synchronization. User-level threads are typically lightweight and efficient, as they do not require kernel-level intervention for context switching. However, they can also be limited in their capabilities, as they may not have direct access to system resources such as I/O devices or the network.</p><p><strong>Kernel-level threads</strong>, on the other hand, are managed directly by the operating system kernel. This means that the kernel is responsible for managing thread creation, scheduling, and synchronization. Kernel-level threads are typically more powerful and flexible, as they have direct access to system resources. However, they can also be less efficient than user-level threads, as they require more overhead for context switching.</p><h4>Multithreading Models</h4><ol><li><strong>One-to-one Model: </strong>The one-to-one model maps each of the user threads to a kernel thread. This means that many threads can run in parallel on multiprocessors and other threads can run when one thread makes a blocking system call.</li></ol><figure><img alt="" src="https://cdn-images-1.medium.com/max/432/1*uCIKHi7PjYY39XToiwA28w.png" /></figure><p>2. <strong>Many-to-One Model:</strong> The many-to-one model maps many of the user threads to a single kernel thread. This model is quite efficient as the user space manages the thread management.</p><p>A disadvantage of the many-to-one model is that a thread-blocking system call blocks the entire process. Also, multiple threads cannot run in parallel, as only one thread can access the kernel at a time.</p><figure><img alt="" src="https://cdn-images-1.medium.com/max/448/1*pkRRwLYHh6CjpyR_bvi0Hw.png" /></figure><p>3. <strong>Many-to-Many Model: </strong>The many-to-many model maps many of the user threads to an equal number or lesser number of kernel threads. The number of kernel threads depends on the application or machine.</p><p>The many-to-many model does not have the disadvantages of the one-to-one model or the many-to-one model. There can be as many user threads as required, and their corresponding kernel threads can run in parallel on a multiprocessor.</p><figure><img alt="" src="https://cdn-images-1.medium.com/max/475/1*XRfVPtOWwvc7scMBoWSGwA.png" /></figure><h4>Pthreads in Linux</h4><iframe src="" width="0" height="0" frameborder="0" scrolling="no"><a href="https://medium.com/media/179fd780ccb6f0f3ae2814d2ac193ee9/href">https://medium.com/media/179fd780ccb6f0f3ae2814d2ac193ee9/href</a></iframe><p>The C program demonstrates the basic Pthreads API for constructing a multithreaded program that calculates the summation of a non-negative integer in a separate thread. In a Pthreads program, separate threads begin execution of a specified function. In the code above, this is the runner() function. When this program begins, a single thread of control begins in main(). After some initialization, main() creates a second thread that begins controlling the runner() function. Both threads share the global data sum. Let’s look more closely at this program. All Pthreads programs must include the pthread.h header file. The statement pthread_ tid declares the identifier for the thread we will create. Each thread has a set of attributes, including stack size and scheduling information. The pthread_attr_t attr declaration represents the attributes for the thread. We set the attributes in the function call pthread_attr_init(&amp;attr). Because we did not explicitly set any attributes, we use the default attributes provided. A separate thread is created with the pthread _create() function call. In addition to passing the thread identifier and the attributes for the thread, we also pass the name of the function where the new thread will begin execution — in this case, the runner() function. Last, we pass the integer parameter that was provided on the command line, argv[1]. At this point, the program has two threads: the initial (or parent) thread in main() and the summation (or child) thread performing the summation operation in the runner() function. This program follows the thread create/join strategy, whereby after creating the summation thread, <strong>the parent thread will wait for it to terminate by calling the pthread_join() function.</strong> The summation thread will terminate when it calls the function pthread_exit(). Once the summation thread has returned, the parent thread will output the value of the shared data sum.</p><h3>Conclusion</h3><ul><li>Threads are a fundamental concept in computer science that allows for concurrent execution of multiple tasks within a single process.</li><li>They are lightweight, independent units of execution that share the same memory space.</li><li>Threads enable parallelism and can improve the performance of applications by taking advantage of multiple CPU cores.</li><li>They can communicate and synchronize with each other through various mechanisms like shared variables, locks, and semaphores. However, managing threads can be complex and prone to issues like race conditions and deadlocks, requiring careful design and synchronization techniques to ensure correct and efficient execution.</li></ul><h3>References</h3><ul><li><a href="https://www.tutorialspoint.com/multi-threading-models">https://www.tutorialspoint.com/multi-threading-models</a></li><li>Operating System Concepts” by Silberschatz, Galvin, and Gagne is the 10th edition.</li></ul><img src="https://medium.com/_/stat?event=post.clientViewed&referrerSource=full_rss&postId=cd8ceca1a8d1" width="1" height="1" alt="">]]></content:encoded>
        </item>
        <item>
            <title><![CDATA[R-Squared]]></title>
            <link>https://halil7hatun.medium.com/r-squared-3e9e4603ecd7?source=rss-a71bf613874c------2</link>
            <guid isPermaLink="false">https://medium.com/p/3e9e4603ecd7</guid>
            <category><![CDATA[data-science]]></category>
            <category><![CDATA[r-squared]]></category>
            <category><![CDATA[python]]></category>
            <dc:creator><![CDATA[Halil İbrahim Hatun]]></dc:creator>
            <pubDate>Mon, 19 Sep 2022 07:47:15 GMT</pubDate>
            <atom:updated>2022-09-19T07:47:15.699Z</atom:updated>
            <content:encoded><![CDATA[<p>In data science, we create and use regression models of the process of estimating a variable (the dependent variable) using one or more variables. So, how will we understand the performance of the regression model we have created?<br>One of the metrics used to measure regression performance is <strong>R — squared</strong>.</p><figure><img alt="" src="https://cdn-images-1.medium.com/max/1024/1*uIam_3HO1c_FXkGJYgXmTg.png" /></figure><h3>What is R-Squared?</h3><p>Let’s go through an example to explain what R-squared is. We have data. We have given this data to a regression model by making the necessary preprocessing applications. Then a regression line was formed. So is this regression line appropriate? Or how good is this regression line, could it be better? R-squared is a metric that allows us to get answers to such questions. It is a metric that shows us how well the regression line formed, with a numerical value in an appropriate position.</p><h3>How to calculate R-Squared?</h3><p>R-squared is obtained by dividing the sum of the squares of the distance of each point from the regression line by the sum of the squares of the distance of each moment from the mean and subtracting the result from 1.</p><figure><img alt="" src="https://cdn-images-1.medium.com/max/1024/1*tCegJAuKhbu5haxmX58PqA.png" /></figure><p>There are some exceptions when interpreting the R-square metric. For example, logically, if R-Squared is low, we think that the fit of the model is bad, and if it is high, we think that the fit of the model is good. But this is not always the case. In some data, this situation varies. Therefore, it is not correct to evaluate the performance of the model only with the R-Squared metric.</p><figure><img alt="" src="https://cdn-images-1.medium.com/max/1024/1*HU5viIlds14a6oz8RPlg9Q.png" /></figure><h3>Adjusted R-Squared</h3><p>As the number of independent values increases, the R-Squared metric will increase indirectly. For example, let’s say we’re calculating the R-Squared of a house estimate. Next, add an attribute called the average height of previous homeowners to this home estimate data. This attribute has nothing to do with house prices, but R-Squared will be higher. In other words, it will be deduced that the prices of the houses with a high average height of the old house owners are higher. This approach is wrong. We use the <strong>Adjusted R-squared metric to improve this situation as much as possible. This metric’s expected value is the number of individual elements.</strong></p><figure><img alt="" src="https://cdn-images-1.medium.com/max/1024/1*xwJTCLO-n8Qozr4tN0dmCg.png" /></figure><figure><img alt="" src="https://cdn-images-1.medium.com/max/1024/1*JAZDRP2AIakvXKSehV_xJQ.png" /></figure><h4>Let’s examine R-Squared and Adjusted R-Squared metrics by applying</h4><p>Firstly, we are importing libraries and methods that we use</p><iframe src="" width="0" height="0" frameborder="0" scrolling="no"><a href="https://medium.com/media/718356623352af8f8a10ca7c8311e6d5/href">https://medium.com/media/718356623352af8f8a10ca7c8311e6d5/href</a></iframe><iframe src="" width="0" height="0" frameborder="0" scrolling="no"><a href="https://medium.com/media/657c11a227802fc049aad160165810eb/href">https://medium.com/media/657c11a227802fc049aad160165810eb/href</a></iframe><figure><img alt="" src="https://cdn-images-1.medium.com/max/1024/1*mkrTPr6LkLdiHAkdH1CSfA.png" /></figure><p>We delete the “Posted On” feature for it is not necessary for the regression model.</p><iframe src="" width="0" height="0" frameborder="0" scrolling="no"><a href="https://medium.com/media/992efd0db9eb969c939c5219cc2a6002/href">https://medium.com/media/992efd0db9eb969c939c5219cc2a6002/href</a></iframe><h3>Preprocessing Part</h3><iframe src="" width="0" height="0" frameborder="0" scrolling="no"><a href="https://medium.com/media/b44087d63bb9d5f3abf6b2a8916342f3/href">https://medium.com/media/b44087d63bb9d5f3abf6b2a8916342f3/href</a></iframe><h3>Model Split Part</h3><iframe src="" width="0" height="0" frameborder="0" scrolling="no"><a href="https://medium.com/media/94ce10808d1798d34e50490b32a46668/href">https://medium.com/media/94ce10808d1798d34e50490b32a46668/href</a></iframe><figure><img alt="" src="https://cdn-images-1.medium.com/max/528/1*flLl0gKC71JPywepgzF1RA.png" /><figcaption>shape control image</figcaption></figure><h3>Regression Part</h3><iframe src="" width="0" height="0" frameborder="0" scrolling="no"><a href="https://medium.com/media/9f92941b45aec644398d6e24cd291f52/href">https://medium.com/media/9f92941b45aec644398d6e24cd291f52/href</a></iframe><iframe src="" width="0" height="0" frameborder="0" scrolling="no"><a href="https://medium.com/media/4003ca76225c704b0e83a7f058630e56/href">https://medium.com/media/4003ca76225c704b0e83a7f058630e56/href</a></iframe><figure><img alt="" src="https://cdn-images-1.medium.com/max/1024/1*YkwQ0oxUTtbR7JgOBXyfzA.png" /><figcaption>results of the main regression part</figcaption></figure><iframe src="" width="0" height="0" frameborder="0" scrolling="no"><a href="https://medium.com/media/9c70275b07cd6db183c5114958996d95/href">https://medium.com/media/9c70275b07cd6db183c5114958996d95/href</a></iframe><figure><img alt="" src="https://cdn-images-1.medium.com/max/680/1*aGufmVvS7i-7olDHE7__VQ.png" /><figcaption>scoring df</figcaption></figure><h3>Visualization of R-Squared And Adjusted R-Squared Values</h3><iframe src="" width="0" height="0" frameborder="0" scrolling="no"><a href="https://medium.com/media/65826da7b632d267cdf08dc0015d58f4/href">https://medium.com/media/65826da7b632d267cdf08dc0015d58f4/href</a></iframe><figure><img alt="" src="https://cdn-images-1.medium.com/max/967/1*RsFuJ6Ke189EAEltulge4Q.png" /></figure><h3><strong>References</strong></h3><ul><li><a href="https://www.investopedia.com/terms/r/r-squared.asp">https://www.investopedia.com/terms/r/r-squared.asp</a></li><li><a href="https://www.kaggle.com/code/jyotiprasadpal/assessing-the-accuracy-with-r2-and-adjusted-r2/notebook">https://www.kaggle.com/code/jyotiprasadpal/assessing-the-accuracy-with-r2-and-adjusted-r2/notebook</a></li><li><a href="https://corporatefinanceinstitute.com/resources/knowledge/other/r-squared/">https://corporatefinanceinstitute.com/resources/knowledge/other/r-squared/</a></li></ul><img src="https://medium.com/_/stat?event=post.clientViewed&referrerSource=full_rss&postId=3e9e4603ecd7" width="1" height="1" alt="">]]></content:encoded>
        </item>
    </channel>
</rss>