<?xml version="1.0" encoding="UTF-8"?><rss xmlns:dc="http://purl.org/dc/elements/1.1/" xmlns:content="http://purl.org/rss/1.0/modules/content/" xmlns:atom="http://www.w3.org/2005/Atom" version="2.0" xmlns:cc="http://cyber.law.harvard.edu/rss/creativeCommonsRssModule.html">
    <channel>
        <title><![CDATA[Stories by Ray Flanagan on Medium]]></title>
        <description><![CDATA[Stories by Ray Flanagan on Medium]]></description>
        <link>https://medium.com/@rfdev?source=rss-7651da13d57f------2</link>
        <image>
            <url>https://cdn-images-1.medium.com/fit/c/150/150/1*_r7VG1h1wum1Li0FYPNWsA.png</url>
            <title>Stories by Ray Flanagan on Medium</title>
            <link>https://medium.com/@rfdev?source=rss-7651da13d57f------2</link>
        </image>
        <generator>Medium</generator>
        <lastBuildDate>Fri, 05 Jun 2026 18:55:50 GMT</lastBuildDate>
        <atom:link href="https://medium.com/@rfdev/feed" rel="self" type="application/rss+xml"/>
        <webMaster><![CDATA[yourfriends@medium.com]]></webMaster>
        <atom:link href="http://medium.superfeedr.com" rel="hub"/>
        <item>
            <title><![CDATA[Word Vectors & Word2Vec]]></title>
            <link>https://medium.com/@rfdev/word-vectors-c6f53ae573e7?source=rss-7651da13d57f------2</link>
            <guid isPermaLink="false">https://medium.com/p/c6f53ae573e7</guid>
            <category><![CDATA[nlp]]></category>
            <category><![CDATA[machine-learning]]></category>
            <dc:creator><![CDATA[Ray Flanagan]]></dc:creator>
            <pubDate>Thu, 21 Dec 2023 16:13:07 GMT</pubDate>
            <atom:updated>2023-12-26T16:27:04.501Z</atom:updated>
            <content:encoded><![CDATA[<p>This article focuses on Word Vectors and the Word2Vec algorithm.</p><p><strong>Word vectors are numerical representations (vectors) of words.</strong></p><p>What’s the point? Can’t we just use raw words?</p><p>Before word-vectors, this was the approach to NLP problems. However, computers had a hard time understanding the relationships between words and phrases. Because word vectors are in a computer’s native language (numbers), language models can find a sense of meaning and relationship between words.</p><p>Let’s look at an example:</p><figure><img alt="" src="https://cdn-images-1.medium.com/max/1024/1*7woXYINh3HVDz7exqNrvhw.png" /></figure><p>Here, the words in the graph are represented by a vector (like the one shown on the left).</p><p>Now, the amazing thing about word embeddings is that they have relationships with one another. Similar words are close together in vector space. For example, come and go are related words and have similar vectors.</p><p>Since word embeddings are vectors, we can also perform mathematical operations on them, which can yield some pretty interesting results. One classic example is the following:</p><figure><img alt="" src="https://cdn-images-1.medium.com/max/507/1*mt5-a4ABgbrrtalfabdbZw.png" /></figure><p>Here, we have three vectors: one for king, one for man, and one for woman. And we are performing the operation: king — man + woman, which yields a vector similar to queen.</p><p>This demonstrates the capability of word vectors as they can retain a sense of semantic meaning and relationship.</p><p><strong>How can we create word vectors? Word2Vec</strong></p><p>Let’s outline how Word2Vec creates word vectors.</p><ol><li>Each word in the corpus is given a random vector</li><li>For each word in the corpus, calculate the probability of the current word (o) given the context words (c). This is done by calculating the similarity between the word vector representing o and the word vectors representing c.</li><li>The loss is then calculated based on these probabilities. The objective is to <strong>minimize the loss</strong> by maximizing the probability in Step 2.</li></ol><figure><img alt="" src="https://cdn-images-1.medium.com/max/789/1*ZGcN1DP37IKio6VykpxEiw.png" /><figcaption>Loss function for Word2Vec</figcaption></figure><p>4. Then, use gradient descent to update the word vectors.</p><img src="https://medium.com/_/stat?event=post.clientViewed&referrerSource=full_rss&postId=c6f53ae573e7" width="1" height="1" alt="">]]></content:encoded>
        </item>
        <item>
            <title><![CDATA[Transformer Architecture in NLP]]></title>
            <link>https://medium.com/@rfdev/transformer-architecture-in-nlp-ac10bd850f97?source=rss-7651da13d57f------2</link>
            <guid isPermaLink="false">https://medium.com/p/ac10bd850f97</guid>
            <category><![CDATA[machine-learning]]></category>
            <category><![CDATA[transformers]]></category>
            <category><![CDATA[nlp]]></category>
            <dc:creator><![CDATA[Ray Flanagan]]></dc:creator>
            <pubDate>Sun, 11 Dec 2022 18:34:45 GMT</pubDate>
            <atom:updated>2022-12-11T18:34:45.201Z</atom:updated>
            <content:encoded><![CDATA[<p>The Transformer, based solely on attention, is an alternative to an RNN model architecture. Many recent popular NLP models use the Transformer Architecture (GPT and Bert for example).</p><p>This article focuses on the Transformer Architecture as outlined in the paper Attention is All You Need (Vaswani et al 2017).</p><p><strong>Model Architecture</strong></p><figure><img alt="Transformer Architecture" src="https://cdn-images-1.medium.com/max/1024/0*xD79MR28JW6BID7R.png" /><figcaption>Transformer Architecture (Vaswani et al 2017)</figcaption></figure><p>The Transformer in the diagram represents an encoder-decoder model.</p><p><strong>Encoder</strong></p><p>The base architecture for the Encoder is what is boxed to the left. It can be thought of as “encoding” or understanding information from the input.</p><p>The two main parts of the stack are</p><ul><li>Multi-head attention layer</li><li>Feed-forward network</li></ul><p><strong>Decoder Architecture</strong></p><p>The base architecture for the Decoder is what is boxed to the right and is used to generate output (in an encoder-decoder architecture the decoder generates inputs from the encoder).</p><p>The stack is very similar to the encoder. The only difference is that there is an extra layer at the beginning: Masked Multi-head Attention.</p><p><strong>Attention</strong></p><p>With previous RNN model architecture, it was hard to pay “attention” to future and previous words in the input sentence. Transformers use self-attention to fix this problem. For each word, self-attention calculates a weight that each word before and after the current word in the sequence.</p><figure><img alt="" src="https://cdn-images-1.medium.com/max/1024/0*8XkAC9wPORuG88Gf.png" /></figure><figure><img alt="" src="https://cdn-images-1.medium.com/max/576/0*WCP3WIhrVrAn9I-H.png" /><figcaption>Multi-head Attention (Vaswani et al 2017)</figcaption></figure><p>Each of the Multi-head attention layers is made up of several Attention Layers (Scaled Dot-Product Attention). Let’s look into these layers and see how they work.</p><p><strong>Scaled Dot-Product Attention</strong></p><p>Although there are many ways to calculate attention, the method that is proposed in the paper is called Scaled Dot-Product attention.</p><figure><img alt="" src="https://cdn-images-1.medium.com/max/265/1*_zPrl5ekhrbwZlgr3vmpVQ.png" /></figure><p>Here, the similarity measure is the dot product between Q and K. The Dot-product is then scaled by the square root of the size of K. The Dot-product is scaled here to normalize values, improving the speed of the calculations.</p><p>The softmax of the dot product returns a probability distribution which is then multiplied by V to get the attention weights.</p><p><strong>Masked Attention</strong></p><p>As mentioned previously, the decoder adds an extra masked-attention layer. Masked attention simply adds a mask (a binary tensor) to the input vector so that the decoder can only “pay attention” to previous words.</p><p>Now that we’ve seen what a Transformer is and what is made up of, let&#39;s see some examples of how it&#39;s used.</p><p><strong>Encoder and Decoder models</strong></p><p>The best example of an Encoder and Decoder model is for translation. The Encoder understands the input text and the decoder takes the encoded input and translates it into the output language.</p><p><strong>Encoder models</strong></p><p>Because Encoders attempt to understand the meaning of text, they are used in text analysis/classification.</p><p><strong>Decoder models</strong></p><p>Probably the best example of a decoder model is the GPT Family. They are primarily used for text-generation.</p><img src="https://medium.com/_/stat?event=post.clientViewed&referrerSource=full_rss&postId=ac10bd850f97" width="1" height="1" alt="">]]></content:encoded>
        </item>
        <item>
            <title><![CDATA[Python Walrus Operator]]></title>
            <link>https://medium.com/@rfdev/python-walrus-operator-ea27d1335b31?source=rss-7651da13d57f------2</link>
            <guid isPermaLink="false">https://medium.com/p/ea27d1335b31</guid>
            <category><![CDATA[python3]]></category>
            <category><![CDATA[walrus-operator]]></category>
            <category><![CDATA[python-programming]]></category>
            <category><![CDATA[python]]></category>
            <dc:creator><![CDATA[Ray Flanagan]]></dc:creator>
            <pubDate>Thu, 24 Nov 2022 15:50:46 GMT</pubDate>
            <atom:updated>2022-11-24T15:50:46.201Z</atom:updated>
            <content:encoded><![CDATA[<p>The walrus operator in Python as first introduced in PEP572 in Python 3.8. Because it resembles a walrus (: being the eyes and = being the tusks), it is known as the walrus operator.</p><p>But what does := do, why is it useful, and when should you use it?</p><p><strong>What does it do?</strong></p><p>From the Python docs:</p><blockquote>:=assigns values to variables as part of a larger expression</blockquote><p>But what does this mean? Simply, it creates a variable within an if statement.</p><p>Let’s see an example from the docs with and without using the walrus operator.</p><pre>if len(array) &gt; 10:<br>  print(f&#39;List is too long ({ len(array) } elements, expected &lt;= 10)&#39;)<br><br># Versus<br><br>if (l := len(array)) &gt; 10:<br>  print(f&#39;List is too long ({ l } elements, expected &lt;= 10)&#39;)</pre><p>Rather than needing to repeat the len call twice, the variable l defined using the walrus operator is used.</p><p>Although the change here seems small, the benefit of using walrus operators is more significant when more complex expressions are used</p><p>For example, with Django</p><pre># example 1<br><br>if Users.objects.filter(age__gte=10).exists():<br>  users = Users.objects.filter(age__gte=10)<br>  print(users.count())<br>  ...<br><br># versus example 2<br><br>if (users := Users.objects.filter(age__gte=10)).exists():<br>  print(users.count())  </pre><p>Here we can see the real benefit of walrus operators. Rather than repeating the filter twice, we only have to write it once. This is extremely beneficial if you want to change the filter; it reduces the number of places where you need to edit the statement.</p><p>Although you can get around repeating statements, and not use the walrus operator by defining a variable outside of the if-statement. For example:</p><pre># example 3<br><br>users = Users.objects.filter(age__gte=10)<br><br>if users.exists():<br>  print(users.count())</pre><p>I find two problems with this solution.</p><ol><li>It is an extra line compared to the walrus operator solution</li><li>If you are only planning to use the users variable within the if statement, the intention of the variable is not immediately clear as it is defined outside of the if statement.</li></ol><p><strong>Why is this useful?</strong></p><ol><li>The walrus operator reduces the number of statements that could be called in an if-statement.</li><li>Compared to example 3, the walrus operator solution makes it clear that the variable will be used inside of the if statement.</li></ol><p><strong>When should you use it?</strong></p><p>When you find yourself repeating statements within the condition and inside of an if statement, you should consider using a walrus operator.</p><p>However, I would recommend resorting to example 3 if you plan to use the variable outside of the if statement as well. For example:</p><pre>users = Users.objects.filter(age__gte=10)<br><br>if users.exists():<br>  print(users.count())<br><br>...<br><br>for user in users:<br>  ...</pre><p>As compared to</p><pre>if (users := Users.objects.filter(age__gte=10)).exists():<br>  print(users.count())<br><br>...<br><br>for user in users:<br>  ...</pre><p>Although a walrus operator could be used here, I find that using it makes it less clear as to what your intentions are for the users variable.</p><img src="https://medium.com/_/stat?event=post.clientViewed&referrerSource=full_rss&postId=ea27d1335b31" width="1" height="1" alt="">]]></content:encoded>
        </item>
    </channel>
</rss>