<?xml version="1.0" encoding="UTF-8"?><rss xmlns:dc="http://purl.org/dc/elements/1.1/" xmlns:content="http://purl.org/rss/1.0/modules/content/" xmlns:atom="http://www.w3.org/2005/Atom" version="2.0" xmlns:cc="http://cyber.law.harvard.edu/rss/creativeCommonsRssModule.html">
    <channel>
        <title><![CDATA[Stories by Omar Aflak on Medium]]></title>
        <description><![CDATA[Stories by Omar Aflak on Medium]]></description>
        <link>https://medium.com/@omaraflak?source=rss-c215fdc67eb------2</link>
        <image>
            <url>https://cdn-images-1.medium.com/fit/c/150/150/1*qRE-csBHtvwOsE2M_YV7UQ.png</url>
            <title>Stories by Omar Aflak on Medium</title>
            <link>https://medium.com/@omaraflak?source=rss-c215fdc67eb------2</link>
        </image>
        <generator>Medium</generator>
        <lastBuildDate>Tue, 09 Jun 2026 22:51:14 GMT</lastBuildDate>
        <atom:link href="https://medium.com/@omaraflak/feed" rel="self" type="application/rss+xml"/>
        <webMaster><![CDATA[yourfriends@medium.com]]></webMaster>
        <atom:link href="http://medium.superfeedr.com" rel="hub"/>
        <item>
            <title><![CDATA[Information & Entropy]]></title>
            <link>https://medium.com/data-science-collective/information-entropy-093036aa8eec?source=rss-c215fdc67eb------2</link>
            <guid isPermaLink="false">https://medium.com/p/093036aa8eec</guid>
            <category><![CDATA[information-theory]]></category>
            <category><![CDATA[entropy]]></category>
            <category><![CDATA[mathematics]]></category>
            <category><![CDATA[cross-entropy]]></category>
            <category><![CDATA[machine-learning]]></category>
            <dc:creator><![CDATA[Omar Aflak]]></dc:creator>
            <pubDate>Fri, 21 Feb 2025 18:22:32 GMT</pubDate>
            <atom:updated>2025-09-05T22:30:25.100Z</atom:updated>
            <content:encoded><![CDATA[<h4>What is information and how is it measured? What is entropy? Cross entropy? Relative entropy (aka KL Divergence)?</h4><h4>UPDATE: Check out the updated version of this article at: <a href="https://omaraflak.com/articles/entropy">https://omaraflak.com/articles/entropy</a></h4><h3>Information</h3><p>Information is tied to the field of <strong>probabilities</strong>, and it can be seen as a measure of <strong>uncertainty. </strong>To avoid extrapolation and misuse of this concept, you need to remember that it only makes sense to talk about information (in the mathematical sense) when you are studying a <strong>probabilistic event</strong>.</p><blockquote>Information relates to probabilities in that the realisation of an event with low probability brings a lot of information, and the realisation of an event with high probability brings little information.</blockquote><p>For example: the event <em>“It rains in London”</em> is very <strong><em>likely</em></strong> therefore it brings <strong><em>little</em></strong> information. In contrast, the event <em>“It rains in the Sahara Desert”</em> is so <strong><em>unlikely</em></strong>, it brings a lot <strong><em>more</em></strong> information (e.g. it could more realistically help you pin-point the day of the event).</p><p>So information relates to probability, but how <strong><em>exactly</em></strong>?</p><p>Let’s explore the properties we would like such a mapping to have:</p><ol><li>Low probability ⇒ high information (already established)</li><li>High probability ⇒ low information (already established)</li><li>Probability = 1 ⇒ Information = 0 (derived from 2 — that’s because if an event is certain to be realised, then knowing about it doesn’t bring about any information)</li><li>Probability → 0 ⇒ Information → +inf (the opposite of 3 must be true)</li><li>Information should be <strong>additive for independent events</strong>, i.e. learning about two independent events should give you the amount of information equal to the sum of the information gained from each event separately:</li></ol><p>information(event1 and event2) = information(event1) + information(event2)</p><p>Given that the probability of two independent events to be realised together is p(event1) * p(event2) (product of the probabilities of each event to happen individually), then we must have:</p><p>information(p(event1) * p(event2)) = information(p(event1)) + information(p(event2))</p><p>I’m abusing the previous notation, because here we passed a <em>probability</em> into this information function instead of an <em>event</em>, but that’s just to illustrate the idea…</p><blockquote>The realisation of two independent events brings as much information as the sum of the information gained from each event separately.</blockquote><p>Lastly, if we need this mapping function to be continuous, then there’s only one family of functions that respects those properties: <strong>the logarithms</strong>.</p><p>More precisely, the negative logarithms:</p><figure><img alt="" src="https://cdn-images-1.medium.com/max/710/1*2Y7WzWZ13F7zcOh7bD6foQ.png" /><figcaption>-log(x)</figcaption></figure><p>We define information mathematically as:</p><figure><img alt="" src="https://cdn-images-1.medium.com/max/244/0*-6tIMQmqduAn-D8R.png" /><figcaption>Relates the information content of an event `x` to its probability of realisation `p(x)`</figcaption></figure><p>We said <em>“the family of functions”</em> earlier — indeed, logarithms of all bases respect the properties listed above. You can use any of them, the difference will be in the <strong>unit</strong> of the information:</p><ul><li>log2(x) will give <em>“bits”</em></li><li>log10(x) will give “<em>dits</em>”</li><li>ln(x) will give <em>“nats”</em></li></ul><p>All of those are valid ways of expressing information. In practice, we often use the base 2 logarithm.</p><blockquote><strong>Bit:</strong> represents the amount of information content gained with a binary choice.</blockquote><h4><strong>Example 1</strong></h4><p>I flip a fair coin p(heads) = p(tails) = 1/2 and tell you the result. I have just given you: -log2(1/2) = log2(2) = 1 bit of information! In other words, I have given you the <strong>information content gained with 1 binary choice</strong>.</p><p>Recall that log(1/x) = -log(x).</p><blockquote>This is what the logarithm in base 2 does: it answers the question “how many times do I have to divide x by 2 to get 1 (or less) ?”, or in other words, how many binary choices do I have to make on my input space to be left with 1 element (or less). Each of these binary choices (division) represents one <strong>bit</strong> of information.</blockquote><h4>Example 2</h4><p>I have to pick one fruit between 8 different fruits (assume each is equally likely to be picked). I pick one and tell you which: I have just given you -log2(1/8) = log2(8) = 3 bits of information. In other words, I have given you the information content gained with 3 binary choices (divide 8 by two 3 times).</p><blockquote>I could have measured information in “dits” instead. It is equally correct to say that I would have given you -log10(1/8) = log10(8) = 0.9 dits of information.</blockquote><blockquote><em>3 bits = 0.9 dits</em></blockquote><h3>Entropy</h3><p>In the previous examples, you’ll notice that I used <strong>uniform probability distributions</strong>. This means the probability of each outcome was equally likely (p=1/2 for the coin toss, and p=1/8 for the fruit pick). Then I asked:</p><blockquote>What is the information gained for observing one of those events?</blockquote><p>Since the probability was the same for all events, then the answer to that question would be the <strong>same regardless of the outcome</strong> of the random experiment.</p><blockquote>What if each outcome had a different probability of realisation?</blockquote><p>What if I had to pick between 3 fruits, each with a different probability according to my preferences:</p><ul><li><strong>M</strong>ango (p=0.7)</li><li><strong>A</strong>pples (p=0.2)</li><li><strong>O</strong>range (p=0.1)</li></ul><p>A natural question is: <strong><em>on average</em></strong>, what is the information gained for observing an event from that random experiment? We are asking the <em>same question</em> as before, but of course since each outcome has a different probability and since the information depends on the probability, the result will change for different outcomes. Therefore we ask about the <strong><em>average</em></strong> outcome.</p><p>One way is to <strong><em>sum</em></strong> the information gained by each event <strong><em>weighted</em></strong> by the probability of realisation.</p><figure><img alt="" src="https://cdn-images-1.medium.com/max/940/0*bzRNPxkbSjsMxQtX.png" /><figcaption>Expected information gained for observing an event from the random experiment</figcaption></figure><blockquote>That is exactly what <strong>entropy</strong> is!</blockquote><p>We call <em>entropy</em> the <strong>expected amount of information</strong> gained for observing an event from a random variable. In other words, this answers the question: “If I sample an event from a variable X ; On average, what is the information gained for observing one of x1, x2, ..., or xn given the probability distribution of those events?”.</p><p>We usually denote the entropy of a random variable X as H(X):</p><figure><img alt="" src="https://cdn-images-1.medium.com/max/411/0*KlUoJKa7aMfjGr46.png" /><figcaption>Entropy</figcaption></figure><blockquote>An interesting follow up question is: <strong>when is the entropy minimal/maximal ?</strong></blockquote><p>Without even doing any calculation, we can intuitively try to answer. <em>Give it a thought!</em></p><h4>Minimal Entropy</h4><p>Since entropy is the expected information to be gained from observing a random variable, and since information is minimal when events are certain to be realised, the absolute minimum would be reached if a random variable could be <strong><em>predictable every time</em></strong>, i.e. if it had an event with probability p=1 and the rest p=0 (in which case H(X)=0). Any other probability distribution would yield <strong><em>some</em></strong> amount of information.</p><h4>Maximal Entropy</h4><p>Entropy is maximised if the average information is maximal. We know information is highest for most improbable events (p-&gt;0). So if we have multiple events, each with a certain probability, and we want those probabilities to be as low as possible, then the lowest we can go <strong><em>on average</em></strong> is when we spread the probability space over all events equally, that is p=1/n with n the number of events, or in other words: a <strong><em>uniform probability distribution</em></strong>.</p><blockquote>You can see the uniform distribution as the most <strong>“unpredictable” — </strong>the one for which each event brings the maximum amount of information content.</blockquote><p>I highly advise checking out 3B1B video on how to solve the game <em>Wordle</em> using the concept of entropy.</p><iframe src="https://cdn.embedly.com/widgets/media.html?src=https%3A%2F%2Fwww.youtube.com%2Fembed%2Fv68zYyaEmEA%3Ffeature%3Doembed&amp;display_name=YouTube&amp;url=https%3A%2F%2Fwww.youtube.com%2Fwatch%3Fv%3Dv68zYyaEmEA&amp;image=https%3A%2F%2Fi.ytimg.com%2Fvi%2Fv68zYyaEmEA%2Fhqdefault.jpg&amp;type=text%2Fhtml&amp;schema=youtube" width="854" height="480" frameborder="0" scrolling="no"><a href="https://medium.com/media/ed79509a999d67edf4330eabc3b850c9/href">https://medium.com/media/ed79509a999d67edf4330eabc3b850c9/href</a></iframe><p>There’s also another way to interpret <strong>entropy</strong>, and it’s going to be useful for the rest of the article, so before going further with <em>Cross Entropy</em> and <em>Relative Entropy</em>, we’re making a little stop at <strong><em>encoders</em></strong>.</p><h3>Encoders</h3><p>An encoder is a machine/routine/code that assigns to each event of a probability distribution a code (let’s say in bits, but we could use another base).</p><p>An encoder is <strong>optimal</strong>, if <em>on average</em>, it uses the theoretical <strong><em>minimum number of bits</em></strong> possible to represent an event drawn from the distribution.</p><h4>Example 1</h4><p>Say we have three events A,B,C, with p(A)=p(B)=p(C)=1/3.</p><p>We could create a coding (<em>a mapping</em>) that uses 2 bits to encode each outcome:</p><p>A-&gt;00, B-&gt;01, C-&gt;10</p><p>If I then give you a list of bits, e.g. 011000, you are able to decode it (by splitting every 2 bits and using the mapping above): 011000 → BCA. This works out fine, but we are waisting the 11 state of our 2 bits, which accounts for 25% of all possible states! This is not very optimal.</p><blockquote>What if we assigned less bits to some events ?</blockquote><h4>Example 2</h4><p>Consider the following encoder:</p><p>A-&gt;0, B-&gt;10, C-&gt;11</p><p>Here, we use a total of <strong>5 bits</strong> to encode <strong>3 states</strong> (instead of 6 bits in the previous coding), that is 5/3 = 1.7 bits on average, which is less than 2 bits like previously.</p><p>With this new encoder, suppose we read the first 2 bits of a message b1,b2:</p><ul><li>if b1 == 0 then A</li><li>if b1 == 1 and b2 == 0then B</li><li>if b1 == 1 and b2 == 1then C</li></ul><p>And we can keep reading and decoding a long string of bits that way.</p><blockquote>You might ask, why not use even less bits? Well, let’s see.</blockquote><h4>Example 3 ❌</h4><p>Consider this final encoder:</p><p>A-&gt;0, B-&gt;1, C-&gt;00</p><p>This uses less bits than the previous too, but it is also <strong>ambiguous</strong>!</p><p>The bits string 00 could be either AA or C , and there’s no way to go around this.</p><blockquote>An important feature of an encoder is that <strong>it has to be unambiguous</strong>, i.e. decodable in a single way.</blockquote><h4>Encoders &amp; Entropy</h4><p>How does that relate to entropy?</p><p>Think about the <strong>optimal encoder</strong>: that will be the encoder that assigns, <strong><em>on average</em></strong>, the <strong><em>least amount of bits possible</em></strong> to an event of your distribution.</p><p>In the <strong>example 2</strong> above, we considered A,B,C to be equally likely; but what if C was more probable than A and B? Wouldn’t it be better then to assign the <strong><em>single bit</em></strong> to C and <strong><em>two bits</em></strong> to A and B?</p><blockquote>In general, to achieve optimality, we need to assign less bits to more probable outcomes and more bits to less probable outcomes.</blockquote><p>A natural question is then: <strong><em>what is the minimum number of bits we can use to encode events drawn from a given probability distribution?</em></strong></p><p>The answer is… the <strong>entropy</strong>!</p><p><a href="https://en.wikipedia.org/wiki/Shannon%27s_source_coding_theorem">Shannon&#39;s source coding theorem - Wikipedia</a></p><p>To clarify: the entropy is the <strong><em>theoretical minimum</em></strong>, but in practice you may not come up with an encoder that uses `entropy` number of bits on average.</p><p>Now that we’re equipped with this new insight, let’s tackle the next concepts!</p><h3>Cross Entropy</h3><p>Let’s say I have a machine that produces random letters (a-z) according to a certain unknown probability distribution P = [p(a), p(b), …, p(z)].</p><p>Your task is to create an <strong><em>encoder</em></strong> that is <strong><em>as efficient as possible</em></strong> for data coming from this distribution, i.e. <em>an encoder that uses, on average, the least amount of bits possible to encode events from this distribution.</em></p><p>We know from earlier that the <strong><em>optimal encoder</em></strong> uses, on average, a number of bits equal to the <strong><em>entropy</em></strong> of the distribution, H(P). But for this you need to know the exact distribution, and here you don’t!</p><p>Therefore, you will have to <strong><em>guess</em></strong> what the true distribution is and produce an encoder based on your guess. Let’s call your guessed distribution Q = [q(a), q(b), ..., q(z)]. The average number of bits used by your encoder for Q <strong>will</strong> be higher or equal to H(P); and the actual amount is called the <strong>cross entropy</strong> between P and Q.</p><blockquote>The <strong>cross entropy</strong> between P and Q is the average number of bits needed to encode events from P using an optimal encoder for Q. We denote that number H(P, Q).</blockquote><p>Said differently, it means that you were <strong>expecting</strong> data from a probability distribution Q, but in reality the data belonged to a probability distribution P. And the average amount of bits used to encode those events from P (while expecting they were drawn from Q) is what we call the <strong><em>cross entropy</em></strong>.</p><p><em>Can you guess the formula?</em></p><figure><img alt="" src="https://cdn-images-1.medium.com/max/790/0*2c-20lQy3BeIVl5n.png" /><figcaption>Cross Entropy between P and Q</figcaption></figure><p>This looks very much like H(Q), but the information is weighted by the probabilities coming from P. This makes sense:</p><p>You will be using Q to encode events coming from the machine, therefore the information content will be calculated using q(x). However, the <strong><em>actual weighting</em></strong> of the information from each event comes from P since that is the <strong>true frequency</strong> of the events.</p><blockquote>Notice that H(P, Q) ≠ H(Q, P). The result changes depending on which is the <strong>true </strong>distribution and which is the <strong>guessed </strong>distribution.</blockquote><p>Also, notice that if you had guessed P perfectly well (Q=P), then the result should be the theoretical minimum number of bits possible to encode events from P, that is the <strong>entropy</strong>:</p><figure><img alt="" src="https://cdn-images-1.medium.com/max/454/0*l1owxTJn5KMsZre3.png" /><figcaption>Cross Entropy with a distribution and itself is the Entropy</figcaption></figure><h3>Relative Entropy</h3><p>Lastly, the <strong><em>relative entropy</em></strong>, also known as the <strong><em>KL divergence</em></strong>.</p><p>If you’ve understood the <em>cross entropy</em>, then this should be a piece of cake!</p><p>Recall that the <strong><em>cross entropy</em></strong> is average number of bits used if you encode events drawn from a distribution P while expecting the events to come from a distribution Q. We said this number <strong>must be higher or equal</strong> to H(P) since that would be the number of bits used by a <strong>perfect encoder</strong> for P.</p><p>The number of <strong>extra bits</strong> used relative to H(P) is what we call the <strong>relative entropy</strong> and we denote it KL(P||Q)! That is, not the entire entropy but just the extra you used due to the error in guessing P.</p><blockquote>Essentially, that is the difference in bits used by a <strong>suboptimal encoder</strong> and an <strong>optimal encoder</strong>. So this is simply H(P, Q) - H(P).</blockquote><figure><img alt="" src="https://cdn-images-1.medium.com/max/734/0*cfVpSaXuHOHLgECf.png" /><figcaption>Relative Entropy, aka KL Divergence</figcaption></figure><p>Like the cross entropy, the <em>relative entropy</em> is not commutative: KL(P||Q) ≠ KL(Q||P). You can understand it as a measure of <strong><em>relative difference</em></strong> between two probability distributions, the minimum being 0 when Q=P.</p><h4>Last Note</h4><p>In machine learning we try to minimise the cross entropy:</p><figure><img alt="" src="https://cdn-images-1.medium.com/max/407/0*W93vsT1kbBlagMl6.png" /><figcaption>Cross Entropy</figcaption></figure><p>Where P is the distribution of the <strong>data</strong>, and Q is the distribution of the <strong>model</strong>. Since the data doesn’t change during the training —H(P) is a <strong>constant</strong>— we are essentially <strong>minimising the relative entropy</strong>, i.e.<strong> </strong>the difference between P and Q.</p><p>Interestingly, in the context of LLMs (Large Language Models), when we minimise the cross entropy and therefore minimise the relative entropy, the loss we end up with after training is an approximation (as KL goes to 0) of the entropy of the data distribution, that is, the entropy of language.</p><img src="https://medium.com/_/stat?event=post.clientViewed&referrerSource=full_rss&postId=093036aa8eec" width="1" height="1" alt=""><hr><p><a href="https://medium.com/data-science-collective/information-entropy-093036aa8eec">Information &amp; Entropy</a> was originally published in <a href="https://medium.com/data-science-collective">Data Science Collective</a> on Medium, where people are continuing the conversation by highlighting and responding to this story.</p>]]></content:encoded>
        </item>
        <item>
            <title><![CDATA[Explaining the blockchain to Paloma]]></title>
            <link>https://omaraflak.medium.com/explaining-the-blockchain-to-paloma-f6534f7cb642?source=rss-c215fdc67eb------2</link>
            <guid isPermaLink="false">https://medium.com/p/f6534f7cb642</guid>
            <category><![CDATA[blockchain]]></category>
            <dc:creator><![CDATA[Omar Aflak]]></dc:creator>
            <pubDate>Mon, 19 Sep 2022 17:34:51 GMT</pubDate>
            <atom:updated>2022-09-19T20:10:08.428Z</atom:updated>
            <content:encoded><![CDATA[<p>Dear Paloma, <em>this</em> is the story of the blockchain… in a paragraph of 4 pages.</p><h3>Bitcoin</h3><p>The story starts in 2009, after the subprime crisis. A guy, girl, or a group of person that goes by the pseudonym <strong>Satoshi Nakamoto</strong> started thinking about a way to have <strong>digital</strong> money that does <strong>not depend on a central authority</strong> to be trusted (referring here to the banks and governments, as the subprime crisis was a succession of bad human decisions). Satoshi came up with a protocol and wrote a paper titled:</p><blockquote>Bitcoin: a peer to peer electronic cash system</blockquote><p><a href="https://www.ussc.gov/sites/default/files/pdf/training/annual-national-training-seminar/2018/Emerging_Tech_Bitcoin_Crypto.pdf">https://www.ussc.gov/sites/default/files/pdf/training/annual-national-training-seminar/2018/Emerging_Tech_Bitcoin_Crypto.pdf</a></p><p>To understand the beauty of the protocol, you need to understand why it’s a challenge in the first place.</p><p>Digital money means giving value to something on a computer. But, you have used a computer, Paloma, and you know that any information can be easily duplicated or modified. If a file on your laptop contained the amount of dollars you possess, then nothing could prevent you from opening that file and changing that number: but digital money is… <em>digital</em>, so it has to be stored somewhere on a computer, and therefore we should have a way to prevent people from tampering with that information.</p><h4>The mysterious protocol</h4><p><em>In a nutshell (and simplified)</em>, this is what Satoshi described in the paper.</p><p>The idea is to connect hundreds of thousands, even millions, of computers together in a giant network, and have them <strong>all</strong> keep a <strong>full history</strong> of all the transactions ever made. If person A wants to send x Bitcoins to person B, then that information goes over the whole network and every computer is made aware that A sent x Bitcoins to B.</p><p><em>A and B identities are kept anonymous: all users in the network are represented by an identifier, but nothing can link an identifier to a person in the real world.</em></p><p>Since all computers know that A sent x Bitcoins to B, and since they hold all previous transactions ever made, they can look at all the transactions involving A for instance, to know exactly how much this account should possess now (sum of everything that identifier ever received minus sum of everything that identifier ever sent).</p><h4>Why does that solve the problem?</h4><p>Imagine a person that has a computer that is part of the network decides to cheat, and they change the information in their own history to include a transaction stating they received Bitcoins from someone. All other computers in the network (millions) will have a different transaction history. If 1 vs 1,000,000 have a different history, you can safely assume that the 1 has tempered with their data.</p><p>This is basically how it works. <strong>The information that is present in majority in the network is the one considered as the true information.</strong></p><p>In other words, to change some information in the network, you would need 51% of the network to agree to temper with their history in the same way (this is a real thing, actually called the “<em>51% attack”</em>), but this becomes very unlikely as the number of computers in the network grows.</p><h3>Blockchain</h3><p>You know that network we’ve been talking about? This is the blockchain, Paloma!</p><p>The reason it’s called that way, is because the transactions are not stored one by one in the history, they’re grouped into chunks (then some cryptographic stuff is applied, let’s not go there) and added to the history: essentially forming a chain of blocks, hence the name. The term “blockchain” was not even mentioned in the original paper that Satoshi wrote, but it was later attributed to the whole protocol he described.</p><p><em>The blockchain is born from Bitcoin.</em></p><p>Amazing. Let’s keep going.</p><h3>Decentralised Apps</h3><p>Now, if you think about it, Bitcoin is just one type of information that lives on the blockchain (it’s just a number representing an amount of currency). But really, you could write anything you want in that history: for instance how much you like me :)</p><p><em>The blockchain is essentially a public book anyone can write information to, but once written it can never be deleted, it’s there forever.</em></p><p>You can write new information though, so if you decide that you don’t like me anymore (highly unlikely), you can write it to the blockchain, but it won’t overwrite what you first wrote: history will show that you <em>once</em> liked me…</p><p>Why would you write anything other than bitcoin to the blockchain? Well, you can imagine all sorts of applications, for example to hold a patent on one of your creations. Instead of going to a patent organisation and paying <strong>a lot</strong> for them to keep a record and act as a figure of authority (a source of truth), you can instead write this information to the blockchain. Because nobody can temper with the information over there, you can prove by looking at the date you sent your data that it was you indeed who thought of your creation first. In that sense, the blockchain acts as a source of truth, and it doesn’t require any central authority to take care of it: it regulates itself.</p><p>These sorts of applications are called “decentralised applications” or “de-apps” (decentralised refers to the fact that the information is not stored in one computer, but many).</p><h3>Smart contracts</h3><p>After Bitcoin, many other cryptocurrencies created their own blockchains. One very popular is the Ethereum blockchain. Ethereum paved the way for something amazing which is now the true power of the blockchain. They allowed for people to send pieces of <strong>code</strong> to the blockchain: these are called <strong><em>smart contracts</em></strong>, and they are able to (if you program them to) transfer ether (the Ethereum currency) from one account to another, many other things you can think of, and all the ones you cannot :)</p><p>Once the code is on the blockchain, you can interact with it and it will always get executed in the same way. So you could imagine a smart contract for a <em>universal basic income</em> for example. If the government programmed such a contract to withdraw funds from their own ether account and give that ether to any account who asks the contract for it (under the constraint that they do it once a month) and publish it to the Ethereum blockchain, then that’s it, you have it.</p><p>Why would you publish this on the blockchain as a smart contract rather than having it in any other way? Great question Paloma.</p><p>Imagine someone in the government decides to spookily stop this universal income or withdraw more than once a month from the state’s wallet: it’s impossible (given the contract was programmed correctly). Once the code is on the blockchain, it’s there forever and it’s set to be executed as it was planned to from day 1, and nothing can change that. Of course if you want some flexibility for future changes, you can code it in the contract; but the code is public so anyone can see what the contract is set to do in advance.</p><h3>DAO</h3><p>DAO, since you asked, stands for “decentralised autonomous organisation” (a bit of a superfluous term in my opinion) and it’s supposed to be an organisation that has some parts of its business logic decentralised (via smart contracts for instance), so it’s not entirely controlled by the people of the organisation.</p><h3>Conclusion</h3><p>Paloma, I hope this helped you in some way or another. Hit me up in the comments if you have any questions.</p><p>I hope to see you in October 🌹</p><img src="https://medium.com/_/stat?event=post.clientViewed&referrerSource=full_rss&postId=f6534f7cb642" width="1" height="1" alt="">]]></content:encoded>
        </item>
        <item>
            <title><![CDATA[Solving Non-Linear Differential Equations numerically using the Finite Difference Method]]></title>
            <link>https://medium.com/data-science/solving-non-linear-differential-equations-numerically-using-the-finite-difference-method-1532d0863755?source=rss-c215fdc67eb------2</link>
            <guid isPermaLink="false">https://medium.com/p/1532d0863755</guid>
            <category><![CDATA[physics]]></category>
            <category><![CDATA[mathematics]]></category>
            <category><![CDATA[simulation]]></category>
            <category><![CDATA[differential-equations]]></category>
            <category><![CDATA[python]]></category>
            <dc:creator><![CDATA[Omar Aflak]]></dc:creator>
            <pubDate>Mon, 01 Feb 2021 19:21:00 GMT</pubDate>
            <atom:updated>2025-09-05T22:32:03.878Z</atom:updated>
            <content:encoded><![CDATA[<h3>Solve Non-Linear Differential Equations numerically using the Finite Difference Method</h3><figure><img alt="" src="https://cdn-images-1.medium.com/max/1024/1*qSVjBQJ8gWgti7BkOiBZOw.png" /></figure><h4>UPDATE: Check out the updated version of this article at: <a href="https://omaraflak.com/articles/finite-difference-method">https://omaraflak.com/articles/finite-difference-method</a></h4><p>In this article we will see how to use the finite difference method to solve non-linear differential equations numerically. We will practice on the pendulum equation, taking air resistance into account, and solve it in Python.</p><p>We will find the differential equation of the pendulum starting from scratch, and then solve it. Before we start, we need a little background on Polar coordinates.</p><h3>Polar Coordinates</h3><p>You already know the famous Cartesian coordinates, which are probably the most used in everyday life. However, in some cases, describing the position of an object in Cartesian coordinates isn’t practical. For instance, when an object is in a <strong>circular movement</strong>, sine and cosine functions are going to pop all over the place, so it’s generally a much better idea to describe that object’s position in what we call <strong>Polar coordinates</strong>.</p><figure><img alt="" src="https://cdn-images-1.medium.com/max/1024/1*x7tdAKJPPnMNjMJaiXvlIw.png" /></figure><p>Polar coordinates are described by two variables, the radius <strong><em>ρ</em> </strong>and the angle <strong><em>θ</em></strong>. We attach unit vectors to each variable:</p><ul><li><strong><em>eρ </em></strong>is a unit vector always pointing in the same direction as vector OM.</li><li><strong><em>eθ </em></strong>is a unit vector perpendicular to <strong><em>eρ</em></strong>.</li></ul><p>Our goal now is to express the <strong><em>position</em></strong>, <strong><em>velocity</em></strong>, and <strong><em>acceleration</em></strong> of an object in Polar coordinates. For this we need to express the relationship between the Polar unit vectors and the Cartesian unit vectors.</p><figure><img alt="" src="https://cdn-images-1.medium.com/max/822/1*BO-tw_Vu1NbTb960m5Qlfg.png" /></figure><figure><img alt="" src="https://cdn-images-1.medium.com/max/411/0*V7RauFrGzYy1VCOV.png" /><figcaption>Cartesian to Polar</figcaption></figure><figure><img alt="" src="https://cdn-images-1.medium.com/max/386/0*lMupJb958eSYGXAT.png" /><figcaption>Polar to Cartesian</figcaption></figure><p>All right! Let’s express <strong><em>position</em></strong>, <strong><em>velocity</em></strong>, and <strong><em>acceleration </em></strong>in Polar coordinates.</p><h4>Position</h4><p>This one is quite easy. It’s the whole point of using Polar coordinates!</p><figure><img alt="" src="https://cdn-images-1.medium.com/max/169/0*YN30x5n_hbqV5a3_.png" /><figcaption>Position in Polar coordinates</figcaption></figure><h4><strong><em>Velocity</em></strong></h4><p>We simply differentiate the position with respect to time. We will assume <strong><em>ρ </em></strong>is a constant, and only <strong><em>θ </em></strong>varies over time.</p><figure><img alt="" src="https://cdn-images-1.medium.com/max/248/0*e97A4bNS4BQi3h_Q.png" /><figcaption>Velocity in Polar coordinates</figcaption></figure><h4>Acceleration</h4><p>We differentiate the velocity with respect to time.</p><figure><img alt="" src="https://cdn-images-1.medium.com/max/435/0*CdFCgPK0k1q5km_2.png" /><figcaption>Acceleration in Polar coordinates</figcaption></figure><p>All right! We can now work on our problem: the pendulum.</p><h3>Pendulum Equation</h3><figure><img alt="" src="https://cdn-images-1.medium.com/max/579/1*AtFL7yn6jcCbaSWG0e3RHw.png" /></figure><p>In order to find the equation that the angle <strong><em>θ </em></strong>satisfies, we will use Newton’s second law of motion, or as we call it in French, <em>the fundamental principle of dynamic</em>.</p><figure><img alt="" src="https://cdn-images-1.medium.com/max/266/0*LW8p7Slky3AvkQe3.png" /><figcaption>Newton’s second law of motion</figcaption></figure><p>The sum of all the forces applied to a system is equal to its mass times its acceleration. Let’s enumerate all the forces applied to the pendulum and <strong><em>express them in Polar coordinates</em></strong>.</p><h4>Weight</h4><p>The weight of the object due to gravity is one the forces applied to the object. Its formula is well known, and will be expressed in our coordinate system as:</p><figure><img alt="" src="https://cdn-images-1.medium.com/max/519/0*WPlAIA8Yq6uVbEQo.png" /><figcaption>Weight</figcaption></figure><p>Where <strong><em>m (kg) </em></strong>is the mass of the object, and <strong><em>g (m/s²)</em></strong> is value of the acceleration of gravity — which is about 9.81 on earth.</p><h4>Rope Tension</h4><p>The rope exerts a tension pulling the pendulum in the direction of the rope.</p><figure><img alt="" src="https://cdn-images-1.medium.com/max/175/0*FDyIfDBCyKlHokSQ.png" /><figcaption>Rope tension</figcaption></figure><p>Where <strong><em>R (N) </em></strong>is the rope tension in Newtons.</p><h4>Air Resistance</h4><p>Of course, the air exerts a friction on the pendulum, which will make it stop oscillating at some point. Small air resistance is usually modeled as a force opposite to the velocity vector and proportional to the norm of the velocity vector.</p><figure><img alt="" src="https://cdn-images-1.medium.com/max/205/0*NTsQNEymX54IDQdP.png" /><figcaption>Air resistance</figcaption></figure><p>Where <strong><em>k (kg/s) </em></strong>is the friction coefficient that is specific to the object in movement, and <strong><em>L (m)</em></strong> is the length of the pendulum rope.</p><h4>Newton’s second law of motion</h4><p>We can now apply Newton’s second law of motion:</p><figure><img alt="" src="https://cdn-images-1.medium.com/max/1024/0*CzH3bk9YgGwn6AHr.png" /><figcaption>Newton’s second law of motion applied to the pendulum</figcaption></figure><p>Then project the result on both axes:</p><figure><img alt="" src="https://cdn-images-1.medium.com/max/409/0*6IbS3SVpD6qyZ0RD.png" /><figcaption>Projection</figcaption></figure><p>Reordering the terms of the second equation, we get:</p><figure><img alt="" src="https://cdn-images-1.medium.com/max/308/0*uULbvD-r7pwBouub.png" /><figcaption>Pendulum equation</figcaption></figure><p>Solving this second order non-linear differential equation is <strong><em>very complicated</em></strong>. This is where the Finite Difference Method comes very handy. It will boil down to two lines of Python! Let’s see how.</p><h3>Finite Difference Method</h3><p>The method consists of approximating derivatives numerically using a rate of change with a very small step size.</p><figure><img alt="" src="https://cdn-images-1.medium.com/max/382/0*20bGYBQUco3pz-eT.png" /><figcaption>Derivative — Rate of change</figcaption></figure><p>That is the very definition of what a derivative is. Numerically, if we knew <strong><em>f</em></strong>, we could take a small number <strong><em>h</em></strong> — e.g. 0.0001 — and compute the above formula for a given <strong><em>x</em></strong>, which would give us an approximation of <strong><em>f’(x)</em></strong>.</p><p>The finite difference method simply uses that fact to transform differential equations into ordinary equations.</p><p>In our case, we start by expressing <strong><em>θ’’</em></strong> with respect to <strong><em>θ’</em></strong> using the rate of change.</p><figure><img alt="" src="https://cdn-images-1.medium.com/max/303/0*7TLzqixPh-LRH711.png" /></figure><p>I removed the limit, and wrote <strong><em>dt </em></strong>for us to know that this is supposed to be an infinitesimal value — in practice, just a very small number. We will now plug this equation into the pendulum equation.</p><figure><img alt="" src="https://cdn-images-1.medium.com/max/692/0*4vnJ4ksOuUb6ZPIz.png" /></figure><p>Okay! We managed to express the angular velocity at time <strong><em>t+dt</em></strong> with respect to the angle and angular velocity at time <strong><em>t</em></strong>. In other words, if for instance <strong><em>dt=0.001</em></strong> and if you know <strong><em>θ(0) </em></strong>and<strong><em> θ’(0) </em></strong>(which are the initial conditions of the system), then you can compute <strong><em>θ’(0.001)</em></strong>! If we could also compute <strong><em>θ(0.001) </em></strong>then the recursion is complete and we can compute <strong><em>[θ(t), θ’(t)] </em></strong>for any<strong><em> t </em></strong>starting with known initial conditions.</p><p>Fortunately, there is a way to compute <strong><em>θ(t+dt)</em></strong>:</p><figure><img alt="" src="https://cdn-images-1.medium.com/max/390/0*0aqSKSqOBeG4nAtg.png" /></figure><p>All right! With that equation in hand we can also compute the angle at time <strong><em>t+dt</em></strong> given the angle and the angular velocity at time <strong><em>t</em></strong>.</p><p>Using these two equations we can now compute the angle theta at any time step!</p><figure><img alt="" src="https://cdn-images-1.medium.com/max/594/0*1Cb-MZa7HhlWKQFV.png" /></figure><p>Indeed, given <strong><em>[θ(0), θ’(0)] </em></strong>you can compute <strong><em>[θ(dt), θ’(dt)]. </em></strong>Given <strong><em>[θ(dt), θ’(dt)] </em></strong>you can compute<strong><em> [θ(2dt), θ’(2dt)]</em></strong>, and so on.</p><p>Let’s put all this to use in a Python program!</p><h3>Python Code</h3><iframe src="" width="0" height="0" frameborder="0" scrolling="no"><a href="https://medium.com/media/ad29c9483ede60c16266729600b9e16e/href">https://medium.com/media/ad29c9483ede60c16266729600b9e16e/href</a></iframe><p>We iteratively compute <strong><em>θ(t) </em></strong>and <strong><em>θ’(t)</em></strong> using the formulas we found, and put the results in two separate lists. Hopefully the code is understandable, but feel free to drop a comment if you have any question.</p><p>Running the code will produce the following plot.</p><figure><img alt="" src="https://cdn-images-1.medium.com/max/1024/1*76zJ8fPas-iNfks2T6x3GQ.png" /></figure><p>Two happy observations:</p><ul><li>The angular velocity seems to reach extremums when the angle is zero, which makes sens since this is where the pendulum has accumulated all its inertia and is about to slow down because it’s going up.</li><li>The angular velocity seems to reach zero when the angle reaches an extremum, which makes sens since this is when the pendulum is slowing down and is about to go in the other direction.</li></ul><p>Playing with the code a little, you might want to set the initial velocity to <strong><em>2π </em></strong>for instance.</p><figure><img alt="" src="https://cdn-images-1.medium.com/max/1024/1*jRbCG6xzbdoJ953HhZFOjg.png" /></figure><p>Notice how the angle keep increasing before going down. What happened is that the initial velocity was high enough to make the pendulum make a full spin before entering the usual oscillation!</p><p>One last thing… You can try to increase <strong><em>dt</em></strong> and see how this affects the simulation. Hopefully you’ve understood that a smaller <strong><em>dt</em></strong> means more accurate results. Let’s see what happens for <strong><em>N=3,2,1</em></strong>.</p><figure><img alt="" src="https://cdn-images-1.medium.com/max/1024/1*PyGQzJMovfBjO_wFK-A8lw.png" /></figure><p>I was actually surprised to see that for only 3 points per second (and even 2), we still manage to get the general shape of the solution. N=1 is another story…</p><h3>Conclusion</h3><p>In this article we have seen how to use the finite difference method to solve differential equations (even non-linear) and we applied it to a practical example: the pendulum. This technique also works for partial differential equations, a well known case is the heat equation.</p><img src="https://medium.com/_/stat?event=post.clientViewed&referrerSource=full_rss&postId=1532d0863755" width="1" height="1" alt=""><hr><p><a href="https://medium.com/data-science/solving-non-linear-differential-equations-numerically-using-the-finite-difference-method-1532d0863755">Solving Non-Linear Differential Equations numerically using the Finite Difference Method</a> was originally published in <a href="https://medium.com/data-science">TDS Archive</a> on Medium, where people are continuing the conversation by highlighting and responding to this story.</p>]]></content:encoded>
        </item>
        <item>
            <title><![CDATA[Ray Tracing From Scratch in Python]]></title>
            <link>https://omaraflak.medium.com/ray-tracing-from-scratch-in-python-41670e6a96f9?source=rss-c215fdc67eb------2</link>
            <guid isPermaLink="false">https://medium.com/p/41670e6a96f9</guid>
            <category><![CDATA[game-design]]></category>
            <category><![CDATA[game-development]]></category>
            <category><![CDATA[python]]></category>
            <category><![CDATA[mathematics]]></category>
            <category><![CDATA[computer-graphics]]></category>
            <dc:creator><![CDATA[Omar Aflak]]></dc:creator>
            <pubDate>Sun, 26 Jul 2020 15:50:17 GMT</pubDate>
            <atom:updated>2025-09-18T16:35:30.054Z</atom:updated>
            <content:encoded><![CDATA[<h4>Create a computer-generated image using the Ray Tracing algorithm coded from scratch in Python.</h4><figure><img alt="" src="https://cdn-images-1.medium.com/max/1024/1*GFLF03PB7Uo5NLo5G5oylQ.png" /><figcaption>fig. 1 — computer-generated image</figcaption></figure><h4><strong>UPDATE: Check out the updated version of this article at: </strong><a href="https://omaraflak.com/articles/ray-tracing"><strong>https://omaraflak.com/articles/ray-tracing</strong></a></h4><p>In this post I will give you a glimpse of what computer graphics algorithms may look like. I will explain the <em>ray tracing</em> algorithm and show a simple implementation in Python.</p><p>By the end of this article you’ll be able to make a program that will generate the above image, without making use of any fancy graphic library! Only <em>NumPy</em>. Isn’t it crazy?! Let’s dive in!</p><p><em>P.S. This article is by no mean a complete guide / explanation of ray tracing, since this is such a vast subject, but rather an introduction for curious people :)</em></p><h3>Prerequisites</h3><p>We only need very basic vector geometry.</p><ul><li>If you have 2 points A and B — whatever the dimensionality: 1, 2, 3, …, n — then a vector that goes from A to B can be found by computing B — A (element-wise);</li><li>The length of a vector — whatever the dimensionality — can be found by computing the square root of the sum of the squared components. The length of a vector v is denoted ||v||;</li><li>A <em>unit-vector</em> is a vector of length 1: ||v|| = 1;</li><li>Given a vector, another vector that points to the same direction but with a length of 1 can be found by dividing each component of the first vector by its length — this is called normalization: u = v / ||v||;</li><li>Dot product for vectors. Specifically: &lt;v, v&gt; = ||v||²;</li><li>Solving a quadratic equation;</li><li>A bit of patience and imagination;</li></ul><h3>Ray Tracing Algorithm</h3><p>In effect, <em>ray tracing</em> is a <strong>rendering</strong> technique that simulates the <strong>path of light </strong>and <strong>intersections with objects</strong> and is able to produce images with a high degree of realism. More optimized variations of this algorithm are actually used in video games!</p><p>To explain the algorithm we need to setup a <strong>scene</strong>:</p><ul><li>We need a <strong>3D space</strong> (that simply means we’re going to use 3 coordinates to position objects in space);</li><li>We need <strong>objects</strong> in that space (since we’re going to reproduce fig. 1, imagine spheres);</li><li>We need a source of <strong>light</strong> (this is going to be a single point emitting light in all directions, so in essence a single position);</li><li>We need an “eye” or a <strong>camera</strong> to observe the scene (again, simply a position);</li><li>Since the camera could be looking anywhere really, we need a <strong>screen </strong>through which the camera will be observing the objects (4 positions for the four corners of a rectangular screen);</li></ul><figure><img alt="" src="https://cdn-images-1.medium.com/max/1024/1*u4eQPZVcbCG6zYglUqu9AQ.jpeg" /><figcaption>fig. 2</figcaption></figure><p><em>A word about the </em><strong><em>screen</em></strong><em>: </em>the screen is going to occupy a certain amount of space that you will define (could be a 3x2 rectangle for instance). But 3 and 2 don’t really mean anything alone. They do mean something when you compare them to the sizes of other objects, they are relative. What’s important here, is how you will split that rectangle into smaller squares (pixels), akin the figure above. This is going to determine the size of the final image. In other words, you can create a 3x2 rectangle and split it into 300x200 pixels, that will work just fine.</p><p>Given the <strong>scene</strong>, this is the ray tracing algorithm:</p><pre>for each pixel <strong><em>p</em></strong>(x,y,z) of the screen:<br>    associate a black color to <strong><em>p</em></strong><br>    if the ray (line) that starts at <strong><em>camera</em></strong> and goes towards <strong><em>p</em></strong> intersects any object of the scene then:<br>        calculate the <strong><em>intersection point</em></strong> to the nearest object<br>        if there is no object of the scene in-between the <strong><em>intersection point</em></strong> and the <strong><em>light</em></strong> then:<br>            calculate the color of the <strong><em>intersection point</em></strong><br>            associate the color of the <strong><em>intersection point</em></strong> to <strong><em>p</em></strong></pre><figure><img alt="" src="https://cdn-images-1.medium.com/max/1024/1*unARAf_8oLsXAQVp3obY0Q.jpeg" /><figcaption>fig. 3</figcaption></figure><p><em>Note that this process is actually the reverse process of real-life illumination. In reality, light comes out of the source in all directions, bounce on objects and hits your eye. However, since not all rays coming out the light source will end up in your eye, ray tracing does the reverse process to save computation time (trace rays from the eye back to the light source).</em></p><p>This is all purely geometrical, the only thing I didn’t explain is how to calculate the color of the intersection point. This is isn’t necessary right now, so I will explain it later. Just know there exist physical models that describe how objects are illuminated when light strikes on them with a certain angle, intensity, etc.</p><p>At the end of the algorithm we will have filled the <strong><em>screen</em></strong> with the correct colors, and we can just save it as an image.</p><h3>Setup the scene</h3><p>Before starting to code, we need to setup a scene. For now we will decide where the <strong><em>camera</em></strong> and the <strong><em>screen</em></strong> are located. For our purpose, we will make things simple by aligning them with the unit axes.</p><figure><img alt="" src="https://cdn-images-1.medium.com/max/1024/1*3S4l3TIqjr6dGlOzTc95_w.jpeg" /><figcaption>fig.4 — scene</figcaption></figure><p>Hence, the <strong><em>camera</em></strong> is located at the point <strong>(x=0, y=0, z=1) </strong>and the <strong><em>screen</em></strong> is part of the plane formed by the <strong><em>x</em></strong> and <strong><em>y</em></strong> axes. With that being set up, we can already write the skeleton of our code.</p><iframe src="" width="0" height="0" frameborder="0" scrolling="no"><a href="https://medium.com/media/7d6480628ba0b76d024ce95a68ba47e1/href">https://medium.com/media/7d6480628ba0b76d024ce95a68ba47e1/href</a></iframe><ul><li>The <strong><em>camera </em></strong>is just a position, 3 coordinates;</li><li>The <strong><em>screen </em></strong>on the other hand is defined by four numbers (or two points): <em>left, top, right, bottom</em>. It ranges from -1 to 1 in the <strong><em>x </em></strong>direction (this is arbitrary), and ranges from -1 / ratio to 1 / ratio in the <strong><em>y</em></strong> direction, where <em>ratio</em> is image width / image height. The reason for this is simple: we want the <strong><em>screen</em></strong> to have the same aspect ratio than the actual image we want to produce. Setting up the <strong><em>screen</em></strong> this way will produce an aspect ratio of (<em>width</em> over <em>height</em>): 2 / (2 / ratio) = ratio which is the ratio of the desired image of 300x200;</li><li>Finally, the loop consists of splitting the screen into width and height points in the <strong><em>x</em></strong> and <strong><em>y</em></strong> directions respectively, then computing the color of the current pixel;</li></ul><p>You can actually run that code and it will produce — as expected for now — a black image. If you look back at the pseudo-code then this is what we accomplished.</p><pre>✅ for each pixel <strong><em>p</em></strong>(x,y,z) of the screen:<br>✅    associate a black color to <strong><em>p</em></strong><br>    if the ray (line) that starts at <strong><em>camera</em></strong> and goes towards <strong><em>p</em></strong> intersects any object of the scene then:<br>        calculate the <strong><em>intersection point</em></strong> to the nearest object<br>        if there is no object of the scene in-between the <strong><em>intersection point</em></strong> and the <strong><em>light</em></strong> then:<br>            calculate the color of the <strong><em>intersection point</em></strong><br>            associate the color of the <strong><em>intersection point</em></strong> to <strong><em>p</em></strong></pre><h3>Ray intersection</h3><p>The next step of the algorithm is: if the ray (line) that starts at <strong><em>camera</em></strong> and goes towards <strong><em>p</em></strong> intersects any object of the scene then.</p><p>Let’s break it down into two parts. First, what is the ray (line) that starts at <strong><em>camera</em></strong> and goes towards <strong><em>p </em></strong>?</p><h4>Ray definition</h4><p>We say “<em>ray</em>” but that’s really just another word for “<em>line</em>”. In general, whenever you code something that is geometrical, you should prefer vectors over actual line equations, they really are easier to work with and are much less prone to errors such as division by zero.</p><p>So, since the <em>ray </em>starts at the <strong><em>camera</em></strong> and goes in the direction of the currently targeted <strong><em>pixel</em></strong>, we can define a unit-vector that points to a similar direction. Therefore, we define a “<em>ray that starts at </em><strong><em>camera</em></strong><em> and goes towards </em><strong><em>pixel</em></strong><em>”</em> as the following equation:</p><figure><img alt="" src="https://cdn-images-1.medium.com/max/524/0*0oQdtXIiWTbIcCgP.png" /><figcaption>eq. 1 — ray</figcaption></figure><p>Remember, <strong><em>camera</em></strong> and <strong><em>pixel</em></strong> are 3D-points. For t=0 you end up at the <strong><em>camera</em></strong> position, and the more you increase t the further away you get from the <strong><em>camera</em></strong> in the direction of the <strong>pixel</strong>. This is a parametric equation, that yields a <strong>point</strong> along the line for a given t.</p><p>Of course, there is nothing special about <strong><em>camera</em></strong> or <strong><em>pixel</em></strong>, we can similarly define a ray that starts at origin (O) and goes towards destination (D) as:</p><figure><img alt="" src="https://cdn-images-1.medium.com/max/327/0*_ZYaab5Sq9mzdW59.png" /><figcaption>eq. 2 — ray</figcaption></figure><p>For convenience, we define <strong><em>d</em></strong> as the <strong><em>direction</em></strong> vector.</p><p>We can now complete the code and add the computation of the ray.</p><iframe src="" width="0" height="0" frameborder="0" scrolling="no"><a href="https://medium.com/media/2820feca428fd54d50476c20a6377b58/href">https://medium.com/media/2820feca428fd54d50476c20a6377b58/href</a></iframe><ul><li>We’ve added the normalize(vector) function that returns a… normalized vector;</li><li>We’ve added the computation of origin and direction which both together define a ray. Notice that <strong><em>pixel</em></strong> has z=0 since it lies on the screen which is contained in the plane formed by the <strong><em>x</em></strong> and <strong><em>y</em></strong> axes;</li></ul><p>Now we get to the second part which is intersects any object of the scene then. That is basically the “hard” part. The computation is going to be different for each type of objects we will dealing with (spheres, planes, triangles, etc.). For the sake of simplicity, we will only render spheres. So for the next part we will see:</p><ul><li>How we define a sphere;</li><li>How we compute the <em>intersection point</em> between a ray and a sphere, if it exists;</li></ul><h4>Sphere definition</h4><p>A sphere is actually a pretty simple mathematical object to define. <em>A sphere is defined as the set of points that are all at the same distance r (radius) from a given point (center)</em>.</p><p>Therefore, given the center <strong><em>C</em></strong> of a sphere, and its radius <strong><em>r</em></strong>, an arbitrary point <strong><em>X</em></strong> lies on the sphere if and only if:</p><figure><img alt="" src="https://cdn-images-1.medium.com/max/178/0*-wqpIEGQa7Fr4-s2.png" /><figcaption>eq. 3— sphere equation</figcaption></figure><p>For convenience, we square both sides to get rid of the square root caused by the magnitude of X — C.</p><figure><img alt="" src="https://cdn-images-1.medium.com/max/203/0*ZIe6srUIFVxcCtWf.png" /><figcaption>eq. 4— sphere intersection</figcaption></figure><p>We can already define some spheres just after the <em>screen</em> declaration.</p><iframe src="" width="0" height="0" frameborder="0" scrolling="no"><a href="https://medium.com/media/0f95485eef55d3fe15cf04d60ac8b82c/href">https://medium.com/media/0f95485eef55d3fe15cf04d60ac8b82c/href</a></iframe><p>Now let’s compute the intersection between a ray and a sphere.</p><h4>Sphere Intersection</h4><p>We know the ray equation, and we know what condition a point must satisfy so that it lays on a sphere. All we have to do is plug eq. 2 into eq. 4 and solve for t. Which means, answering the question: <em>for which </em><em>t, </em><em>ray(t) will be on the sphere ?</em></p><figure><img alt="" src="https://cdn-images-1.medium.com/max/1024/0*eEzDJT0Ccvg-LA0M.png" /><figcaption>eq. 5 — sphere intersection</figcaption></figure><p>This is an ordinary quadratic equation that we can solve for t. We will call the coefficients associated with t², t¹, t⁰ <strong><em>a</em></strong>, <strong><em>b</em></strong>, and <strong><em>c</em></strong> respectively. Let’s calculate the discriminant of that equation:</p><figure><img alt="" src="https://cdn-images-1.medium.com/max/251/0*ZNi2SL0eWot_HZmK.png" /><figcaption>eq. 6 — discriminant</figcaption></figure><p>Since <strong><em>d </em></strong>(direction) is a unit-vector, we have a=1. Once we calculate the discriminant of that equation, there are 3 possibilities:</p><figure><img alt="" src="https://cdn-images-1.medium.com/max/1024/1*r49zBhQBtG45W2n4EPI1Nw.jpeg" /><figcaption>fig. 5— sphere discriminant</figcaption></figure><p>We will only use the <strong>third</strong> case to detect intersections. Here’s a function that can detect intersections between a ray and a sphere. It will return t the distance from the <strong><em>origin of the ray</em></strong> to the <strong><em>nearest intersection point</em></strong> if the ray actually intersects the sphere, and it will return None otherwise.</p><iframe src="" width="0" height="0" frameborder="0" scrolling="no"><a href="https://medium.com/media/5043983dc3b29c548d71d7ab6ce52bc0/href">https://medium.com/media/5043983dc3b29c548d71d7ab6ce52bc0/href</a></iframe><p>Notice that we only return the <strong>nearest</strong> intersection (because there are 2) only when both t1 and t2 are positive. This is because a t that solves the equation could be negative, but it would mean that the ray that intersects the sphere doesn’t have <strong><em>d </em></strong>as a <em>direction</em> vector, but <strong><em>-d </em></strong>(for instance if the sphere is behind the <em>camera</em> and the <em>screen</em>).</p><h4>Nearest intersected object</h4><p>All right, so far so good, but we still haven’t completed the instruction from the pseudo-code which was: if the ray (line) that starts at <strong><em>camera</em></strong> and goes towards <strong><em>p</em></strong> intersects any object of the scene then[...]. Good news is, we can do this and the next instruction in one strike! The next instruction is: calculate the <strong><em>intersection point</em></strong> to the nearest object.</p><p>We can easily create a function that uses sphere_intersect() to find the nearest object that a ray intersects, if it exists. We simply loop over all the spheres, search for intersections, and keep the nearest sphere.</p><iframe src="" width="0" height="0" frameborder="0" scrolling="no"><a href="https://medium.com/media/3a406f6386e91f15c89b9f7273e7b9e3/href">https://medium.com/media/3a406f6386e91f15c89b9f7273e7b9e3/href</a></iframe><p>When calling the function, if nearest_object is None then there is no object intersected by the ray, otherwise its value is the <strong>nearest</strong> <strong>intersected</strong> <strong>object</strong> and we get min_distance, the distance from the <em>ray origin</em> to the <em>intersection point</em>.</p><h4>Intersection point</h4><p>In order to compute the <strong><em>intersection point</em></strong>, we use the previous function:</p><pre>nearest_object, distance = nearest_intersected_object(objects, o, d)<br>if nearest_object:<br>    intersection_point = o + d * distance</pre><p>Hooray! We’ve completed the second and the third instructions. This is the code we have until now:</p><iframe src="" width="0" height="0" frameborder="0" scrolling="no"><a href="https://medium.com/media/0ac3125dac39d11d81aafea4a764d34c/href">https://medium.com/media/0ac3125dac39d11d81aafea4a764d34c/href</a></iframe><pre>✅ for each pixel <strong><em>p</em></strong>(x,y,z) of the screen:<br>✅    associate a black color to <strong><em>p</em></strong><br>✅    if the ray (line) that starts at <strong><em>camera</em></strong> and goes towards <strong><em>p</em></strong> intersects any object of the scene then:<br>✅        calculate the <strong><em>intersection point</em></strong> to the nearest object<br>        if there is no object of the scene in-between the <strong><em>intersection point</em></strong> and the <strong><em>light</em></strong> then:<br>            calculate the color of the <strong><em>intersection point</em></strong><br>            associate the color of the <strong><em>intersection point</em></strong> to <strong><em>p</em></strong></pre><h4>Light intersection</h4><p>So far, we know if there is a straight line that goes from the camera/eye to an object, and we know which object this is, as well as exactly what part of the object we’re looking at. What we don’t know yet is if that specific point is illuminated at all! Maybe the light isn’t striking on that particular point, and so there is no need to go further because we cannot see it. Therefore, the next step is to check if there is no object of the scene in-between the <strong><em>intersection point</em></strong> and the <strong><em>light</em></strong><em>.</em></p><p>Fortunately, we already have a function to help us: nearest_intersected_object(). Indeed, we want to know if the ray that starts at the <strong><em>intersection point</em></strong> and goes towards the <strong><em>light</em></strong> is intersecting an object of the scene <em>before</em> crossing the light. This is practically the same task as previously, we just need to change the ray origin and direction. First, we need to define a light. You can add this near the objects declaration:</p><pre>light = { &#39;position&#39;: np.array([5, 5, 5]) }</pre><p>To check if an object is shadowing the intersection point, we have to pass the ray that starts at the <strong><em>intersection point </em></strong>and goes towards the <strong><em>light</em></strong>, and see if the nearest object returned is actually closer than the light to the intersection point (in other words, in between).</p><iframe src="" width="0" height="0" frameborder="0" scrolling="no"><a href="https://medium.com/media/bad83c1a583fa1313d2596ee07303f90/href">https://medium.com/media/bad83c1a583fa1313d2596ee07303f90/href</a></iframe><p>Looks neat, doesn’t it ? Well this is <strong>not</strong> going to work… We need to make a slight adjustment. If we use the <strong><em>intersection point </em></strong>as the origin of the new ray we might end up detecting the sphere where we currently stand as an object in between the intersection point and the light. A quick and widely used fix for that problem is to take a little step that gets us away from the surface of the sphere. We generally use a normal vector to the surface and take a little step in that direction.</p><figure><img alt="" src="https://cdn-images-1.medium.com/max/780/1*eHQusfhNcmrMATZ4nUinKw.jpeg" /><figcaption>fig. 6 — sphere normal step</figcaption></figure><p><em>This trick isn’t used only for spheres, but for any kind object.</em></p><p>Therefore, the correct code is:</p><iframe src="" width="0" height="0" frameborder="0" scrolling="no"><a href="https://medium.com/media/818455d956ebf9bd323c1809d8fc1040/href">https://medium.com/media/818455d956ebf9bd323c1809d8fc1040/href</a></iframe><pre>✅ for each pixel <strong><em>p</em></strong>(x,y,z) of the screen:<br>✅    associate a black color to <strong><em>p</em></strong><br>✅    if the ray (line) that starts at <strong><em>camera</em></strong> and goes towards <strong><em>p</em></strong> intersects any object of the scene then:<br>✅        calculate the <strong><em>intersection point</em></strong> to the nearest object<br>✅        if there is no object of the scene in-between the <strong><em>intersection point</em></strong> and the <strong><em>light</em></strong> then:<br>            calculate the color of the <strong><em>intersection point</em></strong><br>            associate the color of the <strong><em>intersection point</em></strong> to <strong><em>p</em></strong></pre><h3>Blinn-Phong reflection model</h3><p>This is it, the last part. We know a light beam has stroke the object, and the reflection of the beam got straight into the <strong><em>camera</em></strong>. The question is: What does the camera see ? This is what the Blinn-Phong model attempts to answer.</p><p><strong><em>FYI</em></strong>: The <a href="https://en.wikipedia.org/wiki/Blinn%E2%80%93Phong_reflection_model"><strong>Blinn-Phong</strong></a> model is an approximation to the <a href="https://en.wikipedia.org/wiki/Phong_reflection_model"><strong>Phong</strong></a> model that is less computationally intensive.</p><p>According to this model, any material has 4 properties:</p><ul><li><strong>Ambient</strong> <strong>color</strong>: color that an object is suppose to have in absence of light. It’s hard to imagine, since we only see objects when light strikes on them, but generally this is a dim color tinted with the actual color you imagine;</li><li><strong>Diffuse color</strong>:<strong> </strong>color that is the closest to what we think of when we say “color”;</li><li><strong>Specular color</strong>:<strong> </strong>color of the shiny part of an object when light has stroke on it. Most of the time this is white;</li><li><strong>Shininess</strong>: a coefficient representing how shiny an object is;</li></ul><p><strong><em>Note:</em></strong><em> All colors are </em><strong><em>RGB</em></strong><em> representations in the range </em><strong><em>0–1</em></strong><em>.</em></p><figure><img alt="" src="https://cdn-images-1.medium.com/max/963/0*rL6ZqKcczS9GyemA.png" /><figcaption>Phong reflection model — Wikipedia</figcaption></figure><p>So every object of the scene must have these 4 properties. Let’s add them to the spheres.</p><iframe src="" width="0" height="0" frameborder="0" scrolling="no"><a href="https://medium.com/media/5fc7b3d93d236003ad43a2e5d99e3736/href">https://medium.com/media/5fc7b3d93d236003ad43a2e5d99e3736/href</a></iframe><p>In this example, the spheres are red, magenta, and green respectively.</p><p>The Blinn-Phong model states that light also has the three color properties: ambient, diffuse and specular. Let’s add them too.</p><iframe src="" width="0" height="0" frameborder="0" scrolling="no"><a href="https://medium.com/media/86034daf3ca6dc89ae8b5c7f78b52247/href">https://medium.com/media/86034daf3ca6dc89ae8b5c7f78b52247/href</a></iframe><p>Given these properties, the Blinn-Phong model calculates the illumination of a point as follows:</p><figure><img alt="" src="https://cdn-images-1.medium.com/max/1024/0*i-KIXwZcSkzk9qjn.png" /><figcaption>eq. 7 — Blinn-Phong model</figcaption></figure><p>where,</p><ul><li>ka, kd, ks are the <em>ambient, diffuse, specular</em> properties of the <strong>object</strong>;</li><li>ia, id, is are the <em>ambient, diffuse, specular</em> properties of the <strong>light</strong>;</li><li>L is a direction <strong>unit vector</strong> from the <strong><em>intersection point</em></strong> towards the <strong><em>light</em></strong>;</li><li>N is the <strong>unit</strong> <strong>normal</strong> <strong>vector</strong> to the surface of the object at the <em>intersection point</em>;</li><li>V is a direction <strong>unit vector </strong>from the <strong><em>intersection point</em></strong> towards the <strong><em>camera</em></strong>;</li><li>α is the <strong>shininess</strong> of the object;</li></ul><iframe src="" width="0" height="0" frameborder="0" scrolling="no"><a href="https://medium.com/media/d18874b912a481783dc373e9c2963b6d/href">https://medium.com/media/d18874b912a481783dc373e9c2963b6d/href</a></iframe><p>Notice that at the end, we bound the color between <strong><em>0</em></strong> and <strong><em>1 </em></strong>to make sure it’s in the correct range.</p><pre>✅ for each pixel <strong><em>p</em></strong>(x,y,z) of the screen:<br>✅    associate a black color to <strong><em>p</em></strong><br>✅    if the ray (line) that starts at <strong><em>camera</em></strong> and goes towards <strong><em>p</em></strong> intersects any object of the scene then:<br>✅        calculate the <strong><em>intersection point</em></strong> to the nearest object<br>✅        if there is no object of the scene in-between the <strong><em>intersection point</em></strong> and the <strong><em>light</em></strong> then:<br>✅            calculate the color of the <strong><em>intersection point</em></strong><br>✅            associate the color of the <strong><em>intersection point</em></strong> to <strong><em>p</em></strong></pre><h3><strong>Run the code!</strong></h3><p>Increase <strong><em>width</em></strong> and <strong><em>height</em></strong> for a higher resolution (at the cost of your time).</p><figure><img alt="" src="https://cdn-images-1.medium.com/max/1024/1*e16y4L0CuYhqDDxLtcDyHQ.png" /><figcaption>fig. 5 — first result</figcaption></figure><p>Wow that’s cool! However, you may notice 2 things that differ from the first image I’ve shown at the beginning. Go ahead, take a look back.</p><ul><li>The grey floor is missing;</li><li>There are no reflections (mirror effect) in this picture;</li></ul><p>Let’s address these two points.</p><h3>Fake plane</h3><p>Ideally we would create another type of object, a plane, but because we’re lazy we can simply use another sphere. <em>How ?</em> Well, if you’re standing on a sphere that has an infinitely large radius (compared to your size), then you’ll feel like you’re standing on a flat surface. Just like earth :)</p><p>Add this sphere to your list of objects, and render again!</p><pre>{ &#39;center&#39;: np.array([0, -9000, 0]), &#39;radius&#39;: 9000 - 0.7, &#39;ambient&#39;: np.array([0.1, 0.1, 0.1]), &#39;diffuse&#39;: np.array([0.6, 0.6, 0.6]), &#39;specular&#39;: np.array([1, 1, 1]), &#39;shininess&#39;: 100 }</pre><h3>Reflection</h3><p>Right now, we render rays that: come out the light source, hit the surface of an object, then directly bounce towards the camera. What if the ray hits multiple objects before hitting the camera ? This is reflection. The ray will accumulate different colors and when it strikes the camera you will see reflections. Let’s do it.</p><p>Each object has a <strong>reflection coefficient</strong> in the range <strong><em>0–1</em></strong><em>. </em>“0” means the object is matte, “1” means the object is like a mirror. Let’s add a reflection property to all the spheres:</p><pre>{ &#39;center&#39;: np.array([-0.2, 0, -1]), ..., &#39;reflection&#39;: 0.5 }<br>{ &#39;center&#39;: np.array([0.1, -0.3, 0]), ..., &#39;reflection&#39;: 0.5 }<br>{ &#39;center&#39;: np.array([-0.3, 0, 0]), ..., &#39;reflection&#39;: 0.5 }<br>{ &#39;center&#39;: np.array([0, -9000, 0]), ..., &#39;reflection&#39;: 0.5 }</pre><h4>Algorithm</h4><p>Currently, we compute a ray that starts at the <strong><em>camera</em></strong> and goes towards a <strong><em>pixel</em></strong><em>,</em><strong><em> </em></strong>then we trace that ray into the scene, check for the nearest intersection and compute the intersection point color.</p><p>In order to include reflections, we need to trace the <strong>reflected ray</strong> after an intersection happen and include the color contribution of each intersection point. We repeat that process some number of time (to define).</p><figure><img alt="" src="https://cdn-images-1.medium.com/max/1024/1*xV8qXv64kbf10drdvlmpog.jpeg" /><figcaption>fig. 6 — reflection</figcaption></figure><h4>Color computation</h4><p>In order to get the color of a pixel, we need to sum the contribution of each intersected point by the ray.</p><figure><img alt="" src="https://cdn-images-1.medium.com/max/507/0*TvIgxb607u1zYeyv.png" /><figcaption>eq. 8 — color computation</figcaption></figure><p>where,</p><ul><li>c is the (final) color of a pixel;</li><li>i is the illumination computed by the Blinn-Phong model of the <em>#index</em> intersection point;</li><li>r is the reflection of the <em>#index</em> intersected object;</li></ul><p>Then it’s up to you to decide when to stop computing that sum (i.e. when to stop tracing reflected rays).</p><h4>Reflected ray</h4><p>Before we’re able to code this, we need to find the reflected ray direction. We can compute a reflected ray the following way:</p><figure><img alt="" src="https://cdn-images-1.medium.com/max/273/0*Yl7qDqa8CH_qSt7r.png" /><figcaption>eq. 9 — reflection</figcaption></figure><figure><img alt="" src="https://cdn-images-1.medium.com/max/781/1*EFzc6YK1ri1QROOon8Ipgw.jpeg" /><figcaption>fig. 7— reflection diagram</figcaption></figure><p>where,</p><ul><li>R is the normalized reflected ray;</li><li>V is a direction <strong>unit vector</strong> of the ray to be reflected;</li><li>N is the direction <strong>unit vector </strong><em>normal</em> to the surface the ray stroke;</li></ul><p>Add this method at the top of the file along with the normalize() function:</p><iframe src="" width="0" height="0" frameborder="0" scrolling="no"><a href="https://medium.com/media/cd229bb5a3b6fa4dc53785aff5198f25/href">https://medium.com/media/cd229bb5a3b6fa4dc53785aff5198f25/href</a></iframe><h4>Code</h4><p>Time to code this. It’s actually a small change at the end. Simply make the following changes:</p><iframe src="" width="0" height="0" frameborder="0" scrolling="no"><a href="https://medium.com/media/48dd52e2200e60e43624027b70cb280f/href">https://medium.com/media/48dd52e2200e60e43624027b70cb280f/href</a></iframe><p><strong><em>Important: </em></strong><em>Now that we have put the intersection code in another loop for reflection, we should use </em><em>break statements where we previously used </em><em>continue statements, in order to avoid useless computations.</em></p><p>That’s it! Run the code and observe the beautiful result!</p><figure><img alt="" src="https://cdn-images-1.medium.com/max/1024/1*GFLF03PB7Uo5NLo5G5oylQ.png" /></figure><h3>Final Code</h3><p>The final code is surprisingly small, about a hundred lines of code!</p><iframe src="" width="0" height="0" frameborder="0" scrolling="no"><a href="https://medium.com/media/0235108288c76013713fcc5cc7b897e2/href">https://medium.com/media/0235108288c76013713fcc5cc7b897e2/href</a></iframe><h3>What’s next ?</h3><p>This was a very simplistic program that was meant to educate on the subject. There are so many ways to improve this and implement other fascinating functionalities. Here are some of them:</p><ul><li>OOP! Right now we’ve put all the objects in a dict, but you could make classes, figure out what’s specific to spheres and what’s not, make a base class, and implement other objects such as planes or triangles;</li><li>Same thing goes for light. Add some POO here and make it so you can add multiple lights in the scene;</li><li>Separate the material properties from the geometrical properties, to be able to apply one material (e.g. blue matte) to any type of objects;</li><li>Figure out a way to position the screen correctly given any camera position and a direction to look at;</li><li>Model the light differently. Currently it’s a single point, which is why the shadows of objects are “hard” or well defined. In order to get “soft” shadows (with a gradient basically), you need to model a light like a 2d or 3d object: disk or sphere?</li></ul><h3>Bonus</h3><p>Here’s an animation I made with ray tracing. I simply rendered the scene several times with the camera at different positions.</p><iframe src="https://cdn.embedly.com/widgets/media.html?src=https%3A%2F%2Fwww.youtube.com%2Fembed%2FklVCeTXNX2M%3Ffeature%3Doembed&amp;display_name=YouTube&amp;url=https%3A%2F%2Fwww.youtube.com%2Fwatch%3Fv%3DklVCeTXNX2M&amp;image=https%3A%2F%2Fi.ytimg.com%2Fvi%2FklVCeTXNX2M%2Fhqdefault.jpg&amp;key=a19fcc184b9711e1b4764040d3dc5c07&amp;type=text%2Fhtml&amp;schema=youtube" width="854" height="480" frameborder="0" scrolling="no"><a href="https://medium.com/media/07003a129238435e06b22134c7f5d2fd/href">https://medium.com/media/07003a129238435e06b22134c7f5d2fd/href</a></iframe><p>The code is in Kotlin (you’ll notice then how much python is slow…) and available on GitHub if you’re interested.</p><p><a href="https://github.com/OmarAflak/RayTracer-Kotlin">OmarAflak/RayTracer-Kotlin</a></p><h3>Conclusion</h3><p>Congratulation if you made so far! I hope you enjoyed this fascinating subject and don’t hesitate to comment for any question. For further readings on that matter I would highly advise the following website:</p><p><a href="https://www.scratchapixel.com/">Scratchapixel</a></p><p>Cheers !</p><img src="https://medium.com/_/stat?event=post.clientViewed&referrerSource=full_rss&postId=41670e6a96f9" width="1" height="1" alt="">]]></content:encoded>
        </item>
        <item>
            <title><![CDATA[Bézier Interpolation]]></title>
            <link>https://omaraflak.medium.com/b%C3%A9zier-interpolation-8033e9a262c2?source=rss-c215fdc67eb------2</link>
            <guid isPermaLink="false">https://medium.com/p/8033e9a262c2</guid>
            <category><![CDATA[python]]></category>
            <category><![CDATA[graphic-design]]></category>
            <category><![CDATA[mathematics]]></category>
            <category><![CDATA[bezier-curves]]></category>
            <category><![CDATA[interpolation]]></category>
            <dc:creator><![CDATA[Omar Aflak]]></dc:creator>
            <pubDate>Sat, 09 May 2020 08:41:17 GMT</pubDate>
            <atom:updated>2025-09-05T22:32:59.345Z</atom:updated>
            <content:encoded><![CDATA[<h4>Create smooth shapes using Bézier curves.</h4><figure><img alt="" src="https://cdn-images-1.medium.com/max/662/1*ql81LfrJHLl6xmGQARXD1Q.png" /></figure><h4>UPDATE: Check out the updated version of this article at: <a href="https://omaraflak.com/articles/bezier-interpolation">https://omaraflak.com/articles/bezier-interpolation</a></h4><p>In this article, we will see how we can use cubic Bézier curves to create a smooth line that goes through a predefined set of points. If you don’t know what Bézier curves are, you might want to check out this post I have written which could do as an introduction, or simply browse Wikipedia!</p><p><a href="https://towardsdatascience.com/b%C3%A9zier-curve-bfffdadea212">Bézier Curve</a></p><h3>Cubic Bézier Curves</h3><p>The goal is to fit <strong><em>n+1</em></strong> given points <strong><em>(P0, …, Pn)</em></strong>. In order to fit these points, we are going to use one cubic Bézier curve (4 control points) between each consecutive pair of points.</p><figure><img alt="" src="https://cdn-images-1.medium.com/max/659/1*b8xAbZeY1-RFsr0PQ-fL7Q.png" /><figcaption>fig. 1</figcaption></figure><p>So in this figure, <strong><em>G0</em></strong>, <strong><em>G1,</em></strong> and <strong><em>G2 </em></strong>are three different cubic Bézier curves that start and end at <strong><em>(P0, P1)</em></strong>, <strong><em>(P1, P2)</em></strong>, and <strong><em>(P2, P3) </em></strong>respectively. Since any Bézier curve always starts and ends at the first and last control points, we are left with 2 control points for each curve that we will have to find so that the resulting line looks smooth.</p><p><em>You might want to refer back to </em><em>fig. 1 during the article to understand the indices we will be using.</em></p><p>The general equation of the cubic Bézier curve is the following:</p><figure><img alt="" src="https://cdn-images-1.medium.com/max/848/0*RvTYvC7w0SRcUdJP.png" /></figure><p>Where <strong><em>K </em></strong>are the 4 control points. In our case, <strong><em>K0</em></strong> and <strong><em>K3 </em></strong>will be two consecutive points that we want to fit (e.g. <strong><em>P0-P1</em></strong>, or <strong><em>P1-P2</em></strong>, etc.), and <strong><em>K1</em></strong> and <strong><em>K2 </em></strong>are the remaining 2 control points we have to find.</p><h3><strong>Problem Setup</strong></h3><p>Given that we have <strong><em>n+1 </em></strong>points to fit, we will use a cubic Bézier curve to fit each consecutive par of points. We denote <strong>Γi<em> </em></strong>the Bézier curve that fits <strong><em>Pi</em></strong> to <strong><em>Pi+1</em></strong>:</p><figure><img alt="" src="https://cdn-images-1.medium.com/max/1024/0*K9lBKIWRujivZGn2.png" /><figcaption>eq. 2</figcaption></figure><p>Where <strong><em>ai</em></strong> and <strong><em>bi </em></strong>are left to find. Notice that there are <strong><em>n</em></strong> curves.</p><p>If we want the final curve to be smooth, we need to ensure that the transition between <strong>Γi<em> </em></strong>and<strong><em> </em>Γi+1<em> </em></strong>is smooth<strong><em> </em></strong>around <strong><em>Pi+1</em></strong>. In other words, that the curvature of <strong>Γi<em> </em></strong>matches the curvature of<strong><em> </em>Γi+1<em> </em></strong>around<strong><em> Pi+1. </em></strong>Mathematically, this means respecting the following conditions:</p><figure><img alt="" src="https://cdn-images-1.medium.com/max/580/0*FBns1Cnc-acCmhkQ.png" /><figcaption>eq. 3, 4</figcaption></figure><p>We need to find all the <strong><em>ai</em></strong> and <strong><em>bi</em></strong>. Since we have one pair of them in each Bézier curve, and since we have <strong><em>n </em></strong>curves, we need to find <strong><em>2n </em></strong>variables. However, here we have <strong><em>2(n-1)</em></strong> equations. We are missing 2 equations to solve the system. Therefore, we impose the following (arbitrary) boundary conditions:</p><figure><img alt="" src="https://cdn-images-1.medium.com/max/212/0*DPeBUXfR3_arIawZ.png" /><figcaption>eq. 5, 6</figcaption></figure><h3>Write the system</h3><p>Before solving the system, we need to calculate the first and second derivatives of <strong>Γi </strong>and write the system down.</p><figure><img alt="" src="https://cdn-images-1.medium.com/max/1024/0*ZW5q6XJjaIBZiCSZ.png" /><figcaption>eq. 7</figcaption></figure><p>and,</p><figure><img alt="" src="https://cdn-images-1.medium.com/max/1024/0*AqirkTOX_m9M_C4F.png" /><figcaption>eq. 8</figcaption></figure><h4>Inject the equations</h4><p>Injecting eq. 7 into eq. 3:</p><figure><img alt="" src="https://cdn-images-1.medium.com/max/643/0*-SNOSelL-J5wyVtH.png" /><figcaption>eq. 9</figcaption></figure><p>Injecting eq. 8 into eq. 4:</p><figure><img alt="" src="https://cdn-images-1.medium.com/max/648/0*Q51WmWs8B2oRM3fw.png" /><figcaption>eq. 10</figcaption></figure><p>Injecting eq. 8 into eq. 5:</p><figure><img alt="" src="https://cdn-images-1.medium.com/max/304/0*6FXnI8BGP1u-3ezp.png" /><figcaption>eq. 11</figcaption></figure><p>Finally, injecting eq.8 into eq. 6:</p><figure><img alt="" src="https://cdn-images-1.medium.com/max/369/0*lXIZR-rD60NNF9Ls.png" /><figcaption>eq. 12</figcaption></figure><h3>Solve the system</h3><p>To sum up, we have the following <strong><em>2n</em></strong> equations:</p><figure><img alt="" src="https://cdn-images-1.medium.com/max/597/0*BqwofngudZbqbY_L.png" /><figcaption>eq. 9, 10, 11, 12</figcaption></figure><p>In order to solve the system are going to eliminate all the <strong><em>bi </em></strong>by injecting eq. 9 into eq. 10, 11, 12. Using eq. 9:</p><figure><img alt="" src="https://cdn-images-1.medium.com/max/561/0*aT1AwZ4jRwxx7nSa.png" /><figcaption>eq. 13</figcaption></figure><p>Injecting eq. 13 into eq. 10:</p><figure><img alt="" src="https://cdn-images-1.medium.com/max/893/0*DWXJdpbCmLDNO0H3.png" /><figcaption>eq. 13, 14</figcaption></figure><p>Injecting eq. 13 into eq. 11:</p><figure><img alt="" src="https://cdn-images-1.medium.com/max/394/0*5C4xj3iTR-BlYb8q.png" /><figcaption>eq. 15</figcaption></figure><p>Almost there! We now need to inject eq. 13 into the fourth equation, eq. 12. However, eq. 12 has <strong><em>bn-1 </em></strong>but eq. 13 holds until <strong><em>bn-2</em></strong>. Good news, we can use eq. 14 to get rid of <strong><em>bn-1 </em></strong>then inject <strong><em>bn-2</em></strong><em>.</em></p><figure><img alt="" src="https://cdn-images-1.medium.com/max/767/0*8bjuRDe2Rcvcbv8t.png" /><figcaption>eq. 16</figcaption></figure><p>All right! To summarise we have in this order eq. 15, 13, 16:</p><figure><img alt="" src="https://cdn-images-1.medium.com/max/754/0*OQnKruaygCBOekle.png" /><figcaption>eq. 15, 13, 16</figcaption></figure><p>We can write this system as a matrix multiplication and solve it. Hold on, we’re there!</p><figure><img alt="" src="https://cdn-images-1.medium.com/max/905/0*iIdqqaYXE9aNr1Z7.png" /><figcaption>eq. 17</figcaption></figure><p>As you can see, the first matrix has only 3 diagonal filled with values, others are zeros. This kind of matrix is called, reasonably enough, a <em>tridiagonal</em> matrix. Algorithms exist to solve this type of systems efficiently, such as <a href="https://en.wikipedia.org/wiki/Tridiagonal_matrix_algorithm">Thomas Algorithm</a> which runs linearly in time. For the sake of simplicity, we won’t bother optimising further and we will simply use the built-in functions of Numpy in Python to solve the system.</p><p>However, we are still missing the <strong><em>bi </em></strong>points. To find these we simply make use of eq. 13 which works out all the <strong><em>bi </em></strong>up until <strong><em>bn-2 </em></strong>and then eq. 12 which gives the last term, <strong><em>bn-1</em></strong>.</p><figure><img alt="" src="https://cdn-images-1.medium.com/max/553/0*M7PDLzG91Zo-o3Cm.png" /><figcaption>eq. 13, 12</figcaption></figure><p>We are done at last! Let’s see how we can program this using Python.</p><h3>Python Code</h3><p>I did my best to make the code as clear as possible, and I added comments. If you can’t get your head around something don’t hesitate to comment!</p><iframe src="" width="0" height="0" frameborder="0" scrolling="no"><a href="https://medium.com/media/157329425eb5f36f2d7b4fafcb8ef274/href">https://medium.com/media/157329425eb5f36f2d7b4fafcb8ef274/href</a></iframe><p>Finally, this is how we would use this code:</p><iframe src="" width="0" height="0" frameborder="0" scrolling="no"><a href="https://medium.com/media/db83a7ff396f98ed5524baf837b12ce3/href">https://medium.com/media/db83a7ff396f98ed5524baf837b12ce3/href</a></iframe><p>In my case, I got this:</p><figure><img alt="" src="https://cdn-images-1.medium.com/max/657/1*vVuxQfN3bGq4my3E3AFAhQ.png" /><figcaption>fig. 2</figcaption></figure><p>This is it for this article! I hope you enjoyed it and don’t hesitate to comment any question you might have!</p><img src="https://medium.com/_/stat?event=post.clientViewed&referrerSource=full_rss&postId=8033e9a262c2" width="1" height="1" alt="">]]></content:encoded>
        </item>
        <item>
            <title><![CDATA[Bézier Curve]]></title>
            <link>https://omaraflak.medium.com/b%C3%A9zier-curve-bfffdadea212?source=rss-c215fdc67eb------2</link>
            <guid isPermaLink="false">https://medium.com/p/bfffdadea212</guid>
            <category><![CDATA[graphic-design]]></category>
            <category><![CDATA[math]]></category>
            <category><![CDATA[bezier-curves]]></category>
            <category><![CDATA[mathematics]]></category>
            <category><![CDATA[python]]></category>
            <dc:creator><![CDATA[Omar Aflak]]></dc:creator>
            <pubDate>Sat, 02 May 2020 17:18:54 GMT</pubDate>
            <atom:updated>2025-09-05T22:34:00.000Z</atom:updated>
            <content:encoded><![CDATA[<h4>Understand the mathematics of Bézier curves</h4><figure><img alt="" src="https://cdn-images-1.medium.com/max/648/1*TeAp4zWT8cqsXNaly7xgaA.png" /></figure><h4>UPDATE: Check out the updated version of this article at: <a href="https://omaraflak.com/articles/bezier">https://omaraflak.com/articles/bezier</a></h4><p>Bézier curves are used a lot in computer graphics, often to produce smooth curves, and yet they are a very simple tool. If you have ever used Photoshop you might have stumbled upon that tool called “Anchor” where you can put anchor points and draw some curves with them… Yep, these are Bézier curves. Or if you have used vector-based graphic, SVG, these too use Bézier curves. Let’s see how it works.</p><h3>Definition</h3><p>Given <strong><em>n+1</em></strong><em> </em>points <strong><em>(P0, …, Pn) </em></strong>called the <strong><em>control points</em></strong><em>, </em>the <a href="https://en.wikipedia.org/wiki/B%C3%A9zier_curve">Bézier curve</a> defined by these points is defined as:</p><figure><img alt="" src="https://cdn-images-1.medium.com/max/444/0*pt-2BIkUZ7eJmMQx.png" /><figcaption>eq. 1</figcaption></figure><p>Where <strong>B(t) </strong>is the <a href="https://en.wikipedia.org/wiki/Bernstein_polynomial">Bernstein polynomial</a>, and:</p><figure><img alt="" src="https://cdn-images-1.medium.com/max/619/0*tYh_YHX8047-H2K2.png" /><figcaption>eq. 2</figcaption></figure><p>You will notice that this Bernstein polynomial looks a lot like the <strong><em>k(th) </em></strong>term in Newton’s binomial formula, which is:</p><figure><img alt="" src="https://cdn-images-1.medium.com/max/352/0*YuSxa61p8p-JGBrb.png" /><figcaption>eq. 3</figcaption></figure><p>In fact, the Bernstein polynomial is nothing but the <strong><em>k(th)</em></strong> term in the expansion of <strong><em>(t + (1 - t))^n = 1. </em></strong>Which is why if you sum all the <strong><em>Bi</em></strong> up to <strong><em>n</em></strong>, you will get <strong>1</strong>. Any ways.</p><h3>Quadratic Bézier Curve</h3><p>The quadratic Bézier curve is how we call the Bézier curve with 3 control points, since the degree of P(t) will be 2. Let’s calculate the Bézier curve given <strong>3</strong> control points and explore some properties we might find! Remember, eq. 1 holds for <strong><em>n+1 </em></strong>points, so in our case <strong><em>n=2.</em></strong></p><figure><img alt="" src="https://cdn-images-1.medium.com/max/996/0*nmPLYCoMl6l97qdS.png" /><figcaption>eq. 4</figcaption></figure><p>Mind that <strong>P(t) </strong>does not return a number, but a point on the curve. Now we just have to choose three control points and evaluate the curve on the range [0, 1]. We can do this in Python quite easily.</p><iframe src="" width="0" height="0" frameborder="0" scrolling="no"><a href="https://medium.com/media/ef672388a1b9d2092bc13159dca826af/href">https://medium.com/media/ef672388a1b9d2092bc13159dca826af/href</a></iframe><figure><img alt="" src="https://cdn-images-1.medium.com/max/640/1*nLDNadmGL-k6FXV0Tdo1MA.png" /></figure><p>You can notice that the curve starts and ends at the first and last control points. This result will be true for any number of points. Since <strong><em>t</em></strong> ranges from <strong>0</strong> to <strong>1</strong>, we can prove this by evaluating <strong>P(t)</strong> at t=0 and t=1. Using eq. 1:</p><figure><img alt="" src="https://cdn-images-1.medium.com/max/485/0*0ytQ2pqBhBIplig8.png" /><figcaption>eq. 4</figcaption></figure><figure><img alt="" src="https://cdn-images-1.medium.com/max/485/0*7RL80F1U-f1huH8Y.png" /><figcaption>eq. 5</figcaption></figure><p>Because the curve goes from <strong><em>P0</em></strong> to <strong><em>P2</em></strong>, in this case, <strong><em>P1 </em></strong>entirely<em> </em>determines the shape of the curve<em>. </em>Moving <strong><em>P1 </em></strong>around you might notice something:</p><figure><img alt="" src="https://cdn-images-1.medium.com/max/1024/1*sVP9ywadzev59X2T8HydfA.png" /></figure><p>The Bézier curve is always contained in the polygon formed by the control points. This polygon is hence called the <strong><em>control polygon</em></strong><em>, </em>or Bézier polygon. This property also holds for any number of control points, which makes their manipulation quite intuitive when using a software.</p><figure><img alt="" src="https://cdn-images-1.medium.com/max/1024/1*hDjhAfgcHBqWIpbXqA-m0Q.png" /></figure><h3>Matrix representation</h3><p>We can actually represent the Bézier formula using matrix multiplication, which might be useful in other contexts, for instance for splitting the Bézier curve. If we go back to our example we can rewrite P(t) as follows:</p><figure><img alt="" src="https://cdn-images-1.medium.com/max/688/0*hxse7pmov35e_G1Z.png" /><figcaption>eq. 6</figcaption></figure><p>And so all the information about the quadratic Bézier curve is compacted into one matrix, <strong><em>M</em></strong>. Now, we might want to find the coefficients of that matrix without having to do all these steps, and in a way that is easily programmable. Since the coefficients of the matrix are simply the coefficients of the polynomial in front of each <strong><em>Pi</em></strong>, what we are looking for is the expanded form of the Bernstein polynomial eq. 2.</p><p>One more thing: if we expand <strong>Bi(t) </strong>we will get the polynomial in front of <strong>Pi</strong>, which corresponds to the <strong>i(th) column</strong> in the matrix. However, that is not really convenient and it would be easier to program if we could get rows instead. That said, you might notice that the <strong>i(th) row</strong> of the matrix is exactly the same as the <strong>reversed (n-i)(th)</strong> <strong>column, </strong>and the coefficients of the reversed (n-i)(th) column are nothing but the coefficients of <strong>B(n-i)(t) </strong>taken in decreasing powers of <strong><em>t</em></strong><em>.</em></p><figure><img alt="" src="https://cdn-images-1.medium.com/max/531/0*BJNNNVM_w55I1IAx.png" /><figcaption>eq. 7</figcaption></figure><p>You might want to refer to eq. 2 and eq. 3 if you’re having some troubles.</p><p>Therefore, the coefficients of the matrix are nothing but the coefficients in front of <strong><em>t</em></strong>, meaning:</p><figure><img alt="" src="https://cdn-images-1.medium.com/max/480/0*Spbm_pjWWvayoE1W.png" /><figcaption>eq. 8</figcaption></figure><iframe src="" width="0" height="0" frameborder="0" scrolling="no"><a href="https://medium.com/media/264321b5495ed7df13bf33cb76961886/href">https://medium.com/media/264321b5495ed7df13bf33cb76961886/href</a></iframe><h3>Interpolation</h3><p>One interesting application of Bézier curves is to draw a smooth curve going through a predefined set of points. The reason it is interesting is because the formula of <strong>P(t) </strong>produces<strong> points</strong> and is not of the form <strong>y=f(x)</strong>, so one <strong><em>x</em></strong> can have multiple <strong><em>y</em></strong>’s (basically a function that can “go backward”). For instance we could draw something like this:</p><figure><img alt="" src="https://cdn-images-1.medium.com/max/640/1*L_7MuirjkGk4KEtpYpDBXg.png" /></figure><p>However, the mathematics to produce this result are not trivial so I’ve wrote a dedicated post for this:</p><p><a href="https://towardsdatascience.com/b%C3%A9zier-interpolation-8033e9a262c2">Bézier Interpolation</a></p><p>In the meantime, here is how you can program the general version of the Bézier curve for any number of control points using eq. 1.</p><iframe src="" width="0" height="0" frameborder="0" scrolling="no"><a href="https://medium.com/media/d452104de73512feaa49a618ffec6f99/href">https://medium.com/media/d452104de73512feaa49a618ffec6f99/href</a></iframe><p>Run the program and you will get the graph displayed in the header.</p><p>That’s it for this introduction to Bézier curves. I hope you learned something and don’t hesitate to comment any question that you might have!</p><img src="https://medium.com/_/stat?event=post.clientViewed&referrerSource=full_rss&postId=bfffdadea212" width="1" height="1" alt="">]]></content:encoded>
        </item>
        <item>
            <title><![CDATA[A one night journey]]></title>
            <link>https://omaraflak.medium.com/a-one-night-journey-10127ffaf275?source=rss-c215fdc67eb------2</link>
            <guid isPermaLink="false">https://medium.com/p/10127ffaf275</guid>
            <category><![CDATA[humor]]></category>
            <category><![CDATA[education]]></category>
            <category><![CDATA[entrepreneurship]]></category>
            <category><![CDATA[hackathons]]></category>
            <category><![CDATA[creativity]]></category>
            <dc:creator><![CDATA[Omar Aflak]]></dc:creator>
            <pubDate>Wed, 08 Apr 2020 07:58:44 GMT</pubDate>
            <atom:updated>2020-04-08T17:28:25.787Z</atom:updated>
            <content:encoded><![CDATA[<h4>What does it take ?</h4><figure><img alt="" src="https://cdn-images-1.medium.com/max/1024/0*5Xv0PgW6gAKz0Gd-" /><figcaption>Photo by <a href="https://unsplash.com/@kidcircus?utm_source=medium&amp;utm_medium=referral">Kid Circus</a> on <a href="https://unsplash.com?utm_source=medium&amp;utm_medium=referral">Unsplash</a></figcaption></figure><p>I took an “entrepreneurial” course this year at my University.</p><p>I’ve heard about the <em>top 10 techniques that every entrepreneur must know. </em>I’ve heard about how I should talk to an audience to convince them. I’ve heard about the relentless mindset I should have, and never fear failure. I’ve heard about how I should have diversified skills among my team. I’ve heard about how important <em>networking</em> is. I’ve heard about the fall and the rise of dozens of people I don’t even know. I’ve heard about all the mistakes I should avoid, and everything I should do in order to succeed. Days, weeks, months have passed and I kept hearing those things.</p><h3>Quarantined</h3><p>Few weeks back from now, an evening, I was at my friend’s house, we were 3 having some pizzas after a skateboard session. We opened up the TV, we were awaiting for the French president to announce new measures related to COVID-19. That’s when the quarantine was announced. <em>We were so happy.</em></p><h3>Hysteria</h3><p>While eating our pizzas, we started to imagine apps that we could create for the weird period ahead of us. Ideas were coming from all sides, we kind of went hysterical at that moment, but eventually we settled down.</p><h3>The agreement</h3><p>We thought that, during this period, it would probably be hard for some students to keep up with the work, especially if exams were coming. We were mostly thinking of teenagers, but really this can be true for anyone. That’s when we agreed on making a website that will <strong><em>connect</em></strong> students to anyone who’s willing to spend a bit of his time to teach a specific subject.</p><h3>Most important part of all</h3><p>We wanted to call this website <strong><em>covaide</em></strong>. You get the play on words (<em>aide</em> means <em>aid,</em> obviously) right? So, first thing we did? <strong>Buy a domain name! </strong>Of course. <a href="https://covaide.fr"><strong>covaide.fr</strong></a> was born. Well… there was no website behind it yet, but still, it was born, for 5€. Finally ready to start coding.</p><h3>The right tools</h3><p>We used <a href="https://firebase.google.com/">Firebase</a> to <a href="https://medium.com/swlh/how-to-deploy-a-react-app-with-firebase-hosting-98063c5bf425">host and deploy</a> our not-ready-yet ReactJS website, which took about 10 minutes, no jokes. Firebase is a free(mium) tool made by Google, which gives you access to all the basic stuff you need to get started: <em>Hosting</em>, <em>Database, Authentication, Storage, etc.</em></p><h3><em>The End</em></h3><p>We spent the night coding. One of us didn’t code, but we were all so excited that he stayed up all night watching us typing… Great guy. The day after, we had a prototype. It was super ugly, really. The whole site would break if you went on mobile. But it was working. We made a post online, it was shared about 200 times on Facebook and LinkedIn. We had a lot of feedbacks from friends but also complete strangers. People started to sign up on the website… Wow!</p><p>Then the government canceled all important exams for teenagers. lol.</p><h3>Conclusion</h3><p>I took an “entrepreneurial” course this year at my University, but truth is, that one little experience was worth a thousand words. So, the next time you’re feeling wings growing, remember that the worst case scenario is a sleepless pizza-night hackathon with your friends, 5€ for a stupid domain name, and a good story to share. Cheers!</p><img src="https://medium.com/_/stat?event=post.clientViewed&referrerSource=full_rss&postId=10127ffaf275" width="1" height="1" alt="">]]></content:encoded>
        </item>
        <item>
            <title><![CDATA[Optimization — Descent Algorithms]]></title>
            <link>https://omaraflak.medium.com/optimization-descent-algorithms-bf595f069788?source=rss-c215fdc67eb------2</link>
            <guid isPermaLink="false">https://medium.com/p/bf595f069788</guid>
            <category><![CDATA[optimization]]></category>
            <category><![CDATA[python]]></category>
            <category><![CDATA[mathematics]]></category>
            <category><![CDATA[machine-learning]]></category>
            <category><![CDATA[algorithms]]></category>
            <dc:creator><![CDATA[Omar Aflak]]></dc:creator>
            <pubDate>Wed, 01 Apr 2020 01:25:40 GMT</pubDate>
            <atom:updated>2025-09-05T22:34:49.299Z</atom:updated>
            <content:encoded><![CDATA[<h3>Optimization — Descent Algorithms</h3><h4>In this post, we will see several basic optimization algorithms that you can use in various data science problems.</h4><figure><img alt="" src="https://cdn-images-1.medium.com/max/957/1*qNjI4S1T7jNBDi3oaJEGbg.png" /></figure><h4>UPDATE: Check out the updated version of this article at: <a href="https://omaraflak.com/articles/optimization">https://omaraflak.com/articles/optimization</a></h4><p>Many algorithms used in Machine Learning are based on basic mathematical optimization methods. Discovering these algorithms directly in the context of Machine Learning might be confusing because of all the prerequisites. Thus, I think it might be a good idea to see these algorithms free of any context in order to get a better understanding of these techniques.</p><h3>Descent Algorithms</h3><p>Descent algorithms are meant to <strong>minimise</strong> a given function, that’s it. Really. These algorithms proceed <strong>iteratively</strong>, it means that they successively improve their current solution. You might think:</p><blockquote>What if I want to find the maximum of a function ?</blockquote><p>Simply, add a <em>minus</em> sign in front of your function, and it becomes a “min” problem!</p><p>Let’s dive in. This is our problem definition:</p><figure><img alt="" src="https://cdn-images-1.medium.com/max/433/0*X61V0WDN5Cx6vxhz.png" /><figcaption>Our problem consists of finding a vector x* that will minimise a function f</figcaption></figure><p>One prerequisite you must know is that if a point is a minimum, maximum, or a saddle point (meaning both at the same time), then the <strong>gradient</strong> of the function is zero at that point.</p><figure><img alt="" src="https://cdn-images-1.medium.com/max/655/1*1og7x2jLFgEWn8mjAqQiqA.png" /><figcaption>1D case</figcaption></figure><p>Descent algorithms consist of building a sequence <strong><em>{x}</em></strong> that will converge towards <strong><em>x*</em> </strong>(<em>arg min f(x)</em>). The sequence is built the following way:</p><figure><img alt="" src="https://cdn-images-1.medium.com/max/271/0*9OIjEOTWgZxIsa4n.png" /><figcaption>Sequence we try to build in order to get to x*</figcaption></figure><p>Where <em>k</em> is the iteration, and <strong><em>d</em></strong> is a vector, same size as <em>x</em>, called the <strong>descent vector</strong>. Then, this is what the algorithm looks like:</p><pre>x = x_init<br>while ||gradient(f(x))|| &gt; epsilon:<br>    x = x + d</pre><p>That’s it! We keep doing the update until the norm of the gradient is small enough (as it should reach a zero value at some extremum).</p><p>We will see 3 different descent/direction vectors: Newton’s direction, Gradient’s direction, and Gradient + Optimal Step Size direction. First, we need to define a function that we will try to minimise during our experiments.</p><h3>Rosenbrock Function</h3><p>I chose the Rosenbrock function, but you may find many others, <a href="https://en.wikipedia.org/wiki/Test_functions_for_optimization">here for instance</a>. Another good one would be Himmelblau’s function.</p><figure><img alt="" src="https://cdn-images-1.medium.com/max/300/0*SQpYo3KNVPXP-bAr.png" /><figcaption>Rosenbrock function — Wikipedia</figcaption></figure><figure><img alt="" src="https://cdn-images-1.medium.com/max/567/0*tquPJO6RiFd82jxQ.png" /></figure><p>It has a global minimum at <em>(x, y)=(a, a²) </em>where <em>f(a, a²) = 0</em>. I will use <em>a=1, b=100 </em>which are commonly used values.</p><p>We will also need, two other pieces of information, the <a href="https://en.wikipedia.org/wiki/Gradient"><strong>gradient</strong></a> of that function, as well as the <a href="https://en.wikipedia.org/wiki/Hessian_matrix"><strong>hessian</strong></a> matrix.</p><figure><img alt="" src="https://cdn-images-1.medium.com/max/796/0*awI4Z4pf1SHyMsb2.png" /><figcaption>Gradient of the Rosenbrock function</figcaption></figure><figure><img alt="" src="https://cdn-images-1.medium.com/max/937/0*td5QTu3mj6wee1Ev.png" /><figcaption>Hessian of the Rosenbrock function</figcaption></figure><p>Let’s open up a file and start a Python script. I will do this in a Google Colab, and all the code used in this post will be available here:</p><p><a href="https://colab.research.google.com/drive/1jOK_C_pSDV6J98ggV1-uEaWUrPuVm5tu">Google Colaboratory</a></p><p>Here is our first piece of code.</p><iframe src="" width="0" height="0" frameborder="0" scrolling="no"><a href="https://medium.com/media/edbb55a0d360b99317e39fa0ad242157/href">https://medium.com/media/edbb55a0d360b99317e39fa0ad242157/href</a></iframe><p>From now on, I will refer to the function input vector as <strong><em>x</em></strong>, akin to the problem definition earlier. Now that we are ready, let’s see the first descent vector!</p><h3>Newton’s direction</h3><p>Newton’s direction is the following:</p><figure><img alt="" src="https://cdn-images-1.medium.com/max/421/0*Ym2KTv-3Ne3qQLti.png" /><figcaption>Newton’s direction</figcaption></figure><p>So the update is:</p><figure><img alt="" src="https://cdn-images-1.medium.com/max/587/0*SRjUsiT-8MZFy1ku.png" /></figure><h4>Quick note n°1</h4><p>You can find this updated formula by doing the 2nd order <a href="https://en.wikipedia.org/wiki/Taylor_series">Taylor expansion</a> of <em>f(</em><strong><em>x + d</em></strong><em>), </em>since the update we are performing is <em>x_new = </em><strong><em>x + d</em></strong>.</p><figure><img alt="" src="https://cdn-images-1.medium.com/max/511/0*hHi_rPu2OTG6qwqs.png" /></figure><p>We want to find <em>d </em>such that <em>f(x + d)</em> is as low as possible. Supposing <em>f’’(x)</em> is positive, this equation is a parabola that has a minimum. That minimum is reached when the derivative of <em>f(x + d)</em> is zero.</p><figure><img alt="" src="https://cdn-images-1.medium.com/max/317/0*lIaTOSfJZ8H5w_-D.png" /></figure><p>In n-dimensions, <em>f’’(x) </em>becomes the hessian matrix, and <em>1/f’’(x) </em>shows up as the inverse hessian matrix. Finally, <em>f’(x) </em>will be the gradient.</p><h4>Quick note n°2</h4><p>We need to compute the inverse of the hessian matrix. For big matrices, this is a very computationally intensive task. Therefore, in practice, we solve this a bit differently, but in a totally equivalent manner.</p><figure><img alt="" src="https://cdn-images-1.medium.com/max/404/0*sFe5mu9XfCiJCQni.png" /><figcaption>linear equation</figcaption></figure><p>Instead of computing the inverse of the hessian matrix, we solve this equation for <em>g </em>and make the update rule the following:</p><figure><img alt="" src="https://cdn-images-1.medium.com/max/267/0*c1lIIlI1ooMXD0Sl.png" /></figure><p>Let’s code the algorithm now:</p><iframe src="" width="0" height="0" frameborder="0" scrolling="no"><a href="https://medium.com/media/c383b586bf7aa336642114cd85d0719d/href">https://medium.com/media/c383b586bf7aa336642114cd85d0719d/href</a></iframe><p>You will notice a small difference with the algorithm I presented at the beginning. I added a <em>max_iteration </em>parameter, so that the algorithm doesn’t run indefinitely if it doesn’t converge. Let’s try it.</p><iframe src="" width="0" height="0" frameborder="0" scrolling="no"><a href="https://medium.com/media/978e1da5b2c877030ffd3caa79e7e21e/href">https://medium.com/media/978e1da5b2c877030ffd3caa79e7e21e/href</a></iframe><p>We get this result:</p><pre>x* = [1. 1.]<br>Rosenbrock(x*) = 0.0<br>Grad Rosenbrock(x*) = [0. 0.]<br>Iterations = 2</pre><p>The algorithm converged in only <strong>2</strong> iterations! That’s really fast. You might think:</p><blockquote>Hey, the initial <em>x </em>is very close to the target <em>x*, that makes the task easy!</em></blockquote><p>You’re right. Try with some other values, for instance <strong><em>x_init = [50, -30]</em></strong>, the algorithm terminates in <strong>5</strong> iterations.</p><p>This algorithm is called the <a href="https://en.wikipedia.org/wiki/Newton%27s_method_in_optimization">Newton’s Method</a> and all descent algorithms are modifications of this method! It’s kind of the mother formula. The reason why it’s really fast is that it uses <strong>second order information</strong> (the hessian matrix).</p><p>Using the hessian matrix, however, comes at a cost: efficiency. Computing an inverse matrix is a computationally intensive task, so mathematicians came up with solutions to overcome this problem. Mainly: <em>Quasi-Newton</em> methods, and <em>Gradient</em> methods. Quasi-Newton methods try to approximate the inverse of the hessian matrix with various techniques, whereas Gradient methods simply stick to first order information.</p><h3>Gradient’s direction</h3><p>If you did some Machine Learning, you’ve probably seen this already. The gradient direction:</p><figure><img alt="" src="https://cdn-images-1.medium.com/max/279/0*GQuSlYLQehPMwVcV.png" /><figcaption>Gradient’s direction</figcaption></figure><p>Where <em>α</em> is called the step size (or learning rate in ML), and is a real number<em>.</em></p><p>If you have been doing some Machine Learning, now you know this formula is actually part of a bigger one: Newton’s direction, except we replaced the inverse hessian with a constant! The update rule now is:</p><figure><img alt="" src="https://cdn-images-1.medium.com/max/429/0*NtDW3s6liFdSMtiW.png" /></figure><p>The algorithm becomes:</p><iframe src="" width="0" height="0" frameborder="0" scrolling="no"><a href="https://medium.com/media/143d976399c461e59d96d9830d658e1e/href">https://medium.com/media/143d976399c461e59d96d9830d658e1e/href</a></iframe><p>Let’s try it out:</p><iframe src="" width="0" height="0" frameborder="0" scrolling="no"><a href="https://medium.com/media/ff4d2849606ee0560747d3ae65b5dd70/href">https://medium.com/media/ff4d2849606ee0560747d3ae65b5dd70/href</a></iframe><p>You can tweak the values of <em>alpha</em>, <em>epsilon</em>, and <em>max_iterations</em>. In order to get a result similar to the Newton’s method I came up with those. This is the result:</p><pre>x* = [0.99440769 0.98882419]<br>Rosenbrock(x*) = 3.132439308613923e-05<br>Grad Rosenbrock(x*) = [-0.00225342 -0.00449072]<br>Iterations = 5000</pre><p>Wow! Gradient descent took 5000 iterations where the Newton’s method took only 2! Moreover, the algorithm didn’t completely reach the minimum point <em>(1, 1)</em>.</p><p>The main reason for which this algorithm converged so slowly compared to Newton, is that not only we no longer have the information given by the second derivative of <em>f, </em>but we used a <strong>constant</strong> to replace the inverse hessian.</p><p>Think about it. The derivative of a function is the rate of change of that function. So the hessian gives information about the rate of change of the gradient. Since finding the minimum implies necessarily a zero gradient, the hessian becomes super useful as it tells you when the gradient goes up or down.</p><p>Many papers in ML are just about finding a better approach for this specific step. Momentum, Adagrad, or Adadelta are some examples.</p><h3>Gradient’s direction + Optimal step size</h3><p>One improvement to the classical gradient descent is to use a variable step size at each iteration, not a constant. Not only it’s going to be a variable step size, but it’s also the <strong>best possible step size.</strong></p><figure><img alt="" src="https://cdn-images-1.medium.com/max/300/0*xSXYsQ5i-1mBF15w.png" /><figcaption>αk is the step size at iteration k</figcaption></figure><p>The update is:</p><figure><img alt="" src="https://cdn-images-1.medium.com/max/446/0*GbfRrDMpqgLMuZsH.png" /></figure><p>How do we find <strong>α</strong>? Since we want this update to be as efficient as possible, i.e. to minimise <em>f </em>as much as possible, we are looking for <strong>α </strong>such that:</p><figure><img alt="" src="https://cdn-images-1.medium.com/max/587/0*nDn9PmkevXiEzciP.png" /></figure><p>Notice that at this step, <strong><em>x </em></strong>and <strong><em>grad(x)</em></strong> are constants. Therefore, we can define a new function <strong><em>q</em></strong><em>:</em></p><figure><img alt="" src="https://cdn-images-1.medium.com/max/479/0*Aon4JwDNW75ZG3X6.png" /></figure><p>Where <strong><em>q </em></strong>is actually a function of one variable. And we want to find the <strong>α </strong>that minimises this function. Umm… Gradient descent? We could, but while we’re at it, let’s learn a new method: <a href="https://en.wikipedia.org/wiki/Golden-section_search"><strong>Golden Section Search</strong></a>.</p><p><em>Golden Section Search</em> aims at finding the extremum (minimum or maximum) of a function inside a specified interval. Since we use <strong>α in the range [0, 1]</strong>, this is the perfect opportunity to use this algorithm.</p><figure><img alt="" src="https://cdn-images-1.medium.com/max/651/1*gDw31rWNTTm9X5Dnuk11xg.gif" /><figcaption>Golden Section Search — first 5 iterations</figcaption></figure><p>As this post is starting to be pretty long I’m not going to go into the details. Hopefully, with the help of that magnificent GIF I took ages to make, and the code below, you’ll be able to understand what’s happening here.</p><iframe src="" width="0" height="0" frameborder="0" scrolling="no"><a href="https://medium.com/media/eb7157eff57809fc419a0c7a4decfa90/href">https://medium.com/media/eb7157eff57809fc419a0c7a4decfa90/href</a></iframe><p>Now that we are able to find the best <strong>α</strong>, let’s code gradient descent with optimal step size!</p><iframe src="" width="0" height="0" frameborder="0" scrolling="no"><a href="https://medium.com/media/cba14462984ff35daf362f43be2a34b9/href">https://medium.com/media/cba14462984ff35daf362f43be2a34b9/href</a></iframe><p>Then, we can run this code:</p><iframe src="" width="0" height="0" frameborder="0" scrolling="no"><a href="https://medium.com/media/dbb893eeb41252d2aade76ae9dae1689/href">https://medium.com/media/dbb893eeb41252d2aade76ae9dae1689/href</a></iframe><p>We get the following result:</p><pre>x* = [0.99438271 0.98879563]<br>Rosenbrock(x*) = 3.155407544747055e-05<br>Grad Rosenbrock(x*) = [-0.01069628 -0.00027067]<br>Iterations = 3000</pre><p>Even though in this case the results are not significantly better than pure gradient descent, generally the optimal step size performs better. For instance, I tried the same comparison with Himmelblau’s function, and gradient descent with optimal step size was more than twice as fast as pure gradient descent.</p><h3>Conclusion</h3><p>This is the end of this post. I hope you learned some new things that triggered your curiosity for mathematical optimization! There are tons of other interesting methods. Go find them! Don’t forget to check out the <a href="https://colab.research.google.com/drive/1jOK_C_pSDV6J98ggV1-uEaWUrPuVm5tu">Google Colab</a> file, you will find all the code used and the same tests we did here with Himmelblau’s function. Don’t hesitate to leave a comment, and until next time, peace! :)</p><img src="https://medium.com/_/stat?event=post.clientViewed&referrerSource=full_rss&postId=bf595f069788" width="1" height="1" alt="">]]></content:encoded>
        </item>
        <item>
            <title><![CDATA[Mathématiques des réseaux de neurones — code Python]]></title>
            <link>https://medium.com/france-school-of-ai/math%C3%A9matiques-des-r%C3%A9seaux-de-neurones-code-python-613d8e83541?source=rss-c215fdc67eb------2</link>
            <guid isPermaLink="false">https://medium.com/p/613d8e83541</guid>
            <category><![CDATA[mathematics]]></category>
            <category><![CDATA[keras]]></category>
            <category><![CDATA[machine-learning]]></category>
            <category><![CDATA[python]]></category>
            <category><![CDATA[neural-networks]]></category>
            <dc:creator><![CDATA[Omar Aflak]]></dc:creator>
            <pubDate>Thu, 21 Feb 2019 17:23:54 GMT</pubDate>
            <atom:updated>2025-09-05T22:36:24.442Z</atom:updated>
            <content:encoded><![CDATA[<h3>Réseaux de neurones en partant de zéro en Python</h3><figure><img alt="" src="https://cdn-images-1.medium.com/max/1024/0*YtYVlwMAM7s7cwgI" /><figcaption>Photo by <a href="https://unsplash.com/@cadop?utm_source=medium&amp;utm_medium=referral">Mathew Schwartz</a> on <a href="https://unsplash.com?utm_source=medium&amp;utm_medium=referral">Unsplash</a></figcaption></figure><p><em>Cet article est une traduction du poste originalement publié </em><a href="https://medium.com/datadriveninvestor/math-neural-network-from-scratch-in-python-d6da9f29ce65"><em>ici</em></a><em>.</em></p><h4>UPDATE: Check out the updated version of this article at: <a href="https://omaraflak.com/articles/neural-network">https://omaraflak.com/articles/neural-network</a>, <a href="https://omaraflak.com/articles/neural-network-2">https://omaraflak.com/articles/neural-network-2</a></h4><p>Le but de cet article est de comprendre comment est implémenté un <em>framework</em> tel que <em>Keras</em>, mais également de comprendre les fondements mathématiques qui se cachent derrière le <em>machine learning</em>. Nous allons donc créer en partant de zéro, une mini bibliothèque qui nous permettra de construire des réseaux de neurones très facilement, comme ci dessous:</p><figure><img alt="" src="https://cdn-images-1.medium.com/max/654/1*Ld6m0dsfSkJMnbXaoIAOVw.png" /><figcaption>3-layer neural network</figcaption></figure><p>Je supposerai à partir d’ici que vous avez déjà quelques connaissances fondamentales telles que le modèle du neurone artificiel et l’algorithme des gradients descendants. Encore une fois, le but ici n’est pas d’expliquer les différentes applications possibles du machine learning, mais plutôt comment implémenter ces algorithmes.</p><h3>Couche par Couche</h3><p>Gardons à l’esprit la démarche globale du machine learning :</p><ol><li>Donner une <strong>entrée</strong> au modèle.</li><li>Propager cette entrée à travers le réseau de neurones jusqu’à récupérer la <strong>sortie</strong>.</li><li>Une fois la sortie obtenue, nous pouvons la comparer à la sortie voulue et donc calculer une <strong>erreur</strong>.</li><li>On ajuste les paramètres du modèle pour diminuer l’erreur précédemment calculée. Pour cela on soustrait à chaque paramètre la dérivée de l’erreur par rapport à lui-même (<a href="https://fr.wikipedia.org/wiki/Algorithme_du_gradient">gradient descendant</a>).</li><li>On recommence à l’étape 1.</li></ol><p>L’étape la plus importante est la <strong>4ième</strong>. Nous voulons être capable de créer autant de couches que l’on veut, de n’importe quel type, et d’utiliser n’importe quelle fonction d’activation. Seulement, en changeant l’architecture du réseau de neurones, on change également la formule littérale du calcul de la dérivée de l’erreur par rapport aux paramètres.</p><p>Le but est donc de faire une implémentation qui fait abstraction de l’architecture du modèle (comme dans Keras). Pour cela, nous devons implémenter chaque couche <strong>séparément</strong>.</p><h3>Ce que chaque couche doit faire</h3><p>Quelle que soit la couche que nous codons (<em>fully connected</em>, <em>convolutional</em>, <em>maxpooling</em>, <em>dropout</em>, etc.), il y aura toujours au moins deux éléments fondamentaux : une <strong>entrée</strong> et une <strong>sortie</strong>.</p><figure><img alt="" src="https://cdn-images-1.medium.com/max/626/0*3cUc4jsQlHsoovn9.png" /></figure><h4>Passe avant — forward propagation</h4><p>Nous pouvons dès lors préciser une propriété importante : <strong>la sortie d’une couche est l’entrée de la couche suivante.</strong></p><figure><img alt="" src="https://cdn-images-1.medium.com/max/676/1*ddDKxWSAYci2dHaG3O_DOg.png" /></figure><p>Cette partie est ce qu’on appelle la <strong>passe avant</strong> (forward propagation) : on propage l’entrée <strong>X </strong>(image, son, texte, etc.) dans le réseau de neurones jusqu’à obtenir la sortie <strong>Y</strong>. Puis, on observe une erreur <strong>E </strong>qu’il faut maintenant diminuer.</p><h4>Descente de Gradient</h4><p>Ceci est un <strong>rappel </strong>rapide, si vous avez besoin d’en savoir plus sur la descente de gradient il y a des tonnes de ressources sur Internet.</p><p>Fondamentalement, nous voulons changer un paramètre dans le réseau (appelez-le <strong>w</strong>) afin que l’erreur totale <strong>E</strong> <strong>diminue</strong>. Il existe un moyen intelligent de le faire (sans changer le paramètre au hasard) qui est le suivant:</p><figure><img alt="" src="https://cdn-images-1.medium.com/max/310/0*MZYPbIwBn5SOEpG7.png" /></figure><p>Où <strong>α</strong> est un paramètre dans l’intervalle [0,1] que nous fixons et qui est appelé le <strong>taux d’apprentissage</strong>. Quoi qu’il en soit, l’important ici est <strong>∂E/∂w</strong> (la dérivée de <strong>E </strong>par rapport à <strong>w</strong>). Nous devons être en mesure de trouver la valeur de cette expression pour n’importe quel paramètre du réseau, <strong>quelle que soit son architecture.</strong></p><h4>Passe arrière — backward propagation</h4><p>Supposons que l’on donne à une couche la <strong>dérivée de l’erreur</strong> par rapport à <strong>sa sortie </strong>(<strong>∂E/∂Y</strong>), alors elle doit être capable de donner la <strong>dérivée de l’erreur</strong> par rapport à <strong>son entrée </strong>(<strong>∂E/∂X</strong>).</p><figure><img alt="" src="https://cdn-images-1.medium.com/max/748/0*bL0jVGoQiH_TsrvT.png" /></figure><p>L’erreur E est un <strong>scalaire </strong>(un nombre), et X et Y sont des <strong>matrices</strong>. La notation ci-dessus (abusive) signifie ceci:</p><figure><img alt="" src="https://cdn-images-1.medium.com/max/526/0*KYFYEbu_o9j1_a5w.png" /></figure><p>Laissons de côté ∂E/∂X pour l’instant et concentrons-nous sur ∂E/∂Y. Si une couche a accès à ∂E/∂Y où Y est <strong>sa propre sortie, </strong>alors nous pouvons très facilement calculer la dérivée de l’erreur par rapport à ses paramètres ∂E/∂W (pour l’étape de l’ajustement), et cela, <strong>indépendamment de l’architecture globale du réseau de neurones!</strong> Il suffit d’utiliser la règle de dérivation des fonctions composées :</p><figure><img alt="" src="https://cdn-images-1.medium.com/max/325/0*3xceKA-7b2oNbE4T.png" /></figure><p>Etant donnée ∂E/∂Y nous pouvons donc calculer ∂E/∂W, et donc ajuster les paramètres de la couche!</p><h4>Pourquoi avons-nous besoin de ∂E/∂X ?</h4><p>N’oubliez pas, la sortie d’une couche est l’entrée de la couche suivante. Donc <strong>∂E/∂X pour une couche sera ∂E/∂Y pour la couche précédente!</strong> Une fois munit de son propre ∂E/∂Y, la couche précédente pourra à son tour ajuster ses paramètres. Pour calculer ∂E/∂X on utilise encore une fois la règle de dérivation des fonctions composées :</p><figure><img alt="" src="https://cdn-images-1.medium.com/max/329/0*Of8qDiWo31MK-MU0.png" /></figure><p>Cette astuce est la clé de compréhension de la backward propagation! Apres cela, nous pourrons programmer un réseau de neurones convolutif en un rien de temps.</p><h4>Un super diagramme</h4><p>C’est ce que j’ai décrit plus tôt. La couche 3 va mettre à jour ses paramètres en utilisant ∂E/∂Y, puis va passer ∂E/∂H2 à la couche précédente, qui est son propre “∂E/∂Y”. La couche 2 va alors faire de même, et ainsi de suite.</p><figure><img alt="" src="https://cdn-images-1.medium.com/max/746/1*0QPRST83oBicKPE_R4biJA.png" /></figure><p>Ça peut paraître abstrait maintenant mais deviendra très clair part la suite. Nous pouvons dès lors créer notre première classe en Python qui sera une classe abstraite représentant une couche.</p><h4>Classe abstraite : Layer</h4><p>La classe abstraite Layer, dont les autres couches hériteront, contient les caractéristiques communes à toutes les couches : une <strong>entrée</strong>, une <strong>sortie</strong>, une fonction qui fait la <strong>passe avant</strong>, et une pour la <strong>passe arrière</strong>.</p><iframe src="" width="0" height="0" frameborder="0" scrolling="no"><a href="https://medium.com/media/a1d792797eafb36d7e3c488d2389c266/href">https://medium.com/media/a1d792797eafb36d7e3c488d2389c266/href</a></iframe><p>Comme vous pouvez le constater, il existe un paramètre supplémentaire dans backward_propagation que je n’ai pas mentionné, c’est le learning_rate. Ce paramètre devrait être une politique de mise à jour, ou un optimiseur, comme ils l’appellent dans Keras, mais par souci de simplicité, nous allons simplement passer le <em>learning rate</em> et mettre à jour nos paramètres en utilisant la descente de gradient.</p><h3>Fully Connected Layer</h3><p>Définissons et implémentons maintenant le premier type de couche: <em>Fully Connected Layer</em> ou FC Layer. Les couches FC sont les couches les plus élémentaires car tous les neurones d’entrée sont connectés à tous les neurones de sortie.</p><figure><img alt="" src="https://cdn-images-1.medium.com/max/438/1*jOObSF4HAB5VL5wO6petqA.png" /></figure><h4>Forward Propagation</h4><p>La valeur de chaque neurone de sortie peut être calculée comme suit :</p><figure><img alt="" src="https://cdn-images-1.medium.com/max/337/0*Ec9fiXkNOoA5Z2v5.png" /></figure><p>La notation <strong>matricielle</strong> permet de simplifier ce calcul :</p><figure><img alt="" src="https://cdn-images-1.medium.com/max/1024/0*qTBPwmzIjBOGKpXH.png" /></figure><figure><img alt="" src="https://cdn-images-1.medium.com/max/253/0*FsJQ82GmKlV2X22z.png" /></figure><p>Nous en avons fini avec la passe avant. Traitons maintenant la passe arrière de la couche FC.</p><p><em>Notez que je n’utilise aucune fonction d’activation, c’est parce que nous allons l’implémenter dans une couche à part !</em></p><h4>Backward Propagation</h4><p>Comme nous l’avons dit, supposons que nous ayons une matrice contenant la dérivée de l’erreur par rapport à la sortie de <strong>cette couche</strong> (∂E/∂Y). Nous avons besoin de :</p><ol><li>La dérivée de l’erreur par rapport aux paramètres (∂E/∂W, ∂E/∂B)</li><li>La dérivée de l’erreur par rapport à l’entrée (∂E/∂X)</li></ol><p>Commençons par ∂E/∂W. Cette matrice doit avoir la même taille que W : ixj où i est le nombre de neurones d’entrée et j le nombre de neurones de sortie. Nous avons besoin d’une <strong>dérivée pour chaque paramètre</strong> :</p><figure><img alt="" src="https://cdn-images-1.medium.com/max/401/0*PiVZ-czmfvBaAFSe.png" /></figure><p>En utilisant la règle de dérivation des fonctions composées, on écrit :</p><figure><img alt="" src="https://cdn-images-1.medium.com/max/621/0*cKYlxf87ZwkKtnrt.png" /></figure><p>D’où,</p><figure><img alt="" src="https://cdn-images-1.medium.com/max/521/0*sEeQVqIapym6O9VH.png" /></figure><p>Nous avons notre première formule qui permet d’ajuster les poids ! Calculons à présent ∂E/∂B.</p><figure><img alt="" src="https://cdn-images-1.medium.com/max/516/0*sJVXt05jf2cgd_Ys.png" /></figure><p>Encore une fois, ∂E/∂B doit-être de la même dimension que B: une dérivée par biais.</p><figure><img alt="" src="https://cdn-images-1.medium.com/max/557/0*ud43sxuKpgPo9Rlo.png" /></figure><p>D’où,</p><figure><img alt="" src="https://cdn-images-1.medium.com/max/523/0*sovSd27ja1_7R2yU.png" /></figure><p>Maintenant que nous avons ∂E/∂W et ∂E/∂B nous pouvons ajuster tout les paramètres de cette couche de sorte à diminuer l’erreur! Il nous reste simplement à calculer ∂E/∂X pour que la couche précédente puisse faire les mêmes calculs.</p><figure><img alt="" src="https://cdn-images-1.medium.com/max/526/0*nPzn6Jgv-P0wxUA7.png" /></figure><p>Encore une fois, dérivation de fonctions composées :</p><figure><img alt="" src="https://cdn-images-1.medium.com/max/559/0*x6uE01GkG3NKLNQp.png" /></figure><p>Nous pouvons écrire la matrice entière :</p><figure><img alt="" src="https://cdn-images-1.medium.com/max/1024/0*JWCzIdtJVTeQ_PG8.png" /></figure><p>Et voilà! Nous avons obtenu ces trois formules fondamentale pour la couche FC!</p><figure><img alt="" src="https://cdn-images-1.medium.com/max/246/0*HlI8qj8qZqGIWBrk.png" /></figure><h4>Coder la couche FC</h4><p>Créez un nouveau fichier qui contiendra le code suivant :</p><iframe src="" width="0" height="0" frameborder="0" scrolling="no"><a href="https://medium.com/media/e0e90a001853dbd5dca31a5dd1af4995/href">https://medium.com/media/e0e90a001853dbd5dca31a5dd1af4995/href</a></iframe><h3>Couche d’activation</h3><p>Tous les calculs que nous avons faits jusqu’à présent étaient complètement linéaires. La machine n’apprendra rien avec ce genre de modèle. Nous devons ajouter une <strong>non-linéarité</strong> au modèle en appliquant des fonctions non linéaires à la sortie de certaines couches.</p><p>Nous devons maintenant refaire tout le processus pour ce nouveau type de couche !</p><iframe src="https://cdn.embedly.com/widgets/media.html?src=https%3A%2F%2Fgiphy.com%2Fembed%2FKWbmr5E1UdPUI%2Ftwitter%2Fiframe&amp;display_name=Giphy&amp;url=https%3A%2F%2Fgiphy.com%2Fgifs%2Fway-matt01ss-KWbmr5E1UdPUI&amp;image=https%3A%2F%2Fmedia1.giphy.com%2Fmedia%2FKWbmr5E1UdPUI%2Fgiphy.gif&amp;key=a19fcc184b9711e1b4764040d3dc5c07&amp;type=text%2Fhtml&amp;schema=giphy" width="435" height="233" frameborder="0" scrolling="no"><a href="https://medium.com/media/a32341bb891e1a18b9b74288b4c9bb60/href">https://medium.com/media/a32341bb891e1a18b9b74288b4c9bb60/href</a></iframe><p>Pas de soucis, ça va être bien plus rapide car il n’y a pas de paramètres entraînable. Il suffit de calculer <strong>∂E/∂X</strong>.</p><p>Nous appellerons f et f&#39; la fonction d’activation et sa dérivée, respectivement.</p><figure><img alt="" src="https://cdn-images-1.medium.com/max/385/1*xl7UacJfCUAk_KqX3WI49Q.png" /></figure><h4>Forward Propagation</h4><p>Comme vous le verrez, c’est assez simple. Pour une entrée X donnée, la sortie est simplement la fonction d’activation appliquée à chaque élément de X. Ce qui signifie que l’<strong>entrée</strong> et la <strong>sortie</strong> ont la même dimension.</p><figure><img alt="" src="https://cdn-images-1.medium.com/max/453/0*Aw9jCvpliMjaO00W.png" /></figure><h4>Backward Propagation</h4><p>Étant donné <strong>∂E/∂Y</strong>, nous voulons calculer <strong>∂E/∂X</strong>.</p><figure><img alt="" src="https://cdn-images-1.medium.com/max/841/0*9aXZPVNHXXpz1vd9.png" /></figure><p>Attention, nous utilisons ici une multiplication <strong>scalaire</strong> (élément par élément) entre les deux matrices.</p><h4>Coder la couche d’activation</h4><p>Le code pour la couche d’activation est aussi très simple.</p><iframe src="" width="0" height="0" frameborder="0" scrolling="no"><a href="https://medium.com/media/b85efa76afbd9c312afa1e44ec204d97/href">https://medium.com/media/b85efa76afbd9c312afa1e44ec204d97/href</a></iframe><p>Vous pouvez également écrire certaines fonctions d’activation et leurs dérivées dans un fichier séparé. Elles seront utilisées plus tard pour créer une couche d’activation.</p><iframe src="" width="0" height="0" frameborder="0" scrolling="no"><a href="https://medium.com/media/f15a23c30b9c0dd1f5da7e168dfa4097/href">https://medium.com/media/f15a23c30b9c0dd1f5da7e168dfa4097/href</a></iframe><h3>Fonction de perte</h3><p>Jusqu’à présent, pour une couche donnée, nous supposions que <strong>∂E/∂Y</strong> était donné (par la couche suivante). Mais qu’advient-il de la dernière couche? Comment obtient-elle <strong>∂E/∂Y</strong>? Nous le donnons simplement manuellement, et cela dépend de la façon dont nous définissons l’erreur.</p><p>L’erreur du réseau, qui mesure le degré de performance du modèle pour une entrée donnée, est définie par nous-même. Il existe de nombreuses façons de définir l’erreur, et l’une des plus connues est appelée <strong>MSE — Mean Squared Error.</strong></p><figure><img alt="" src="https://cdn-images-1.medium.com/max/363/0*CuLKvZnTvjT1d6KJ.png" /><figcaption>Mean Squared Error</figcaption></figure><p>Où y* et y désignent respectivement la <strong>sortie souhaitée</strong> et la <strong>sortie obtenue</strong>. Vous pouvez penser à la perte comme une dernière couche qui regroupe tous les neurones de sortie et les écrases en un seul neurone. Ce dont nous avons besoin maintenant, comme pour toutes les autres couches, c’est de définir <strong>∂E/∂Y</strong>. Excepté que maintenant, nous avons enfin “atteint” <strong>E</strong>!</p><figure><img alt="" src="https://cdn-images-1.medium.com/max/588/0*2ggVYLPNH-vIJir6.png" /></figure><p>Et voilà! Il suffira de donner cette valeur à la <strong>dernière</strong> couche lors de la passe arrière, ce qui lui permettra d’ajuster ses paramètres, puis elle calculera <strong>∂E/∂X </strong>qu’elle passera à la couche d’avant, qui fera le même procédé à son tour, etc.</p><p>Nous pouvons implémenter la fonction de perte dans un fichier séparé. Elle sera utilisée lors de la création du réseau.</p><iframe src="" width="0" height="0" frameborder="0" scrolling="no"><a href="https://medium.com/media/8faa28e1cf1d7b073b7011bf20a4038a/href">https://medium.com/media/8faa28e1cf1d7b073b7011bf20a4038a/href</a></iframe><h3>Classe Network</h3><p>Bientôt fini, tenez bon ! Maintenant que nous avons tous nos bloque de code prêt à l’emploi, nous allons faire une classe appelée Network qui permettra de construire ces fameux réseaux de neurones !</p><p>J’ai commenté presque chaque partie du code, il ne devrait pas être trop compliqué à comprendre si vous avez saisi les étapes précédentes. Néanmoins, laissez un commentaire si vous avez des questions, je répondrai avec plaisir !</p><iframe src="" width="0" height="0" frameborder="0" scrolling="no"><a href="https://medium.com/media/0edee416f27e38da88987d223f6ad69d/href">https://medium.com/media/0edee416f27e38da88987d223f6ad69d/href</a></iframe><h3>Construire un réseau de neurone</h3><p>Enfin ! Nous pouvons utiliser notre classe pour créer un réseau de neurones avec autant de couches que nous voulons ! Nous allons construire deux réseaux de neurones: un simple <strong>XOR</strong> et un solveur <strong>MNIST</strong>.</p><h4>Résoudre XOR</h4><p>Commencer par un XOR est toujours important car c’est un moyen simple de savoir si le réseau apprend quelque chose.</p><iframe src="" width="0" height="0" frameborder="0" scrolling="no"><a href="https://medium.com/media/e2cd57305cf937f6c415d310afb2228c/href">https://medium.com/media/e2cd57305cf937f6c415d310afb2228c/href</a></iframe><p>Je ne pense pas avoir besoin d’insister sur beaucoup de choses. Faites attention avec les données d’entrainement, vous devriez toujours avoir la dimension de l’échantillon en premier. Par exemple, avec le problème xor, la dimension des données d’entrées devrait être (4,1,2).</p><h4>Résultat</h4><pre><strong>$ python xor.py</strong> <br>epoch 1/1000 error=0.322980<br>epoch 2/1000 error=0.311174<br>epoch 3/1000 error=0.307195<br>...<br>epoch 998/1000 error=0.000243<br>epoch 999/1000 error=0.000242<br>epoch 1000/1000 error=0.000242</pre><pre>[<br>    array([[ 0.00077435]]),<br>    array([[ 0.97760742]]),<br>    array([[ 0.97847793]]),<br>    array([[-0.00131305]])<br>]</pre><p>Clairement ça fonctionne ! Nous pouvons maintenant résoudre quelque chose de plus intéressant : le dataset MNIST.</p><h4>Résoudre MNIST</h4><iframe src="" width="0" height="0" frameborder="0" scrolling="no"><a href="https://medium.com/media/5c680bc3b49d6ae9926284d8e49709c6/href">https://medium.com/media/5c680bc3b49d6ae9926284d8e49709c6/href</a></iframe><h4>Résultat</h4><pre><strong>$</strong> <strong>python example_mnist_fc.py</strong><br>epoch 1/30   error=0.238658<br>epoch 2/30   error=0.093187<br>epoch 3/30   error=0.073039<br>...<br>epoch 28/30   error=0.011636<br>epoch 29/30   error=0.011306<br>epoch 30/30   error=0.010901</pre><pre><strong>predicted values : </strong><br>[<br>    array([[ 0.119,  0.084 , -0.081,  0.084, -0.068, 0.011,  0.057,  <strong>0.976</strong>, -0.042, -0.0462]]),<br>    array([[ 0.071,  0.211,  <strong>0.501</strong> ,  0.058, -0.020, 0.175,  0.057 ,  0.037,  0.020,  0.107]]),<br>    array([[ 1.197e-01,  <strong>8.794e-01</strong>, -4.410e-04, 4.407e-02, -4.213e-02,  5.300e-02, 5.581e-02,  8.255e-02, -1.182e-01, 9.888e-02]])<br>]<br><strong>true values : </strong><br>[[0. 0. 0. 0. 0. 0. 0. 1. 0. 0.]<br> [0. 0. 1. 0. 0. 0. 0. 0. 0. 0.]<br> [0. 1. 0. 0. 0. 0. 0. 0. 0. 0.]]</pre><p>Ça fonctionne parfaitement! Incroyable :)</p><iframe src="https://cdn.embedly.com/widgets/media.html?src=https%3A%2F%2Fgiphy.com%2Fembed%2Fl3q2XhfQ8oCkm1Ts4%2Ftwitter%2Fiframe&amp;display_name=Giphy&amp;url=https%3A%2F%2Fmedia.giphy.com%2Fmedia%2Fl3q2XhfQ8oCkm1Ts4%2Fgiphy.gif&amp;image=https%3A%2F%2Fi.giphy.com%2Fmedia%2Fl3q2XhfQ8oCkm1Ts4%2Fgiphy.gif&amp;key=d04bfffea46d4aeda930ec88cc64b87c&amp;type=text%2Fhtml&amp;schema=giphy" width="435" height="310" frameborder="0" scrolling="no"><a href="https://medium.com/media/71cc1dd5f467e8e68938f7dffa85b7f3/href">https://medium.com/media/71cc1dd5f467e8e68938f7dffa85b7f3/href</a></iframe><h3>GitHub</h3><p>Vous pouvez trouver le code complet utilisé pour cette publication dans le repo GitHub suivant. Il contient également le code pour d’autres couches comme Convolutives ou bien <em>Flatten</em>.</p><p><a href="https://github.com/OmarAflak/Medium-Python-Neural-Network">GitHub - omaraflak/Medium-Python-Neural-Network: This code is part of my post on Medium.</a></p><iframe src="https://cdn.embedly.com/widgets/media.html?src=https%3A%2F%2Fwww.youtube.com%2Fembed%2FpauPCy_s0Ok%3Ffeature%3Doembed&amp;display_name=YouTube&amp;url=https%3A%2F%2Fwww.youtube.com%2Fwatch%3Fv%3DpauPCy_s0Ok&amp;image=https%3A%2F%2Fi.ytimg.com%2Fvi%2FpauPCy_s0Ok%2Fhqdefault.jpg&amp;key=a19fcc184b9711e1b4764040d3dc5c07&amp;type=text%2Fhtml&amp;schema=youtube" width="854" height="480" frameborder="0" scrolling="no"><a href="https://medium.com/media/7ee24b0358ff07ee6f03e97e6133fe5e/href">https://medium.com/media/7ee24b0358ff07ee6f03e97e6133fe5e/href</a></iframe><h3>Si vous avez aimé cet article — Quelques claps 👏 m’aideraient beaucoup. Peace! 😎</h3><img src="https://medium.com/_/stat?event=post.clientViewed&referrerSource=full_rss&postId=613d8e83541" width="1" height="1" alt=""><hr><p><a href="https://medium.com/france-school-of-ai/math%C3%A9matiques-des-r%C3%A9seaux-de-neurones-code-python-613d8e83541">Mathématiques des réseaux de neurones — code Python</a> was originally published in <a href="https://medium.com/france-school-of-ai">France School of AI</a> on Medium, where people are continuing the conversation by highlighting and responding to this story.</p>]]></content:encoded>
        </item>
        <item>
            <title><![CDATA[A road towards happiness]]></title>
            <link>https://omaraflak.medium.com/https-medium-com-omaraflak-a-road-towards-happiness-a18c21f0a4d5?source=rss-c215fdc67eb------2</link>
            <guid isPermaLink="false">https://medium.com/p/a18c21f0a4d5</guid>
            <category><![CDATA[education]]></category>
            <category><![CDATA[happiness]]></category>
            <category><![CDATA[passion]]></category>
            <category><![CDATA[schools]]></category>
            <category><![CDATA[work]]></category>
            <dc:creator><![CDATA[Omar Aflak]]></dc:creator>
            <pubDate>Tue, 01 Jan 2019 12:26:20 GMT</pubDate>
            <atom:updated>2019-01-01T12:26:20.887Z</atom:updated>
            <content:encoded><![CDATA[<p>If you’re reading this at the time it is published then happy new year to you ! 🎉🥂 I thought I would publish something a bit different from the usual stuff, so let’s talk about happiness :)</p><p>Last year, I stumbled upon an old Instagram story from Will Smith where he talks about happiness. He said that <strong><em>your happiness is your responsibility</em>.</strong> I found this idea interesting and I started thinking about it.</p><figure><img alt="" src="https://cdn-images-1.medium.com/max/1024/0*PeymqeOwt-RBy9yW" /><figcaption>Photo by <a href="https://unsplash.com/@robschreckhise?utm_source=medium&amp;utm_medium=referral">Rob Schreckhise</a> on <a href="https://unsplash.com?utm_source=medium&amp;utm_medium=referral">Unsplash</a></figcaption></figure><h3>Your Happiness</h3><p>Regardless of the time or place you live in, regardless of your gender, regardless of your age, your religion, and beliefs, no matter who you are, the end goal of every action you ever take is <strong>your happiness</strong>.</p><p>As humans living in the 21st century, we get happy or fulfilled through two major aspects of our lives. Snapchat and Instagram. Just kidding… <strong>Personal life </strong>(family, friends, partners…)<strong><em> </em></strong>and <strong>professional</strong> <strong>life</strong> (work).</p><p>Unfortunately, rare are the persons who are fulfilled by these two aspects simultaneously, and the truth is you really need<strong> both of them</strong> to be happy<em>. </em>Since<em> </em><strong>we spend most of our time at work</strong>, it seems reasonable to assume that we need to be fulfilled by our professional life; having a good personal life is not enough. In this post I’d like to focus on the <em>work </em>aspect.</p><h3><strong>Why some people get fulfilled by their work ?</strong></h3><blockquote>Your work is going to fill a large part of your life, and the only way to be truly satisfied is to do what you believe is great work. And the only way to do great work is to love what you do. — Steve Jobs</blockquote><p><strong>Passion</strong>, gives us a sense of purpose, a mission, a reason to never settle and to always improve ourselves. Passion unlocks our potential, it gives us strength and makes us proud of who we are becoming, <em>satisfied</em>, because we are actually moving forward.</p><h4>Passion comes within the right environment</h4><p>While some people can have predispositions to be better at something and hence like it, passion can only be brought to life within the <strong>right environment </strong>: what surrounds us in our everyday life. As we grow up, as children or students, our environment is essentially composed from <strong>family and friends</strong>,<strong> </strong>and <strong>schools</strong>. For now let’s focus on family.</p><h4>Why family is so important in the process of finding your passion ?</h4><p>I like to think of it, like Simon Sinek thinks of employees in a company.</p><blockquote>If we trust each other we will turn our backs, we will take risks, we will innovate, we will do things that will change the course of our world. If I don’t trust you, I can’t do that. — Simon Sinek</blockquote><p>In <a href="https://youtu.be/ReRcHdeUG9Y?t=1276">this talk</a>, he is saying that leaders should expand the “<em>circle of trust</em>” of their companies from the high hierarchy employees to the most junior ones, so that everyone feel like they belong to the company and use their time and energy to actually innovate and be productive, not protecting themselves.</p><p><strong>It is essentially the same thing in families.</strong></p><p>I had the chance to grow in a family where <em>for example</em>, of course it doesn’t simply boil down to that, <em>“my money is your money, and your money is my money”. </em>I realized later looking at friends around me that it isn’t always the case (something more like <em>“my money is my money, and your money is your money”</em>).</p><p>While I had the <strong>time </strong>to keep practicing and learning the things I love (programming), some of my friends had to figure out a way to <strong><em>“repay”</em></strong> their parents for money they’ve spent. They had to find <strong>irrelevant</strong> jobs to get money while I was <strong>investing time in my own future, practicing</strong>.</p><p>Today, because of all the time I’ve spent learning and improving myself in the areas I like, I can easily find internships when I need to, and I have a lot of experience in programming. But all this wouldn’t have been possible, if the <strong>environment</strong> I grew up in wasn’t <strong>suited</strong> for that. Of course it is not simply about money, my parents taught me to always be eager to discover new things, and supported me no matter what.</p><h4>A side note about money…</h4><p>Money isn’t important, you can always manage to get some. You spend it today, you’ll make more tomorrow. Whereas if you spend your time, you’ll never get it back; and this time is even more valuable when you’re young, <em>don’t waste it</em>.</p><p>So if you’re a parent and you want the best thing for your kids, teach them to <strong>enjoy learning</strong> and allow them to <strong>use</strong> their time. The more time they have, the more they can discover new fields, the more likely they are to find something<strong> they love</strong>.</p><p><em>Passion comes within the right environment.</em></p><h4>What about schools ?</h4><p>As I said earlier, as we grow up, the two major components of our environment are <strong>family</strong> and <strong>schools</strong>.</p><p>Since, as children (or students) we spend most of our time in schools (just like work when we grow up), we could easily argue that they are <strong>at least</strong> as important as families in the process of <strong>education</strong>. That’s why we refer to schools as education ! Because schools are not simply supposed to give knowledge, <strong>they’re supposed to educate !</strong></p><p>Therefore, and this is actually <strong>why </strong>I’ve been writing this whole post, I would like you to think about that :</p><h3><strong><em>Are schools educating children to be happy ? i.e. to find their passion ? </em></strong>🤔</h3><img src="https://medium.com/_/stat?event=post.clientViewed&referrerSource=full_rss&postId=a18c21f0a4d5" width="1" height="1" alt="">]]></content:encoded>
        </item>
    </channel>
</rss>