<?xml version="1.0" encoding="UTF-8"?><rss xmlns:dc="http://purl.org/dc/elements/1.1/" xmlns:content="http://purl.org/rss/1.0/modules/content/" xmlns:atom="http://www.w3.org/2005/Atom" version="2.0" xmlns:cc="http://cyber.law.harvard.edu/rss/creativeCommonsRssModule.html">
    <channel>
        <title><![CDATA[Stories by Rishab on Medium]]></title>
        <description><![CDATA[Stories by Rishab on Medium]]></description>
        <link>https://medium.com/@rishab137?source=rss-bd7904d12ec6------2</link>
        <image>
            <url>https://cdn-images-1.medium.com/fit/c/150/150/0*LI3UxFH1Sqg4m6eZ.jpg</url>
            <title>Stories by Rishab on Medium</title>
            <link>https://medium.com/@rishab137?source=rss-bd7904d12ec6------2</link>
        </image>
        <generator>Medium</generator>
        <lastBuildDate>Thu, 28 May 2026 11:31:32 GMT</lastBuildDate>
        <atom:link href="https://medium.com/@rishab137/feed" rel="self" type="application/rss+xml"/>
        <webMaster><![CDATA[yourfriends@medium.com]]></webMaster>
        <atom:link href="http://medium.superfeedr.com" rel="hub"/>
        <item>
            <title><![CDATA[Docker run vs exec]]></title>
            <link>https://medium.com/@rishab137/docker-run-vs-exec-8f88da152fb5?source=rss-bd7904d12ec6------2</link>
            <guid isPermaLink="false">https://medium.com/p/8f88da152fb5</guid>
            <category><![CDATA[docker]]></category>
            <category><![CDATA[containers]]></category>
            <dc:creator><![CDATA[Rishab]]></dc:creator>
            <pubDate>Mon, 11 Aug 2025 02:42:46 GMT</pubDate>
            <atom:updated>2025-08-11T02:42:46.306Z</atom:updated>
            <content:encoded><![CDATA[<h3>Docker run</h3><p>It is a compound function that translates into a sequence of more basic docker commands:<br>image pull -&gt; container create -&gt; container attach -&gt; network connect -&gt; container start</p><figure><img alt="" src="https://cdn-images-1.medium.com/max/771/1*-NDidsHm4WArm_LeQK-Hhg.png" /></figure><p>The container runtime shim component above acts as a server. It provides RPC means (basically a UNIX socket) to connect to it. And then it starts streaming the container’s stdout and stderr back to your end of the socket! It can also read from this socket and forward the data to the container’s stdin.</p><h3>Docker exec</h3><p>It is used to execute a command or start an interactive shell inside of an already running container.<br>Note: it may resemble attach a bit because an existing running container is involved, however attach merely relays the stdio stream of the running container to your terminal while what exec really does is start another temporary container inside of the existing one.<br>In doing that, it retains all the properties of the existing container such as the net, pid, mount, etc. namespaces, same cgroups hierarchy, etc.</p><figure><img alt="" src="https://cdn-images-1.medium.com/max/764/1*-juop1wwJVyJabvWfXBohQ.png" /></figure><h3>References</h3><p>Ivan Velichko — Iximiuz Labs</p><img src="https://medium.com/_/stat?event=post.clientViewed&referrerSource=full_rss&postId=8f88da152fb5" width="1" height="1" alt="">]]></content:encoded>
        </item>
        <item>
            <title><![CDATA[Java Virtual Threads]]></title>
            <link>https://medium.com/@rishab137/java-virtual-threads-dcdaf312dbe5?source=rss-bd7904d12ec6------2</link>
            <guid isPermaLink="false">https://medium.com/p/dcdaf312dbe5</guid>
            <category><![CDATA[virtual-threads]]></category>
            <category><![CDATA[java]]></category>
            <dc:creator><![CDATA[Rishab]]></dc:creator>
            <pubDate>Sun, 06 Apr 2025 17:17:33 GMT</pubDate>
            <atom:updated>2025-04-06T17:17:33.808Z</atom:updated>
            <content:encoded><![CDATA[<h3>Contents</h3><p><a href="https://docs.google.com/document/d/1bmAyAowFvzxha34yHQ93-nUctrTL7G0Il9HSXdhHNRo/edit?tab=t.0#heading=h.hbevzn8gwk8n">References:</a></p><p><a href="https://docs.google.com/document/d/1bmAyAowFvzxha34yHQ93-nUctrTL7G0Il9HSXdhHNRo/edit?tab=t.0#heading=h.ukza0i9wkugq">All features:</a></p><p><a href="https://docs.google.com/document/d/1bmAyAowFvzxha34yHQ93-nUctrTL7G0Il9HSXdhHNRo/edit?tab=t.0#heading=h.r3jfr7flethk">Virtual threads notes:</a></p><h3>References:</h3><ol><li><a href="https://dicedb.io/">https://dicedb.io/</a></li><li><a href="https://redis.io/">https://redis.io/</a></li></ol><h3>All features:</h3><ol><li>In-memory kv store with TTL as a standalone app</li><li>Interaction with DB callers via HTTP</li><li>Requests will be handled by virtual threads instead of one platform thread per request</li><li>Fault tolerance</li><li>Ability to take snapshots (history) and recover from it</li><li>Key level RBAC but should be extensible</li><li>Separate DBs within for different users</li></ol><h3>Virtual threads notes:</h3><p>Why and what are virtual threads?</p><p><strong>a thread is managed and scheduled by the operating system, while a virtual thread is managed and scheduled by a virtual machine</strong>.</p><p>They are an alternate implementation of the <strong>java.lang.Thread</strong> type, which <strong>stores the stack frames in the heap (garbage-collected memory) instead of the stack</strong>.</p><p>It’s worth mentioning that cooperative scheduling is helpful when working in a highly collaborative environment. Since a virtual thread releases its carrier thread only when reaching a blocking operation, cooperative scheduling and virtual threads will not improve the performance of CPU-intensive applications. The JVM already gives us a tool for those tasks: Java parallel streams.</p><p>However, <strong>there are some cases where a blocking operation doesn’t unmount the virtual thread from the carrier thread</strong>, blocking the underlying carrier thread. In such cases, we say the virtual is <em>pinned</em> to the carrier thread. It’s not an error but a behavior that limits the application’s scalability. Note that if a carrier thread is pinned, the JVM can always add a new platform thread to the carrier pool if the configurations of the carrier pool allow it.<br> Fortunately, there are only two cases in which a virtual thread is pinned to the carrier thread:</p><ul><li>When it executes code inside a <strong>synchronized</strong> block or method;</li><li>When it calls a native method or a foreign function (i.e., a call to a native library using JNI).</li></ul><p>A virtual thread composes of two things:</p><p>1. A continuation: an execution unit that can be started, then parketd (yielded) rescheduled back and resumes its execution in that same way from where it left off an d still be managed by a JVM instead of relying on an operating system</p><p>2. A scheduler: ForkJoin pool by default</p><p>Citations:</p><p><a href="https://www.baeldung.com/java-virtual-thread-vs-thread">Difference Between Thread and Virtual Thread in Java | Baeldung</a></p><p><a href="https://rockthejvm.com/articles/the-ultimate-guide-to-java-virtual-threads">The Ultimate Guide to Java Virtual Threads | Rock the JVM</a></p><img src="https://medium.com/_/stat?event=post.clientViewed&referrerSource=full_rss&postId=dcdaf312dbe5" width="1" height="1" alt="">]]></content:encoded>
        </item>
        <item>
            <title><![CDATA[Setting up a data pipeline in Python from a UNIX based OS to an ODBC based DBMS]]></title>
            <link>https://medium.com/@rishab137/setting-up-a-data-pipeline-in-python-from-a-unix-based-os-to-an-odbc-based-dbms-fc5be035678f?source=rss-bd7904d12ec6------2</link>
            <guid isPermaLink="false">https://medium.com/p/fc5be035678f</guid>
            <category><![CDATA[python]]></category>
            <category><![CDATA[odbc]]></category>
            <category><![CDATA[etl]]></category>
            <category><![CDATA[python3]]></category>
            <category><![CDATA[sql]]></category>
            <dc:creator><![CDATA[Rishab]]></dc:creator>
            <pubDate>Sun, 12 Jan 2020 10:30:12 GMT</pubDate>
            <atom:updated>2025-04-06T17:15:34.425Z</atom:updated>
            <content:encoded><![CDATA[<h3>How I setup an ETL pipeline in Python from a UNIX based OS to an ODBC based DBMS</h3><p>Hello, this article will describe how I set up an ETL pipeline entirely in <strong>Python 3.6.3</strong> for my data housed on an <strong>RHEL CentOS 6.0</strong>, to be sent ultimately to my DB hosted on a <strong>Microsoft SQL Server 11</strong>.</p><p>The 2 main libraries that we’ll use, are:</p><ul><li><a href="https://turbodbc.readthedocs.io/en/latest/"><strong>Turbodbc</strong></a>: Performance-wise, it’s the best way to push data via Python to ODBC based databases (IMO, of course). But you should take my word for it. Also, an important piece of information before we get started with this library: <em>At the end of the day, Turbodbc is very much like just another Python library that you can install via </em><strong><em>pip </em></strong><em>or from its source. Be careful with the requirements of the package, especially if you’re building it on a vanilla development OS (which was my case), as there could be some standard header packages that your OS might me missing.</em><br>In such a scenario, the following commands may prove helpful:</li></ul><pre>$ sudo apt install gcc<br>$ sudo yum install &lt;any-header-package-dev&gt;</pre><figure><img alt="The requirements for setting up Turbodbc package on your system" src="https://cdn-images-1.medium.com/max/515/1*K9MF4dDY4dYcmm1uzbocZQ.png" /><figcaption>Source: Turbodbc documentation</figcaption></figure><ul><li><a href="https://pandas.pydata.org/"><strong>Pandas</strong></a>: Everybody familiar with data manipulation in Python is already aware of the capabilities of this library. We used it primarily to fit our source data into DataFrames for easy and efficient manipulation. TBH, the fact that Turbodbc allows writing a DataFrame directly, didn’t really leave us with much doubt. You may install it via <strong>pip</strong> or choose to build it from its source.</li></ul><p>Let’s continue with setting the necessary things up on the scripts server (my RHEL CentOS box). Once we are done installing the 2 aforementioned libraries with their necessary components (mainly for Turbodbc), we must check their installation as well so that they are in line with the requirements of Turbodbc. To do so, we can execute the following commands:</p><ul><li><strong>gcc</strong>:<br>$ gcc --version</li><li><strong>libboost-all-dev</strong>:<br>$ dpkg -s libboost-dev | grep &#39;Version&#39;</li><li><strong>python-devel</strong>: you can check in <em>/usr/include/</em></li><li><strong>unixODBC-dev</strong>:<br>$ odbcinst -j</li></ul><p>Once we are certain that the versions are right and compatible, we proceed to getting the actual ODBC drivers required for the connection to our DB. One can simply refer to <a href="https://docs.microsoft.com/en-us/sql/connect/odbc/linux-mac/installing-the-microsoft-odbc-driver-for-sql-server?redirectedfrom=MSDN&amp;view=sql-server-ver15">this</a> post by Microsoft to set things up, or read further for my step-by-step guide for the same.</p><p>We start by identifying the drivers that we need. We do so based on the version of our OS, as can be checked on the same page mentioned earlier. In our case, we were building the pipeline on a RHEL CentOS 6.0, hence we choose to go with Microsoft ODBC Driver 13.1 for SQL Server.<br>The next step is getting the .rpm files of the drivers to the CentOS machine. For that, we need the URL of the driver, which can be fetched from this <a href="https://docs.microsoft.com/en-us/sql/connect/odbc/download-odbc-driver-for-sql-server?view=sql-server-ver15">page</a>. Identify the OS you’re setting up the driver for and you’re taken to the list of the drivers with all their versions for that specific OS. In our case, it was <a href="https://packages.microsoft.com/rhel/6/prod/">this</a> page. From the list of .rpm files, we pick the <em>msodbcsql</em> with the necessary version. For us, it turned out to be <a href="https://packages.microsoft.com/rhel/6/prod/msodbcsql-13.1.9.2-1.x86_64.rpm">msodbcsql-13.1.9.2–1.x86_64.rpm</a>, and the complete URL becomes:<br><a href="https://packages.microsoft.com/rhel/6/prod/msodbcsql-13.1.9.2-1.x86_64.rpm">https://packages.microsoft.com/rhel/6/prod/msodbcsql-13.1.9.2-1.x86_64.rpm</a></p><p>Now, to get this driver onto the target machine, on our CentOS box, we execute the following <a href="https://www.tecmint.com/10-wget-command-examples-in-linux/"><em>wget</em></a> command:</p><pre>$ wget <a href="https://packages.microsoft.com/rhel/6/prod/msodbcsql-13.1.9.2-1.x86_64.rpm">https://packages.microsoft.com/rhel/6/prod/msodbcsql-13.1.9.2-1.x86_64.rpm</a></pre><p>The .rpm file can be downloaded to any directory of your choice. You just need to have the necessary permissions to view that directory as you’ll need to be able to install the driver. Now, in the same directory, we run the following to install the driver:</p><pre>$ sudo yum localinstall msodbcsql-13.1.9.2-1.x86_64.rpm</pre><p>We’ve installed the MSODBC Driver! But, first we check the installation:</p><pre>$ ls -l /opt/microsoft/msodbcsql/lib64/<br>total 16364<br>-rwxr-xr-x. 1 root root 16753837 Jan  4  2018 libmsodbcsql-13.1.so.9.2</pre><p>We’ll also check the the odbcinst.ini file as it needs to catch the driver we just installed:</p><pre>$ cat /etc/odbcinst.ini<br>[ODBC Driver 13 for SQL Server]<br>Description=Microsoft ODBC Driver 13 for SQL Server<br>Driver=/opt/microsoft/msodbcsql/lib64/libmsodbcsql-13.1.so.9.2</pre><p>Upon getting a similar output, we can confirm that the driver has been installed successfully.</p><p>The next step in the process is to connect to the SQL Server database from our CentOS machine, via Python. For that, we wrote a <em>config.json</em> file that has the following parameters:</p><pre>&quot;driver&quot;: &quot;ODBC Driver 13 for SQL Server&quot;,<br>&quot;server&quot;: &quot;123.456.78.90,14001&quot;,<br>&quot;schema&quot;: &quot;MY_SCHEMA&quot;,<br>&quot;database&quot;: &quot;MY_DATABASE&quot;,<br>&quot;data_table&quot;: &quot;MY_TABLE&quot;,<br>&quot;username&quot;: &quot;MY_USER&quot;,<br>&quot;password&quot;: &quot;MY_PWD&quot;</pre><p>A few points above the above file that the reader must be careful with:</p><ul><li><em>driver</em>: the name of the driver that we set up in the previous section.</li><li><em>server</em>: the IP of the SQL Server housing the database, with default port 14001</li><li><em>schema</em>: the schema of the database object</li><li><em>database</em>: the name of the database</li><li><em>data_table</em>: the name of the table</li><li><em>username</em>: as we used SQL authentication instead of Microsoft authentication, we specify the same in the configuration file</li><li><em>password</em>: the password of the corresponding SQL Server username (of course, you shouldn’t store your password in plain sight; more on that in a future article)<br><strong>Note: the username and password mentioned on the <em>config.json</em> file must have the necessary permissions to the SQL Server database.</strong></li></ul><p>With the above parameters, we define our connection object as:</p><pre>import json<br>from turbodbc import connect</pre><pre>with open(&#39;config.json&#39;) as json_config:<br>        config = json.load(json_config)</pre><pre>connection = connect(driver=config[&#39;driver&#39;],<br>                            server=config[&#39;server&#39;],<br>                            database=config[&#39;database&#39;],<br>                            username=config[&#39;username&#39;],<br>                            pwd=config[&#39;password&#39;]<br>                    )</pre><p>Now, moving further along in our pipeline, let’s prepare our DataFrame. Pandas makes it quite easy for us to bring our data from a bunch of formats into a Pandas DataFrame. For more information on that, I’ll have a whole another article dedicated to it linked here (in future). For now, let’s assume that our data resides in a Pandas DataFrame <em>df</em>.</p><p>The next and final step, would be to write this <em>df</em> to the database. We write a <em>turbo_write</em> function which looks like this:</p><pre>def turbo_write(connection, df, config):<br>    column_string = &#39;(&#39;<br>    column_string += &#39;, &#39;.join(df.columns)<br>    column_string += &#39;)&#39;<br> values_holder = [&#39;?&#39; for col in df.columns]<br> value_string = &#39;(&#39;<br> value_string += &#39;, &#39;.join(val_holder)<br> value_string += &#39;)&#39;<br> sql_query = f&quot;&quot;&quot;<br>    INSERT INTO {config[&#39;database&#39;]}.{config[&#39;schema&#39;]}.{config[&#39;table&#39;]} {column_string}<br>    VALUES {value_string}<br>    &quot;&quot;&quot;<br> # writing array of values for turbodbc<br>    df_values = [df[col].values for col in df.columns]<br> # cleans the previous head insert<br>    # can be ignored if data is to be appended to an existing table<br>    &#39;&#39;&#39;<br>    with connection.cursor() as cursor:<br>        cursor.execute(f&quot;delete from {config[&#39;database&#39;]}.config[&#39;schema&#39;].{config[&#39;table&#39;]}&quot;)<br>        connection.commit()<br>    &#39;&#39;&#39;<br> # inserts the data<br>    with connection.cursor() as cursor:<br>        try:<br>            print(sql) # for better understanding<br>            cursor.executemanycolumns(sql_query, df_values)<br>            connection.commit()<br>        except Exception as e:<br>            connection.rollback()<br>            print(&#39;Insert errored out: &#39; + str(e))</pre><pre># Credits to <a href="https://medium.com/@erickfis">https://medium.com/@erickfis</a> for the inspiration</pre><p>Passing the connection object created earlier, DataFrame to be written to the database and the configuration file to <em>turbo_write</em>, writes the required data to the required database successfully.</p><img src="https://medium.com/_/stat?event=post.clientViewed&referrerSource=full_rss&postId=fc5be035678f" width="1" height="1" alt="">]]></content:encoded>
        </item>
    </channel>
</rss>