Docker run vs exec

Rishab — Mon, 11 Aug 2025 02:42:46 GMT

Docker run

It is a compound function that translates into a sequence of more basic docker commands:
image pull -> container create -> container attach -> network connect -> container start

The container runtime shim component above acts as a server. It provides RPC means (basically a UNIX socket) to connect to it. And then it starts streaming the container’s stdout and stderr back to your end of the socket! It can also read from this socket and forward the data to the container’s stdin.

Docker exec

It is used to execute a command or start an interactive shell inside of an already running container.
Note: it may resemble attach a bit because an existing running container is involved, however attach merely relays the stdio stream of the running container to your terminal while what exec really does is start another temporary container inside of the existing one.
In doing that, it retains all the properties of the existing container such as the net, pid, mount, etc. namespaces, same cgroups hierarchy, etc.

References

Ivan Velichko — Iximiuz Labs

Java Virtual Threads

Rishab — Sun, 06 Apr 2025 17:17:33 GMT

References:

All features:

Virtual threads notes:

References:

All features:

In-memory kv store with TTL as a standalone app
Interaction with DB callers via HTTP
Requests will be handled by virtual threads instead of one platform thread per request
Fault tolerance
Ability to take snapshots (history) and recover from it
Key level RBAC but should be extensible
Separate DBs within for different users

Virtual threads notes:

Why and what are virtual threads?

a thread is managed and scheduled by the operating system, while a virtual thread is managed and scheduled by a virtual machine.

They are an alternate implementation of the java.lang.Thread type, which stores the stack frames in the heap (garbage-collected memory) instead of the stack.

It’s worth mentioning that cooperative scheduling is helpful when working in a highly collaborative environment. Since a virtual thread releases its carrier thread only when reaching a blocking operation, cooperative scheduling and virtual threads will not improve the performance of CPU-intensive applications. The JVM already gives us a tool for those tasks: Java parallel streams.

However, there are some cases where a blocking operation doesn’t unmount the virtual thread from the carrier thread, blocking the underlying carrier thread. In such cases, we say the virtual is pinned to the carrier thread. It’s not an error but a behavior that limits the application’s scalability. Note that if a carrier thread is pinned, the JVM can always add a new platform thread to the carrier pool if the configurations of the carrier pool allow it.
Fortunately, there are only two cases in which a virtual thread is pinned to the carrier thread:

When it executes code inside a synchronized block or method;
When it calls a native method or a foreign function (i.e., a call to a native library using JNI).

A virtual thread composes of two things:

1. A continuation: an execution unit that can be started, then parketd (yielded) rescheduled back and resumes its execution in that same way from where it left off an d still be managed by a JVM instead of relying on an operating system

2. A scheduler: ForkJoin pool by default

Citations:

Difference Between Thread and Virtual Thread in Java | Baeldung

The Ultimate Guide to Java Virtual Threads | Rock the JVM

Setting up a data pipeline in Python from a UNIX based OS to an ODBC based DBMS

Rishab — Sun, 12 Jan 2020 10:30:12 GMT

How I setup an ETL pipeline in Python from a UNIX based OS to an ODBC based DBMS

Hello, this article will describe how I set up an ETL pipeline entirely in Python 3.6.3 for my data housed on an RHEL CentOS 6.0, to be sent ultimately to my DB hosted on a Microsoft SQL Server 11.

The 2 main libraries that we’ll use, are:

Turbodbc: Performance-wise, it’s the best way to push data via Python to ODBC based databases (IMO, of course). But you should take my word for it. Also, an important piece of information before we get started with this library: At the end of the day, Turbodbc is very much like just another Python library that you can install via pip or from its source. Be careful with the requirements of the package, especially if you’re building it on a vanilla development OS (which was my case), as there could be some standard header packages that your OS might me missing.
In such a scenario, the following commands may prove helpful:

$ sudo apt install gcc
$ sudo yum install

Source: Turbodbc documentation

Pandas: Everybody familiar with data manipulation in Python is already aware of the capabilities of this library. We used it primarily to fit our source data into DataFrames for easy and efficient manipulation. TBH, the fact that Turbodbc allows writing a DataFrame directly, didn’t really leave us with much doubt. You may install it via pip or choose to build it from its source.

Let’s continue with setting the necessary things up on the scripts server (my RHEL CentOS box). Once we are done installing the 2 aforementioned libraries with their necessary components (mainly for Turbodbc), we must check their installation as well so that they are in line with the requirements of Turbodbc. To do so, we can execute the following commands:

gcc:
$ gcc --version
libboost-all-dev:
$ dpkg -s libboost-dev | grep 'Version'
python-devel: you can check in /usr/include/
unixODBC-dev:
$ odbcinst -j

Once we are certain that the versions are right and compatible, we proceed to getting the actual ODBC drivers required for the connection to our DB. One can simply refer to this post by Microsoft to set things up, or read further for my step-by-step guide for the same.

We start by identifying the drivers that we need. We do so based on the version of our OS, as can be checked on the same page mentioned earlier. In our case, we were building the pipeline on a RHEL CentOS 6.0, hence we choose to go with Microsoft ODBC Driver 13.1 for SQL Server.
The next step is getting the .rpm files of the drivers to the CentOS machine. For that, we need the URL of the driver, which can be fetched from this page. Identify the OS you’re setting up the driver for and you’re taken to the list of the drivers with all their versions for that specific OS. In our case, it was this page. From the list of .rpm files, we pick the msodbcsql with the necessary version. For us, it turned out to be msodbcsql-13.1.9.2–1.x86_64.rpm, and the complete URL becomes:
https://packages.microsoft.com/rhel/6/prod/msodbcsql-13.1.9.2-1.x86_64.rpm

Now, to get this driver onto the target machine, on our CentOS box, we execute the following wget command:

$ wget https://packages.microsoft.com/rhel/6/prod/msodbcsql-13.1.9.2-1.x86_64.rpm

The .rpm file can be downloaded to any directory of your choice. You just need to have the necessary permissions to view that directory as you’ll need to be able to install the driver. Now, in the same directory, we run the following to install the driver:

$ sudo yum localinstall msodbcsql-13.1.9.2-1.x86_64.rpm

We’ve installed the MSODBC Driver! But, first we check the installation:

$ ls -l /opt/microsoft/msodbcsql/lib64/
total 16364
-rwxr-xr-x. 1 root root 16753837 Jan  4  2018 libmsodbcsql-13.1.so.9.2

We’ll also check the the odbcinst.ini file as it needs to catch the driver we just installed:

$ cat /etc/odbcinst.ini
[ODBC Driver 13 for SQL Server]
Description=Microsoft ODBC Driver 13 for SQL Server
Driver=/opt/microsoft/msodbcsql/lib64/libmsodbcsql-13.1.so.9.2

Upon getting a similar output, we can confirm that the driver has been installed successfully.

The next step in the process is to connect to the SQL Server database from our CentOS machine, via Python. For that, we wrote a config.json file that has the following parameters:

"driver": "ODBC Driver 13 for SQL Server",
"server": "123.456.78.90,14001",
"schema": "MY_SCHEMA",
"database": "MY_DATABASE",
"data_table": "MY_TABLE",
"username": "MY_USER",
"password": "MY_PWD"

A few points above the above file that the reader must be careful with:

driver: the name of the driver that we set up in the previous section.
server: the IP of the SQL Server housing the database, with default port 14001
schema: the schema of the database object
database: the name of the database
data_table: the name of the table
username: as we used SQL authentication instead of Microsoft authentication, we specify the same in the configuration file
password: the password of the corresponding SQL Server username (of course, you shouldn’t store your password in plain sight; more on that in a future article)
Note: the username and password mentioned on the config.json file must have the necessary permissions to the SQL Server database.

With the above parameters, we define our connection object as:

import json
from turbodbc import connect

with open('config.json') as json_config:
        config = json.load(json_config)

connection = connect(driver=config['driver'],
                            server=config['server'],
                            database=config['database'],
                            username=config['username'],
                            pwd=config['password']
                    )

Now, moving further along in our pipeline, let’s prepare our DataFrame. Pandas makes it quite easy for us to bring our data from a bunch of formats into a Pandas DataFrame. For more information on that, I’ll have a whole another article dedicated to it linked here (in future). For now, let’s assume that our data resides in a Pandas DataFrame df.

The next and final step, would be to write this df to the database. We write a turbo_write function which looks like this:

def turbo_write(connection, df, config):
    column_string = '('
    column_string += ', '.join(df.columns)
    column_string += ')'
 values_holder = ['?' for col in df.columns]
 value_string = '('
 value_string += ', '.join(val_holder)
 value_string += ')'
 sql_query = f"""
    INSERT INTO {config['database']}.{config['schema']}.{config['table']} {column_string}
    VALUES {value_string}
    """
 # writing array of values for turbodbc
    df_values = [df[col].values for col in df.columns]
 # cleans the previous head insert
    # can be ignored if data is to be appended to an existing table
    '''
    with connection.cursor() as cursor:
        cursor.execute(f"delete from {config['database']}.config['schema'].{config['table']}")
        connection.commit()
    '''
 # inserts the data
    with connection.cursor() as cursor:
        try:
            print(sql) # for better understanding
            cursor.executemanycolumns(sql_query, df_values)
            connection.commit()
        except Exception as e:
            connection.rollback()
            print('Insert errored out: ' + str(e))

# Credits to https://medium.com/@erickfis for the inspiration

Passing the connection object created earlier, DataFrame to be written to the database and the configuration file to turbo_write, writes the required data to the required database successfully.

Stories by Rishab on Medium

Docker run vs exec

Docker run

Docker exec

References

Java Virtual Threads

Contents

References:

All features:

Virtual threads notes:

Setting up a data pipeline in Python from a UNIX based OS to an ODBC based DBMS

How I setup an ETL pipeline in Python from a UNIX based OS to an ODBC based DBMS