<?xml version="1.0" encoding="UTF-8"?><rss xmlns:dc="http://purl.org/dc/elements/1.1/" xmlns:content="http://purl.org/rss/1.0/modules/content/" xmlns:atom="http://www.w3.org/2005/Atom" version="2.0" xmlns:cc="http://cyber.law.harvard.edu/rss/creativeCommonsRssModule.html">
    <channel>
        <title><![CDATA[Stories by ServerMO on Medium]]></title>
        <description><![CDATA[Stories by ServerMO on Medium]]></description>
        <link>https://medium.com/@ServerMO?source=rss-4d43a52b335e------2</link>
        <image>
            <url>https://cdn-images-1.medium.com/fit/c/150/150/1*9aB5YyU2HvhhNciVHu9BAg.png</url>
            <title>Stories by ServerMO on Medium</title>
            <link>https://medium.com/@ServerMO?source=rss-4d43a52b335e------2</link>
        </image>
        <generator>Medium</generator>
        <lastBuildDate>Sat, 23 May 2026 09:11:07 GMT</lastBuildDate>
        <atom:link href="https://medium.com/@ServerMO/feed" rel="self" type="application/rss+xml"/>
        <webMaster><![CDATA[yourfriends@medium.com]]></webMaster>
        <atom:link href="http://medium.superfeedr.com" rel="hub"/>
        <item>
            <title><![CDATA[The Enterprise Guide to Self-Hosting DeepSeek V4 on Bare Metal]]></title>
            <link>https://medium.com/@ServerMO/the-enterprise-guide-to-self-hosting-deepseek-v4-on-bare-metal-6dd3a3a10751?source=rss-4d43a52b335e------2</link>
            <guid isPermaLink="false">https://medium.com/p/6dd3a3a10751</guid>
            <category><![CDATA[artificial-intelligence]]></category>
            <category><![CDATA[devops]]></category>
            <category><![CDATA[cybersecurity]]></category>
            <category><![CDATA[software-engineering]]></category>
            <category><![CDATA[machine-learning]]></category>
            <dc:creator><![CDATA[ServerMO]]></dc:creator>
            <pubDate>Thu, 21 May 2026 05:21:37 GMT</pubDate>
            <atom:updated>2026-05-21T05:21:37.021Z</atom:updated>
            <content:encoded><![CDATA[<p><strong>Escaping the API tax with precise VRAM math, WekaFS parallel storage, and Kong API security on ServerMO.</strong></p><figure><img alt="" src="https://cdn-images-1.medium.com/max/1024/1*5OAfzvLSTLN3w8C_udGjvA.jpeg" /></figure><p>The introduction of the one-million-token context window fundamentally altered artificial intelligence operations. Engineering teams can now inject entire application repositories, database schemas, and massive log clusters directly into a single prompt.</p><p>However, feeding millions of tokens through commercial endpoints generates catastrophic monthly invoices — widely known as the <strong>API Tax</strong>.</p><p>Processing 50 million tokens daily through commercial APIs generates thousands of dollars in unpredictable monthly costs. By shifting that exact workload to a <strong>ServerMO Bare Metal GPU Server</strong>, your operational costs become up to five times cheaper at scale. You pay a flat infrastructure rate rather than an exponential per-token penalty, ensuring strict data sovereignty in the process.</p><p>Here is the exact SRE playbook to deploy DeepSeek V4 securely and efficiently.</p><h3>Phase 1: Hardware Sizing and Exact VRAM Math</h3><p>Many outdated deployment guides suggest utilizing legacy A100 architectures. This is an engineering flaw. The A100 lacks the Hopper Transformer Engine required for native FP8 mathematical acceleration. DeepSeek V4 utilizes a massive Mixture-of-Experts (MoE) architecture, requiring precise Video RAM calculations encompassing both model weights and the vast KV Cache memory footprint.</p><p>Let us calculate the exact memory arithmetic for the <strong>DeepSeek V4 Flash</strong> variant:</p><ul><li><strong>FP8 Weights:</strong> 158 GB</li><li><strong>KV Cache (1M tokens, Batch Size 1):</strong> 10 GB</li><li><strong>Total Required VRAM:</strong> 168 GB</li></ul><p>A ServerMO cluster of four NVIDIA L40S graphic cards provides 192 GB, leaving perfect headroom for low-concurrency operations.</p><h3>The Concurrency Trap (OOM Warning)</h3><p>The 10 GB KV Cache calculation is strictly for a batch size of one. If ten concurrent users request a one-million-token context simultaneously, your KV Cache requirement instantly balloons to 100 GB. For high-concurrency enterprise workloads, you must scale horizontally across multiple ServerMO bare metal clusters.</p><h3>Phase 2: Parallel Storage Architecture</h3><p>A catastrophic mistake frequently made by junior engineers is downloading massive AI models onto the local disk of every single GPU node. Furthermore, utilizing standard network file systems (NFS) creates a massive storage bottleneck. Attempting to load 158 GB over standard protocols takes an eternity, delaying your deployment.</p><p>You must implement a high-performance Parallel File System like <strong>WekaFS</strong> or Lustre. These systems utilize RDMA to bypass the CPU entirely, loading the massive AI weights directly into the GPU memory instantaneously across your entire bare metal cluster.</p><pre># Mount the Weka Parallel File System on every GPU node<br>sudo mkdir -p /mnt/shared_ai_storage<br>sudo mount -t wekafs backend01.internal/ai_models /mnt/shared_ai_storage<br>sudo chown -R $USER:$USER /mnt/shared_ai_storage<br><br># Download the model exactly once to the high-speed volume<br>huggingface-cli download deepseek-ai/DeepSeek-V4-Flash \<br>  --local-dir /mnt/shared_ai_storage/deepseek_v4_flash \<br>  --resume-download</pre><h3>Phase 3: vLLM and Disaggregation Architecture</h3><p>The <strong>vLLM</strong> framework represents the absolute industry standard for executing large language models in production. Because DeepSeek relies on a sparse MoE architecture, we must activate both <strong>Tensor Parallelism</strong> (to split individual layers across GPUs) and <strong>Expert Parallelism</strong> (to distribute expert sub-networks efficiently).</p><pre># Launch the inference server reading directly from shared storage<br>python3 -m vllm.entrypoints.openai.api_server \<br>  --model /mnt/shared_ai_storage/deepseek_v4_flash \<br>  --tensor-parallel-size 4 \<br>  --enable-expert-parallel \<br>  --dtype fp8 \<br>  --max-model-len 32768 \<br>  --gpu-memory-utilization 0.90 \<br>  --port 8080</pre><p>When scaling the massive V4 Pro model, standard tensor parallelism is insufficient. Elite engineers utilize vLLM prefill-decode disaggregation — separating prompt processing from token generation. ServerMO eliminates network latency for this by providing 400G InfiniBand and RoCEv2 RDMA networking, guaranteeing instantaneous memory synchronization.</p><h3>Phase 4: Zero-Trust Security with Kong API Gateway</h3><p>Exposing the raw vLLM process directly to the public internet is a catastrophic security violation. You must deploy <strong>Kong API Gateway</strong> to enforce strict Transport Layer Security (TLS) and JWT bearer token validation.</p><pre># Deploy the Kong API Gateway enforcing strict TLS certificates<br>sudo docker run -d --name kong_gateway \<br>  --network host \<br>  -e &quot;KONG_DATABASE=off&quot; \<br>  -e &quot;KONG_DECLARATIVE_CONFIG=/kong/kong.yml&quot; \<br>  -e &quot;KONG_PROXY_LISTEN=0.0.0.0:443 ssl&quot; \<br>  -e &quot;KONG_SSL_CERT=/certs/fullchain.pem&quot; \<br>  -e &quot;KONG_SSL_CERT_KEY=/certs/privkey.pem&quot; \<br>  -v /etc/kong/kong.yml:/kong/kong.yml \<br>  -v /etc/letsencrypt/live/api.yourdomain.com/:/certs/ \<br>  kong:latest</pre><h3>The Secure Drop-In Replacement</h3><p>Because the vLLM engine perfectly mimics the OpenAI endpoint specification, migrating your applications requires zero code rewrites. You simply swap the base URL in your client configuration.</p><pre>from openai import OpenAI<br><br># Point the client directly to your secure HTTPS ServerMO gateway<br>client = OpenAI(<br>    base_url=&quot;https://api.yourdomain.com/v1&quot;,<br>    api_key=&quot;YOUR_SECURE_ENTERPRISE_TOKEN&quot;<br>)<br>response = client.chat.completions.create(<br>    model=&quot;deepseek_v4_flash&quot;,<br>    messages=[{&quot;role&quot;: &quot;user&quot;, &quot;content&quot;: &quot;Analyze our secure architecture.&quot;}]<br>)</pre><h3>Phase 5: The Bare Metal Advantage</h3><p>Engineering teams frequently attempt to host intensive artificial intelligence workloads on spot instances provided by major cloud vendors. Spot instances are notoriously volatile and can terminate your inference pipelines abruptly, destroying your operational SLAs.</p><p>Furthermore, utilizing heavily virtualized cloud instances creates massive hypervisor abstraction bottlenecks. By deploying directly on <strong>ServerMO</strong>, you secure dedicated, unshared access to elite computational silicon. The bare metal infrastructure ensures your PCIe Gen 5 lanes, InfiniBand networks, and NVLink bridges operate at absolute maximum bandwidth.</p><p>Stop funding the commercial AI API economy. Reclaim your data sovereignty and launch your highly secure private intelligence cluster on dedicated hardware.</p><blockquote><strong><em>Read the full technical documentation and deployment architecture here:</em></strong><em> </em><a href="https://www.servermo.com/howto/self-host-deepseek-v4-bare-metal/"><em>Self-Host DeepSeek V4 on Bare Metal GPUs</em></a></blockquote><img src="https://medium.com/_/stat?event=post.clientViewed&referrerSource=full_rss&postId=6dd3a3a10751" width="1" height="1" alt="">]]></content:encoded>
        </item>
        <item>
            <title><![CDATA[The 10 Best UK Dedicated Server Providers in 2026: A Deep Technical Review]]></title>
            <link>https://medium.com/@ServerMO/the-10-best-uk-dedicated-server-providers-in-2026-a-deep-technical-review-a511ccf72bea?source=rss-4d43a52b335e------2</link>
            <guid isPermaLink="false">https://medium.com/p/a511ccf72bea</guid>
            <category><![CDATA[gpu]]></category>
            <category><![CDATA[cybersecurity]]></category>
            <category><![CDATA[best]]></category>
            <category><![CDATA[servermo]]></category>
            <category><![CDATA[dedicated-server]]></category>
            <dc:creator><![CDATA[ServerMO]]></dc:creator>
            <pubDate>Fri, 08 May 2026 08:26:00 GMT</pubDate>
            <atom:updated>2026-05-08T08:26:00.421Z</atom:updated>
            <content:encoded><![CDATA[<p><strong>By ServerMO Team | Updated: May 2026</strong></p><figure><img alt="" src="https://cdn-images-1.medium.com/max/1024/1*qNu0CSQJRgAJBhbKZqC-dg.jpeg" /></figure><p>In 2026, deploying infrastructure in the United Kingdom requires rigorous technical scrutiny. It is no longer enough to choose a familiar brand name and hope for the best. With strict <strong>UK GDPR</strong> laws demanding absolute data sovereignty and hyper-competitive markets requiring <strong>sub-15ms latency</strong>, your choice of a bare metal provider is a critical business decision.</p><p>Whether you are targeting the London financial hubs or building the next AI-driven enterprise, here is the definitive deep dive into the top 10 UK dedicated server providers for 2026.</p><h3>🏆 1. ServerMO (The Undisputed UK Champion)</h3><p><strong>Best For:</strong> Enterprise databases, high-frequency gaming, and intensive AI rendering.</p><p>ServerMO has fundamentally changed the bare metal landscape in the UK. While legacy providers crowd into single London facilities, ServerMO operates across <strong>10+ distinct regional edge hubs</strong>, including Manchester, Edinburgh, Glasgow, Birmingham, and Slough.</p><p><strong>Technical Highlights:</strong></p><ul><li><strong>The GPU Fleet:</strong> Direct access to NVIDIA L4 24GB Tensor Cores, A100s, and RTX A4000s.</li><li><strong>Networking:</strong> 10Gbps to 100Gbps unmetered lines with premium BGP routing via carriers like NTT, Orange, and BT.</li><li><strong>Reliability:</strong> An aggressive 1-to-4 hour hardware replacement SLA and 99.99% uptime.</li></ul><p><strong>Verdict:</strong> For those needing localized sub-15ms latency across the entire UK, ServerMO is the engineering gold standard.</p><h3>🥈 2. OVHcloud</h3><p><strong>Best For:</strong> Experienced SysAdmins requiring massive scale and deep DDoS scrubbing.</p><p>OVHcloud remains a powerhouse due to its proprietary VAC technology. If your infrastructure is a constant target for volumetric attacks, their London-based hardware inventory offers a robust baseline defense.</p><blockquote><strong><em>The Drawback:</em></strong><em> It is strictly “unmanaged.” If your node faces a kernel panic or complex network route issue, you are on your own unless you pay for their top-tier support contracts.</em></blockquote><h3>🥉 3. Hetzner</h3><p><strong>Best For:</strong> Budget-conscious developers and sandbox environments.</p><p>Hetzner offers incredible “compute-per-dollar” using consumer-grade AMD Ryzen processors.</p><blockquote><strong><em>The Drawback:</em></strong><em> </em><strong><em>The Geographical Flaw.</em></strong><em> Hetzner does not operate primary bare metal data centers in the UK. Hosting here means your data lives in Germany or Finland, which complicates UK GDPR compliance and adds unnecessary cross-border latency.</em></blockquote><h3>4. AWS (London Region)</h3><p><strong>Best For:</strong> Corporations fully integrated into the Amazon ecosystem needing serverless and managed tools.</p><blockquote><strong><em>The Drawback:</em></strong><em> </em><strong><em>The “Egress Tax.”</em></strong><em> For bandwidth-intensive apps, AWS is a financial catastrophe. They charge heavily for every gigabyte leaving their network. If you are running high-traffic e-commerce or video streaming, bare metal will save you up to 80% on monthly costs.</em></blockquote><h3>5. Liquid Web</h3><p><strong>Best For:</strong> Mission-critical organizations requiring “white-glove” managed support.</p><blockquote><strong><em>The Drawback:</em></strong><em> Elite service comes with elite pricing. You are often paying a massive premium for support staff rather than cutting-edge hardware.</em></blockquote><h3>6. IONOS</h3><p><strong>Best For:</strong> Small local businesses looking for entry-level UK nodes.</p><blockquote><strong><em>The Drawback:</em></strong><em> Highly restrictive. You won’t find custom BGP sessions or high-capacity NVMe arrays here. It’s a “closed appliance” approach to bare metal.</em></blockquote><h3>7. Fasthosts</h3><p><strong>Best For:</strong> Users who prefer a legacy British brand for traditional web hosting.</p><blockquote><strong><em>The Drawback:</em></strong><em> Hardware generations often lag behind. Finding the latest PCIe Gen 5 NVMe or DDR5 RAM architectures can be difficult compared to more aggressive competitors.</em></blockquote><h3>8. Cherry Servers</h3><p><strong>Best For:</strong> DevOps teams treating bare metal like scalable cloud instances via REST APIs.</p><blockquote><strong><em>The Drawback:</em></strong><em> Limited regional UK edge locations. You are mostly restricted to major international hubs, losing that “local edge” advantage in regions like Scotland or Wales.</em></blockquote><h3>9. Leaseweb</h3><p><strong>Best For:</strong> Multi-national corporations looking for long-term hardware leases and premium global transit.</p><blockquote><strong><em>The Drawback:</em></strong><em> Bureaucratic. Their model targets massive enterprise contracts, making it less accessible for agile startups needing to spin up or down quickly.</em></blockquote><h3>10. Redstation (Cogent)</h3><p><strong>Best For:</strong> Wholesale unmetered fiber lines and massive transit capacity.</p><blockquote><strong><em>The Drawback:</em></strong><em> Post-acquisition, they have shifted toward wholesale transit rather than agile server deployments. Their interface and support structure feel outdated for modern DevOps workflows.</em></blockquote><h3>Summary: The Technical “Sweet Spot”</h3><p>If you have an infinite budget and need cloud-native tools, <strong>AWS</strong> is your home. If you are on a shoestring budget and don’t care about data location, <strong>Hetzner</strong> wins.</p><p>However, for the <strong>British Market</strong>, <strong>ServerMO</strong> is the undisputed winner. They provide the perfect intersection of regional edge hubs, 100Gbps unmetered bandwidth, and mission-critical AI hardware.</p><p><strong>Originally Published on ServerMO:</strong> 🔗 <a href="https://www.servermo.com/blogs/best-uk-dedicated-server-providers-2026/">https://www.servermo.com/blogs/best-uk-dedicated-server-providers-2026/</a></p><img src="https://medium.com/_/stat?event=post.clientViewed&referrerSource=full_rss&postId=a511ccf72bea" width="1" height="1" alt="">]]></content:encoded>
        </item>
        <item>
            <title><![CDATA[Build a Production-Grade Live Streaming Origin Server]]></title>
            <link>https://medium.com/@ServerMO/build-a-production-grade-live-streaming-origin-server-c68888e08fff?source=rss-4d43a52b335e------2</link>
            <guid isPermaLink="false">https://medium.com/p/c68888e08fff</guid>
            <category><![CDATA[streaming-video]]></category>
            <category><![CDATA[nginx]]></category>
            <category><![CDATA[devops]]></category>
            <category><![CDATA[system-architecture]]></category>
            <category><![CDATA[software-engineering]]></category>
            <dc:creator><![CDATA[ServerMO]]></dc:creator>
            <pubDate>Fri, 01 May 2026 04:49:15 GMT</pubDate>
            <atom:updated>2026-05-01T04:49:15.259Z</atom:updated>
            <content:encoded><![CDATA[<p>Escape the myths. Deploy a brutally honest self-hosted streaming engine using strict security and optimized GPU transcoding.</p><figure><img alt="" src="https://cdn-images-1.medium.com/max/1024/1*ZIX6q6VFzqmlFmQiCdZiPA.jpeg" /></figure><h3>Phase 1: The Cloud Tax and Scaling Reality</h3><p>Many generic tutorials claim you can build your own global Twitch clone on a single server. This is a massive engineering exaggeration. A single server, no matter how powerful, will bottleneck on network interface limits long before reaching ten thousand concurrent viewers.</p><p>What you are actually building is a <strong>High-Performance Origin Server</strong>. By deploying on ServerMO Dedicated Bare Metal Servers, you secure unmetered uplink ports, avoiding public cloud egress fees entirely. Your bare metal node will handle the heavy ingest and encoding, while you offload the final viewer delivery to an edge caching layer like Cloudflare.</p><h3>Server Build Blueprint</h3><ul><li><strong>Phase 1:</strong> The Cloud Tax and Scaling Reality</li><li><strong>Phase 2:</strong> Compiling Nginx from Source</li><li><strong>Phase 3:</strong> The Truth About GPU Limits</li><li><strong>Phase 4:</strong> Optimized Filter Complex Transcoding</li><li><strong>Phase 5:</strong> Smart Security and Strict CORS</li><li><strong>Phase 6:</strong> The Low Latency HLS Reality</li></ul><h3>Phase 2: Compiling Nginx from Source</h3><p>Do not trust default packages. While Ubuntu provides Nginx natively, it does not include the RTMP core by default. Even if you install the separate module, it is frequently outdated. For true production stability, you must compile Nginx manually from source.</p><pre>sudo apt update<br>sudo apt install -y build-essential libpcre3-dev libssl-dev zlib1g-dev git ffmpeg<br># Download the required source files<br>wget http://nginx.org/download/nginx-1.25.3.tar.gz<br>git clone https://github.com/arut/nginx-rtmp-module.git<br>tar -xzf nginx-1.25.3.tar.gz<br>cd nginx-1.25.3<br># Compile with required secure modules<br>./configure \<br>  --with-http_ssl_module \<br>  --with-http_v2_module \<br>  --add-module=../nginx-rtmp-module<br>make -j$(nproc)<br>sudo make install<br># Configure essential firewall ports<br>sudo ufw allow 1935/tcp<br>sudo ufw allow 80/tcp<br>sudo ufw allow 443/tcp</pre><h3>Phase 3: The Truth About GPU Limits</h3><p>There is a critical reality regarding hardware encoders. Consumer series cards like the RTX 4090 have a driver-enforced limit allowing only around eight concurrent NVENC sessions. If you ignore this, your system will fail silently under heavy load.</p><p><strong>The Open Source Patch vs. Enterprise Hardware:</strong> Many developers use the community-built nvidia-patch script to bypass this lock on consumer cards. While highly effective for budget setups, running uncertified driver hacks is extremely risky for compliance. For stable, highly dense transcoding workloads, you must provision Enterprise GPUs like the NVIDIA L4 or A100, which possess massive concurrency capabilities officially.</p><h3>Phase 4: Optimized Filter Complex Transcoding</h3><p>Common tutorials chain multiple video filters inefficiently, causing massive processor overhead. The correct professional approach utilizes the filter_complex directive. This splits the stream directly within the GPU memory, preventing expensive data copying between the central processor and the graphics card.</p><pre>rtmp {<br>    server {<br>        listen 1935;<br>        chunk_size 4096;<br><br>application live {<br>            live on;<br>            record off;<br>            <br>            # The strictly optimized NVENC pipeline<br>            exec_push ffmpeg -hwaccel cuda -hwaccel_output_format cuda \<br>            -i rtmp://localhost/live/$name \<br>            -filter_complex &quot;[0:v]split=3[v1][v2][v3]; \<br>            [v1]scale_cuda=1920:1080[v1out]; \<br>            [v2]scale_cuda=1280:720[v2out]; \<br>            [v3]scale_cuda=854:480[v3out]&quot; \<br>            -map &quot;[v1out]&quot; -c:v:0 h264_nvenc -b:v:0 5M -preset p5 \<br>            -map &quot;[v2out]&quot; -c:v:1 h264_nvenc -b:v:1 3M -preset p5 \<br>            -map &quot;[v3out]&quot; -c:v:2 h264_nvenc -b:v:2 1M -preset p5 \<br>            -f flv rtmp://localhost/hls/$name;<br>            <br>            # Forward the ingest to other platforms simultaneously<br>            push rtmp://live.twitch.tv/app/YOUR_TWITCH_KEY;<br>            <br>            # Enforce authentication script<br>            on_publish http://127.0.0.1:8080/auth;<br>        }<br>    }<br>}</pre><h3>Phase 5: Smart Security and Strict CORS</h3><p>Many enterprise guides demand complex Redis databases for authentication. This is pure over-engineering for an origin server. The on_publish directive triggers only once when a stream begins. Unless you have thousands of broadcasters connecting at the exact same millisecond, a simple Python script is highly optimal and lightweight.</p><p><strong>Security Alert: The Wildcard CORS Flaw</strong> Never use an asterisk (*) for your Access-Control-Allow-Origin header. Doing so allows any website to embed your player and steal your expensive bandwidth. Always specify your exact approved domains.</p><pre># Open /etc/nginx/sites-available/default<br>server {<br>    listen 80;<br>    server_name origin.yourdomain.com;location /hls {<br>        types {<br>            application/vnd.apple.mpegurl m3u8;<br>            video/mp2t ts;<br>        }<br>        root /var/www/html;<br>        <br>        add_header Cache-Control no-cache;<br>        <br>        # CORRECT SECURITY: Block stream hijackers<br>        add_header Access-Control-Allow-Origin &quot;https://www.yourdomain.com&quot;;<br>    }<br>}</pre><h3>Phase 6: The Low Latency HLS Reality</h3><p>Standard HTTP Live Streaming introduces massive delays. By tuning our fragments to one second, we achieve Low-Latency HLS (LL-HLS), bringing the delay down to around four to eight seconds. We must acknowledge that this is still not true real-time delivery. If your platform demands sub-second, Twitch-like interaction, you must eventually graduate from Nginx-RTMP and implement WebRTC solutions.</p><blockquote><strong><em>Storage Warning: The RAM Disk Reality</em></strong><em> Using </em><em>tmpfs RAM storage prevents SSD wear and offers incredible read speeds for live segments. However, RAM is highly volatile. If the server crashes, the stream dies instantly. For transient live video, this is a brilliant trade-off, but never use it for permanent Video on Demand (VOD) storage.</em></blockquote><pre># Mount the RAM disk to handle active transient segments<br>sudo mount -t tmpfs -o size=2G tmpfs /var/www/html/hls</pre><p>Reload the server using sudo systemctl reload nginx. Your robust origin node is now fully operational and ready to serve your edge networks securely.</p><h3>Streaming Engineering FAQ</h3><p><strong>Can one streaming server handle ten thousand viewers?</strong></p><p>No. A single node cannot handle ten thousand viewers reliably due to bandwidth limits and network stack bottlenecks. You must split your architecture. Use the bare metal server as your ingest origin and a CDN like Cloudflare for viewer delivery.</p><p><strong>Why is a wildcard CORS header dangerous for video streaming?</strong></p><p>Using an asterisk for CORS allows any website on the internet to embed and steal your live stream bandwidth. For production security, you must explicitly define only your approved website domains.</p><p><strong>Are there limits to NVIDIA hardware transcoding?</strong></p><p>Consumer GeForce RTX cards have a strict software limit enforced by the driver, allowing only a few concurrent sessions. While open-source patches exist to bypass this, enterprise platforms should deploy datacenter GPUs like the NVIDIA L4 for official support and reliability.</p><p><strong>Does Nginx-RTMP provide true real-time streaming?</strong></p><p>No. Standard HLS has massive latency. Even when tuned for low latency, you will still experience a delay of four to eight seconds. True real-time streaming requires modern protocols like WebRTC.</p><p><strong><em>Read the original engineering blueprint on our official blog:</em></strong> <br>🔗 <a href="https://www.servermo.com/howto/build-live-streaming-server-nginx-rtmp/"><strong>Build a Production Grade Live Streaming Origin Server</strong></a></p><img src="https://medium.com/_/stat?event=post.clientViewed&referrerSource=full_rss&postId=c68888e08fff" width="1" height="1" alt="">]]></content:encoded>
        </item>
        <item>
            <title><![CDATA[How to Migrate MySQL to ClickHouse with Zero Downtime]]></title>
            <link>https://medium.com/@ServerMO/how-to-migrate-mysql-to-clickhouse-with-zero-downtime-c949c67d4eda?source=rss-4d43a52b335e------2</link>
            <guid isPermaLink="false">https://medium.com/p/c949c67d4eda</guid>
            <category><![CDATA[data-engineering]]></category>
            <category><![CDATA[clickhouse]]></category>
            <category><![CDATA[mysql]]></category>
            <category><![CDATA[software-engineering]]></category>
            <category><![CDATA[data-architecture]]></category>
            <dc:creator><![CDATA[ServerMO]]></dc:creator>
            <pubDate>Thu, 30 Apr 2026 11:13:54 GMT</pubDate>
            <atom:updated>2026-04-30T11:13:54.007Z</atom:updated>
            <content:encoded><![CDATA[<p><strong>MaterializedMySQL is dead. Master the 2026 industry standard CDC pipeline using Debezium and Redpanda on Bare Metal.</strong></p><figure><img alt="" src="https://cdn-images-1.medium.com/max/1024/1*p7bnlN8GJQbynmA0h79l8Q.jpeg" /></figure><p>MySQL is an outstanding transactional database, but it severely struggles with heavy analytical queries. Moving these workloads to ClickHouse is the definitive solution. However, if you read older migration guides from popular database vendors, they will almost universally instruct you to use the MaterializedMySQL engine.</p><p><strong>Do not execute those commands.</strong> The ClickHouse team officially deprecated and removed the MaterializedMySQL engine in version 24.12. It was highly experimental and fundamentally flawed at scale. The true enterprise standard for achieving zero-downtime replication is <strong>Change Data Capture</strong>, commonly referred to as CDC.</p><h3>Migration Blueprint</h3><ul><li><strong>Phase 1:</strong> The MaterializedMySQL Trap</li><li><strong>Phase 2:</strong> Network Latency and SaaS Economics</li><li><strong>Phase 3:</strong> Advanced Schema Mapping and Snapshot</li><li><strong>Phase 4:</strong> The 2026 CDC Streaming Pipeline</li><li><strong>Phase 5:</strong> The Missing Ingestion Layer</li><li><strong>Phase 6:</strong> Tombstones, The FINAL Trap, and Storage Tax</li><li><strong>Phase 7:</strong> Fault Tolerance and Cutover</li></ul><h3>Phase 1: The MaterializedMySQL Trap (Deprecated)</h3><p>As mentioned, relying on the built-in MaterializedMySQL engine is a trap. It failed to handle complex schema migrations and crashed under heavy replication loads. Modern Data Engineering requires a decoupled, resilient pipeline that reads the MySQL Binary Logs (Binlogs) asynchronously. This is where CDC steps in.</p><h3>Phase 2: Network Latency and SaaS Economics</h3><p>Many modern tutorials suggest using fully managed SaaS platforms like Confluent Cloud or ClickPipes to handle your CDC streaming. While these tools are convenient, they introduce a massive financial trap.</p><p>When you sync terabytes of operational data across different cloud regions, public providers will charge you astronomical network egress fees. Furthermore, change data capture is highly sensitive to network latency.</p><blockquote><strong><em>The Bare Metal Advantage:</em></strong><em> If your primary MySQL database is located in North America, hosting your open-source Redpanda and ClickHouse architecture on dedicated bare metal servers in the USA ensures sub-millisecond communication. This localized approach eliminates replication lag during peak transactional hours while completely avoiding per-gigabyte cloud billing shocks.</em></blockquote><h3>Phase 3: Advanced Schema Mapping and Snapshot</h3><p>Before activating the live stream, we must copy the historical data. The biggest mistake engineers make here is assuming basic data types map perfectly. In production environments, you must handle null values, financial decimals, and timezones meticulously.</p><p>You must manually create the destination table first, mapping MySQL data types to ClickHouse’s advanced types. Once created, use the native mysql() table function to pull the data at maximum speed.</p><pre>-- Creating a production-ready ClickHouse schema<br>CREATE TABLE orders_analytics (<br>    order_id UInt64,<br>    customer_name Nullable(String),          -- Handling MySQL NULLs<br>    amount Decimal(10, 2),                   -- Financial precision<br>    status Enum8(&#39;PENDING&#39; = 1, &#39;PAID&#39; = 2), -- Strict enumerations<br>    created_at DateTime(&#39;UTC&#39;)               -- Timezone awareness<br>) ENGINE = MergeTree()<br>ORDER BY order_id;<br><br>-- Execute the high-speed initial data copy<br>INSERT INTO orders_analytics<br>SELECT * FROM mysql(&#39;10.0.0.5:3306&#39;, &#39;prod_db&#39;, &#39;orders&#39;, &#39;user&#39;, &#39;pass&#39;);</pre><h3>Phase 4: The 2026 CDC Streaming Pipeline</h3><p>To capture live transactions, we use <strong>Debezium</strong> to read the MySQL binary logs. Debezium will push these changes to an event streaming message broker.</p><p><strong>The Kafka vs. Redpanda Reality:</strong> Apache Kafka is the battle-tested enterprise standard with a massive ecosystem. You can absolutely use it. However, running JVMs can be resource-heavy. For bare metal NVMe servers, we often recommend <strong>Redpanda</strong> as a drop-in C++ alternative for simpler operations, zero ZooKeeper dependency, and lower latency. Both work perfectly for this pipeline.</p><pre>// Example Debezium Connector Configuration pushing to your broker<br>{<br>  &quot;name&quot;: &quot;mysql-clickhouse-connector&quot;,<br>  &quot;config&quot;: {<br>    &quot;connector.class&quot;: &quot;io.debezium.connector.mysql.MySqlConnector&quot;,<br>    &quot;database.hostname&quot;: &quot;10.0.0.5&quot;,<br>    &quot;database.include.list&quot;: &quot;prod_db&quot;,<br>    &quot;table.include.list&quot;: &quot;prod_db.orders&quot;,<br>    &quot;database.history.kafka.bootstrap.servers&quot;: &quot;broker_host:9092&quot;,<br>    &quot;database.history.kafka.topic&quot;: &quot;schema-changes.orders&quot;<br>  }<br>}</pre><h3>Phase 5: The Missing Ingestion Layer</h3><p>Many tutorials skip a critical step: <em>How does data actually flow from the Kafka topic into the ClickHouse storage table?</em> You need an ingestion layer. ClickHouse provides a native Kafka Engine that reads the message stream, and a Materialized View that routes those messages into your final analytical table.</p><pre>-- 1. Create the Kafka Engine Consumer<br>CREATE TABLE orders_kafka_queue (<br>    order_id UInt64,<br>    amount Decimal(10, 2),<br>    status String,<br>    op_type String -- Debezium operation type (create, update, delete)<br>) ENGINE = Kafka()<br>SETTINGS kafka_broker_list = &#39;broker_host:9092&#39;,<br>         kafka_topic_list = &#39;prod_db.orders&#39;,<br>         kafka_group_name = &#39;clickhouse_consumer&#39;,<br>         kafka_format = &#39;JSONEachRow&#39;;<br><br>-- 2. Route data to the final analytical table<br>CREATE MATERIALIZED VIEW orders_mv TO orders_analytics_final AS<br>SELECT order_id, <br>       amount, <br>       status, <br>       if(op_type = &#39;d&#39;, 1, 0) AS is_deleted, <br>       now() AS updated_at<br>FROM orders_kafka_queue;</pre><h3>Phase 6: Tombstones, The FINAL Trap, and Storage Tax</h3><p>ClickHouse is an append-only database. When Debezium detects a deleted row in MySQL, it sends a <strong>tombstone record</strong>. To process this, we use the ReplacingMergeTree engine with a deleted flag. However, this introduces two massive production challenges:</p><ul><li><strong>The Storage Tax:</strong> The ReplacingMergeTree does not delete old rows immediately. It waits for a random background merge, causing storage amplification. To manage this, schedule an OPTIMIZE TABLE orders_analytics_final FINAL command during off-peak night hours to force a cleanup.</li><li><strong>The FINAL Trap:</strong> Many blogs tell you to use the FINAL keyword in your SELECT queries to get the latest row. <strong>Do not do this.</strong> It causes massive CPU spikes because it forces ClickHouse to resolve all intermediate row states in real-time. Instead, use the argMax function to efficiently fetch the latest state without locking the database.</li></ul><pre>-- The Enterprise way to query updated records without the CPU-crushing FINAL keyword<br>SELECT <br>    order_id, <br>    argMax(amount, updated_at) AS latest_amount, <br>    argMax(status, updated_at) AS latest_status<br>FROM orders_analytics_final<br>GROUP BY order_id<br>HAVING argMax(is_deleted, updated_at) = 0;</pre><h3>Phase 7: Fault Tolerance and Cutover</h3><p>Before routing live traffic, ensure your pipeline is fault-tolerant. Configure a <strong>Dead Letter Queue (DLQ)</strong> inside your Kafka or Redpanda broker to catch schema mismatch errors. Ensure your ClickHouse ReplicatedReplacingMergeTree tables have a replication factor of at least two across different bare metal nodes.</p><p>Once verified, update your application code to route all heavy aggregations, dashboard requests, and report generation queries to ClickHouse. Your MySQL database is now relieved of analytical strain, allowing it to focus purely on rapid transactional writes.</p><h3>MySQL Migration FAQ</h3><p><strong>Why is the MaterializedMySQL engine throwing syntax errors?</strong> The MaterializedMySQL engine was highly experimental, and the ClickHouse development team officially deprecated and removed it in version 24.12. You must now use a Change Data Capture (CDC) pipeline like Debezium for replication.</p><p><strong>How does ClickHouse handle MySQL DELETE operations?</strong> ClickHouse is a columnar analytical database that does not delete rows instantly. When Debezium captures a delete operation, it sends a tombstone record. You must route this to a ReplacingMergeTree table and filter out the deleted flag in your queries.</p><p><strong>Should I use the FINAL keyword to query updated rows in ClickHouse?</strong> No. Using the FINAL keyword on large tables causes massive CPU overhead. It is much faster to use aggregate functions like argMax() or filter by a deleted column flag.</p><p><strong>Why is Redpanda recommended over Apache Kafka for bare metal?</strong> Redpanda is a modern C++ drop-in replacement for Apache Kafka. It completely eliminates the heavy Java Virtual Machine (JVM) dependencies and ZooKeeper requirements, making it significantly faster and easier to deploy on bare metal NVMe servers.</p><p><strong><em>Read the original engineering blueprint on our official blog:</em></strong> 🔗 <a href="https://www.servermo.com/howto/migrate-mysql-to-clickhouse/"><strong>https://www.servermo.com/howto/migrate-mysql-to-clickhouse/</strong></a></p><img src="https://medium.com/_/stat?event=post.clientViewed&referrerSource=full_rss&postId=c949c67d4eda" width="1" height="1" alt="">]]></content:encoded>
        </item>
        <item>
            <title><![CDATA[Install and Tune PostgreSQL on Ubuntu 24.04 Bare Metal]]></title>
            <link>https://medium.com/@ServerMO/install-and-tune-postgresql-on-ubuntu-24-04-bare-metal-378771341403?source=rss-4d43a52b335e------2</link>
            <guid isPermaLink="false">https://medium.com/p/378771341403</guid>
            <category><![CDATA[linux]]></category>
            <category><![CDATA[baremetal]]></category>
            <category><![CDATA[postgresql]]></category>
            <category><![CDATA[devops]]></category>
            <category><![CDATA[database]]></category>
            <dc:creator><![CDATA[ServerMO]]></dc:creator>
            <pubDate>Fri, 24 Apr 2026 06:38:40 GMT</pubDate>
            <atom:updated>2026-04-24T06:38:40.611Z</atom:updated>
            <content:encoded><![CDATA[<p><strong>Escape the default 128MB memory trap. Learn the brutal truths about modern RAM tuning, NVMe WAL separation, and disaster recovery on Ubuntu.</strong></p><figure><img alt="" src="https://cdn-images-1.medium.com/max/1024/1*0HUUvD45q813JHbJdofvJg.jpeg" /></figure><h3>Executive Summary: Honest Engineering</h3><p>Most online tutorials teach you how to install PostgreSQL, but they leave you with a configuration meant for a Raspberry Pi. If you simply run apt install postgresql on a massive 128GB RAM server, PostgreSQL will default to using a mere <strong>128MB of RAM</strong> for its cache.</p><p>This guide bridges the gap between a basic installation and a <strong>Database Administrator (DBA) reality</strong>, stripping away outdated myths (like blindly allocating 25% RAM or over-relying on RAID 10) to help you build a modern, high-throughput database architecture.</p><h3>Database Blueprint</h3><ul><li><strong>Phase 1:</strong> Enterprise Installation (Ubuntu 24.04)</li><li><strong>Phase 2:</strong> The “25% shared_buffers” Myth</li><li><strong>Phase 3:</strong> NVMe IOPS &amp; WAL Separation</li><li><strong>Phase 4:</strong> Linux OS Huge Pages (With Warnings)</li><li><strong>Phase 5:</strong> Hardening Network Security</li><li><strong>Phase 6:</strong> The Bare Metal Reality (Disaster Recovery)</li><li><strong>Phase 7:</strong> Cloud IOPS vs. Bare Metal Economics</li></ul><h3>Phase 1: Enterprise Installation</h3><p>Operating system repositories often carry outdated versions of PostgreSQL. For production workloads, always add the official PostgreSQL Global Development Group (PGDG) repository to install the latest stable version (e.g., PostgreSQL 16 or 17).</p><pre># Import the repository signing key<br>sudo install -d /usr/share/postgresql-common/pgdg<br>sudo curl -o /usr/share/postgresql-common/pgdg/apt.postgresql.org.asc --fail https://www.postgresql.org/media/keys/ACCC4CF8.asc<br><br># Add the official repository<br>sudo sh -c &#39;echo &quot;deb [signed-by=/usr/share/postgresql-common/pgdg/apt.postgresql.org.asc] https://apt.postgresql.org/pub/repos/apt $(lsb_release -cs)-pgdg main&quot; &gt; /etc/apt/sources.list.d/pgdg.list&#39;<br># Update and install PostgreSQL<br>sudo apt update<br>sudo apt -y install postgresql postgresql-contrib</pre><h3>Phase 2: The “25% shared_buffers” Myth</h3><p>You will often read that you should set shared_buffers to 25% of your total RAM. On a 16GB server, this is great advice. On a modern 256GB Bare Metal server, allocating 64GB to shared_buffers is often a mistake that causes inefficient &quot;double-buffering&quot;.</p><p>Modern DBAs rely heavily on the efficiency of the <strong>Linux Kernel Page Cache</strong>. Open sudo nano /etc/postgresql/16/main/postgresql.conf and tune honestly:</p><ul><li><strong>shared_buffers:</strong> For massive servers (128GB+ RAM), cap this between <strong>16GB to 32GB</strong>. Let the Linux Page Cache handle the rest.</li><li><strong>effective_cache_size:</strong> This does NOT allocate memory; it simply tells the query planner how much memory is available in total (OS Cache + shared_buffers). Set this to <strong>75% of your total RAM</strong>.</li><li><strong>work_mem:</strong> Memory used for complex sorting. Do not set this too high. If you set work_mem = 256MB and have 1,000 active connections, you will instantly consume 256GB of RAM and crash. A safe start is <strong>32MB to 64MB</strong>.</li></ul><blockquote><strong><em>Pro-Tip: Connection Pooling (PgBouncer)</em></strong><em> To prevent the </em><em>work_mem OOM (Out-of-Memory) crash mentioned above, never let your application connect directly to PostgreSQL. Always install a lightweight connection pooler like </em><strong><em>PgBouncer</em></strong><em> in front of your database to queue and multiplex connections.</em></blockquote><pre># Always restart the service after modifying postgresql.conf<br>sudo systemctl restart postgresql</pre><h3>Phase 3: NVMe IOPS &amp; WAL Separation</h3><p>PostgreSQL default settings assume you are running on slow, spinning Hard Disk Drives (HDD). When using Enterprise NVMe SSDs, applying old-school RAID 10 logic is often overkill for pure performance, as a single NVMe drive can easily saturate the PCIe bus.</p><p>The true architectural secret to database speed is <strong>physically separating your WAL (Write-Ahead Log)</strong>. Run your main database on one NVMe drive, and point your WAL directory to a completely separate, dedicated NVMe drive. This eliminates disk contention during heavy write operations.</p><pre># In postgresql.conf, apply these modern NVMe optimizations:<br><br># Default is 4.0. Lower to 1.1 to tell the planner random reads are nearly as fast as sequential.<br>random_page_cost = 1.1<br># Increase concurrent I/O requests for enterprise NVMe drives<br>effective_io_concurrency = 200<br># Optimize Write-Ahead Logging (WAL) for high throughput<br>wal_buffers = 16MB<br>checkpoint_timeout = 15min<br>max_wal_size = 4GB</pre><h3>Phase 4: Linux OS Huge Pages (With Warnings)</h3><p>When you configure a large shared_buffers (e.g., 16GB+), the Linux kernel struggles to manage memory in standard 4KB pages. By enabling <strong>Huge Pages</strong> (2MB per page), you measurably reduce CPU overhead during memory lookups.</p><p>However, this is not a magic bullet, and it comes with a severe risk:</p><blockquote><strong><em>🚨 CRITICAL STARTUP WARNING:</em></strong><em> In your </em><em>postgresql.conf, </em><em>huge_pages = try is the safe default. If you force it to </em><em>huge_pages = on, and you miscalculate the </em><em>vm.nr_hugepages value in your Linux </em><em>/etc/sysctl.conf, </em><strong><em>PostgreSQL will completely fail to start.</em></strong><em> Ensure you have enough contiguous free memory before enforcing this at the OS level.</em></blockquote><h3>Phase 5: Hardening Network Security</h3><p>Many basic tutorials instruct users to set listen_addresses = &#39;*&#39;. <strong>Do not do this on a public network.</strong> Exposing port 5432 to the entire internet guarantees brute-force attacks.</p><p><strong>Best Practices for Remote Access:</strong></p><ul><li>Bind the listener only to your private VPC IP or a VPN interface: listen_addresses = &#39;10.0.0.5&#39;.</li><li>If you must allow external connections, strictly whitelist the incoming IPs in /etc/postgresql/16/main/pg_hba.conf.</li><li>Always use modern cryptographic hashing for authentication. Ensure your pg_hba.conf utilizes scram-sha-256 instead of the outdated md5 or insecure trust methods.</li></ul><pre># Example pg_hba.conf hardened entry:<br># TYPE    DATABASE        USER            ADDRESS                 METHOD<br>host      production_db   app_user        192.168.1.50/32         scram-sha-256</pre><p>After configuring pg_hba.conf, explicitly allow the port through the Uncomplicated Firewall (UFW) only for trusted IP subnets:</p><pre># Allow PostgreSQL port (5432) ONLY from your application server&#39;s IP<br>sudo ufw allow from 192.168.1.50 to any port 5432 proto tcp<br>sudo ufw enable</pre><h3>Phase 6: The Bare Metal Reality (Disaster Recovery)</h3><p>The ultimate trade-off for unthrottled Bare Metal performance is responsibility. Unlike managed DBaaS platforms that offer automated one-click restores, a Bare Metal DBA is solely responsible for disaster recovery. A single accidental DROP TABLE can be fatal without a proper backup strategy.</p><ul><li><strong>Logical Backups:</strong> Use pg_dump for daily snapshots of smaller databases or specific tables.</li><li><strong>Point-in-Time Recovery (PITR):</strong> For enterprise workloads, you must use tools like <strong>pgBackRest</strong> or <strong>WAL-G</strong> to enable continuous WAL archiving. This allows you to restore the database to any exact second before a crash.</li></ul><blockquote><strong><em>🚨 CRITICAL DBA WARNING:</em></strong><em> Never store your database backups on the same NVMe drive as your active database. Always stream your WAL archives and base backups to off-site object storage or a physically distinct secondary server.</em></blockquote><h3>Phase 7: Cloud IOPS vs. Bare Metal Economics</h3><p>A common misconception is that public cloud environments (AWS, GCP, Azure) are inherently slow. That is false. Modern clouds <em>can</em> achieve massive IOPS and sustained high-throughput transactions using “Provisioned IOPS” (io2 block express) or Dedicated Hosts.</p><p><strong>The real issue is the astronomical cost.</strong></p><p>To get the equivalent I/O performance of a single local NVMe drive on the cloud, you will pay massive premiums for provisioned storage and face unpredictable network egress fees during global database replication.</p><p>If your application relies on high-speed data ingestion (TimescaleDB), complex JOINs, or heavy AI vector searches (pgvector), you need raw unthrottled infrastructure. When architecting for global user bases, many DBAs strategically deploy their primary write-nodes on <strong>enterprise dedicated servers</strong> to leverage premium Tier-1 network blending for optimal transatlantic routing. With 100% bare metal NVMe power, massive ECC RAM, and unmetered global ports, you receive the raw performance of the cloud’s highest tiers at a fraction of the economic cost.</p><p><strong><em>Read the original engineering blueprint on our official blog:</em></strong> 🔗 <a href="https://www.servermo.com/howto/install-tune-postgresql-server-ubuntu-24-04/">https://www.servermo.com/howto/install-tune-postgresql-server-ubuntu-24-04/</a></p><img src="https://medium.com/_/stat?event=post.clientViewed&referrerSource=full_rss&postId=378771341403" width="1" height="1" alt="">]]></content:encoded>
        </item>
        <item>
            <title><![CDATA[Future-Proof Your Infrastructure: Post-Quantum Nginx & Zero Trust]]></title>
            <link>https://medium.com/@ServerMO/future-proof-your-infrastructure-post-quantum-nginx-zero-trust-81846dde261d?source=rss-4d43a52b335e------2</link>
            <guid isPermaLink="false">https://medium.com/p/81846dde261d</guid>
            <category><![CDATA[devops]]></category>
            <category><![CDATA[nginx]]></category>
            <category><![CDATA[cybersecurity]]></category>
            <category><![CDATA[cryptography]]></category>
            <category><![CDATA[zero-trust]]></category>
            <dc:creator><![CDATA[ServerMO]]></dc:creator>
            <pubDate>Fri, 10 Apr 2026 03:52:19 GMT</pubDate>
            <atom:updated>2026-04-10T03:52:19.428Z</atom:updated>
            <content:encoded><![CDATA[<h4>Stop “Harvest Now, Decrypt Later” attacks. Master post-quantum algorithms, close your inbound ports, and secure your enterprise bare metal servers.</h4><figure><img alt="" src="https://cdn-images-1.medium.com/max/1024/1*e3geuPs0lhy28jjFO1gwZg.jpeg" /></figure><p>The cybersecurity landscape has fundamentally shifted. Threat actors — particularly at the nation-state level — are actively engaging in <strong>“Harvest Now, Decrypt Later” (HNDL)</strong> attacks. They are silently intercepting and storing your current TLS-encrypted traffic, waiting for the day a Cryptographically Relevant Quantum Computer (CRQC) becomes available to crack it open.</p><p>While this might not be an immediate day-zero threat for a personal blog, it is a critical vulnerability for enterprises handling government contracts, financial data, or long-term Intellectual Property (IP). If you are asking, <em>“how do we encrypt against quantum computing?”</em> the answer is implementing <strong>Post-Quantum Cryptography (PQC)</strong> alongside a true <strong>Zero Trust Network Architecture (ZTNA)</strong>.</p><p>Here is the complete security blueprint to lock down your bare metal infrastructure.</p><h3>Phase 1: Setup Cloudflare Zero Trust Tunnels</h3><p>The traditional method of securing a web server involves opening ports 80 and 443 and hoping your firewall holds up against zero-day exploits. The modern enterprise approach is Zero Trust.</p><p>By using Cloudflare Tunnels (cloudflared), your server establishes an outbound-only connection to the edge. Your server&#39;s public IP remains entirely hidden from the internet.</p><ol><li><strong>Download and install the cloudflared daemon on Ubuntu:</strong></li></ol><pre>curl -L --output cloudflared.deb https://github.com/cloudflare/cloudflared/releases/latest/download/cloudflared-linux-amd64.deb<br>sudo dpkg -i cloudflared.deb</pre><p><strong>2. Authenticate and create the secure tunnel:</strong></p><pre>cloudflared tunnel login<br>cloudflared tunnel create servermo-prod<br># Save the output UUID!</pre><p><strong>3. Create the configuration file:</strong> Instruct Cloudflare to route incoming internet traffic to your local Nginx instance.</p><p>sudo nano ~/.cloudflared/config.yml</p><pre>tunnel: &lt;YOUR-TUNNEL-UUID&gt;<br>credentials-file: /root/.cloudflared/&lt;YOUR-TUNNEL-UUID&gt;.json<br>ingress:<br>  - hostname: secure.yourdomain.com<br>    service: https://localhost:443 # Proxying to HTTPS to enforce Nginx PQC locally<br>    originRequest:<br>      noTLSVerify: true # Ensure local certificate is trusted or bypassed<br>  - service: http_status:404</pre><p><strong>4. Route the DNS and start the background service:</strong></p><p>Bash</p><pre>cloudflared tunnel route dns servermo-prod secure.yourdomain.com<br>sudo cloudflared service install<br>sudo systemctl start cloudflared</pre><blockquote><strong>Architect’s Reality Check (The Cloudflare SPOF):</strong> Routing all traffic through cloudflared introduces a Single Point of Failure (SPOF) and absolute Vendor Lock-in. If Cloudflare experiences a global outage, your hidden server becomes unreachable. Enterprise deployments must maintain an emergency &quot;Backdoor&quot; VPN (like WireGuard) tied directly to the Bare Metal public IP for Disaster Recovery.</blockquote><h3>Phase 2: Enable Post-Quantum SSL on Nginx</h3><p>We will configure Nginx to use X25519MLKEM768—a hybrid algorithm combining classical Elliptic Curve Diffie-Hellman (X25519) with NIST’s finalized ML-KEM standard.</p><p><em>Note: To enable post-quantum key agreement, your Nginx server must be linked against a PQC-aware cryptographic library (like a modern OpenSSL 3.x release that supports FIPS 203 natively, or via the Open Quantum Safe (OQS) provider).</em></p><p><strong>Edit your Nginx server block:</strong> sudo nano /etc/nginx/conf.d/secure.conf</p><pre>server {<br>    listen 443 ssl http2;<br>    server_name secure.yourdomain.com;<br><br>    ssl_certificate /etc/ssl/certs/yourdomain.crt;<br>    ssl_certificate_key /etc/ssl/private/yourdomain.key;<br><br>    # Strict TLS 1.3 only<br>    ssl_protocols TLSv1.3;<br>    <br>    # Enable Post-Quantum Hybrid Key Exchange (Confidentiality)<br>    ssl_ecdh_curve X25519MLKEM768:X25519:prime256v1;<br><br>    ssl_prefer_server_ciphers on;<br><br>    # Basic Security Headers<br>    add_header Strict-Transport-Security &quot;max-age=31536000; includeSubDomains; preload&quot; always;<br>    add_header X-Content-Type-Options &quot;nosniff&quot; always;<br><br>    location / {<br>        root /var/www/html;<br>        index index.html;<br>    }<br>}</pre><h3>The “Edge” Conflict: Two-Legged TLS</h3><p><strong>Major Reality Check: The Proxy Architectural Flaw</strong></p><p>Many guides fail to mention a critical architectural flaw: When you use Cloudflare Tunnels (or any reverse proxy CDN), your encryption is <strong>two-legged</strong>.</p><ol><li>Client ➔ Cloudflare Edge</li><li>Cloudflare Edge ➔ Your Nginx Origin</li></ol><p>Setting X25519MLKEM768 on your Nginx server <strong>only secures the second leg</strong> (Edge to Origin). If you do not explicitly enable Post-Quantum Cryptography in your Cloudflare Dashboard (Edge Certificates settings), the connection between your customer and Cloudflare remains vulnerable to HNDL attacks.</p><h3>Phase 3: Secure SSH &amp; App-Level Zero Trust</h3><p>Network-level Zero Trust (blocking Linux ports) is incomplete. If an attacker breaches the tunnel, they have free rein. To achieve <em>True</em> Zero Trust, you must implement App-Level and Identity-Level verification.</p><ul><li><strong>Private IP Routing:</strong> In your Cloudflare Dashboard ➔ Settings ➔ Network, ensure your Bare Metal’s private IP CIDR (e.g., 10.0.0.0/8) is Included in the Split Tunnels routing profile. Connect via the WARP client to access SSH locally without exposing port 22.</li><li><strong>Software-Level Authentication (App-Level ZT):</strong> Inside your server, do not assume internal traffic is safe. Implement strict JWT (JSON Web Token) validation on your APIs, and consider using a Service Mesh (like Istio or Linkerd) to enforce mTLS between internal microservices.</li></ul><h3>Phase 4: Quantum-Safe Storage (Data-at-Rest)</h3><p>Protecting your data in transit with TLS is useless if an attacker manages to steal a physical NVMe drive, compromise a datacenter, or leak a database snapshot. The “Harvest Now, Decrypt Later” threat applies directly to Data-at-Rest as well.</p><p><strong>AES-256 is the Standard:</strong> You do not need experimental lattice-based cryptography to protect Data-at-Rest. Quantum computers using Grover’s algorithm effectively halve the security strength of symmetric keys. Therefore, an AES-128 key offers only 64 bits of quantum security (vulnerable), while an <strong>AES-256</strong> key provides 128 bits of post-quantum security.</p><p><strong>The Fix:</strong> Ensure your infrastructure is provisioned with LUKS (Linux Unified Key Setup) utilizing the aes-xts-plain64 cipher and a strictly enforced 256-bit key size for all block storage partitions and database backups.</p><h3>The Bare Metal Cryptography Advantage</h3><p>Hybrid key exchanges (combining classical ECC with ML-KEM) introduce significantly larger packet sizes and heavier cryptographic processing overhead.</p><p>While a basic shared VPS can easily handle PQC for a low-traffic blog, enterprise applications processing thousands of concurrent TLS handshakes on a shared cloud hypervisor will experience severe CPU spiking and network latency. The compute tax of quantum-resistant cryptography is very real.</p><p>To execute True Zero Trust protocols, AES-256 block encryption, and post-quantum TLS algorithms at scale, you need the raw, unshared power of a <strong>Dedicated Bare Metal Server</strong>. Backed by high-core count processors and unmetered network pipelines, dedicated infrastructure delivers the exact performance profile required to absorb cryptographic overhead without throttling your users.</p><p>Stop sharing compute. Secure your enterprise.</p><p>🔗 <strong>Deploy High-Compute Bare Metal for your Enterprise:</strong> <a href="https://www.servermo.com/dedicated-server/"><strong>ServerMO Dedicated Servers</strong></a></p><p><em>This article was originally published on the ServerMO Blog. Read the full tutorial and FAQ here:</em><strong> </strong><a href="https://www.servermo.com/howto/post-quantum-zero-trust-nginx-setup/"><strong>https://www.servermo.com/howto/post-quantum-zero-trust-nginx-setup/</strong></a></p><img src="https://medium.com/_/stat?event=post.clientViewed&referrerSource=full_rss&postId=81846dde261d" width="1" height="1" alt="">]]></content:encoded>
        </item>
        <item>
            <title><![CDATA[The Bare Metal Kubernetes Blueprint: Deploying Talos Linux & Cilium eBPF]]></title>
            <link>https://medium.com/@ServerMO/the-bare-metal-kubernetes-blueprint-deploying-talos-linux-cilium-ebpf-e3d70e8f20c3?source=rss-4d43a52b335e------2</link>
            <guid isPermaLink="false">https://medium.com/p/e3d70e8f20c3</guid>
            <category><![CDATA[kubernetes]]></category>
            <category><![CDATA[devops]]></category>
            <category><![CDATA[baremetal]]></category>
            <category><![CDATA[linux]]></category>
            <category><![CDATA[platform-engineering]]></category>
            <dc:creator><![CDATA[ServerMO]]></dc:creator>
            <pubDate>Thu, 09 Apr 2026 07:54:32 GMT</pubDate>
            <atom:updated>2026-04-09T07:54:32.940Z</atom:updated>
            <content:encoded><![CDATA[<p><strong>Master production-grade High Availability (HA), etcd quorum failover, and native Layer 2 routing on dedicated hardware.</strong></p><figure><img alt="" src="https://cdn-images-1.medium.com/max/1024/1*pH22N4NR3M0jngQ1TVTevQ.jpeg" /></figure><p>Running Kubernetes in the cloud provides flexibility, but for I/O and network-heavy workloads, hypervisor overhead can severely impact performance. Transitioning to <strong>Bare Metal Kubernetes</strong> offers direct access to PCIe lanes, raw compute, and complete data sovereignty.</p><p>However, there is a catch: installing Kubernetes on general-purpose Linux distributions (like Ubuntu or Debian) requires strict CIS compliance hardening to reduce the attack surface. You spend hundreds of DevOps hours managing SSH keys, applying OS-level patches, and fighting configuration drift.</p><p>Enter <strong>Talos Linux</strong> — the modern datacenter standard for immutable Kubernetes.</p><h3>What is Talos Linux? The Immutable Paradigm</h3><p>A common question among platform engineers is, <em>“What is Talos Linux based on?”</em> While it utilizes the Linux kernel, it is an <strong>immutable, API-driven operating system</strong> designed explicitly for Kubernetes from the ground up. It drastically reduces the OS-level attack surface by eliminating SSH, the shell, and package managers entirely. Every interaction happens via a mutually authenticated gRPC API (talosctl).</p><p><strong>The API Security Reality:</strong> While Talos secures the underlying node, it does <em>not</em> make your cluster invincible. The Kubernetes API remains a massive attack vector. True security still mandates strict RBAC, Pod Security Standards, and intra-cluster mTLS.</p><h3>High Availability Architecture &amp; The etcd Quorum</h3><p>Running a single Control Plane is a lab experiment, not a production setup. The Kubernetes database (etcd) relies on a strict quorum (majority) to function. A production-grade cluster requires a minimum of <strong>3 Control Plane nodes</strong>.</p><ul><li><strong>The Quorum Risk:</strong> In a 3-node cluster, the quorum is 2. If one node fails, the cluster survives. If <em>two</em> nodes fail, the cluster is dead. You cannot read or write to the API.</li></ul><h4>Infrastructure &amp; The Layer 2 VIP</h4><p>To expose the API securely, Talos uses a Virtual IP (VIP) backed by gratuitous ARP. <strong>The limitation:</strong> This requires all Control Plane nodes to reside in the exact same Layer 2 subnet.</p><p>Deploying this architecture on dedicated bare-metal servers provides the necessary physical Layer 2 networking capabilities without cloud routing restrictions.</p><ul><li><strong>3x Control Plane Nodes:</strong> (e.g., 10.10.10.11, .12, .13)</li><li><strong>1x Private L2 VIP for API Server:</strong> (e.g., 10.10.10.100)</li></ul><h3>Step 1: OS Installation via IPMI</h3><p>In a true datacenter environment, writing ISOs to physical USB drives is impractical. Bare metal provisioning relies on remote Out-of-Band (OOB) management.</p><ol><li>Download the Talos Linux Metal ISO from the official GitHub releases.</li><li>Log into your server’s <strong>IPMI / iKVM Console</strong>.</li><li>Navigate to Virtual Media, mount the ISO, and power cycle the server.</li><li>The system will boot into Talos Maintenance Mode and await configuration over the network.</li></ol><h3>Step 2: Generating the HA Configuration</h3><p>Generate the foundational machine configuration. Notice that we bind the cluster endpoint to our <strong>Private VIP</strong> (10.10.10.100).</p><pre>talosctl gen config my-ha-cluster https://10.10.10.100:6443<br># Generated files: controlplane.yaml, worker.yaml, talosconfig</pre><h3>Step 3: Layer 2 VIP &amp; VLAN Patching</h3><p>We must configure Talos to announce the Layer 2 VIP across the Control Planes. This ensures that if Control Plane 1 dies, the ARP table updates and the VIP seamlessly fails over to Control Plane 2.</p><p>Create patch-cp.yaml. <em>(Note: We also disable the default kube-proxy because we will use Cilium as a full eBPF replacement).</em></p><pre>machine:<br>  network:<br>    interfaces:<br>      - interface: eth1<br>        vip:<br>          ip: 10.10.10.100 # The L2 Shared API Endpoint<br>cluster:<br>  network:<br>    cni:<br>      name: none # We will install Cilium manually<br>  proxy:<br>    disabled: true # Cilium will replace kube-proxy</pre><p>Merge this patch with the base configuration:</p><pre>talosctl machineconfig patch controlplane.yaml --patch @patch-cp.yaml -o cp-patched.yaml</pre><h3>Step 4: Bootstrapping &amp; Backups</h3><p>Apply the patched configuration to all three Control Plane nodes.</p><pre>talosctl apply-config --insecure --nodes 10.10.10.11 --file cp-patched.yaml<br>talosctl apply-config --insecure --nodes 10.10.10.12 --file cp-patched.yaml<br>talosctl apply-config --insecure --nodes 10.10.10.13 --file cp-patched.yaml</pre><p>Once the nodes boot, bootstrap the cluster on <strong>only the first node</strong> to initiate the etcd quorum.</p><pre>talosctl config endpoint 10.10.10.100<br>talosctl config node 10.10.10.11<br>talosctl bootstrap --talosconfig ./talosconfig<br>talosctl kubeconfig ./kubeconfig --talosconfig ./talosconfig<br>export KUBECONFIG=$(pwd)/kubeconfig</pre><blockquote><strong><em>Day-2 Operations (etcd Disaster Recovery):</em></strong><em> Do not wait for a failure. Immediately establish a cron job to backup your cluster state using </em><em>talosctl etcd snapshot db.snapshot and store it externally (e.g., S3 storage).</em></blockquote><h3>Step 5: Cilium CNI (Native L2 Announcements)</h3><p>A common legacy practice was deploying MetalLB alongside your CNI. Modern eBPF-based CNIs like <strong>Cilium</strong> now natively support L2 announcements and BGP, making standalone LoadBalancers redundant resource bloat.</p><p><strong>1. Install Cilium (Replacing Kube-Proxy)</strong></p><pre>helm install cilium cilium/cilium \<br>  --namespace kube-system \<br>  --set ipam.mode=kubernetes \<br>  --set kubeProxyReplacement=true \<br>  --set k8sServiceHost=10.10.10.100 \<br>  --set k8sServicePort=6443 \<br>  --set l2announcements.enabled=true \<br>  --set securityContext.capabilities.ciliumAgent=&quot;{CHOWN,KILL,NET_ADMIN,NET_RAW,IPC_LOCK,SYS_ADMIN,SYS_RESOURCE,DAC_OVERRIDE,FOWNER,SETGID,SETUID}&quot; \<br>  --set securityContext.capabilities.cleanCiliumState=&quot;{NET_ADMIN,SYS_ADMIN,SYS_RESOURCE}&quot; \<br>  --set cgroup.autoMount.enabled=false \<br>  --set cgroup.hostRoot=/sys/fs/cgroup</pre><p><strong>2. Define the IP Pool</strong> Apply the CiliumLoadBalancerIPPool and CiliumL2AnnouncementPolicy to expose your LoadBalancer type services. <em>(Warning: Replace the RFC-5737 IPs below with your actual assigned Public IP block).</em></p><pre>apiVersion: &quot;cilium.io/v2alpha1&quot;<br>kind: CiliumLoadBalancerIPPool<br>metadata:<br>  name: public-ip-pool<br>spec:<br>  blocks:<br>  - cidr: &quot;198.51.100.10/29&quot; # REPLACE WITH YOUR REAL IPs<br>---<br>apiVersion: &quot;cilium.io/v2alpha1&quot;<br>kind: CiliumL2AnnouncementPolicy<br>metadata:<br>  name: default-l2-policy<br>spec:<br>  interfaces:<br>  - eth0<br>  externalIPs: true<br>  loadBalancerIPs: true</pre><h3>Step 6: The Production Readiness Stack</h3><p>Your bare metal cluster is now online, highly available, and networking natively via eBPF. However, a true production environment requires a Day-2 operations stack:</p><ul><li><strong>Ingress Routing:</strong> Deploy the Kubernetes Gateway API (Envoy) or NGINX Ingress Controller for proper HTTP/S traffic routing.</li><li><strong>Certificate Management:</strong> Install cert-manager integrated with Let&#39;s Encrypt for automated TLS renewals.</li><li><strong>Observability:</strong> You are flying blind without metrics. Deploy the Prometheus Operator, Grafana, and Cilium Hubble to monitor cluster health and network flows.</li></ul><h3>Talos Kubernetes &amp; Bare Metal FAQ</h3><p><strong>What is the difference between talosctl and kubectl?</strong> talosctl is the CLI tool used to manage the underlying Talos operating system (e.g., configuring networks, upgrading the OS, fetching syslog). kubectl is the standard Kubernetes CLI used to manage containerized applications and cluster resources (e.g., deploying pods, managing services).</p><p><strong>Why use Talos Linux instead of Ubuntu for Kubernetes?</strong> General-purpose distributions like Ubuntu require extensive CIS hardening, frequent OS-level patching, and SSH key management. Talos Linux eliminates configuration drift and OS-level vulnerabilities by being immutable and strictly API-managed, saving hundreds of hours in DevOps maintenance.</p><p><strong>Do I need a USB drive to install Talos on Bare Metal?</strong> No. In an enterprise datacenter environment, you can mount the Talos ISO remotely using your dedicated server’s IPMI / iKVM console or utilize PXE booting for automated, remote deployments without requiring physical access to the hardware.</p><p><strong>How do bare metal Kubernetes nodes communicate securely?</strong> Kubernetes nodes should never route internal traffic over the public internet. Secure bare metal clusters route Control Plane and Worker node traffic exclusively over an isolated Private VLAN (Layer 2), effectively mitigating external network sniffing and DDoS attacks on internal components.</p><p>🔗 <strong>Deploy your next K8s Cluster on High-Performance Infrastructure:<br></strong><a href="https://www.servermo.com/dedicated-servers-usa/"><strong>https://www.servermo.com/dedicated-servers-usa/</strong></a></p><p><em>This article was originally published on the ServerMO Blog. Read the full tutorial and technical blueprint here:</em> <a href="https://www.servermo.com/howto/deploy-talos-linux-kubernetes-bare-metal/"><strong>The Bare Metal Kubernetes Blueprint: Deploy Talos Linux</strong></a></p><img src="https://medium.com/_/stat?event=post.clientViewed&referrerSource=full_rss&postId=e3d70e8f20c3" width="1" height="1" alt="">]]></content:encoded>
        </item>
        <item>
            <title><![CDATA[The Thinking Engine: Deploying NVIDIA NIM for Dynamic Quest Generation on Bare Metal]]></title>
            <link>https://medium.com/@ServerMO/the-thinking-engine-deploying-nvidia-nim-for-dynamic-quest-generation-on-bare-metal-107e903456e2?source=rss-4d43a52b335e------2</link>
            <guid isPermaLink="false">https://medium.com/p/107e903456e2</guid>
            <category><![CDATA[ai]]></category>
            <category><![CDATA[dedicated-server]]></category>
            <category><![CDATA[nvidia]]></category>
            <category><![CDATA[gaming]]></category>
            <category><![CDATA[game-development]]></category>
            <dc:creator><![CDATA[ServerMO]]></dc:creator>
            <pubDate>Sat, 28 Mar 2026 06:47:59 GMT</pubDate>
            <atom:updated>2026-03-28T06:47:59.060Z</atom:updated>
            <content:encoded><![CDATA[<p><strong>The Enterprise Blueprint for Real-Time LLM Dialogue and Evolving NPC Logic.</strong></p><figure><img alt="" src="https://cdn-images-1.medium.com/max/1024/1*g3etKZxOlkK3QBH9_mRcEw.jpeg" /></figure><p>While our previous guides focused on the “senses” of an NPC — using NVIDIA ACE for voice and facial animation — NVIDIA NIM (Inference Microservices) provides the actual “brain.”</p><p>We are fully into the era where gamers expect more than three static dialogue choices. They expect a living world that reacts to their moral choices, inventory changes, and previous interactions in real-time. But powering this intelligence at scale introduces a massive engineering hurdle: Latency.</p><h3>The Problem: The Cloud API Bottleneck</h3><p>The standard approach to Large Language Models (LLMs) in gaming relies on public cloud APIs. You take the player’s inventory, the world state, and the quest history, package it into a massive prompt, and send it over the internet.</p><p>The result? Unpredictable routing delays. In a fast-paced immersive environment, waiting two seconds for an NPC to process a prompt and generate a response completely breaks the illusion. Furthermore, renting tokens from a public API becomes financially disastrous when a game scales to millions of active players.</p><h3>The Solution: Resident VRAM on Dedicated Bare Metal 🏢⚡</h3><p>To achieve the robust Time-To-First-Token (TTFT) response times that AAA gaming demands, the model must stay resident in local VRAM. By self-hosting an optimized NIM on ServerMO Bare Metal, you eliminate the queue delays and virtualization tax of shared cloud providers.</p><p>Here is the architectural blueprint for deploying a production-grade logic stack without the lag.</p><h3>1. Hardware Validation for the Blackwell Era</h3><p>NIM containers utilize TensorRT-LLM for deep hardware-level acceleration. To unlock the latest optimizations for the newest GPU architectures (like the RTX 5090 or L40S), your bare metal server must be running the latest generation of NVIDIA drivers (570+) alongside a modern CUDA toolkit. Direct hardware access ensures zero hypervisor overhead.</p><h3>2. Model Selection and the KV Cache Warning</h3><p>For a single-GPU deployment, efficiency is everything. We strongly recommend deploying the highly optimized Llama-3.1–8B-Instruct model using FP8 Quantization.</p><p>Why not a 70B model? Attempting to load a 70B parameter model on a single 24GB or 48GB GPU is a guaranteed recipe for a fatal Out of Memory (OOM) crash. The model weights alone consume massive VRAM, leaving absolutely no room for the <strong>KV Cache</strong>. In gaming, the KV Cache is critical — it is the memory space used to store the massive context windows required for complex, ongoing quest histories. An 8B model leaves plenty of VRAM free to remember what the player did ten minutes ago.</p><h3>3. The Logic Stack and Shared Memory</h3><p>When setting up your production container environment for NVIDIA Triton, there is a hidden pitfall that crashes many deployments. The inference engine requires a large, dedicated RAM disk (tmpfs) mapped to shared memory. Unlike the KV Cache, which strictly resides in the GPU VRAM, this shared memory buffer is critical for Inter-Process Communication (IPC) between the CPU and GPU. Allocating at least 16GB here ensures your engine won’t crash under heavy concurrent player load.</p><h3>4. Token Streaming &amp; Prompt Guardrails</h3><p>When querying your bare metal API from the game engine, two configurations are non-negotiable:</p><ul><li><strong>Token Streaming:</strong> This ensures the UI displays text instantly, exactly like a human typing or speaking, rather than waiting for the entire paragraph to generate.</li><li><strong>Prompt Guardrails:</strong> Players will inevitably attempt prompt injection (e.g., trying to convince the NPC to hand over a god-tier weapon for free). You must enforce strict rules and lore boundaries via the core system role instructions before the player’s prompt is ever processed.</li></ul><h3>5. Engine Integration and Scaling</h3><p>Modern engines like Unreal Engine 5 can use native HTTP modules to construct a JSON payload containing the player’s context window. This is sent directly to your Bare Metal NIM endpoint.</p><p>For multiplayer games and MMOs, a single instance will eventually bottleneck as concurrent requests fill up the GPU’s KV Cache. For true enterprise scaling, studios deploy multiple inference replicas across ServerMO Bare Metal clusters, routing traffic through a high-bandwidth internal load balancer.</p><h3>Stop Renting Tokens. Own the Factory.</h3><p>Processing complex AI logic and massive context windows requires unthrottled GPU power. By moving your inference to dedicated infrastructure, you secure your data, eliminate API rate limits, and guarantee sub-100ms response times for your players.</p><p><strong>Read the full step-by-step technical guide:</strong> 🔗 <a href="https://www.servermo.com/howto/deploy-nvidia-nim-dynamic-narrative-bare-metal/"><strong>NVIDIA NIM on Bare Metal: Setup AI Quest Generation</strong></a></p><img src="https://medium.com/_/stat?event=post.clientViewed&referrerSource=full_rss&postId=107e903456e2" width="1" height="1" alt="">]]></content:encoded>
        </item>
        <item>
            <title><![CDATA[The Linux OOM-Killer Protocol: Stop the “Killed” Message in AI Training]]></title>
            <link>https://medium.com/@ServerMO/the-linux-oom-killer-protocol-stop-the-killed-message-in-ai-training-9c8ed7942811?source=rss-4d43a52b335e------2</link>
            <guid isPermaLink="false">https://medium.com/p/9c8ed7942811</guid>
            <category><![CDATA[pytorch]]></category>
            <category><![CDATA[devops]]></category>
            <category><![CDATA[ai]]></category>
            <category><![CDATA[linux]]></category>
            <category><![CDATA[machine-learning]]></category>
            <dc:creator><![CDATA[ServerMO]]></dc:creator>
            <pubDate>Thu, 26 Mar 2026 08:22:12 GMT</pubDate>
            <atom:updated>2026-03-26T08:22:12.685Z</atom:updated>
            <content:encoded><![CDATA[<h3>Master the 2-minute enterprise fix to protect your PyTorch models from silent kernel terminations.</h3><figure><img alt="" src="https://cdn-images-1.medium.com/max/1024/1*chndUhL0l80e-VR-i2hmAg.jpeg" /></figure><p>You’ve spent 12 hours fine-tuning a Large Language Model. You check the terminal in the morning, expecting a finished checkpoint. Instead, you see one devastating word: <strong>Killed</strong>.</p><p>No Python traceback. No error logs. Your process simply vanished.</p><p>Welcome to the <strong>Linux OOM-Killer</strong> (Out-Of-Memory Killer). When system RAM drops too low, the kernel acts as a sniper, targeting the heaviest process — usually your AI model — to prevent a total system freeze.</p><h3>The Diagnostic Blueprint</h3><p>Before you change your code, confirm the assassination. Interrogate the kernel ring buffer:</p><p>Bash</p><pre>dmesg -T | grep -i &#39;killed process&#39;</pre><p>If you see an entry like Out of memory: Killed process (python3), you’ve been hit.</p><h3>Step 1: The Strict Overcommit Shield</h3><p>By default, Linux uses “Heuristic Overcommit” — it lies to applications, promising RAM that doesn’t exist. When PyTorch tries to claim that fake memory, the OOM-Killer strikes.</p><p>To stop this, switch to <strong>Strict Mode</strong>:</p><ul><li><strong>Set Strict Overcommit:</strong> vm.overcommit_memory = 2</li><li><strong>Increase the Ratio:</strong> vm.overcommit_ratio = 100 (Crucial! Default is 50%, which will crash your AI even if you have 50% RAM free).</li></ul><p><strong>The Permanent Fix:</strong></p><p>Bash</p><pre>echo &quot;vm.overcommit_memory=2&quot; | sudo tee -a /etc/sysctl.conf<br>echo &quot;vm.overcommit_ratio=100&quot; | sudo tee -a /etc/sysctl.conf<br>sudo sysctl -p</pre><h3>Step 2: The Docker OOM Bypass</h3><p>Running in containers? You can manually disable the killer for specific AI workloads:</p><p>Bash</p><pre>docker run --gpus all --oom-kill-disable -d my-ai-model</pre><p><em>Note: This makes your process “immortal.” Use with caution to avoid locking yourself out of the server if a leak occurs.</em></p><h3>The Bare Metal Edge vs. Cloud VM “Ballooning”</h3><p>Why does this happen more on Cloud VMs? <strong>Memory Ballooning.</strong> Cloud hypervisors dynamically “steal” idle RAM from your VM to give to other tenants. When your PyTorch DataLoader suddenly spikes, the hypervisor can’t return that RAM fast enough, triggering a fatal OOM kill.</p><p><strong>ServerMO Bare Metal</strong> eliminates this. You get 100% dedicated, unshared DDR5 RAM. No ballooning, no oversubscription — just uninterrupted tensor processing.</p><p>📖 <strong>Read the Full Engineering Guide:</strong> 🔗 <a href="https://www.servermo.com/howto/stop-linux-oom-killer-ai-crash/">Stop AI Crashes: The Linux OOM-Killer Protocol</a></p><img src="https://medium.com/_/stat?event=post.clientViewed&referrerSource=full_rss&postId=9c8ed7942811" width="1" height="1" alt="">]]></content:encoded>
        </item>
        <item>
            <title><![CDATA[ How to Install, Set Up, and Configure an FTP Server Using vsftpd on a Linux Server]]></title>
            <link>https://medium.com/@ServerMO/how-to-install-set-up-and-configure-an-ftp-server-using-vsftpd-on-a-linux-server-580089e26716?source=rss-4d43a52b335e------2</link>
            <guid isPermaLink="false">https://medium.com/p/580089e26716</guid>
            <category><![CDATA[security]]></category>
            <category><![CDATA[ftp]]></category>
            <category><![CDATA[how-to]]></category>
            <category><![CDATA[servers]]></category>
            <category><![CDATA[linux]]></category>
            <dc:creator><![CDATA[ServerMO]]></dc:creator>
            <pubDate>Fri, 13 Mar 2026 03:39:33 GMT</pubDate>
            <atom:updated>2026-03-13T03:39:33.416Z</atom:updated>
            <content:encoded><![CDATA[<p>Secure file transfers made easy — a complete step-by-step setup guide. <strong><em>By ServerMO</em></strong></p><figure><img alt="" src="https://cdn-images-1.medium.com/max/1024/1*PHIopVmYFID8X32NiizVrg.avif" /><figcaption>a thumbnail of How to Install, Set Up, and Configure an FTP Server Using vsftpd on a Linux Server</figcaption></figure><p>FTP (File Transfer Protocol) remains a powerful, reliable method for moving files across networks. Whether you’re transferring website files, backups, or large media packages, FTP gets the job done — fast and efficiently.</p><p>In this guide, <strong>ServerMO</strong> walks you through how to install, configure, and secure <strong>vsftpd</strong>, one of the most trusted FTP server tools on Linux. Perfect for system administrators, developers, and power users working on <strong>CentOS, RHEL, or Ubuntu</strong>.</p><h3>📦 What Is an FTP Server?</h3><p>An <strong>FTP server</strong> acts like your digital warehouse — a remote system where you can <strong>upload, download, and manage files</strong> with ease. Unlike email or web-based file transfers, FTP is built for <strong>speed and scale</strong>, making it ideal for businesses and sysadmins handling bulk data or site backups.</p><p>With the right configuration, it becomes a <strong>secure, high-performance environment</strong> for team collaboration and data distribution.</p><h3>🔧 Installing vsftpd on Linux</h3><p>vsftpd (Very Secure FTP Daemon) is known for its <strong>simplicity, stability, and speed</strong>. Here’s how to install it:</p><h3>On CentOS / RHEL</h3><p>Open your terminal and run:</p><pre>sudo dnf install vsftpd<br># For older CentOS/RHEL versions:<br>sudo yum install vsftpd</pre><h3>On Ubuntu/Debian</h3><pre>sudo apt update<br>sudo apt install vsftpd</pre><p>Done! Now let’s move on to the configuration phase.</p><h3>⚙️ Configuring vsftpd on Linux</h3><p>The configuration file for vsftpd is typically found at:</p><pre>/etc/vsftpd/vsftpd.conf</pre><p>Open it with a text editor (we’ll use nano here):</p><pre>sudo nano /etc/vsftpd/vsftpd.conf</pre><h3>Key Setting: Enable File Uploads</h3><p>To allow users to upload files to the server, make sure this line is set:</p><pre>write_enable=YES</pre><p>This lets users authenticated via /etc/passwd (Linux system users) to write to their assigned directories.</p><h3>▶️ Starting and Enabling the vsftpd Service</h3><p>Once you’ve configured your FTP server, start the service and make it persistent across reboots.</p><pre># Start the FTP service<br>sudo systemctl start vsftpd<br><br># Enable it to run on boot<br>sudo systemctl enable vsftpd</pre><p>You now have a basic FTP server up and running.</p><h3>🔒 Securing Your FTP Server</h3><p>FTP by default is not encrypted, so it’s crucial to lock things down to avoid unauthorized access and data leaks.</p><h3>✅ 1. Configure the Firewall</h3><p>Make sure your firewall allows FTP traffic:</p><pre>sudo firewall-cmd --permanent --add-port=21/tcp<br>sudo firewall-cmd --permanent --add-port=20/tcp<br>sudo firewall-cmd --reload</pre><p>On Ubuntu:</p><pre>sudo ufw allow 20/tcp<br>sudo ufw allow 21/tcp</pre><h3>✅ 2. User Authentication</h3><ul><li>Create individual Linux users for FTP access</li><li>Assign them strong passwords</li><li>Use chroot_local_user=YES in the config file to lock them to their home directories</li></ul><h3>✅ 3. Monitor Logs and Usage</h3><p>vsftpd logs to /var/log/vsftpd.log. Regularly check for suspicious login attempts or unauthorized actions.</p><p>Optional (but recommended): Enable FTPS (FTP over SSL/TLS) for encrypted sessions.</p><h3>📈 Why FTP Still Matters</h3><p>Setting up an FTP server might seem old school, but it’s still an essential tool for <strong>fast, structured file transfers</strong> across internal or external networks.</p><p>With <strong>vsftpd</strong>, you get:</p><ul><li>Lightweight and secure file transfer</li><li>Easy configuration and maintenance</li><li>Compatibility with most FTP clients</li><li>Support for anonymous and authenticated users</li></ul><p>When secured properly, FTP offers a robust solution for teams and enterprises that need <strong>control, speed, and automation</strong>.</p><h3>💪 Power Your FTP Infrastructure with ServerMO</h3><p>A reliable FTP setup needs a powerful server behind it — and that’s where <strong>ServerMO</strong> delivers.</p><p>We offer <strong>high-performance bare-metal servers</strong> designed for intensive tasks like file transfer, app hosting, and enterprise operations.</p><h3>Why Choose ServerMO?</h3><ul><li>💻 Intel &amp; AMD Enterprise CPUs</li><li>🚀 1Gbps to 100Gbps Dedicated Uplinks</li><li>🛡️ Full DDoS Protection Included</li><li>🌐 Global Data Center Deployment</li><li>🧩 Custom OS Support (Any Linux or Windows Distro)</li><li>🤝 24/7 Expert Support</li></ul><h3>📎 Ready to Set Up Your Own FTP Server?</h3><p>Visit <a href="https://www.servermo.com/"><strong>S</strong>erverMO</a> to browse our lineup of dedicated servers — optimized for developers, sysadmins, and businesses that take file security and performance seriously.</p><p>References: <a href="https://www.servermo.com/howto/install-vsftpd-ftp-server-linux-guide/">How to Install, Set Up, and Configure an FTP Server Using vsftpd on a Linux Server</a></p><img src="https://medium.com/_/stat?event=post.clientViewed&referrerSource=full_rss&postId=580089e26716" width="1" height="1" alt="">]]></content:encoded>
        </item>
    </channel>
</rss>