Blog

  • How to monitor cloud costs with kosty: an AWS FinOps utility

    Some month ago I found this interesting project on github: https://github.com/kosty-cloud/kosty

    Kosty is a cost and security audit written in python and using aws cli to check the current status of your service.

    I followed the quickstart guide:

    pip3 install kosty

    python3 -m venv costy

    export AWS_DEFAULT_PROFILE=root

    kosty audit –output all

    Unfortunately this just output the json file, there is no GUI to check visually the cost and security audit.

    To get the GUI:

    git clone https://github.com/kosty-cloud/kosty

    cd kosty

    open dashboard/index.html

    The GUI start with an upload button

    Image

    After upload there is a summary charts

    Image

    In details

    Image

    Details of IAM Security issues is something like:

    Image

    Opening further for details there is this screen:

    Image

    That is really effective to operate and fix the specific problem.

    Conclusion

    Kosty is a very effective tool to periodically check the cost and security issue, not just an FinOps tool but also an full audit tool effective that should be integrated into other monitoring report.

    A possible use is to run it on server periodically. Then by exposing the .json files to a web server, and also exposing the dashboard is possible to check the status and operate.

    There is no alert system, everything must be checked directly, but the .json format can be scanned by any tool, so an alert can be settled up to notify daily what is going on your AWS services.

    If interested, check my service and contact me (whatsapp preferred)

  • Curiosity killed the cat? Why the market sadly cry?

    Image

    Let’s Find People Behind The Bushes!

    I would never have written this, I would never have thought this, and probably I am not going to publish it. I hate marketing.

    Back to 1895, Gustave Le Bon published “The Crowd: A study of The Popular Mind“, one of the most influential text on politics and marketing for the upcoming future. A friend of mine suggested to read this few years ago, it changed my mind about how to read the real world scenario, and on how (social) marketing really works.

    You can only change yourself, accidentally this change can influence His World. Or not.

    Can one change his marketing? and can that marketing influence his audience? Or he would be let just let behind because not following the current culture?

    My World, Your World, The Whole World

    In the era of AI, LLM, and transformers, (and ADHD) the concept of context and attention became more and more important. Switching context became a kind of exercise:

    • What if I think the same thing, in a different context?
    • If I behave “the same”, in a different context?
    • To what I would deserve attention, in a different context?
    • What if I deserve attention to something more useful, or less useful, in mine, and in a different context?
    • How am I perceived in my context? in your context? in global context?
    • Who am I? “One, None and A Hundred-Thousand” (Pirandello said)

    Survive till the death is the first individual goal. Almost every being can achieve that. Happiness is the (second) most important goal.

    How to achieve happiness? Have a look to what other does, imitate them, or just focus on my will and pursue those by all means?

    Social: -network, -politics, -influence, -science, -whatever. And marketing

    A number of social phenomena happens all-around-me, I can guess that those happens by some logic, until I do not experiment, and see the outcome.

    By accident, I published a content on LinkedIn, and it receive a lot of reaction. I can analyse what happens, then test again if this is the right or the wrong staff. It is an A/B test, a well known technics used in SEO/SEM for testing the performance of landing pages, copying, and other marketing staff.

    I am a technician, not a marketing dude. While designing systems, I keep my attention to specifications, and when needed on performance. I adopt the very same approach of A/B test:

    • Does the solution conforms to specifications/requirements? Then take it.
    • It does not satisfy specification/requirements? Try something else.

    Why performance matter (on software)

    Satisfying requirements is a wide requirement: when a solution does not perform enough, it does not satisfy requirements. Examples:

    • A webpage or SPA renders all elements expected by the user, but it renders in more than 1 second: user loose attention and perceive it as “not really working”
    • Recurring jobs are satisfied and adhere to specification, but it take 30 minutes to import data from remote, so the data are out of sync for longer time than expected: it does not really work.
    • The solution is formally correct, but hardware/energy resources costs (CPU and I/O) exceeds the income from customer paying the service: it does not worth to offer a service that costs more than it earns.

    Most of the time I have to fight against people arguing by KISS acronym, and that “it just works” is enough, but really it is not enough, and “stupid” is a person who do not get the real meaning of “it works” in their context.

    The scale factor

    From examples I gave, 3 important factors:

    • perception: user decide what to pay for, and decide it emotionally
    • timely consistency: data and informations must be current in time, because in time the service is serving data and doing staff.
    • income/cost: if this ratio is less than 1, it does not worth the spent time.

    A missing important number: the multiplier.

    Income/Cost ratio must be above 1, and multiplied by the number of usage. The scale factor.

    A key factor of succeeding business in IT is how many user are paying for the very same service.

    This metric apply to software, computer based service, but also to car selling industry, and all serial products.

    Value added and Karl Marx in the software industry

    In IT people are not involved on scaling, this is done by machines.

    I always loved the Value Added concept, there is a tax based on this, and is something counter intuitive:

    If you add value, you have to pay. Why on Earth one want to do that?

    Adding value is the only mean by which a service or a product is crafted. You do not sell goods. You sell your work, and your work is to add value.

    It is enough to read 2 chapters from Karl Marx “The Capital” to get this concept: resource are free, work makes the value of a good or a service.

    Time is limited, the goal of every business is to maximise the income for spent time.

    • Designing/Fixing/Maintaining a service is time consuming
    • Promote the service is time consuming
    • Selling the service is time consuming
    • Team leading is time consuming
    • But … Running system is a scalable income

    Business succeeds if the 4 time consuming elements are balanced by the scalable income: the greater is the audience, the bigger is the income.

    In this picture, the value is added by the machine running on behalf of the customer using it.

    So, where was the work == value lost here?

    Customer changes, customer needs changes, technology change

    These changes require work, the new work must be:

    • designed/fixed/maintained
    • promoted
    • sold
    • by a team sharing common values and goals

    The universal common value is to pursue wealth and not scarcity: if a business fails on this, every metrics would be broken or useless.

    My social A/B Test (and why results will destroy the World)

    But every content should also give value. My accidental A/B test happens on LinkedIn:

    https://www.linkedin.com/feed/update/urn:li:activity:7400842369140514816

    LinkedIn provides insight and metrics, I checked the numbers:

    • 15 thousand impressions in less than one week
    • 56 comments
    • 28 reactions

    In absolute, these numbers are very low, but for me is a valuable A/B test: normally I stop to one hundred impressions, at most 5 comments. This triggered my attention, I try to analyse what happened:

    • The image is absolutely random, but it show some CPU metrics (graph attracts technicians)
    • The copying is looser “sad story?…” but human: someone gave me support with his comment.
    • The message is not self-promoting, at first look it is self-demoting: people like to be better than others. (or just to promote themself)
    • The copying does not teach something, it just report a fact, with a wrong analysis (I just did it by accident: it was Sunday, I was bored). Wrong staff cause reaction: everybody want to fix something broken, even if Sunday (but maybe only technicians)
    • I got something back. Really I am not an hardware man, I just know something about Data Center for problems related to load balancing handled by Kubernetes orchestrator and VMs, but I am almost ignorant about consumer devices like miniPC. It engaged some interest by me, so it starts a discussion and something to interact.

    In a social networks there are different factors:

    • Me and my point of view
    • The topic
    • The sentiment
    • My network
    • Social network algorithm
    • Speed (to trigger reaction/comment)

    But the most important factor is what makes something “human”: The Error.

    What is changed on marketing is the use of The Error as a mean to sell staff. This happens mostly on politics, and marketing is doing the same, following this trend. But upside-down, the political communication is following marketing. So this circle is going to increase idiocracy to a never reached level, and …

    Image

    This will Destroy the Earth

    Then it comes the machine

    …and like in Terminator saga it will destroy the Earth. LLM is funny because it introduce errors here and there, so you feel more clever. But are you measuring the outcome?

    To balance effort vs income, it is important to measure:

    • Software production
    • Marketing performance
    • Selling performance
    • Team building effectiveness
    • Earning ratio

    Then you can stock more resource and focus on one or another sector, in order to increase the scale.

    LLM does not change this.

  • SSL Termination vs SSL Passthrough: balance between performance and easy management

    Depending on management cost and user experience requirements, it might be more sensible to configure internal service as https or as http.

    I give example of 2 backend exposing NodePort on Kubernetes, just to keep proxy concern clearly separated.

    Image

    SSL Termination at the Proxy (HTTP Mode)

    Concept

    • The proxy terminates TLS, handling encryption and certificate validation.
    • Backends receive plain HTTP (or optionally HTTP/2) traffic.
    • The proxy can inspect and modify headers (X-Forwarded-*) and perform routing, load balancing, caching, etc.
    Browser <--HTTPS--> Proxy (SSL Termination) <--HTTP--> Backend

    HAProxy Example (HTTP/2 Termination)

    frontend https_frontend
        bind *:443 ssl crt /etc/haproxy/ssl/wildcard_beerme.pem alpn h2,http/1.1
        mode http
        option httplog
    
        # Match hostnames
        acl host_redmine hdr(host) -i redmine.beerme
        acl host_odoo    hdr(host) -i odoo.beerme
    
        # Route to backend
        use_backend redmine_backend if host_redmine
        use_backend odoo_backend    if host_odoo
    
    backend redmine_backend
        mode http
        balance roundrobin
        option forwardfor
        http-request set-header X-Forwarded-Proto https
        http-request set-header X-Forwarded-Port  443
        server r1 10.4.1.11:30080 check
        server r2 10.4.1.12:30080 check
    
    backend odoo_backend
        mode http
        balance roundrobin
        option forwardfor
        http-request set-header X-Forwarded-Proto https
        http-request set-header X-Forwarded-Port  443
        server o1 10.4.1.11:32036 check
        server o2 10.4.1.12:32036 check

    Notes:

    • alpn h2,http/1.1 enables HTTP/2 between browser and proxy.
    • Backend sees plain HTTP. Browser multiplexing happens between browser and proxy only.
    • The backend receives headers indicating original HTTPS.

    NGINX Example (HTTP/2 Termination)

    server {
        listen 443 ssl http2;
        server_name redmine.beerme;
    
        ssl_certificate     /etc/nginx/ssl/wildcard_beerme.crt;
        ssl_certificate_key /etc/nginx/ssl/wildcard_beerme.key;
    
        location / {
            proxy_pass http://redmine_backend;
            proxy_set_header Host $host;
            proxy_set_header X-Forwarded-Proto https;
            proxy_set_header X-Forwarded-Port 443;
            proxy_set_header X-Forwarded-For $proxy_add_x_forwarded_for;
            proxy_http_version 1.1;
            proxy_set_header Connection "";
        }
    }
    
    upstream redmine_backend {
        server 10.4.1.11:30080;
        server 10.4.1.12:30080;
    }

    Notes:

    • http2 in listen enables browser-side HTTP/2.
    • Backends only receive HTTP/1.1 requests unless proxy_http_version 2 is supported (NGINX default to HTTP/1.1 upstream).
    • Headers inform backend of original HTTPS protocol.

    SSL Passthrough (TCP/Stream Mode)

    • Proxy does not terminate TLS; it simply forwards TCP based on SNI.
    • Browser connects directly to the backend TLS session.
    • End-to-end HTTP/2 multiplexing is preserved.

    HAProxy TCP Passthrough Example

    frontend ssl_passthrough
        bind *:443
        mode tcp
        tcp-request inspect-delay 5s
        tcp-request content accept if { req.ssl_hello_type 1 }
    
        acl redmine_sni req.ssl_sni -i redmine.beerme
        acl odoo_sni    req.ssl_sni -i odoo.beerme
    
        use_backend redmine_tcp_backend if redmine_sni
        use_backend odoo_tcp_backend    if odoo_sni
    
    backend redmine_tcp_backend
        mode tcp
        server r1 10.4.1.11:30080 check
        server r2 10.4.1.12:30080 check
    
    backend odoo_tcp_backend
        mode tcp
        server o1 10.4.1.11:32036 check
        server o2 10.4.1.12:32036 check

    NGINX Stream Passthrough Example

    stream {
        map $ssl_preread_server_name $backend_name {
            redmine.beerme redmine_backend;
            odoo.beerme    odoo_backend;
            default        blackhole;
        }
    
        upstream redmine_backend {
            server 10.4.1.11:30080;
            server 10.4.1.12:30080;
        }
    
        upstream odoo_backend {
            server 10.4.1.11:32036;
            server 10.4.1.12:32036;
        }
    
        server {
            listen 443;
            proxy_pass $backend_name;
            ssl_preread on;
        }
    }

    Notes:

    • Backend certificates must match domain names (redmine.beerme, odoo.beerme) or use a wildcard certificate.
    • Browser-side HTTP/2 multiplexing is fully preserved.
    • Proxy cannot inspect HTTP headers.

    Browser Performance

    Mode Browser HTTP/2 Multiplexing Backend Certificate Needed Pros Cons
    HTTP SSL Termination Only browser↔proxy Proxy cert only Header manipulation, routing, caching Multiplexing not end-to-end
    SSL Passthrough Browser↔backend (full) Backend cert valid Full HTTP/2 performance Backend must handle TLS, no header inspection

    Key takeaway:

    • For maximum page serving speed and real end-to-end HTTP/2, passthrough is superior.
    • SSL termination simplifies backend management and centralizes certs, but may limit multiplexing performance.

    SSL Termination in Proxy, but talking with HTTP/2 backend (Stateless REST)

    To take advance from stateless nature of REST requests, the better option is to let the proxy reuse the http/2 backend connection for more clients. This also reduce the total number of connection kept by the backend server, and the total number of TLS handshake required:

    • proxy keep one connection per backend
    • proxy negotiate one TLS per backend
    • proxy reuse the same backend for requests arriving from different clients

    For making nginx talk http/2 to the backend:

    server {
    listen 443 ssl http2;
    server_name redmine.beerme;
    
    ssl_certificate /etc/nginx/ssl/self.crt;
    ssl_certificate_key /etc/nginx/ssl/self.key;
    
    location / {
        proxy_pass http://backend_redmine;
        proxy_set_header Host $host;
        proxy_set_header X-Forwarded-For $proxy_add_x_forwarded_for;
        proxy_set_header X-Forwarded-Proto $scheme;
    
        # Enable HTTP/2 to backend
        proxy_http_version 2;
        proxy_set_header Connection "";
    }
    }

  • Building Reliable Air-Quality Monitoring with Open Hardware and Open Data

    How Community Sensors Taught Me More Than Any Enterprise Project**

    For years, I’ve worked on data-driven systems and IoT architectures. But nothing has clarified the importance of clean data pipelines, consistent units, and transparent hardware more than building and deploying my own air-quality monitoring device and contributing that data to sensor.community — one of the largest volunteer-driven air-quality networks in the world.

    This article documents the build, the engineering decisions, the challenges, and what I learned that directly applies to professional IoT systems, industrial telemetry, environmental monitoring, and smart-city architectures.


    Why I Built an Open Air-Quality Sensor

    Commercial air-quality dashboards are often black boxes:

    • unclear calibration
    • opaque algorithms
    • mixed unit systems
    • inconsistent timestamps
    • unpredictable firmware updates

    I wanted the opposite:

    a transparent, measurable, auditable device that exposes exactly how data is collected, processed, and transmitted.

    That’s why I followed the approach described in this build tutorial (same as the one in the linked video) using:

    • NodeMCU / ESP8266 microcontroller
    • SDS011 particulate matter sensor (PM2.5 / PM10)
    • BME280 temperature, humidity, pressure sensor
    • 3D-printed enclosure with active and passive airflow zones

    This combination is extremely popular in the sensor.community network because the hardware is reliable, open, and inexpensive — and because the data can be validated against nearby devices.


    Technical Breakdown of the Hardware

    1. The Microcontroller: ESP8266

    A compact Wi-Fi microcontroller that runs stable firmware and publishes data every 150 seconds.

    It’s efficient, predictable, and ideal for sensor telemetry.

    Key reasons for choosing it:

    • low power consumption
    • stable over weeks of uptime
    • full transparency of firmware
    • broad community support

    2. SDS011 Laser-Based PM Sensor

    A simple, robust unit with active airflow and laser scattering measurement.

    Outputs:

    • PM2.5 (µg/m³)
    • PM10 (µg/m³)

    It communicates via serial protocol with deterministic timing — perfect for calibration and consistent ingestion.

    3. BME280 Environmental Sensor

    This is where some engineering becomes critical.

    It outputs:

    • Temperature (°C)
    • Humidity (%)
    • Pressure (Pa)

    These are natively metric, and this experience strongly reinforced why IoT systems should never default to imperial units: the hardware itself is metric.


    Sensor Enclosure and Data Accuracy

    I used the same dual-chamber airflow design shown in the reference video:

    • one chamber dedicated to the SDS011 with forced airflow
    • one chamber for the BME280, isolated from dust turbulence
    • vertical intake paths to reduce precipitation ingress
    • shaded positioning to avoid radiant heat interference

    This alone significantly improves data quality compared to the “bare PCB + cable” approach seen in cheap commercial monitors.


    Joining the Sensor.Community Network

    Once deployed, the device:

    1. connects to Wi-Fi
    2. publishes data to sensor.community servers
    3. positions itself on the global map
    4. sends anonymized, openly accessible environmental data

    My node became part of a public ecosystem where anyone can compare sensor readings, overlay weather data, or integrate the feed into their own dashboards.

    This was my first major lesson:

    open IoT is stronger than closed IoT, because cross-validation detects errors, drifts, and anomalies immediately.


    What I Learned (and Why It Matters for Professional Projects)

    1. You can’t build trustworthy analytics without trustworthy sensors.

    Garbage in → garbage out.

    Good engineering starts at the edge, not in the cloud.

    2. Units matter more than people think.

    Every sensor outputs SI units.

    Every analytics pipeline expects SI units.

    Every scientific model requires SI units.

    If any component converts to imperial:

    • rounding errors appear
    • ML training becomes inconsistent
    • dashboards stop aligning
    • thresholds become untrustworthy

    This was one of the motivations behind my broader advocacy for metric-by-default software systems.

    3. Firmware transparency is not optional.

    With open firmware, I could inspect:

    • sampling intervals
    • debouncing logic
    • calibration curves
    • JSON payload structures

    This level of transparency makes debugging deterministic — essential for industrial IoT.

    4. Community networks beat proprietary cloud platforms in early detection.

    If my device drifted, nearby sensors revealed it instantly.

    If a firmware update broke a reading, the entire network noticed.

    Proprietary platforms rarely provide this level of collective validation.

    I has also a nice presentation page

    Image

    5. Stable telemetry pipelines require discipline.

    From device → local preprocessing → network → ingestion → storage → dashboard

    every step must be clean, predictable, and unit-consistent.

    This is directly applicable to:

    • smart city deployments
    • microclimate monitoring
    • industrial IoT systems
    • energy and HVAC monitoring
    • air quality compliance networks
  • How to deal with change management: plan and actions

    How to deal with change management: plan and actions

    One of the most important strategy asset in a IT company is the ability to deal with changes.In Software World everything changes, and it changes quickly: market, customer expectation, third party software.Being a technician and working on IT, I mostly focused on customer expectation (in term of UI/UX), and software changes (in term of service…

    Pivoting: when company changes its offer

    As of my experience, it’s usual that some service has no more effective, so customer stop to pay for it.

    There are big software pivot that everyone can see from the marketing news, for general public, but internally every IT companies deal with small or big changes every month, and every week.

    Changing is not an optional task: change must be implemented and managed.

    Planning the change vs telling the aims

    I my opinion there should be a cycle behind change management, as it is in every software development field:

    1. communicate requirements behind the change: what to change
    2. plan a strategy for the change

    The most conflicting staff involves time-to-market and technical debits

    Underestimate the technical debit impact

    This is a big issue on AI and LLM era, where everything looks easy to implement, as long one keeps stopping to demo-stage of the project.

    When you adopt MCP (on IDE or on repository like GitHub) to deploy the power of an LLM, but just stop at the prototype stage you miss out two important quality:

    • corner cases bugs
    • changeability level of the code (also readability of the code)

    I am astonished by the way an LLM (not) deals with corner case, maybe better MCP & LLM code can be written, but most of the time it never does refactoring or whole project review, it never removes staff from the code, and when asked to review the whole code the outcome is something random, far away from any kind of coherent specification.

    Breaking the architecture: is “Domain” in DDD optional?

    When arranging the backend code in micro service the key is to define a cohesive set of domain, one for each micro service.

    It is important to be elastic on this:

    • the set of domains can change
    • some change can break the domain limit for a while (but it is a technical debit that must be fixed)
    • change of domain set must be managed as architectural hop: plan to move every thing on new set, and switch off old services

    Deal with database structural change: the biggest challenge

    If company cares about their customers (and their money), it must care about their data.

    The best practice on implementing micro service architecture is: a service <-> a database. This might be or might not be effective, depending on current status of the system.

    Sometime SOA is introduced as porting strategy for existing system, and database structure does not allow to define a specific db for the specific service.

    Also a well designed micro service scales and deals with a small set of operations, here is where local db is sensible choice for a group of mini and micro service, together implementing a service.

    As example, lets consider a data import service. In a micro service architecture this can be compound of some more mini and micro services:

    • ImportControllerEngine: an Import Engine, that automate the data import process
    • DataRetrievementService: a service that retrieve data from external source
    • ImportPolicyConfigService: a service that expose an API to insert an import process policy, including pickup time and data mapping from external to internal structure
    • DataImportService: a service that actually read import data into the db
    Image

    With this architecture, DataImportService and DataRetrievementService must scales to deal with multiple data retrieve and import. ImportPolicyConfigService has less demand of scalability, it can also provide a facade to expose API to generate event for ImportControllerEngine.

    This architecture can be changed, enhanced, splitted into multiple scalable pieces, or grouped, depending on the demand.

    Here I want to highlight the Databases (and message broker) changes that might be required in response of business demand.

    There is an ImportServiceDB and a targetDB, both structure may requires change during the lifetime of the service, and if you want to keep making money, you should keep the span lifetime of the service as longer as possible.

    ImportServiceDB changes

    The data structures are subject to change whenever a new import policy or new import feature, like source filtering, source retrieve feature, source-to-target changes, etc.

    These changes are not something one can plans upfront the design of the architecture. What can be done is to design the plan of the change, then implement it.

    The general advice for database migration (dbms provider or table structure) is:

    • add the code to write new data to newer structure or db
    • run the code to migrate (or use a migration tool)
    • replace the code that read data, to read it from new structure
    • remove the code that write to older structure

    Without a clear definition of domains of each micro service, the only option would be to inspect the code for each service, and find what are the change to be made.

    Also without a good coding practice (read: clean code, class design), the change can be difficult, or almost impossible without an upfront code review and refactoring stage.

    targetDB changes

    The targetDB can be external from the ImportService, so changes can involve other services as well their micro and mini service composing them. Again the change is not something you can plan upfront: you just need to design and implement it. Following the general advice of migration process, coordinating step by step for each service involved on the change, and keeping the whole system working.

    Conclusions

    The most important element to deal with when migrating database structure is technical debit, being on domain definition, in the architecture, in the code, when a change is asked the required time depends strongly on the state of the system, and clean code is not optional.

    So to evaluate a work it is important an insight glimpse at the status of the system at all level, starting from API, below to architecture arrangement, and then in code implementation.

    In my experience, technical debit is the most underestimated tracker on issue tracker system: most of the time is misused, ignored, confused.

    So the real effective way to evaluate a change is to inspect the code. That is time consuming by itself.

    My Consulting Services

  • Define API by Composable Gateway API Manager

    An Gateway API Manager able easy to define by self-explaining GUI in React:

    Features

    • define Entrypoint Swagger by parsing JSON payload into OpenAPI/Swagger definition
    • Routing logic by expression matching of previous stages
    • Utility docker image (cogwbackend) to execute internal routing of payload, logic, and JSON payload returns of execution
    • Works on Kubernetes (by namespace), and Docker Swarm (by Service’s Label filtering)
    • Utility docker image to execute SQL (HSQL) based on json payload for query and replacement.
    • Stream both internal request and output of execution (in case of single executable node: hsql or rest service

    Status of the project

    Frontend is producing a JSON payload, this is consumed by cogwbackend that calls internal service and returns back the response.

    Streaming feature is in development.

    For more infos, contact me

  • My latest LLM code nightmare

    Customer needs to automate code statical analysis into the integration workflow using a SAST tool.

    The detailed task specification comes from ChatGPT, suggesting semgrep ran from a docker image semgrep/semgrep.

    I thought it wasn’t a bad idea. ChatGPT suggested to integrate the tool as a pre-commit git hook, that is fair for local development, but not for automated continuous integration. Ok, a pretty useless suggestion, but something to start from.

    Ask to LLM Agent

    The tools adopted for CI/CD is Jenkins, so I start interacting with LLM to suggest a groovy pipeline to integrate semgrep/semgrep for code analysis.

    I forgot to state to LLM engine (Claude) that Jenkins was ran as container, managed by a docker swarm instance, running into a VM separated from production (since docker swarm does not have namespace, I fully agree with this arrangement).

    Code generated rely on separate script involving the execution of

    docker run -v $jenkinsworkspace:/code semgrep/semgrep ….

    Thats easy and cool. But wait, Jenkins ran in a docker, the agent used has the docker command, but it connect to the host /var/docker.socket

    This means that there is only a docker daemon on the host. When that docker daemon receive a command from docker CLI, it take argument as it is: mounting a volume from a path means just mounting the host’s path, because it is the only path known by docker daemon.

    The script generated by the LLM try to mount a container’s path to be read by a docker daemon. I feel lazy, and I want to let Claude agent fix the code for me.

    system architecture and pipeline design

    In details, the system architecture and pipeline for backend code (with or without SAST) was designed to do this steps:

    1. build a docker image of the service
    2. run unit tests and integration tests on the newly created docker image (using docker-compose for side services)
    3. (deploy by) push the docker image into a private repository
    4. Update the service into production or dev environment

    The first prompt I gave to the agent was to read the existing Jenkinsfile and integrate a SAST step.

    Please Claude, fix it

    I stated the problem, and suggested to use a volume.

    The agent suggested code:

    1. creates a volume
    2. extracting code from image into the new volume
    3. extract code from volume into the container filesystem
    4. remove the volume
    5. create a new volume for analysing the code
    6. copy code from container filesystem into a new volume
    7. analyse the code
    8. remove the new volume

    Isn’t here some repeated staff? The agent say no.

    This story go on for half an hour…

    The point was that the code was arranged in 2 function (bash function), and the agent treat those as silos.

    I didn’t want to waste my time explaining things to the agent, so I refactor the code my myself.

    I started with the idea to do staff in lazy mode, I ended with fighting the rigidity of AI Agent way of solving staff by adding operations.

    1. define unique volume name (using jenkins job number)
    2. create the volume
    3. extract code from image into the volume
    4. analyse the code contained into the volume
    5. remove the volume

    But I have to use my hand to arrange code this way.

    I also must say that code generated by agents are full of checks, some of those are clever and nice to have, some are paranoid driven. So the job is to remove extra-code.

    The integrated SAST step in the pipeline

    At the end the Jenkins pipeline for the job build new artifacts: SAST reports.

    Image

    Newly generated artifacts can be inspected to fix the code and release a more secure code.

    The step defined as:

            stage('SAST Security Scan') {
    steps {
    script {
    echo "🔒 Starting SAST scan for JavaScript/Node.js application"

    // Extract and scan from built Docker image (code only exists in image)
    echo "🐳 Extracting and scanning code from Docker image: ${env.LOCTAG}"
    def sastExitCode = sh(
    script: "./scripts/sast-scan-image.sh '${env.LOCTAG}'",
    returnStatus: true
    )

    // Read results summary - check both locations
    def summaryContent = ""
    if (fileExists('sast-summary.txt')) {
    summaryContent = readFile('sast-summary.txt').trim()
    } else if (fileExists('/tmp/sast-summary.txt')) {
    summaryContent = readFile('/tmp/sast-summary.txt').trim()
    }

    if (summaryContent) {
    env.SAST_RESULTS = summaryContent
    echo "SAST Results: ${env.SAST_RESULTS}"

    // Parse results for detailed logging
    def results = env.SAST_RESULTS.split(',')
    def highIssues = results[0].split(':')[1] as Integer
    def mediumIssues = results[1].split(':')[1] as Integer
    def lowIssues = results[2].split(':')[1] as Integer

    echo """
    🔍 SAST Scan Summary:
    🔴 High Severity: ${highIssues}
    🟡 Medium Severity: ${mediumIssues}
    🟢 Low Severity: ${lowIssues}
    """
    } else {
    echo "⚠️ No SAST summary found - assuming no issues"
    env.SAST_RESULTS = "HIGH:0,MEDIUM:0,LOW:0"
    }

    // Set build status based on SAST results
    if (sastExitCode == 2) {
    currentBuild.result = 'FAILURE'
    error("❌ SAST scan failed due to high severity security issues")
    } else if (sastExitCode == 1) {
    currentBuild.result = 'UNSTABLE'
    echo "⚠️ SAST scan marked build as unstable due to medium severity issues"
    } else {
    echo "✅ SAST scan passed successfully"
    }
    }
    }
    post {
    always {
    // Copy SAST results from /tmp if they exist there
    sh '''
    # Copy results from /tmp to workspace for archiving
    cp /tmp/semgrep-*.json . 2>/dev/null || true
    cp /tmp/semgrep-*.txt . 2>/dev/null || true
    cp /tmp/sast-*.txt . 2>/dev/null || true
    '''

    // Archive all SAST results
    archiveArtifacts artifacts: 'semgrep-*.json, semgrep-*.txt, sast-*.txt',
    fingerprint: true,
    allowEmptyArchive: true

    // Display scan results in build description
    script {
    if (env.SAST_RESULTS) {
    def results = env.SAST_RESULTS.split(',')
    def highIssues = results[0].split(':')[1]
    def mediumIssues = results[1].split(':')[1]
    def lowIssues = results[2].split(':')[1]

    currentBuild.description = """
    SAST: H:${highIssues} M:${mediumIssues} L:${lowIssues}
    """.trim()
    }
    }
    }
    failure {
    echo '❌ SAST scan failed - check security findings before proceeding'
    }
    unstable {
    echo '⚠️ SAST scan found medium severity issues - review before deployment'
    }
    success {
    echo '✅ SAST scan completed successfully'
    }
    }
    }

    here archiveArtifacts artifacts is the way to index the artifacts listed on top of Jenkins interface

    Is agent mode good or bad idea on coding?

    I am still reluctant to adopt agent mode. Someway it suggest good idea, but someway it use those idea in awful way.

    And worst of all, it keeps saying “you are perfectly right”, while it finds another idiot way of producing unnecessary code.

    I think that behind LLM usage there is an unsaid interest conflict:

    The more the agent interact and create crufty code, the more token are consumed, the more fee are charged.

    This is not about making things done, it is about give away your money thinking you find the cheaper developer in the market: the LLM Agent.

    But someway it helps to know staff. My orientation is to use a mix of agent mode, and internal prompt, fixing staff by hand when it is a matter of refactoring, or clearly crufty code. I prefer to think to LLM as an useful stochastic parrot.

    My services

  • The tale of Jenkins update to Java Jdk21 and Matrix Auth plugin

    The tale of Jenkins update to Java Jdk21 and Matrix Auth plugin

    Image

    I passed a long day dealing with Jenkins runtime update (from jdk17 to jdk21), and authorisation plugin (Matrix Auth) stopping to work. Here is the tale.

    Involved system:

    • Jenkins dockerized: https://github.com/jenkinsci/docker/blob/master/debian/trixie/hotspot/Dockerfile or maybe the slimmed version
    • https://docs.cloudbees.com/docs/cloudbees-ci-kb/latest/client-and-managed-controllers/after-updating-matrix-project-plugin-jenkins-fails-to-restart
    • Updating from jenkins/jenkins:lts-jdk17 –> to jenkins/jenkins:lts-jdk21

    Error log messages was related to hudson, and to matrix, and to authorization:

    Caused: jenkins.util.xstream.CriticalXStreamException: 
    [LF]> ---- Debugging information ----
    [LF]> cause-exception : com.thoughtworks.xstream.mapper.CannotResolveClassException
    [LF]> cause-message : hudson.security.GlobalMatrixAuthorizationStrategy
    [LF]> class : hudson.model.Hudson
    [LF]> required-type : hudson.model.Hudson
    [LF]> converter-type : hudson.util.RobustReflectionConverter
    [LF]> path : /hudson/authorizationStrategy
    [LF]> line number : 14
    [LF]> version : 2.516.3
    [LF]> -------------------------------
    at hudson.util.RobustReflectionConverter.doUnmarshal(RobustReflectionConverter.java:384)
    at hudson.util.RobustReflectionConverter.unmarshal(RobustReflectionConverter.java:291)

    What happened

    The goal was to update jenkins lts to jdk21. My strategy was:

    1. update all possible plugins
    2. restart
    3. Update the image used
    4. Update again all possible plugin

    But after 3. I had trouble restarting the image controlled by docker swarm service.

    I followed suggestion on https://docs.cloudbees.com/docs/cloudbees-ci-kb/latest/client-and-managed-controllers/after-updating-matrix-project-plugin-jenkins-fails-to-restart, downloaded latest version from https://plugins.jenkins.io/matrix-project/releases/

    Even tried to launch the command: jenkins-plugin-cli --plugins matrix-project:858.vb_b_eb_9a_7ea_99e

    (And yes, I was successful! I ran it in the 30 seconds before the service hung out! …anyway this does not work)

    What I did (some time later)

    I inspected config.xml, disabled security, and commented out all matrix of access:

      <useSecurity>false</useSecurity>
    <!--authorizationStrategy class="hudson.security.GlobalMatrixAuthorizationStrategy">
    ....
    </authorizationStrategy--!>

    Copyed the config.xml to config.xml-saved

    Everyone can access, but the service finally started

    Then I update all upgradable, fixed deprecated, then restarted.

    But at the end I chosen to adopt another security model, still using matrix, but using db from redmine.

    All in all: what does it do Matrix plugin

    https://plugins.jenkins.io/matrix-project

    Matrix project gives full control on user permission:

    Image

    That is a key component for multiuser setup.

    Update 29/09/2025

    Security, in details

    Security assessment: Attacks vs Risks

    Image
    Image
    Gogs plugin 1.0.15
    Non-constant time webhook token comparison (no fix available)
    Unsafe default behavior and information disclosure in webhook (no fix available)
    No fixes for these issues are available. It is recommended that you review the security advisory and apply mitigations if possible, or uninstall this plugin.
    docker-build-step 2.12
    CSRF vulnerability and missing permission check (no fix available)
    No fixes for these issues are available. It is recommended that you review the security advisory and apply mitigations if possible, or uninstall this plugin.
    https://www.jenkins.io/security/advisory/2023-10-25/#SECURITY-2896
    Non-constant time webhook token comparison in Gogs Plugin
    SECURITY-2896 / CVE-2023-46657
    Severity (CVSS): Low
    Affected plugin: gogs-webhook
    Description:
    Gogs Plugin 1.0.15 and earlier does not use a constant-time comparison when checking whether the provided and expected webhook token are equal.

    This could potentially allow attackers to use statistical methods to obtain a valid webhook token.

    As of publication of this advisory, there is no fix. Learn why we announce this.

    Real attack/risk: A discover of the token can at most launch an unwanted build, but still from the same git repository and branch, so idempotent.
    
    https://www.jenkins.io/security/advisory/2023-08-16/#SECURITY-2894
    Unsafe default behavior and information disclosure in Gogs Plugin webhook
    SECURITY-2894 / CVE-2023-40348 (information disclosure), CVE-2023-40349 (insecure default)
    Severity (CVSS): Medium
    Affected plugin: gogs-webhook
    Description:
    Gogs Plugin provides a webhook endpoint at /gogs-webhook that can be used to trigger builds of jobs. In Gogs Plugin 1.0.15 and earlier, an option to specify a Gogs secret for this webhook is provided, but not enabled by default.

    This allows unauthenticated attackers to trigger builds of jobs corresponding to the attacker-specified job name.

    Additionally, the output of the webhook endpoint includes whether a job corresponding to the attacker-specified job name exists, even if the attacker has no permission to access it.

    As of publication of this advisory, there is no fix. Learn why we announce this.

    Same risk as above.
    
    https://www.jenkins.io/security/advisory/2024-03-06/#SECURITY-3200
    CSRF vulnerability and missing permission check in docker-build-step Plugin
    SECURITY-3200 / CVE-2024-2215 (CSRF), CVE-2024-2216 (permission check)
    Severity (CVSS): Medium
    Affected plugin: docker-build-step
    Description:
    docker-build-step Plugin 2.11 and earlier does not perform a permission check in an HTTP endpoint implementing a connection test.
    This allows attackers with Overall/Read permission to connect to an attacker-specified TCP or Unix socket URL. Additionally, the plugin reconfigures itself using the provided connection test parameters, affecting future build step executions.
    Additionally, this endpoint does not require POST requests, resulting in a cross-site request forgery (CSRF) vulnerability.
    As of publication of this advisory, there is no fix. Learn why we announce this.

    Real risk: “attackers with Overall/Read permission”, must have access to the system, that is protected by Matrix Auth

    Conclusion of security assessment

    Facing detailed attack vector is of course a good idea, but it has a cost. That cost is not landed by a real gain on security

  • Easy Web Application Development with AWS Cognito and S3

    Easy Web Application Development with AWS Cognito and S3

    Image

    General direction for developing a complex Web Application was to:

    • setup a public website with all login/recall password, and staff
    • setup a backend with entrypoint for authentication, and JWT authorization, etc.
    • write the service
    • Pay infrastructure full time

    Thanks to AWS Cognito, S3 buckets, and Lambda, all those complexity is simplified. And one can pay as it goes: if the service has value for the user, then more lambda fuel must be filled.

    What in this AWS solution:

    • AWS Cognito with usergroup
    • Lambda function triggered for registered and confirmed users
    • Lambda function providing API and rely on .jwts known public key for checking REST authorised through JWT
    • A generic Database containing data for the application

    Read more on AWS Service in Amazon Web Services in Action

  • Dealing with new .kube/config

    Video: https://youtu.be/oBF-dUXZwrA

    Once you get a new config from a remote kubernetes installation you need to integrate it to existing local .kube/config file.

    Sometime, you or others, are doing experiments with kubernetes, so repeat the .kube/config integration steps over and over again, dealing with -data blobs

    List of ingredients

    • existing .kube/config
    • config file to be added, with: new cluster infos, user infos, context (binding user to cluster)
    • utility bash scripts:
      • extract_kconfig.sh for extrac infos and add cluster, context and user
      • kuco_import.sh for extracting BLOB into .crt and .key files

    Getting the scripts

    Retrieve the scripts by:

    curl https://raw.githubusercontent.com/danielecr/selfhosted/refs/heads/main/kubernetes/kuco_import.sh kuco_import.sh
    curl https://raw.githubusercontent.com/danielecr/selfhosted/refs/heads/main/kubernetes/extract_kconfig.sh extract_kconfig.sh

    Make those executable:

    chmod +x extract_kconfig.sh
    chmod +x kuco_import.sh

    First import

    First, copy locally the remote config for kubectl:

    scp remote.host:/home/user/.kube/config newremotekube

    then run the extract_kconfig.sh by:

    extract_kconfig newkubeconfig

    it is an interactive script. First output is some like:


    contexts:
    - context:
    cluster: kubernetes
    user: kubernetes-admin
    name: kubernetes-admin@kubernetes
    current-context: kubernetes-admin@kubernetes
    users:
    - name: kubernetes-admin
    user:

    creating:
    - a cluster named kubernetes
    - a user named kubernetes-admin
    - a context named kubernetes-admin@kubernetes

    Please, check for conflicts in .kube/config before going on!

    Enter host URL:

    Hit <return> without input. Check for conflict in current .kube/config file. If there are no conflict, run it again and insert the right host URL, i.e. http://remote.host:6443

    At the end the suggested command is:

    Now run:
    kuco_import.sh newkubeconfig newkubeconfig

    Run it. Extracted files are stored in the folder $HOME/.kube/

    Edit the .kube/config and remove the path from added certs and key filenames, i.e.:

        certificate-authority: /current/path/to/newkubeconfig-cluster-cert.crt
    #becomes
    certificate-authority: newkubeconfig-cluster-cert.crt

    now change current context by:

    kubectl use-context kubernetes-admin@kubernetes

    And check

    kubectl get nodes

    Same server new keys

    This is the quick part, every time the server changes certificate and user key, the commands to execute are:

    scp remote.host:/home/user/.kube/config newremotekube
    kuco_import.sh newkubeconfig newkubeconfig

    The .kube/config is untouched

    Bonus

    Now, kuco_import.sh is good for extracting also blob from existing .kube/config file by:

    cp .kube/config kindserv
    kuco_import kindserv

    Then edit the .kube/config relevant part by replacing -data with filename references