<?xml version="1.0" encoding="UTF-8"?>
<rss version="2.0" xmlns:content="http://purl.org/rss/1.0/modules/content/" xmlns:dc="http://purl.org/dc/elements/1.1/">
	<channel>
		<title><![CDATA[Python Forum - All Forums]]></title>
		<link>https://python-forum.io/</link>
		<description><![CDATA[Python Forum - https://python-forum.io]]></description>
		<pubDate>Thu, 23 Apr 2026 19:23:39 +0000</pubDate>
		<generator>MyBB</generator>
		<item>
			<title><![CDATA["openpyxl" is not accessed Pylance]]></title>
			<link>https://python-forum.io/thread-46277.html</link>
			<pubDate>Tue, 21 Apr 2026 16:19:54 +0000</pubDate>
			<dc:creator><![CDATA[<a href="https://python-forum.io/member.php?action=profile&uid=33212">dee</a>]]></dc:creator>
			<guid isPermaLink="false">https://python-forum.io/thread-46277.html</guid>
			<description><![CDATA[Hello,<br />
<br />
I installed openpyxl using pip install openpyxl, but I’m running into issues when trying to use it.<br />
<br />
In my script, import openpyxl is flagged by Pylance as "openpyxl is not accessed". When I run the script, I get the following error:<br />
ModuleNotFoundError: No module named 'openpyxl'<br />
<br />
Please advise.<br />
Thanks.<br />
Dee]]></description>
			<content:encoded><![CDATA[Hello,<br />
<br />
I installed openpyxl using pip install openpyxl, but I’m running into issues when trying to use it.<br />
<br />
In my script, import openpyxl is flagged by Pylance as "openpyxl is not accessed". When I run the script, I get the following error:<br />
ModuleNotFoundError: No module named 'openpyxl'<br />
<br />
Please advise.<br />
Thanks.<br />
Dee]]></content:encoded>
		</item>
		<item>
			<title><![CDATA[[SOLVED] [RadioBox] Set default?]]></title>
			<link>https://python-forum.io/thread-46276.html</link>
			<pubDate>Tue, 21 Apr 2026 09:02:51 +0000</pubDate>
			<dc:creator><![CDATA[<a href="https://python-forum.io/member.php?action=profile&uid=11957">Winfried</a>]]></dc:creator>
			<guid isPermaLink="false">https://python-forum.io/thread-46276.html</guid>
			<description><![CDATA[Hello,<br />
<br />
<span style="font-style: italic;" class="mycode_i">Searching Google or the archives with "RadioBox default" returned no hits.</span><br />
<br />
Do you know of a better way to set the default item in a set of radioboxes?<br />
<br />
Thank you.<br />
<br />
<pre class="brush: python" title="Python Code:">lblList = ['640','426','256'] 
self.rbox = wx.RadioBox(panel, label = 'Width', choices = lblList, majorDimension = 1, style = wx.RA_SPECIFY_ROWS) 
#right way to set default?
self.rbox.SetSelection(1)
"""
TODO How to avoid magic number?
for index, item in enumerate(fruits):
	print(index, item)
"""</pre>--<br />
Edit: Unless there's a better way:<br />
<pre class="brush: python" title="Python Code:">self.rbox.SetSelection(lblList.index("426"))</pre>]]></description>
			<content:encoded><![CDATA[Hello,<br />
<br />
<span style="font-style: italic;" class="mycode_i">Searching Google or the archives with "RadioBox default" returned no hits.</span><br />
<br />
Do you know of a better way to set the default item in a set of radioboxes?<br />
<br />
Thank you.<br />
<br />
<pre class="brush: python" title="Python Code:">lblList = ['640','426','256'] 
self.rbox = wx.RadioBox(panel, label = 'Width', choices = lblList, majorDimension = 1, style = wx.RA_SPECIFY_ROWS) 
#right way to set default?
self.rbox.SetSelection(1)
"""
TODO How to avoid magic number?
for index, item in enumerate(fruits):
	print(index, item)
"""</pre>--<br />
Edit: Unless there's a better way:<br />
<pre class="brush: python" title="Python Code:">self.rbox.SetSelection(lblList.index("426"))</pre>]]></content:encoded>
		</item>
		<item>
			<title><![CDATA[Best way to learn how to code in python]]></title>
			<link>https://python-forum.io/thread-46275.html</link>
			<pubDate>Mon, 20 Apr 2026 16:50:54 +0000</pubDate>
			<dc:creator><![CDATA[<a href="https://python-forum.io/member.php?action=profile&uid=43832">Q890</a>]]></dc:creator>
			<guid isPermaLink="false">https://python-forum.io/thread-46275.html</guid>
			<description><![CDATA[Hi,<br />
<br />
So lately ive been having issues with my code (1 week) its nothing big is just "simple calculator" or simple syntax and i try to practice and research daily about this stuff. Personally i love programming and its my dream to become a great programmer in future but it feels when i open the text editor and have a project in mind i am feeling stuck on this and just close it and feel bad about it. Any recommendation?  <img src="https://python-forum.io/images/smilies/eusa_think.gif" alt="Think" title="Think" class="smilie smilie_31" />]]></description>
			<content:encoded><![CDATA[Hi,<br />
<br />
So lately ive been having issues with my code (1 week) its nothing big is just "simple calculator" or simple syntax and i try to practice and research daily about this stuff. Personally i love programming and its my dream to become a great programmer in future but it feels when i open the text editor and have a project in mind i am feeling stuck on this and just close it and feel bad about it. Any recommendation?  <img src="https://python-forum.io/images/smilies/eusa_think.gif" alt="Think" title="Think" class="smilie smilie_31" />]]></content:encoded>
		</item>
		<item>
			<title><![CDATA[[SOLVED] Django: encrypt user password by mean Admin site]]></title>
			<link>https://python-forum.io/thread-46274.html</link>
			<pubDate>Sun, 19 Apr 2026 18:28:47 +0000</pubDate>
			<dc:creator><![CDATA[<a href="https://python-forum.io/member.php?action=profile&uid=39488">aecordoba</a>]]></dc:creator>
			<guid isPermaLink="false">https://python-forum.io/thread-46274.html</guid>
			<description><![CDATA[I'm developing a web site with Django.<br />
I'm using a custom user model and a custom user manager. The User class inherits from AbstractBaseUser and PermissionsMixin. The CustomUserManager inherits from BaseUserManager. This work fine: I can create superuser and othre users normally.<br />
<pre class="brush: python" title="Python Code:">from django.db import models
from django.urls import reverse
from django.contrib.auth.models import AbstractBaseUser, BaseUserManager, PermissionsMixin
from partners.models import Partner

class UserManager(BaseUserManager):
    def create_user(self, name, password=None, **extra_fields):
        if not name:
            raise ValueError("Name is required")
        user = self.model(name=name, **extra_fields)
        user.set_password(password)
        user.save(using=self._db)
        return user

    def create_superuser(self, name, password=None, **extra_fields):
        extra_fields.setdefault('is_staff', True)
        extra_fields.setdefault('is_superuser', True)
        return self.create_user(name, password, **extra_fields)

class User(AbstractBaseUser, PermissionsMixin):
    name = models.CharField(max_length=30, unique=True, null=False, blank=False, help_text='Username')
    partner = models.OneToOneField(Partner, on_delete=models.RESTRICT, null=True, blank=True, related_name='partner')
    date_joined = models. DateTimeField(auto_now_add=True)
    is_active = models.BooleanField(default=True)
    is_staff = models.BooleanField(default=False)

    objects = UserManager()

    USERNAME_FIELD = 'name'

    def __str__(self):
        return self.name

    def get_absolute_url(self):
        return reverse('user-detail', args=[str(self.id)])

    class Meta:
        managed = True
        db_table = 'users'
        ordering = ['name']</pre>But, when I create a user by mean the Admin site, the password in database is stored in plain text, instead encrypted.<br />
Can somebody help me?<br />
Thank you in advance.]]></description>
			<content:encoded><![CDATA[I'm developing a web site with Django.<br />
I'm using a custom user model and a custom user manager. The User class inherits from AbstractBaseUser and PermissionsMixin. The CustomUserManager inherits from BaseUserManager. This work fine: I can create superuser and othre users normally.<br />
<pre class="brush: python" title="Python Code:">from django.db import models
from django.urls import reverse
from django.contrib.auth.models import AbstractBaseUser, BaseUserManager, PermissionsMixin
from partners.models import Partner

class UserManager(BaseUserManager):
    def create_user(self, name, password=None, **extra_fields):
        if not name:
            raise ValueError("Name is required")
        user = self.model(name=name, **extra_fields)
        user.set_password(password)
        user.save(using=self._db)
        return user

    def create_superuser(self, name, password=None, **extra_fields):
        extra_fields.setdefault('is_staff', True)
        extra_fields.setdefault('is_superuser', True)
        return self.create_user(name, password, **extra_fields)

class User(AbstractBaseUser, PermissionsMixin):
    name = models.CharField(max_length=30, unique=True, null=False, blank=False, help_text='Username')
    partner = models.OneToOneField(Partner, on_delete=models.RESTRICT, null=True, blank=True, related_name='partner')
    date_joined = models. DateTimeField(auto_now_add=True)
    is_active = models.BooleanField(default=True)
    is_staff = models.BooleanField(default=False)

    objects = UserManager()

    USERNAME_FIELD = 'name'

    def __str__(self):
        return self.name

    def get_absolute_url(self):
        return reverse('user-detail', args=[str(self.id)])

    class Meta:
        managed = True
        db_table = 'users'
        ordering = ['name']</pre>But, when I create a user by mean the Admin site, the password in database is stored in plain text, instead encrypted.<br />
Can somebody help me?<br />
Thank you in advance.]]></content:encoded>
		</item>
		<item>
			<title><![CDATA[Need a script to replace an existing Python installation in a version-independent dir]]></title>
			<link>https://python-forum.io/thread-46273.html</link>
			<pubDate>Sun, 19 Apr 2026 07:12:27 +0000</pubDate>
			<dc:creator><![CDATA[<a href="https://python-forum.io/member.php?action=profile&uid=3010">pstein</a>]]></dc:creator>
			<guid isPermaLink="false">https://python-forum.io/thread-46273.html</guid>
			<description><![CDATA[Unfortunately Python is always installed (on Window for all users) in a version dependent directory similar to:<br />
<br />
C:\Program Files\Python312\<br />
<br />
Now I want to replace (!) this version COMPLETELY by a new version like 3.14.<br />
But if I start the Python package installation for v3.14 it does not overwrite the existing installation but installed it in a new,<br />
additional directory C:\Program Files\Python314\<br />
I coule manually select the rpevious or another direcory as instalaltion directory but that leaves confusing uninstall entries from old Python installations as orphans in Registry.<br />
<br />
I hate it.<br />
<br />
Why does Python not provide a version independent one-and-only installation? At least as an user option?<br />
WITH preserving my own, addtional previously installed modules?<br />
<br />
This is really annoying and user unfriendly<br />
<br />
I need a tool or Powerschell or batch script script which<br />
<br />
1.) saves all currently installed modules<br />
2.) uninstalls old Python from C:\Program Files\Python\<br />
3.) Installs newest, non-beta Python Release in C:\Program Files\Python\<br />
4.) Re-installs all my previously installed modules (see 1.)<br />
<br />
This cannot be that difficult. This is standard update procedure for 32525 other tools.<br />
Why not in Python? At least optional.<br />
<br />
I don't want to fiddle around with lots of cmdline parameters.<br />
<br />
This should run out of the box by just double click.<br />
<br />
Is there really no such update script?]]></description>
			<content:encoded><![CDATA[Unfortunately Python is always installed (on Window for all users) in a version dependent directory similar to:<br />
<br />
C:\Program Files\Python312\<br />
<br />
Now I want to replace (!) this version COMPLETELY by a new version like 3.14.<br />
But if I start the Python package installation for v3.14 it does not overwrite the existing installation but installed it in a new,<br />
additional directory C:\Program Files\Python314\<br />
I coule manually select the rpevious or another direcory as instalaltion directory but that leaves confusing uninstall entries from old Python installations as orphans in Registry.<br />
<br />
I hate it.<br />
<br />
Why does Python not provide a version independent one-and-only installation? At least as an user option?<br />
WITH preserving my own, addtional previously installed modules?<br />
<br />
This is really annoying and user unfriendly<br />
<br />
I need a tool or Powerschell or batch script script which<br />
<br />
1.) saves all currently installed modules<br />
2.) uninstalls old Python from C:\Program Files\Python\<br />
3.) Installs newest, non-beta Python Release in C:\Program Files\Python\<br />
4.) Re-installs all my previously installed modules (see 1.)<br />
<br />
This cannot be that difficult. This is standard update procedure for 32525 other tools.<br />
Why not in Python? At least optional.<br />
<br />
I don't want to fiddle around with lots of cmdline parameters.<br />
<br />
This should run out of the box by just double click.<br />
<br />
Is there really no such update script?]]></content:encoded>
		</item>
		<item>
			<title><![CDATA[[SOLVED] Recommended way to upgrade module?]]></title>
			<link>https://python-forum.io/thread-46271.html</link>
			<pubDate>Sat, 18 Apr 2026 15:20:18 +0000</pubDate>
			<dc:creator><![CDATA[<a href="https://python-forum.io/member.php?action=profile&uid=11957">Winfried</a>]]></dc:creator>
			<guid isPermaLink="false">https://python-forum.io/thread-46271.html</guid>
			<description><![CDATA[Hello,<br />
<br />
When running a script on a Debian 12 host that worked fine on Windows, a method is missing because a module is older on Debian.<br />
<br />
What is the recommended fix?<br />
<br />
If possible, I prefer to use apt to install packages as an easy way to upgrade things.<br />
<br />
Thank you.<br />
<br />
<pre class="brush: python" title="Python Code:">Traceback (most recent call last):
  File "blah.py", line 34, in create_events_list
    for event in gcal.events:
                 ^^^^^^^^^^^
AttributeError: 'Calendar' object has no attribute 'events'

#Linux
~# pip freeze | grep icalendar
icalendar==4.0.3
~# python3 --version
Python 3.11.2

#Windows
pip freeze | findstr icalendar
icalendar==6.3.2
py --version
Python 3.13.2

apt install python3-icalendar --upgrade
python3-icalendar is already the newest version (4.0.3-5).
0 upgraded, 0 newly installed, 0 to remove and 0 not upgraded.</pre>]]></description>
			<content:encoded><![CDATA[Hello,<br />
<br />
When running a script on a Debian 12 host that worked fine on Windows, a method is missing because a module is older on Debian.<br />
<br />
What is the recommended fix?<br />
<br />
If possible, I prefer to use apt to install packages as an easy way to upgrade things.<br />
<br />
Thank you.<br />
<br />
<pre class="brush: python" title="Python Code:">Traceback (most recent call last):
  File "blah.py", line 34, in create_events_list
    for event in gcal.events:
                 ^^^^^^^^^^^
AttributeError: 'Calendar' object has no attribute 'events'

#Linux
~# pip freeze | grep icalendar
icalendar==4.0.3
~# python3 --version
Python 3.11.2

#Windows
pip freeze | findstr icalendar
icalendar==6.3.2
py --version
Python 3.13.2

apt install python3-icalendar --upgrade
python3-icalendar is already the newest version (4.0.3-5).
0 upgraded, 0 newly installed, 0 to remove and 0 not upgraded.</pre>]]></content:encoded>
		</item>
		<item>
			<title><![CDATA[Creating GUI with Python]]></title>
			<link>https://python-forum.io/thread-46270.html</link>
			<pubDate>Sat, 18 Apr 2026 11:29:50 +0000</pubDate>
			<dc:creator><![CDATA[<a href="https://python-forum.io/member.php?action=profile&uid=43828">pythontest</a>]]></dc:creator>
			<guid isPermaLink="false">https://python-forum.io/thread-46270.html</guid>
			<description><![CDATA[First time i have started using Python, my requirement is i need to create a GUI with same parameters which a user can enter. Once they enter the data has to be sent to serial port device USB to RS485, I have downloaded python and installed. Few questions for me to get started<br />
1. Which python editor is recommended to start developing the software?<br />
2. What are the other software's required to be installed to finish project?<br />
<br />
Thank you in advance,]]></description>
			<content:encoded><![CDATA[First time i have started using Python, my requirement is i need to create a GUI with same parameters which a user can enter. Once they enter the data has to be sent to serial port device USB to RS485, I have downloaded python and installed. Few questions for me to get started<br />
1. Which python editor is recommended to start developing the software?<br />
2. What are the other software's required to be installed to finish project?<br />
<br />
Thank you in advance,]]></content:encoded>
		</item>
		<item>
			<title><![CDATA[pandas  values from prior row]]></title>
			<link>https://python-forum.io/thread-46268.html</link>
			<pubDate>Thu, 16 Apr 2026 19:40:48 +0000</pubDate>
			<dc:creator><![CDATA[<a href="https://python-forum.io/member.php?action=profile&uid=43823">kevind0718</a>]]></dc:creator>
			<guid isPermaLink="false">https://python-forum.io/thread-46268.html</guid>
			<description><![CDATA[Hello:<br />
<br />
I received a list for a user that needs to be reformatted.  Need to pluck data out of prior rows.<br />
<br />
This code demonstrates the issue:<br />
<br />
<pre class="brush: python" title="Python Code:">import  numpy as np
import  os
import  pandas as pd


df  = pd.DataFrame({'Make/Model': ['FORD', 'BRONCO', 'MAVERICK', 'MUSTANG', 'HONDA', 'CIVIC', 'CR-V', 'HYUNDAI', 'TUCSON', 'ELANTRA', 'TOYOTA', 'CAMERY', 'AVALON'],
        'firstYear': [np.nan, 1965.0, 1970.0, 1964.0, np.nan, 1972.0, 1995.0, np.nan, 2004.0, 1990.0, np.nan, 1982.0, 2000.0],
        'style': [np.nan, 'suv', 'compact', 'sport', np.nan, 'compact', 'suv', np.nan, 'suv', 'compact', np.nan, 'sedan', 'sedan'],
        'number doors': [np.nan, '2, 4', '2,4 ', '2', np.nan, '2, 4', '4', np.nan, '4', '4', np.nan, '4', '4'], 
         'manufacturor': [np.nan, np.nan, np.nan, np.nan, np.nan, np.nan, np.nan, np.nan, np.nan, np.nan, np.nan, np.nan, np.nan]})



df['manufacturor']   =  np.where(df['firstYear'].shift(1).isna()  &amp;  df['style'].shift(1).isna()    == True  ,
                                   df['Make/Model'].shift(1),  df['manufacturor'].shift(1)   )</pre>The Make/Model attribute contains two data types, need to add the Make to the Model rows.<br />
<br />
The above code only adds the Make to the first row of the models.  <br />
manufacturor need to be populated for all model rows.<br />
<br />
Know I am close, but do not know how to proceed.<br />
<br />
<br />
<br />
Thanks for your attention to this matter.<br />
<br />
<br />
KD]]></description>
			<content:encoded><![CDATA[Hello:<br />
<br />
I received a list for a user that needs to be reformatted.  Need to pluck data out of prior rows.<br />
<br />
This code demonstrates the issue:<br />
<br />
<pre class="brush: python" title="Python Code:">import  numpy as np
import  os
import  pandas as pd


df  = pd.DataFrame({'Make/Model': ['FORD', 'BRONCO', 'MAVERICK', 'MUSTANG', 'HONDA', 'CIVIC', 'CR-V', 'HYUNDAI', 'TUCSON', 'ELANTRA', 'TOYOTA', 'CAMERY', 'AVALON'],
        'firstYear': [np.nan, 1965.0, 1970.0, 1964.0, np.nan, 1972.0, 1995.0, np.nan, 2004.0, 1990.0, np.nan, 1982.0, 2000.0],
        'style': [np.nan, 'suv', 'compact', 'sport', np.nan, 'compact', 'suv', np.nan, 'suv', 'compact', np.nan, 'sedan', 'sedan'],
        'number doors': [np.nan, '2, 4', '2,4 ', '2', np.nan, '2, 4', '4', np.nan, '4', '4', np.nan, '4', '4'], 
         'manufacturor': [np.nan, np.nan, np.nan, np.nan, np.nan, np.nan, np.nan, np.nan, np.nan, np.nan, np.nan, np.nan, np.nan]})



df['manufacturor']   =  np.where(df['firstYear'].shift(1).isna()  &amp;  df['style'].shift(1).isna()    == True  ,
                                   df['Make/Model'].shift(1),  df['manufacturor'].shift(1)   )</pre>The Make/Model attribute contains two data types, need to add the Make to the Model rows.<br />
<br />
The above code only adds the Make to the first row of the models.  <br />
manufacturor need to be populated for all model rows.<br />
<br />
Know I am close, but do not know how to proceed.<br />
<br />
<br />
<br />
Thanks for your attention to this matter.<br />
<br />
<br />
KD]]></content:encoded>
		</item>
		<item>
			<title><![CDATA[YATL — API testing in a new way]]></title>
			<link>https://python-forum.io/thread-46265.html</link>
			<pubDate>Wed, 15 Apr 2026 08:38:39 +0000</pubDate>
			<dc:creator><![CDATA[<a href="https://python-forum.io/member.php?action=profile&uid=43816">khabib73</a>]]></dc:creator>
			<guid isPermaLink="false">https://python-forum.io/thread-46265.html</guid>
			<description><![CDATA[Today I want to share a fresh look at API testing automation. I think that writing integration tests for modern APIs has become unnecessarily cumbersome. Even when using simple tools, we are drowning in boilerplate code: session setup, exception handling, serialization/deserialization. My idea is to turn the tests into pure data, making them declarative and understandable to any member of the team. If you’ve ever done API testing (especially in microservice architecture), you’ve probably come across this familiar list of problems: <br />
<br />
1. You need to write code — even for the simplest GET request, you have to import libraries, configure the client, and catch timeouts.<br />
2. High entry threshold — to write a test, you need to know Python (Java, Go, JS) at a level sufficient for debugging asynchronous calls and working with JSON schemas.<br />
3. Complex dependencies — when one request uses data from another’s response (for example, POST /login → token, then GET/users/me), the code turns into a tangle of callbacks or async/await.<br />
4. It’s hard to maintain — after a couple of months, the test scripts become “noodles” that even the authors are afraid to touch.<br />
<br />
And most importantly, not everyone in the team is a developer. QA engineers, analysts, technical staff, and sometimes product managers want to test the API, but they don’t want (and shouldn’t) learn all the intricacies of imperative programming.<br />
<br />
<span style="font-size: medium;" class="mycode_size"><span style="font-weight: bold;" class="mycode_b">YATL (Yet Another testing language)</span></span><br />
<br />
<span style="font-weight: bold;" class="mycode_b">YATL</span> is a DSL (<span style="font-style: italic;" class="mycode_i">domain-specific language</span>) written in Python for API testing. It uses YAML as the test description language. But this is not just another framework for developers, but a tool that democratizes API testing, making it accessible to the entire team.<br />
<br />
<blockquote class="mycode_quote"><cite>Quote:</cite> If you know HTTP and YAML, you know YATL.</blockquote>
<br />
Instead of writing imperative code, you declaratively describe the tests in YAML files. Let’s take a look at what the simplest test for verifying a GET request looks like.:<br />
<br />
<br />
<pre class="brush: python" title="Python Code:">name: ping
base_url: google.com

steps:
- name: access_test
  request:
    method: GET
  expect:
    status: 200

- name: failed_test
  request:
    method: GET
    url: /not_found
  expect:
    status: 404</pre>There is no hidden “magic”: only the request (method, URL) and the expected response (status, partial body check). This is much closer to the specification than to the script. Each file has the .test extension.yaml is a test script consisting of several steps.<br />
<br />
<span style="font-size: medium;" class="mycode_size"><span style="font-weight: bold;" class="mycode_b">Key features</span></span><br />
Now let’s move on to the key features and features of the project.:<br />
Support for all data formats. YATL understands most of the content-types that you’ll encounter in real life out of the box.:<br />
<br />
1. JSON — automatically sets the Content-Type: application/json<br />
2. XML — for SOAP and other XML APIs<br />
3. Form-data — application/x-www-form-urlencoded<br />
4. Multipart files — uploading files<br />
5. Plain text — text/plain<br />
<br />
Example with JSON:<br />
<br />
<pre class="brush: python" title="Python Code:">request:
  method: POST
  url: /users
  body:
    json:
      name: "John Doe"
      email: "doe@example.com"</pre><span style="font-size: medium;" class="mycode_size"><span style="font-weight: bold;" class="mycode_b">Data extraction and templating</span></span><br />
<br />
You can extract any fragments from the responses and use them in subsequent queries through powerful Jinja2 templates. Dot notation is used to access nested JSON fields (for example, user.info.name ), which makes the syntax clean and concise: Imagine a chain:<br />
<br />
<pre class="brush: python" title="Python Code:">steps:
  - name: user_creation
    request:
      method: POST
      url: /users
      body:
        json:
          name: "Alice"
    expect:
      status: 200
    extract:
      user_id: "response.id"  # Extracting the ID from the response

  - name: gettin_user
    request:
      method: GET
      url: /users/{{ user_id }}   # Using the extracted ID
    expect:
      status: 200
      json:
        user.info.name: "Alice" # name verification, using dot notation</pre><span style="font-size: medium;" class="mycode_size"><span style="font-weight: bold;" class="mycode_b">Parallel execution</span></span><br />
<br />
YATL runs tests in 10 threads by default (the value can be configured via <code class="icode">--workers</code>). This dramatically speeds up the run of large test suites, for example, for regression.<br />
<br />
<span style="font-size: medium;" class="mycode_size"><span style="font-weight: bold;" class="mycode_b">Skipping tests and steps</span></span><br />
<br />
In real-world development, you often need to temporarily disable a test without deleting it. YATL supports the skip mechanism:<br />
<br />
<pre class="brush: python" title="Python Code:">name: test in development
skip: true  # test will be skipped</pre>You can also skip a separate step:<br />
<br />
<pre class="brush: python" title="Python Code:">steps:
  - name: active_step
    request: ...

  - name: skipped_step
    skip: true  # Only this step is skipped
    request: ...</pre><span style="font-size: medium;" class="mycode_size"><span style="font-weight: bold;" class="mycode_b">Data validation</span></span><br />
<br />
<pre class="brush: python" title="Python Code:">name: Extended Validation Example
steps:
  - name: user_data_test
    request:
      method: GET
      url: /api/users/123
    expect:
      status: 200 
      validate:
        - compare: { path: "user.age", gt: 18 }         
        - compare: { path: "user.name", min_length: 3 }     
        - compare: { path: "user.email", regex: ".+@.+\\..+" } 
        - compare: { path: "items", type: "array", not_empty: true } </pre><span style="font-size: medium;" class="mycode_size"><span style="font-weight: bold;" class="mycode_b">CI/CD integration</span></span><br />
<br />
<pre class="brush: python" title="Python Code:">name: API Tests
on: [push, pull_request]

jobs:
  test:
    runs-on: ubuntu-latest
    steps:
      - uses: actions/checkout@v3
      - uses: actions/setup-python@v4
        with:
          python-version: '3.14'
      - run: pip install yatl
      - run: yatl tests/ --workers 5</pre>You can’t use the library yet (it hasn’t been published in PyPI yet), but you can give the project an star on GitHub and bookmark it today: <a href="https://github.com/Khabib73/YATL" target="_blank" rel="noopener" class="mycode_url">https://github.com/Khabib73/YATL</a><br />
<br />
This is the best way to say “thank you” and help us make the release faster. We plan to release the first stable version in 3–4 weeks.  <img src="https://python-forum.io/images/smilies/heart.png" alt="Heart" title="Heart" class="smilie smilie_16" /> <br />
<br />
YATL, on the other hand, offers a universal language for describing tests that is understandable to everyone on the team. An analyst can add a check for a new endpoint without pulling the developer. QA engineer — reuse variables and schemas. And the developer can quickly debug a crash test just by looking at YAML.]]></description>
			<content:encoded><![CDATA[Today I want to share a fresh look at API testing automation. I think that writing integration tests for modern APIs has become unnecessarily cumbersome. Even when using simple tools, we are drowning in boilerplate code: session setup, exception handling, serialization/deserialization. My idea is to turn the tests into pure data, making them declarative and understandable to any member of the team. If you’ve ever done API testing (especially in microservice architecture), you’ve probably come across this familiar list of problems: <br />
<br />
1. You need to write code — even for the simplest GET request, you have to import libraries, configure the client, and catch timeouts.<br />
2. High entry threshold — to write a test, you need to know Python (Java, Go, JS) at a level sufficient for debugging asynchronous calls and working with JSON schemas.<br />
3. Complex dependencies — when one request uses data from another’s response (for example, POST /login → token, then GET/users/me), the code turns into a tangle of callbacks or async/await.<br />
4. It’s hard to maintain — after a couple of months, the test scripts become “noodles” that even the authors are afraid to touch.<br />
<br />
And most importantly, not everyone in the team is a developer. QA engineers, analysts, technical staff, and sometimes product managers want to test the API, but they don’t want (and shouldn’t) learn all the intricacies of imperative programming.<br />
<br />
<span style="font-size: medium;" class="mycode_size"><span style="font-weight: bold;" class="mycode_b">YATL (Yet Another testing language)</span></span><br />
<br />
<span style="font-weight: bold;" class="mycode_b">YATL</span> is a DSL (<span style="font-style: italic;" class="mycode_i">domain-specific language</span>) written in Python for API testing. It uses YAML as the test description language. But this is not just another framework for developers, but a tool that democratizes API testing, making it accessible to the entire team.<br />
<br />
<blockquote class="mycode_quote"><cite>Quote:</cite> If you know HTTP and YAML, you know YATL.</blockquote>
<br />
Instead of writing imperative code, you declaratively describe the tests in YAML files. Let’s take a look at what the simplest test for verifying a GET request looks like.:<br />
<br />
<br />
<pre class="brush: python" title="Python Code:">name: ping
base_url: google.com

steps:
- name: access_test
  request:
    method: GET
  expect:
    status: 200

- name: failed_test
  request:
    method: GET
    url: /not_found
  expect:
    status: 404</pre>There is no hidden “magic”: only the request (method, URL) and the expected response (status, partial body check). This is much closer to the specification than to the script. Each file has the .test extension.yaml is a test script consisting of several steps.<br />
<br />
<span style="font-size: medium;" class="mycode_size"><span style="font-weight: bold;" class="mycode_b">Key features</span></span><br />
Now let’s move on to the key features and features of the project.:<br />
Support for all data formats. YATL understands most of the content-types that you’ll encounter in real life out of the box.:<br />
<br />
1. JSON — automatically sets the Content-Type: application/json<br />
2. XML — for SOAP and other XML APIs<br />
3. Form-data — application/x-www-form-urlencoded<br />
4. Multipart files — uploading files<br />
5. Plain text — text/plain<br />
<br />
Example with JSON:<br />
<br />
<pre class="brush: python" title="Python Code:">request:
  method: POST
  url: /users
  body:
    json:
      name: "John Doe"
      email: "doe@example.com"</pre><span style="font-size: medium;" class="mycode_size"><span style="font-weight: bold;" class="mycode_b">Data extraction and templating</span></span><br />
<br />
You can extract any fragments from the responses and use them in subsequent queries through powerful Jinja2 templates. Dot notation is used to access nested JSON fields (for example, user.info.name ), which makes the syntax clean and concise: Imagine a chain:<br />
<br />
<pre class="brush: python" title="Python Code:">steps:
  - name: user_creation
    request:
      method: POST
      url: /users
      body:
        json:
          name: "Alice"
    expect:
      status: 200
    extract:
      user_id: "response.id"  # Extracting the ID from the response

  - name: gettin_user
    request:
      method: GET
      url: /users/{{ user_id }}   # Using the extracted ID
    expect:
      status: 200
      json:
        user.info.name: "Alice" # name verification, using dot notation</pre><span style="font-size: medium;" class="mycode_size"><span style="font-weight: bold;" class="mycode_b">Parallel execution</span></span><br />
<br />
YATL runs tests in 10 threads by default (the value can be configured via <code class="icode">--workers</code>). This dramatically speeds up the run of large test suites, for example, for regression.<br />
<br />
<span style="font-size: medium;" class="mycode_size"><span style="font-weight: bold;" class="mycode_b">Skipping tests and steps</span></span><br />
<br />
In real-world development, you often need to temporarily disable a test without deleting it. YATL supports the skip mechanism:<br />
<br />
<pre class="brush: python" title="Python Code:">name: test in development
skip: true  # test will be skipped</pre>You can also skip a separate step:<br />
<br />
<pre class="brush: python" title="Python Code:">steps:
  - name: active_step
    request: ...

  - name: skipped_step
    skip: true  # Only this step is skipped
    request: ...</pre><span style="font-size: medium;" class="mycode_size"><span style="font-weight: bold;" class="mycode_b">Data validation</span></span><br />
<br />
<pre class="brush: python" title="Python Code:">name: Extended Validation Example
steps:
  - name: user_data_test
    request:
      method: GET
      url: /api/users/123
    expect:
      status: 200 
      validate:
        - compare: { path: "user.age", gt: 18 }         
        - compare: { path: "user.name", min_length: 3 }     
        - compare: { path: "user.email", regex: ".+@.+\\..+" } 
        - compare: { path: "items", type: "array", not_empty: true } </pre><span style="font-size: medium;" class="mycode_size"><span style="font-weight: bold;" class="mycode_b">CI/CD integration</span></span><br />
<br />
<pre class="brush: python" title="Python Code:">name: API Tests
on: [push, pull_request]

jobs:
  test:
    runs-on: ubuntu-latest
    steps:
      - uses: actions/checkout@v3
      - uses: actions/setup-python@v4
        with:
          python-version: '3.14'
      - run: pip install yatl
      - run: yatl tests/ --workers 5</pre>You can’t use the library yet (it hasn’t been published in PyPI yet), but you can give the project an star on GitHub and bookmark it today: <a href="https://github.com/Khabib73/YATL" target="_blank" rel="noopener" class="mycode_url">https://github.com/Khabib73/YATL</a><br />
<br />
This is the best way to say “thank you” and help us make the release faster. We plan to release the first stable version in 3–4 weeks.  <img src="https://python-forum.io/images/smilies/heart.png" alt="Heart" title="Heart" class="smilie smilie_16" /> <br />
<br />
YATL, on the other hand, offers a universal language for describing tests that is understandable to everyone on the team. An analyst can add a check for a new endpoint without pulling the developer. QA engineer — reuse variables and schemas. And the developer can quickly debug a crash test just by looking at YAML.]]></content:encoded>
		</item>
		<item>
			<title><![CDATA[Trying to use pyinstaller results in a syntax error]]></title>
			<link>https://python-forum.io/thread-46264.html</link>
			<pubDate>Wed, 15 Apr 2026 02:29:16 +0000</pubDate>
			<dc:creator><![CDATA[<a href="https://python-forum.io/member.php?action=profile&uid=43817">mark1969</a>]]></dc:creator>
			<guid isPermaLink="false">https://python-forum.io/thread-46264.html</guid>
			<description><![CDATA[I managed to install Pyinstaller although it did say on installation.<br />
<br />
---<br />
<pre><code class="codeblock output"><div class="title">Output:</div>&gt; WARNING: The scripts pyi-archive_viewer. exe, pyi-bindepend. exe, pyi-grab_version. exe, pyi-makespec. exe, pyi-set_version. exe and pyinstaller. exe are installed in 'C:\\Users\\maben\\AppData\\Local\\Programs\\Python\\Python312\\Scripts' which is not on PATH.
&gt; 
&gt; Consider adding this directory to PATH or, if you prefer to suppress this warning, use -- no-warn-script-location.'</code></pre>---<br />
<br />
I've tried playing around with PATH (including adding the path outlined in the warning above) but haven't had much success.<br />
<br />
In a .py file I'm trying to run:<br />
<br />
-----<br />
<pre class="brush: python" title="Python Code:">import pyinstaller

pyinstaller "C:\Python\w3\factorial_play.py"</pre>-----<br />
<br />
But I always get an 'Invalid Syntax' alert, highlighted on the first double quotes. I've previously tried just putting the file name in, but would then get it highlighted on the 'f' of 'factorial_play.py'.]]></description>
			<content:encoded><![CDATA[I managed to install Pyinstaller although it did say on installation.<br />
<br />
---<br />
<pre><code class="codeblock output"><div class="title">Output:</div>&gt; WARNING: The scripts pyi-archive_viewer. exe, pyi-bindepend. exe, pyi-grab_version. exe, pyi-makespec. exe, pyi-set_version. exe and pyinstaller. exe are installed in 'C:\\Users\\maben\\AppData\\Local\\Programs\\Python\\Python312\\Scripts' which is not on PATH.
&gt; 
&gt; Consider adding this directory to PATH or, if you prefer to suppress this warning, use -- no-warn-script-location.'</code></pre>---<br />
<br />
I've tried playing around with PATH (including adding the path outlined in the warning above) but haven't had much success.<br />
<br />
In a .py file I'm trying to run:<br />
<br />
-----<br />
<pre class="brush: python" title="Python Code:">import pyinstaller

pyinstaller "C:\Python\w3\factorial_play.py"</pre>-----<br />
<br />
But I always get an 'Invalid Syntax' alert, highlighted on the first double quotes. I've previously tried just putting the file name in, but would then get it highlighted on the 'f' of 'factorial_play.py'.]]></content:encoded>
		</item>
		<item>
			<title><![CDATA[Passing Openssl to Python 2.7]]></title>
			<link>https://python-forum.io/thread-46263.html</link>
			<pubDate>Tue, 14 Apr 2026 14:42:53 +0000</pubDate>
			<dc:creator><![CDATA[<a href="https://python-forum.io/member.php?action=profile&uid=43815">kromak</a>]]></dc:creator>
			<guid isPermaLink="false">https://python-forum.io/thread-46263.html</guid>
			<description><![CDATA[Hi. I have been trying to build a certain program that requires Python 2.7 (Palemoon)<br />
However, the building process fails right at its start. Some of the initial error messages:<br />
<br />
<br />
<pre><code class="codeblock error"><div class="title">Error:</div>ERROR:root:code for hash sha224 was not found.
Traceback (most recent call last):
  File "/arquivos/Python-2.7.18/lib/python2.7/hashlib.py", line 147, in &lt;module&gt;
    globals()[__func_name] = __get_hash(__func_name)
  File "/arquivos/Python-2.7.18/lib/python2.7/hashlib.py", line 97, in __get_builtin_constructor
    raise ValueError('unsupported hash type ' + name)
ValueError: unsupported hash type sha224
ERROR:root:code for hash sha256 was not found.
Traceback (most recent call last):
  File "/arquivos/Python-2.7.18/lib/python2.7/hashlib.py", line 147, in &lt;module&gt;
    globals()[__func_name] = __get_hash(__func_name)
  File "/arquivos/Python-2.7.18/lib/python2.7/hashlib.py", line 97, in __get_builtin_constructor
    raise ValueError('unsupported hash type ' + name)
ValueError: unsupported hash type sha256
ERROR:root:code for hash sha384 was not found.</code></pre>This seems to be because Python was built without support of ssl.<br />
So I would like to pass my Openssl to Python. However I did not find a way to do so. There are "--with-tcltk-includes" and "--with-tcltk-libs" options in configure, but they do not seem to work. (the test <blockquote class="mycode_quote"><cite>Quote:</cite>checking for t_open in -lnsl</blockquote>
 did not work when I passed the corresponding dirs of my Openssl installation)]]></description>
			<content:encoded><![CDATA[Hi. I have been trying to build a certain program that requires Python 2.7 (Palemoon)<br />
However, the building process fails right at its start. Some of the initial error messages:<br />
<br />
<br />
<pre><code class="codeblock error"><div class="title">Error:</div>ERROR:root:code for hash sha224 was not found.
Traceback (most recent call last):
  File "/arquivos/Python-2.7.18/lib/python2.7/hashlib.py", line 147, in &lt;module&gt;
    globals()[__func_name] = __get_hash(__func_name)
  File "/arquivos/Python-2.7.18/lib/python2.7/hashlib.py", line 97, in __get_builtin_constructor
    raise ValueError('unsupported hash type ' + name)
ValueError: unsupported hash type sha224
ERROR:root:code for hash sha256 was not found.
Traceback (most recent call last):
  File "/arquivos/Python-2.7.18/lib/python2.7/hashlib.py", line 147, in &lt;module&gt;
    globals()[__func_name] = __get_hash(__func_name)
  File "/arquivos/Python-2.7.18/lib/python2.7/hashlib.py", line 97, in __get_builtin_constructor
    raise ValueError('unsupported hash type ' + name)
ValueError: unsupported hash type sha256
ERROR:root:code for hash sha384 was not found.</code></pre>This seems to be because Python was built without support of ssl.<br />
So I would like to pass my Openssl to Python. However I did not find a way to do so. There are "--with-tcltk-includes" and "--with-tcltk-libs" options in configure, but they do not seem to work. (the test <blockquote class="mycode_quote"><cite>Quote:</cite>checking for t_open in -lnsl</blockquote>
 did not work when I passed the corresponding dirs of my Openssl installation)]]></content:encoded>
		</item>
		<item>
			<title><![CDATA[assign value of index to variable]]></title>
			<link>https://python-forum.io/thread-46261.html</link>
			<pubDate>Mon, 13 Apr 2026 11:52:01 +0000</pubDate>
			<dc:creator><![CDATA[<a href="https://python-forum.io/member.php?action=profile&uid=43812">zoor29</a>]]></dc:creator>
			<guid isPermaLink="false">https://python-forum.io/thread-46261.html</guid>
			<description><![CDATA[Hi all,<br />
<br />
I'm new to python. However I've stumbled upon a problem I find difficult to grasp.<br />
<br />
The code below search for a an index number where the "index_y1" is below 15000 (value of G_SP).<br />
<br />
<pre class="brush: python" title="Python Code:">		
print(seB10)
print(type(seB10))</pre>gives the output:<br />
<pre><code class="codeblock output"><div class="title">Output:</div>[5.07730000e+01 1.19360515e+02 3.17627965e+02 9.71699674e+02
 3.47047540e+03 1.46524911e+04 7.31078184e+04 4.21952327e+05
 2.76727749e+06 1.41841611e+07]
&lt;class 'numpy.ndarray'&gt;</code></pre><pre class="brush: python" title="Python Code:">print(G_SP)
print(type(G_SP))</pre>gives the output<br />
<br />
<pre><code class="codeblock output"><div class="title">Output:</div>15000
&lt;class 'int'&gt;</code></pre>I've browsed the net and tried the max() and min() function without success. I stumbled upon this solution which did the trick... <br />
<br />
		<pre class="brush: python" title="Python Code:">index_y1 = np.where(seB10&lt;abs(G_SP))

		index_y1 = sorted(index_y1) #Closest below 15
		index_y1 = index_y1[-1]
		print(index_y1[-1])
		print(index_y1)
		print(type(index_y1[-1]))
		print(type(index_y1))</pre>this leads to the output:<br />
<br />
<pre><code class="codeblock output"><div class="title">Output:</div>5
[0 1 2 3 4 5]
&lt;class 'numpy.int64'&gt;
&lt;class 'numpy.ndarray'&gt;</code></pre>however... I don't seem to be able to assign the index value obtained to a variable.<br />
<br />
Why does not this work: <br />
<br />
<blockquote class="mycode_quote"><cite>Quote:</cite>index_y1 = index_y1[-1]</blockquote>
 ?<br />
<br />
and why does this code<br />
<br />
<pre class="brush: python" title="Python Code:">index_y1=max(index_y1)
print(index_y1)
print(type(index_y1))</pre>give this output<br />
<br />
<pre><code class="codeblock output"><div class="title">Output:</div>[0 1 2 3 4 5]
&lt;class 'numpy.ndarray'&gt;</code></pre>Why does it not just give me the answer "5"?<br />
<br />
[0 1 2 3 4 5]<br />
&lt;class 'numpy.ndarray'&gt;]]></description>
			<content:encoded><![CDATA[Hi all,<br />
<br />
I'm new to python. However I've stumbled upon a problem I find difficult to grasp.<br />
<br />
The code below search for a an index number where the "index_y1" is below 15000 (value of G_SP).<br />
<br />
<pre class="brush: python" title="Python Code:">		
print(seB10)
print(type(seB10))</pre>gives the output:<br />
<pre><code class="codeblock output"><div class="title">Output:</div>[5.07730000e+01 1.19360515e+02 3.17627965e+02 9.71699674e+02
 3.47047540e+03 1.46524911e+04 7.31078184e+04 4.21952327e+05
 2.76727749e+06 1.41841611e+07]
&lt;class 'numpy.ndarray'&gt;</code></pre><pre class="brush: python" title="Python Code:">print(G_SP)
print(type(G_SP))</pre>gives the output<br />
<br />
<pre><code class="codeblock output"><div class="title">Output:</div>15000
&lt;class 'int'&gt;</code></pre>I've browsed the net and tried the max() and min() function without success. I stumbled upon this solution which did the trick... <br />
<br />
		<pre class="brush: python" title="Python Code:">index_y1 = np.where(seB10&lt;abs(G_SP))

		index_y1 = sorted(index_y1) #Closest below 15
		index_y1 = index_y1[-1]
		print(index_y1[-1])
		print(index_y1)
		print(type(index_y1[-1]))
		print(type(index_y1))</pre>this leads to the output:<br />
<br />
<pre><code class="codeblock output"><div class="title">Output:</div>5
[0 1 2 3 4 5]
&lt;class 'numpy.int64'&gt;
&lt;class 'numpy.ndarray'&gt;</code></pre>however... I don't seem to be able to assign the index value obtained to a variable.<br />
<br />
Why does not this work: <br />
<br />
<blockquote class="mycode_quote"><cite>Quote:</cite>index_y1 = index_y1[-1]</blockquote>
 ?<br />
<br />
and why does this code<br />
<br />
<pre class="brush: python" title="Python Code:">index_y1=max(index_y1)
print(index_y1)
print(type(index_y1))</pre>give this output<br />
<br />
<pre><code class="codeblock output"><div class="title">Output:</div>[0 1 2 3 4 5]
&lt;class 'numpy.ndarray'&gt;</code></pre>Why does it not just give me the answer "5"?<br />
<br />
[0 1 2 3 4 5]<br />
&lt;class 'numpy.ndarray'&gt;]]></content:encoded>
		</item>
		<item>
			<title><![CDATA[Help with local RAG pipeline – poor retrieval quality, wrong page numbers]]></title>
			<link>https://python-forum.io/thread-46260.html</link>
			<pubDate>Sun, 12 Apr 2026 17:29:34 +0000</pubDate>
			<dc:creator><![CDATA[<a href="https://python-forum.io/member.php?action=profile&uid=38022">IchNar</a>]]></dc:creator>
			<guid isPermaLink="false">https://python-forum.io/thread-46260.html</guid>
			<description><![CDATA[Hi everyone,<br />
<br />
I'm building a fully local RAG application in Python (no cloud APIs) and running into several persistent issues. I'll pin the full source below. Would really appreciate any advice from people who've dealt with similar setups.<br />
<br />
---<br />
<br />
### Stack overview<br />
<br />
- **LLM:** Qwen2.5:7b via Ollama<br />
<br />
- **Embeddings:** <code class="icode">intfloat/multilingual-e5-base</code> (HuggingFace, offline)<br />
<br />
- **Vector store:** FAISS (child chunks) + BM25 (via LangChain)<br />
<br />
- **Reranker:** <code class="icode">cross-encoder/mmarco-mMiniLMv2-L12-H384-v1</code><br />
<br />
- **Chunking:** Parent-child strategy – MarkdownHeaderTextSplitter for parents, RecursiveCharacterTextSplitter for children<br />
<br />
- **PDF extraction:** pymupdf4llm (fast) or MinerU (slow, for LaTeX-heavy docs)<br />
<br />
- **Pipeline:** LangGraph with nodes: pre-retrieval → hybrid retrieve → rerank → build context → evaluate evidence → generate<br />
<br />
- **UI:** Streamlit<br />
<br />
Documents are primarily English-language academic PDFs (e.g. Montgomery's Design and Analysis of Experiments, 720 pages). User queries are always in Slovak.<br />
<br />
---<br />
<br />
### Problem 1 – Cross-lingual retrieval failure (SK query → EN document)<br />
<br />
This is the most painful issue. When a user asks *"čo to je replikácia?"* ("what is replication?"), the FAISS similarity search returns completely irrelevant chunks (confidence ~0.045) even though the word "replication" appears many times in the document.<br />
<br />
My current workaround:<br />
<br />
Detect document language via <code class="icode">langdetect</code><br />
<br />
If EN document detected, translate the SK query to EN using the LLM before retrieval<br />
<br />
Use the translated query in both FAISS and BM25<br />
<br />
This partially works but is inconsistent – sometimes the LLM translates to "What is replication?", sometimes it doesn't, so results are non-deterministic even at temperature=0.<br />
<br />
I also added a rescue BM25 search in <code class="icode">evaluate_evidence</code> as a last resort, which helps but retrieves chunks from wrong pages (e.g. page 424 instead of page 13 where the definition actually is).<br />
<br />
**Questions:**<br />
<br />
- Is <code class="icode">multilingual-e5-base</code> simply too weak for SK↔EN cross-lingual retrieval? Should I switch to a different model (e.g. <code class="icode">intfloat/multilingual-e5-large</code>, <code class="icode">BAAI/bge-m3</code>, or a dedicated cross-lingual model)?<br />
<br />
- Is there a better approach than LLM-based query translation? I considered expanding the index with translated chunks but haven't implemented it yet.<br />
<br />
- Any experience with <code class="icode">mmarco-mMiniLMv2</code> reranker for non-English content? I suspect it's poorly calibrated for Slovak and the confidence scores are systematically too low (~0.04 instead of expected ~0.3+).<br />
<br />
---<br />
<br />
### Problem 2 – Wrong page numbers in cited sources<br />
<br />
My chunker injects <code class="icode">&lt;!--PAGE:N--&gt;</code> markers into the markdown before chunking, then detects which page each chunk belongs to by matching text probes against page texts. The logic works reasonably for single-page chunks but breaks in two cases:<br />
<br />
**Large parents spanning multiple pages** – when <code class="icode">_split_large</code> splits them, all resulting chunks inherit the original parent's page metadata instead of getting re-detected page numbers.<br />
<br />
**Dense mathematical/formula-heavy pages** – probes (min 15 chars) often don't match because MinerU reformats LaTeX and the text doesn't align with the original page content.<br />
<br />
The cited pages are sometimes off by 5–15 pages which makes source verification impossible.<br />
<br />
**Questions:**<br />
<br />
- Is there a more reliable strategy for page attribution in RAG chunking?<br />
<br />
- Would embedding page number tokens directly into chunk text help BM25/FAISS associate chunks with correct pages?<br />
<br />
---<br />
<br />
### Problem 3 – Poor Slovak output quality<br />
<br />
The LLM (Qwen2.5:7b) receives English context and is instructed via system prompt to answer in Slovak. The output Slovak is grammatically broken – literal word-by-word translations, wrong declensions, invented compound words (e.g. "olejová hniloba" for "oil quench", "oholenie vzorku" for "quenching a specimen").<br />
<br />
Current system prompt instructs:<br />
<br />
- Always answer in Slovak<br />
<br />
- Don't translate literally, explain in your own words<br />
<br />
- Keep English technical terms in parentheses if unsure<br />
<br />
This helps somewhat but the quality is still poor for technical content.<br />
<br />
**Questions:**<br />
<br />
- Is Qwen2.5:7b simply not good enough for EN→SK technical translation in context? Would a larger model (Qwen2.5:14b, gemma3:12b) make a significant difference?<br />
<br />
- Has anyone tried a two-step approach: generate answer in English first, then translate to Slovak as a second LLM call?<br />
<br />
- Any prompt engineering tricks that worked for you for multilingual RAG output?<br />
<br />
---<br />
<br />
### Problem 4 – Reranker confidence threshold causes false abstentions<br />
<br />
The cross-encoder produces confidence scores around 0.04–0.07 for relevant Slovak/English pairs. My threshold is set to 0.15 (already lowered from original 0.32). At confidence below threshold, the system returns "not found in documents" even when the correct answer is there.<br />
<br />
I added a keyword override (check if query words appear in context docs) but it's unreliable for cross-lingual queries because Slovak words don't match English document text.<br />
<br />
### Code<br />
<br />
*(pinning below)*<br />
<br />
- <code class="icode">document_processor.py</code> – PDF extraction + parent-child chunking: <a href="https://pastebin.com/m8egQ7HY" target="_blank" rel="noopener" class="mycode_url">https://pastebin.com/m8egQ7HY</a><br />
<br />
- <code class="icode">vector_store.py</code> – FAISS + BM25 + E5Embeddings wrapper: <a href="https://pastebin.com/4kkhsg8M" target="_blank" rel="noopener" class="mycode_url">https://pastebin.com/4kkhsg8M</a><br />
<br />
- <code class="icode">rag_graph.py</code> – full LangGraph pipeline: <a href="https://pastebin.com/P31pGiie" target="_blank" rel="noopener" class="mycode_url">https://pastebin.com/P31pGiie</a><br />
<br />
- <code class="icode">parent_store.py</code> – <a href="https://pastebin.com/xwNeAMnE" target="_blank" rel="noopener" class="mycode_url">https://pastebin.com/xwNeAMnE</a>]]></description>
			<content:encoded><![CDATA[Hi everyone,<br />
<br />
I'm building a fully local RAG application in Python (no cloud APIs) and running into several persistent issues. I'll pin the full source below. Would really appreciate any advice from people who've dealt with similar setups.<br />
<br />
---<br />
<br />
### Stack overview<br />
<br />
- **LLM:** Qwen2.5:7b via Ollama<br />
<br />
- **Embeddings:** <code class="icode">intfloat/multilingual-e5-base</code> (HuggingFace, offline)<br />
<br />
- **Vector store:** FAISS (child chunks) + BM25 (via LangChain)<br />
<br />
- **Reranker:** <code class="icode">cross-encoder/mmarco-mMiniLMv2-L12-H384-v1</code><br />
<br />
- **Chunking:** Parent-child strategy – MarkdownHeaderTextSplitter for parents, RecursiveCharacterTextSplitter for children<br />
<br />
- **PDF extraction:** pymupdf4llm (fast) or MinerU (slow, for LaTeX-heavy docs)<br />
<br />
- **Pipeline:** LangGraph with nodes: pre-retrieval → hybrid retrieve → rerank → build context → evaluate evidence → generate<br />
<br />
- **UI:** Streamlit<br />
<br />
Documents are primarily English-language academic PDFs (e.g. Montgomery's Design and Analysis of Experiments, 720 pages). User queries are always in Slovak.<br />
<br />
---<br />
<br />
### Problem 1 – Cross-lingual retrieval failure (SK query → EN document)<br />
<br />
This is the most painful issue. When a user asks *"čo to je replikácia?"* ("what is replication?"), the FAISS similarity search returns completely irrelevant chunks (confidence ~0.045) even though the word "replication" appears many times in the document.<br />
<br />
My current workaround:<br />
<br />
Detect document language via <code class="icode">langdetect</code><br />
<br />
If EN document detected, translate the SK query to EN using the LLM before retrieval<br />
<br />
Use the translated query in both FAISS and BM25<br />
<br />
This partially works but is inconsistent – sometimes the LLM translates to "What is replication?", sometimes it doesn't, so results are non-deterministic even at temperature=0.<br />
<br />
I also added a rescue BM25 search in <code class="icode">evaluate_evidence</code> as a last resort, which helps but retrieves chunks from wrong pages (e.g. page 424 instead of page 13 where the definition actually is).<br />
<br />
**Questions:**<br />
<br />
- Is <code class="icode">multilingual-e5-base</code> simply too weak for SK↔EN cross-lingual retrieval? Should I switch to a different model (e.g. <code class="icode">intfloat/multilingual-e5-large</code>, <code class="icode">BAAI/bge-m3</code>, or a dedicated cross-lingual model)?<br />
<br />
- Is there a better approach than LLM-based query translation? I considered expanding the index with translated chunks but haven't implemented it yet.<br />
<br />
- Any experience with <code class="icode">mmarco-mMiniLMv2</code> reranker for non-English content? I suspect it's poorly calibrated for Slovak and the confidence scores are systematically too low (~0.04 instead of expected ~0.3+).<br />
<br />
---<br />
<br />
### Problem 2 – Wrong page numbers in cited sources<br />
<br />
My chunker injects <code class="icode">&lt;!--PAGE:N--&gt;</code> markers into the markdown before chunking, then detects which page each chunk belongs to by matching text probes against page texts. The logic works reasonably for single-page chunks but breaks in two cases:<br />
<br />
**Large parents spanning multiple pages** – when <code class="icode">_split_large</code> splits them, all resulting chunks inherit the original parent's page metadata instead of getting re-detected page numbers.<br />
<br />
**Dense mathematical/formula-heavy pages** – probes (min 15 chars) often don't match because MinerU reformats LaTeX and the text doesn't align with the original page content.<br />
<br />
The cited pages are sometimes off by 5–15 pages which makes source verification impossible.<br />
<br />
**Questions:**<br />
<br />
- Is there a more reliable strategy for page attribution in RAG chunking?<br />
<br />
- Would embedding page number tokens directly into chunk text help BM25/FAISS associate chunks with correct pages?<br />
<br />
---<br />
<br />
### Problem 3 – Poor Slovak output quality<br />
<br />
The LLM (Qwen2.5:7b) receives English context and is instructed via system prompt to answer in Slovak. The output Slovak is grammatically broken – literal word-by-word translations, wrong declensions, invented compound words (e.g. "olejová hniloba" for "oil quench", "oholenie vzorku" for "quenching a specimen").<br />
<br />
Current system prompt instructs:<br />
<br />
- Always answer in Slovak<br />
<br />
- Don't translate literally, explain in your own words<br />
<br />
- Keep English technical terms in parentheses if unsure<br />
<br />
This helps somewhat but the quality is still poor for technical content.<br />
<br />
**Questions:**<br />
<br />
- Is Qwen2.5:7b simply not good enough for EN→SK technical translation in context? Would a larger model (Qwen2.5:14b, gemma3:12b) make a significant difference?<br />
<br />
- Has anyone tried a two-step approach: generate answer in English first, then translate to Slovak as a second LLM call?<br />
<br />
- Any prompt engineering tricks that worked for you for multilingual RAG output?<br />
<br />
---<br />
<br />
### Problem 4 – Reranker confidence threshold causes false abstentions<br />
<br />
The cross-encoder produces confidence scores around 0.04–0.07 for relevant Slovak/English pairs. My threshold is set to 0.15 (already lowered from original 0.32). At confidence below threshold, the system returns "not found in documents" even when the correct answer is there.<br />
<br />
I added a keyword override (check if query words appear in context docs) but it's unreliable for cross-lingual queries because Slovak words don't match English document text.<br />
<br />
### Code<br />
<br />
*(pinning below)*<br />
<br />
- <code class="icode">document_processor.py</code> – PDF extraction + parent-child chunking: <a href="https://pastebin.com/m8egQ7HY" target="_blank" rel="noopener" class="mycode_url">https://pastebin.com/m8egQ7HY</a><br />
<br />
- <code class="icode">vector_store.py</code> – FAISS + BM25 + E5Embeddings wrapper: <a href="https://pastebin.com/4kkhsg8M" target="_blank" rel="noopener" class="mycode_url">https://pastebin.com/4kkhsg8M</a><br />
<br />
- <code class="icode">rag_graph.py</code> – full LangGraph pipeline: <a href="https://pastebin.com/P31pGiie" target="_blank" rel="noopener" class="mycode_url">https://pastebin.com/P31pGiie</a><br />
<br />
- <code class="icode">parent_store.py</code> – <a href="https://pastebin.com/xwNeAMnE" target="_blank" rel="noopener" class="mycode_url">https://pastebin.com/xwNeAMnE</a>]]></content:encoded>
		</item>
		<item>
			<title><![CDATA[What do you think about this code]]></title>
			<link>https://python-forum.io/thread-46258.html</link>
			<pubDate>Fri, 10 Apr 2026 03:51:43 +0000</pubDate>
			<dc:creator><![CDATA[<a href="https://python-forum.io/member.php?action=profile&uid=32330">kucingkembar</a>]]></dc:creator>
			<guid isPermaLink="false">https://python-forum.io/thread-46258.html</guid>
			<description><![CDATA[What do you think about this code :<br />
<pre class="brush: python" title="Python Code:">TheInput = input("Please Input the input : ")
#importing tons of stuff that take 30+ seconds load
#the code</pre>]]></description>
			<content:encoded><![CDATA[What do you think about this code :<br />
<pre class="brush: python" title="Python Code:">TheInput = input("Please Input the input : ")
#importing tons of stuff that take 30+ seconds load
#the code</pre>]]></content:encoded>
		</item>
		<item>
			<title><![CDATA[Django: variable translation]]></title>
			<link>https://python-forum.io/thread-46257.html</link>
			<pubDate>Thu, 09 Apr 2026 15:44:19 +0000</pubDate>
			<dc:creator><![CDATA[<a href="https://python-forum.io/member.php?action=profile&uid=39488">aecordoba</a>]]></dc:creator>
			<guid isPermaLink="false">https://python-forum.io/thread-46257.html</guid>
			<description><![CDATA[I'm using Django to develop a project. I can internationalize the templates easily.<br />
<br />
But I have to translate a field in a template to represent the gender of people. This gender is in a database so I show in a template as {{person.gender}}.<br />
<br />
How can I translate the value of the gender in the template,so, if I get 'male' from the database, the template display 'Hombre' for Spanish browsers?]]></description>
			<content:encoded><![CDATA[I'm using Django to develop a project. I can internationalize the templates easily.<br />
<br />
But I have to translate a field in a template to represent the gender of people. This gender is in a database so I show in a template as {{person.gender}}.<br />
<br />
How can I translate the value of the gender in the template,so, if I get 'male' from the database, the template display 'Hombre' for Spanish browsers?]]></content:encoded>
		</item>
	</channel>
</rss>