4
\$\begingroup\$

I'm new on Python, so I'm looking at these YouTube videos of Python projects to practice, and I saw this one of a Password Manager with encryption, but the encryption part couldnt use password, because the module used (cryptography.fernet) would need more modules to be able to use password. So I started trying to make an function to make this, and ended up with that function.

It is working basically, but i want advice about modules that i can use to get same output or simillar and dont need other modules to work fine.

i want to know other functions that i can change in my project to make it faster, since i dont know so much of then;

I want to know another way to store and encrypt the strings, since lists are bigger, and every caracter on the message is turning into an 3-times-larger information at least.

And, (if you can help), i need help to make the commented code part work:

all, except this first "caracters-scrambler" is working fine, and i cant understand what is the wrong function on that part, because it doesnt make an exception, and the outcome comes allmost like how it should when the password is right, but it is still scrambled. You can see it if you de-comment the commented code, i made this to keep the code still working right when you try it.

import random
def coddecod (senha: str, texto: str | list, modo = 'd', debug = False) -> list|str:
    '''String <=> List\n|
    debug (True / False) is an option that show more information about the process if marked as True\n
    texto (String / List) is the text to be Encrypted or Decrypted.\n
    modo ( 'c' / 'd' ) is how the function will be used, it can be 'c' (Encrypt) or 'd' (Decrypt).\n
    senha (String) is the password to be used to Encrypt/Decrypt (Needs to be the same on both sides).\n
    note: Not every keyboard key was added to this function, such "\" or "/" or "§"
    This function creates an list (like string) with some caracters of the password, then a code to the password with the list made, which is allways the same for the same password.\n
    Then checks to the mode:\n
        If 'c', Encrypts:\n
            It, randomlly, makes an string with 85 caracters, making 1 in 120 options, making to it an number (data) to be identfied.\n
            Then data * password-code to be "hiden"\n
            Adds the result to the text and Encrypts the text.\n
        If 'd', Decrypts:\n
            It re-makes the caracters-string used to Encrypt using the identifier number.\n
            And Decrypt using the password code and the string.\n
            If the password is REALLY wrong:\n
                It makes an random text.\n'''
    if debug == True:
        print(f"Password used: {senha}\nText used: {texto}\nMode: {modo}\nDebug: True")
    lista = ["0bm1!d2Mafgh3TtijkcheH4lnEou5pqUsr67Svwx8yz9BCDFGHIJKLNOPQRVWXYZ! |@#$%&*()_+-=[]°ºª^~`", 0, "", [], "", [0, 0, 0, 0, 0]]
    step = 0
    for _ in range(5):
        lista[3].append(lista[0][0 + step: 17 + step])
        step += 17
    if debug == True:
        print(f"blocks of string to be used: {lista[3]}")
    #needs some fix, de-comment this to see.
    '''
    for idx, char in enumerate(senha): #this is the code i need help. look at the '#string scrambler' to see an better code than this here with same objective.
        for step in range(len(lista[3])):
            if lista[5][step] == 1:
                continue
            for idx_fromstep, char_fromstep in enumerate(lista[3][step]):
                if char == ch and idx_fromstep <= 4:
                    lista[2] += "".join(lista[3][step])
                    lista[5][step] = 1
        if idx == len(senha) - 1 and lista[5][0] + lista[5][1] + lista[5][2] + lista[5][3] + lista[5][4] != 5:
            for i, ch in enumerate(lista[5]):
                if ch == 1:
                    continue
                else:
                    lista[5][i] = 1
                    lista[2] += "".join(lista[3][i])
    lista[5] = [0, 0, 0, 0, 0]
    if debug == True:
        print(f"list of the string scrambled by the password: {lista[2]}")'''
    for idx, char in enumerate("0" + senha):
        for i2, c2 in enumerate(lista[0]): #change to lista[2] when code be fixed
            if char == c2:
                lista[1] += i2 * idx
    if lista[1] == 0:
        lista[1] += len(senha)
    if debug == True:
        print(f"code made from password: {lista[1]}")
    if modo == 'c':
        if debug == True:
            print("mode: Encrypting")
        lista[2] = ""
        senha = []
        lista[4] += str(random.randint(1, 9))
        while True: #string scrambler
            teste2 = random.randint(0, 4)
            if lista[5][teste2] == 1:
                continue
            lista[2] += "".join(lista[3][teste2])
            lista[4] += str(teste2)
            lista[5][teste2] = 1
            if lista[5][0] + lista[5][1] + lista[5][2] + lista[5][3] + lista[5][4] == 5:
                break
        data = int(lista[4]) * lista[1]
        if debug == True:
            print(f"Code of the scambled caracters: {lista[4]}\nList of scrambled caracters: {lista[2]}")
        senha.append(data)
        for _, char in enumerate(texto):
            for idx, c2 in enumerate(lista[2]):
                if char == c2:
                    senha.append(idx * lista[1])
        return senha
    if modo == 'd':
        if debug == True:
            print("mode: Decrypting")
        s = []
        data = str(texto[0] // lista[1])
        if debug == True:
            print(f"String code got from text: {data}")
        try:
            for idx, char in enumerate(data):
                if idx == 0:
                    continue
                lista[2] += "".join(lista[3][int(char)])
            if debug == True:
                print(f"list of scrambled caracters got from code: {lista[2]}")
            for i in range(len(texto) - 1):
                letra = texto[i + 1] // lista[1]
                s.append(lista[2][letra])
            senha = "".join(s)
            return senha
        except:
            if debug == True:
                print(f"Password was really wrong.")
            reallywrongpassword = ""
            for _ in range(random.randint(5, 80)):
                reallywrongpassword += random.choice(lista[0][:])
            return reallywrongpassword
test = coddecod("FeijaoTropeiro", "Elias", "c")
wrong = coddecod("54hvrvrwe4ij", test)
wrong2 = coddecod("Feijao", test, "d")
right = coddecod("FeijaoTropeiro", test, "d")
print(f"The text is: 'Elias'.\nThis is the message encrypted: {test}\nThese 2 are the outcomes with wrong passwords: '{wrong}' and '{wrong2}'.\nThis is the message with the right password: {right}.\nRun this code more times to see the encrypted message change.")
New contributor
Unknown is a new contributor to this site. Take care in asking for clarification, commenting, and answering. Check out our Code of Conduct.
\$\endgroup\$

5 Answers 5

6
\$\begingroup\$

illegible source code

You may think that future maintainers of this code will be glad to read lines longer than 180 characters long. Perhaps there are some humans who fit that description. But you are needlessly limiting the scope of future team mates. Write code so people can read it. If you've gone much past 100 characters per line, then it's time to rethink things.

triple quote

print(f"The text is: 'Elias'.\nThis is the message ...

Ok, that's just silly, when we go much more than two hundred characters beyond the left margin. Python has perfectly good facilities for representing such a sequence of codepoints while keeping them legible. Prefer

print(f"""The text is: 'Elias'.
This is the message ... """)

design of Public API

Naming a function "code de-code", and accepting a mode flag, is probably a mistake. At a minimum it prevents type checkers like mypy from verifying that caller used the proper types for the intended mode.

Prefer to present a pair of functions in your Public API, without a modo flag.

respect the signature order

You supplied a docstring. Thank you.

def coddecod (senha: str, texto: str | list, modo = 'd', debug = False) -> list|str:
    '''String <=> List\n|
    debug (True / False) is an option that show more information about the process if marked as True\n
    texto (String / List) is the text to be Encrypted or Decrypted.\n
    modo ( 'c' / 'd' ) is how the function will be used, it can be 'c' (Encrypt) or 'd' (Decrypt).\n
    senha (String) is the password to be used to Encrypt/Decrypt (Needs to be the same on both sides).\n

What are all those \n newlines doing in there? Ok, fine, we'll just ignore them.

Rather than offer explanations in the order
{debug, texto, modo, senha}, please prefer the signature order of
{senha, texto, modo, debug}. It's just how the human brain works. Maintainers anticipate seeing such things appear in the same order with parallel construction, so a consistent narrative will evolve. Scrambling the order gratuitously throws a monkey wrench into that for no gain.

typo

For "caracters", read "characters". Yes, I understand the Portuguese word "caracteres" lacks "h". It's worth getting used to, if only to make sense of common abbreviations like "char" and "ch".

Also, after "Then checks to the mode" I failed to glean much of use. The signature already told me that we're going to encode and decode. But as far as the details go, all the docstring really told me was that I'd have to read the source to tell exactly what happens.

keying material

If you call it senha or contrasena or mot_de_passe or wachtwoord; I don't much care.

What does concern me is that you've not written down any assumptions about how many bits of entropy it should contain. As written it appears the Concept of Operations is for a person to type an "easily remembered" password or passphrase, which likely has fairly low entropy.

The whole point of crypto is to make it "hard" for Eve to recover the plaintext, and here you're not giving any hints about what "hard" means. Must she do \$2^{256}\$ work? Must she do "trivial" work, as for rot13?

During a Code Review we answer these questions:

  1. Does it work?
  2. Is it maintainable?

We can't really address whether it "works" well if we can't tell what Security Parameter (256 bits?) you're shooting for, and whether there's reason to believe the implemented code matches that spec. You need to be explicit about your security objectives.

boolean variable is a boolean expression

    if debug == True:

No.

Please just write if debug:

magic constant

    lista = ["0bm1!d2Mafgh3TtijkcheH4lnEou5pqUsr67Svwx8yz9BCDFGHIJ...

That's crazy. Why start with "0bm1" and not "1m0b" or another permutation?

Recall Kerckhoffs's principle. Eve has already read your source code. The details of lista are already known to her.

Much better to lexically sort the valid characters in the source code, and then permute them:

    lista = [" !!#$%&()*+-0123456789=@BCDEFGHHIJKLMNOPQRSTUVWXYZ[]^_`abcdefghhijklmnopqrstuvwxyz|~ª°º", ...

Uggh, wait! Why do we have a pair of ! bangs in there? And a pair of capital H's? Also a pair of lowercase h's?!?

... makes an string with 85 ...

No, after dups it looks more like 84 distinct codepoints, to me.

BTW, kudos on using _ in for _ in range(5): to say that we won't use and don't care about the index value -- no need to name it.

more magic numbers

        lista[3].append( ... )

This is not great. The 3 is cryptic. Prefer to use a dict or @dataclass for a mutable structured tuple such as this.

tl;dr: Name your fields!

        step += 17

I imagine we might have a Caesar cipher going on here? It's obscure. Cite your references.

Also, you wrote some commented code. It doesn't execute; I didn't read it. Present working code on the Code Review site, and buggy code over on Stack Overflow.

another magic constant

    for idx, char in enumerate("0" + senha):

I can't imagine why we're prepending "0" there. It warrants a # comment.

If you write mysterious code, don't be surprised if a future teammate / maintainer rips it out because it's unclear it's doing anything useful.

extract helper

There seems to be some "scrambling" behavior going on near the top of the function, reading and writing several lista fields. The behavior is common to both the encode() and decode() paths.

Extract that common behavior into a small helper function, which both paths call into. Consider turning the whole codebase into a class, so methods can refer to self.mumble and self.blah when accessing those carefully scrambled variables.

cryptographically strong random numbers

        senha = []
        lista[4] += str(random.randint(1, 9))

I don't know exactly what element 4 is all about. But I'm worried that on this and on other calls you wanted unpredictable numbers, and that's not what we see here. When you read the fine docs about this generator you will find that

it is not suitable for all purposes, and is completely unsuitable for cryptographic purposes.

It goes on to explain in bold writing

Warning The pseudo-random generators of this module should not be used for security purposes. For security or cryptographic uses, see the secrets module.

versioning

You are defining your own cryptographic primitive here, something that all the textbooks advise against doing. You will make mistakes. (Ok, further mistakes in future.) You will find occasion to make Breaking Changes to the file format, as you revise the encryption algorithm.

Write a magic number at the beginning of each encrypted output. Follow it with a version number, so your code can recognize the appropriate algorithm to use with one blob of bits or another, and so you can offer backward compatibility for historic files written in previous years.

\$\endgroup\$
2
  • \$\begingroup\$ Thank you for the answer! But I did not understand this beginning of the point "keying material"; what does it mean "how many bits of entropy"? Is Entropy some size parameter or password difficulty? The idea was to use any password the user wants, and use it as a basis for the whole process. Just one more thing: Which exit format do you think would be better in terms of space? Do you have any encryption module that you recommend to use with password? \$\endgroup\$ Commented 6 hours ago
  • 1
    \$\begingroup\$ en.wikipedia.org/wiki/… , cf the articles on Cryptanalysis, Argon2id, and AES \$\endgroup\$ Commented 5 hours ago
3
\$\begingroup\$

Documentation

This large block of commented-out code should be deleted to reduce clutter:

#needs some fix, de-comment this to see.
'''
for idx, char in enumerate(senha): #this is the code i need help. look at the '#string scrambler' to see an better code than this here with same objective.
    for step in range(len(lista[3])):
        if lista[5][step] == 1:
            continue

Alternate versions of code can be stored in you version control system during development.

There is no need for long lines inside a docstring:

This function creates an list (like string) with some caracters of the password, then a code to the password with the list made, which is allways the same for the same password.\n

That line should be shortened by splitting it up into multiple lines.

Many of the lines in the doctring end in literal \n characters which can be deleted:

Then checks to the mode:\n

Simpler

This line:

if debug == True:

is simpler as:

if debug:

This line:

print(f"Password was really wrong.")

is simpler without the f-string:

print("Password was really wrong.")

ruff identifies other similar issues.

Naming

The PEP 8 style guide recommends snake_case for function and variable names.

The function name coddecod is strange. Perhaps code_decode is more meaningful.

The variable reallywrongpassword would be really_wrong_password.

senha does not convey much meaning in English. Either choose a more descriptive name or add a comment to describe what it means. The same is true for other variables like lista, etc.

\$\endgroup\$
2
\$\begingroup\$

To restate what has already been said, it is strongly advised to use English for variables, function names etc, and even comments and documentation. Because that makes maintenance easier for others, especially if you work in a team, or you create Github repos and other people find them useful and fork them. This can be challenging, but you have already made that effort in comments.

I find the flow and the code logic cryptic and hard to follow. What does not help is that you have a single function that is quite long, with multiple levels of indentation.

Without even understanding the internals of your code, it is obvious to me that it should be split into two functions, one for encoding, one for decoding. As the old saying goes, one function should do just one thing and do it well. You will surely notice that smaller functions are easier to manage and to debug.

The most problematic in my view is that lista because it is counter-intuitive. Basically, it is a range of parameters, but the purpose of each is not obvious without analyzing the code in depth. Working with indices is tricky and error-prone. There are different data structures that are more convenient to use, for example a class, a dataclass or even a named tuple.

Regarding the debug mode, you should get acquainted with the logging module, then instead of print just use logging.debug. So that your debug messages will show up depending on the requested logging level (can be set via a command line parameter or config file). Think of it, you have plenty of if debug == True: in your code, and you could easily get rid of them.

\$\endgroup\$
2
\$\begingroup\$

Apart from all of the other good points raised, you would also want to guard the calling of your test code at the end of the program.

if __name__ == '__main__':
    test = coddecod("FeijaoTropeiro", "Elias", "c")
    wrong = coddecod("54hvrvrwe4ij", test)
    wrong2 = coddecod("Feijao", test, "d")
    right = coddecod("FeijaoTropeiro", test, "d")
    
    print(f"""The text is: 'Elias'.
This is the message encrypted: {test}
These 2 are the outcomes with wrong passwords: '{wrong}' and '{wrong2}'.
This is the message with the right password: {right}.
Run this code more times to see the encrypted message change.""")

This allows you to import the module containing coddecod into another program without executing this code.

\$\endgroup\$
1
\$\begingroup\$

What is Entropy?

Who is Eve?

John Gordon explains that

over the years Alice and Bob have tried to defraud insurance companies, they've played poker for high stakes by mail, and they've exchanged secret messages over tapped telephones. ...

Now most people in Alice's position would give up. Not Alice. She has courage which can only be described as awesome. Against all odds, over a noisy telephone line, tapped by the tax authorities and the secret police, Alice will happily attempt, with someone she doesn't trust, whom she cannot hear clearly, and who is probably someone else, to fiddle her tax returns and to organize a coup d'etat, while at the same time minimizing the cost of the phone call.

And Eve is the eavesdropper who tries to listen in on Alice's conversation.

Or in the context of encrypted archival file storage, we might model communication across time as "week ago Alice" sending a message to "current Alice". In the intervening week or so, she forgot the plaintext but not the key.

standards

When I speak of the Advanced Encryption Standard in this article, I refer exclusively to the AES-256 variant.

entropy

By hypothesis Eve sees the ciphertext and the encryption algorithm, but not the secret key. Eve wants to read the cleartext.

Alice wants Eve's task to be as hard as possible, requiring a brute force search of the key space. Let's drill down on that.

Suppose Alice is using encryption software which allows her to choose from exactly two valid passwords: {yes, no}. Eve knows this. If Alice chooses a 3-character password, is that somehow "more secure" than a 2-character password? No.

Eve does trivial work to read the cleartext -- just try both passwords. If she sees gibberish then she knows to try the other one. We assume Alice didn't send random bits; there's usually some recognizable structure to the cleartext, such as being composed of words from a dictionary.

We could slightly improve matters by accepting
{vermelho, verde, azul} as the valid passwords, but then we'd have only a little more than one bit of entropy. If we accept
{vermelho, verde, azul, branco} then we have two bits of entropy. Accepting from the set of Snow White and her seven dwarves would give us three bits of entropy, and so on.

password

For the moment I will interpret "password" literally -- a word that appears in Webster's dictionary or similar.

If chosen by a ten-year old, having a vocabulary of likely less than eight thousand words, we might have \$13\$ bits of entropy, so Eve must do about \$2^{13}\$ work. (Or half that, but let's not quibble just yet.)

By the time that student graduates high school they might have doubled their vocabulary to ballpark sixteen thousand words, so Eve does \$2^{14}\$ work. Even after years of graduate work we might see just another doubling, so Eve does \$2^{15}\$ work.

Let's be generous and say that Alice randomly chooses each word from a hefty dictionary, having sixty-four thousand words, for two random bytes and \$2^{16}\$ work. In many crypto systems Eve could still brute force through all those possibilities within one minute or even one second. Some systems, such as Argon2id, are specifically designed to be slow, so Eve might take hours or years to try all possibilities, even with many machines at her disposal.

passphrase

Sometimes documentation or software identifiers mention a "password" when really a "passphrase" is intended. We will assume there is no practical length limit, so a sentence or even a paragraph could be used.

Under the charitable assumptions we've made so far, how much work must Eve do? Frankly, not much.

  • 1 word: 16 bits
  • 2 words: 32 bits
  • 3 words: 48 bits
  • 4 words: 64 bits

We'd need more than three words before life starts getting difficult for Eve, and even then we need unrelated words. So "call me Ishmael" would be a poor choice, offering far less than 48 bits of entropy.

hashing

In a short dictionary that begins with these words, we might assign 1, 2, 3 to {a, add, act}. And then "zebra" is up in the thousands somewhere.

So the 256-bit AES key might be filled with 0x0003 for "act", followed by 0x0002 for "add", followed by another number for the rank order of "zebra". But we'd still need sixteen dictionary words to fill up the AES key. Or we could leave a bunch of zero bits at the end.

And suppose our (weak!) passphrase started, "It was the best of times". Dickens is a bit prolix, so we might wind up with more than sixteen words. What to do?

Hash functions to the rescue! We could run the phrase through SHA1, same as git. But that only gives us 160 bits. More convenient to run it through SHA3-256, which gives us just what we need.

The remarkable thing about the Diffusion and Confusion avalanches of such a hash is that small amounts of entropy (like R,G,B color name) will be preserved, while inputs with more than 256 bits of entropy will produce an output having 256 bits of entropy. (Well, there are some details we needn't pursue here, like hash of hash may behave subtly differently from a naïve expectation.)

Also, flipping a single bit in the input will flip about half the output bits. Hash of uppercase "It was" starts b2dc0d88, while hash of lowercase "it was" starts fff4c217. The inputs look similar, while the outputs look very different.

PBKDF

In summary, the trouble with "a passphrase I can memorize" is that it likely has far fewer than 256 bits of entropy.

There's at least two ways to cope with this. One is to use a memorized password to unlock a "vault" file containing the "real" 256-bit key. In the same vein we might store those key bits on a thumb drive, biometric ring, or similar storage that is secure from Eve and perhaps even from her evil friend Mallory.

Another way is to give up and say the memorized password is all we have, so let's make the most of it. If our weak passphrase is "red tomato", we could use a fast SHA3 hash to smear that handful of entropy across 256 bits. But better to use the slow Argon2id hash function. Why? Because it has been engineered to require that Eve purchase lots of CPU and lots of memory if she wants to have a farm spend time doing brute force guessing. It's not the only such password based key derivation function, but it is the winner of the Password Hashing Competition.

OP entropy

The OP code does not in fact use senha, so describing the actual entropy used is moot.

range()

    for _ in range(5):
        lista[3].append(lista[0][0 + step: 17 + step])
        step += 17

To compute that constant list of str, you might prefer to supply a 3rd step argument.

    for i in range(0, 5*17, 17):
        lista[3].append(lista[0][i : i+17])
\$\endgroup\$

You must log in to answer this question.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.