15

I'm trying to take a string of ints and/or floats and create a list of floats. The string is going to have these brackets in them that need to be ignored. I'm using re.split, but if my string begins and ends with a bracket, I get extra empty strings. Why is that?

Code:

import re
x = "[1 2 3 4][2 3 4 5]"
y =  "1 2 3 4][2 3 4 5"
p = re.compile(r'[^\d\.]+')
print p.split(x)
print p.split(y)

Output:

['', '1', '2', '3', '4', '2', '3', '4', '5', '']
['1', '2', '3', '4', '2', '3', '4', '5']
2
  • 2
    None of the answers here actually answer the OP's question (i.e. "Why is that?"). Some answers can be found in this stackoverflow question. Commented Apr 29, 2019 at 13:56
  • @SpaceMonkey55 you should place this as answer! Commented Jan 14, 2020 at 23:25

4 Answers 4

11

If you use re.split, then a delimiter at the beginning or end of the string causes an empty string at the beginning or end of the array in the result.

If you don't want this, use re.findall with a regex that matches every sequence NOT containing delimiters.

Example:

import re

a = '[1 2 3 4]'
print(re.split(r'[^\d]+', a))
print(re.findall(r'[\d]+', a))

Output:

['', '1', '2', '3', '4', '']
['1', '2', '3', '4']

As others have pointed out in their answers, this may not be the perfect solution for this problem, but it is a general answer to the problem described in the title of the question, which I also had to solve when I found this question using Google.

Sign up to request clarification or add additional context in comments.

Comments

1

As a more pythonic way you can just use a list comprehension and str.isdigit() method to check of your character is digit :

>>> [i for i in y if i.isdigit()]
['1', '2', '3', '4', '2', '3', '4', '5']

And about your code first of all you need to split based on space or brackets that could be done with [\[\] ] and for get rid of empty strings that is for leading and trailing brackets you can first strip your string :

>>> y =  "1 2 3 4][2 3 4 5"
>>> re.split(r'[\[\] ]+',y)
['1', '2', '3', '4', '2', '3', '4', '5']
>>> y =  "[1 2 3 4][2 3 4 5]"
>>> re.split(r'[\[\] ]+',y)
['', '1', '2', '3', '4', '2', '3', '4', '5', '']
>>> re.split(r'[\[\] ]+',y.strip('[]'))
['1', '2', '3', '4', '2', '3', '4', '5']

You can also wrap your result with filter function and using bool function.

>>> filter(bool,re.split(r'[\[\] ]+',y))
['1', '2', '3', '4', '2', '3', '4', '5']

1 Comment

Your list comprehension only works if all the numbers are single digit. Certainly that's the case for the example in the question, but I would never assume it for the general case.
1

You can just use filter to avoid empty results:

x = "[1 2 3 4][2 3 4 5]"

print filter(None, re.split(r'[^\d.]+', x))
# => ['1', '2', '3', '4', '2', '3', '4', '5']

Comments

0

You can use regex to capture the content you want instead of splitting the string. You can use this regex:

(\d+)

Working demo

enter image description here

Python code:

import re
p = re.compile(ur'(\d+)')
test_str = u"[1 2 3 4][2 3 4 5]"

re.findall(p, test_str)

Comments

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.