6

I have 2 columns in pandas, with data that looks like this.

code fx         category
AXD  AXDG.R     cat1
AXF  AXDG_e.FE  cat1 
333  333.R      cat1
....

There are other categories but I am only interested in cat1.

I want to combine everything from the code column, and everything after the . in the fx column and replace the code column with the new combination without affecting the other rows.

code    fx         category
AXD.R   AXDG.R     cat1
AXF.FE  AXDG_e.FE  cat1
333.R   333.R      cat1
.....

Here is my code, I think I have to use regex but I'm not sure how to combine it in this way.

df.loc[df['category']== 'cat1', 'code'] = df[df['category'] == 'cat1']['code'].str.replace(r'[a-z](?=\.)', '', regex=True).str.replace(r'_?(?=\.)','', regex=True).str.replace(r'G(?=\.)', '', regex=True)

I'm not sure how to select the second column also. Any help would be greatly appreciated.

5 Answers 5

3

There are other categories but I am only interested in cat1

You can use str.split with series.where to add the extention for cat1:

df['code'] = (df['code'].astype(str).add("."+df['fx'].str.split(".").str[-1])
             .where(df['category'].eq("cat1"),df['code']))

print(df)

     code         fx category
0   AXD.R     AXDG.R     cat1
1  AXF.FE  AXDG_e.FE     cat1
2   333.R      333.R     cat1
Sign up to request clarification or add additional context in comments.

4 Comments

I am getting this error, TypeError: unsupported operand type(s) for +: 'float' and 'str', I am guessing some rows might have digits? is that possible?
@anarchy you probably have mixed datatypes in your series. Convert to string first using df['code']=df['code'].astype(str) and then try.
yeah I actually figured it out haha, added the as type and it works.. thanks!
I updated your answer to include the string conversion
3

You can extract the part of fx and append it to code:

df['code'] += df['fx'].str.extract('(\..*$)')[0]

output:

     code         fx category
0   AXD.R     AXDG.R     cat1
1  AXF.FE  AXDG_e.FE     cat1
2   333.R      333.R     cat1

to limit to cat1 only:

df.loc[df['category'].eq('cat1'), 'code'] += df['fx'].str.extract('(\..*$)')[0]

4 Comments

Can you explain the \..*$ part?
@anarchy match a literal dot \. followed by a series of characters .* and end of line $.
I thought there only needs to be one dot though, what’s the second one for
An unescaped dot means "any character" in a regex.
3

You can use Series.str.extract:

df['code'] = df['code'].astype(str) + np.where(df['category'].eq('cat1'), df['fx'].astype(str).str.extract('(\..+)')[0], '')

Output:

>>> df
     code         fx category
0   AXD.R     AXDG.R     cat1
1  AXF.FE  AXDG_e.FE     cat1
2   333.R      333.R     cat1

Comments

2

We can get the expected result using split like so :

>>> df['code'] = df['code'] + '.' + df['fx'].str.split(pat=".", expand=True)[1]
>>> df
    code    fx          category    
0   AXD.R   AXDG.R      cat1        
1   AXF.FE  AXDG_e.FE   cat1        
2   333.R   333.R       cat1    

To filter only on cat1, as @anky did very well, we can add a where statement:

>>> df['code'] = (df['code'] + '.' + df['fx'].str.split(pat=".", expand=True)[1]).where(df['category'].eq("cat1"), df['code'])

2 Comments

what about filtering 'cat1' only ?
Indeed ! I updated my answer to filter on cat1 as well.
1

Replace alphanumerics before the dot. Append the result to column code.

df['code'] +=df['fx'].str.replace('(^[\w]+(?=\.))','',regex=True)




    code         fx  category
0   AXD.R     AXDG.R     cat1
1  AXF.FE  AXDG_e.FE     cat1
2   333.R      333.R     cat1

Comments

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.