Combine 2 string columns in pandas with different conditions in both columns

Question

I have 2 columns in pandas, with data that looks like this.

code fx         category
AXD  AXDG.R     cat1
AXF  AXDG_e.FE  cat1 
333  333.R      cat1
....

There are other categories but I am only interested in cat1.

I want to combine everything from the code column, and everything after the . in the fx column and replace the code column with the new combination without affecting the other rows.

code    fx         category
AXD.R   AXDG.R     cat1
AXF.FE  AXDG_e.FE  cat1
333.R   333.R      cat1
.....

Here is my code, I think I have to use regex but I'm not sure how to combine it in this way.

df.loc[df['category']== 'cat1', 'code'] = df[df['category'] == 'cat1']['code'].str.replace(r'[a-z](?=\.)', '', regex=True).str.replace(r'_?(?=\.)','', regex=True).str.replace(r'G(?=\.)', '', regex=True)

I'm not sure how to select the second column also. Any help would be greatly appreciated.

anarchy · Accepted Answer · 2021-12-21 13:18:43Z

3

There are other categories but I am only interested in cat1

You can use str.split with series.where to add the extention for cat1:

df['code'] = (df['code'].astype(str).add("."+df['fx'].str.split(".").str[-1])
             .where(df['category'].eq("cat1"),df['code']))

print(df)

     code         fx category
0   AXD.R     AXDG.R     cat1
1  AXF.FE  AXDG_e.FE     cat1
2   333.R      333.R     cat1

edited Dec 21, 2021 at 13:18

anarchy

5,2343 gold badges26 silver badges66 bronze badges

answered Dec 19, 2021 at 17:59

anky

75.3k11 gold badges46 silver badges76 bronze badges

Sign up to request clarification or add additional context in comments.

4 Comments

anarchy Over a year ago

I am getting this error, TypeError: unsupported operand type(s) for +: 'float' and 'str', I am guessing some rows might have digits? is that possible?

anky Over a year ago

@anarchy you probably have mixed datatypes in your series. Convert to string first using df['code']=df['code'].astype(str) and then try.

anarchy Over a year ago

yeah I actually figured it out haha, added the as type and it works.. thanks!

anarchy Over a year ago

I updated your answer to include the string conversion

mozway · Accepted Answer · 2021-12-19 18:00:56Z

3

You can extract the part of fx and append it to code:

df['code'] += df['fx'].str.extract('(\..*$)')[0]

output:

     code         fx category
0   AXD.R     AXDG.R     cat1
1  AXF.FE  AXDG_e.FE     cat1
2   333.R      333.R     cat1

to limit to cat1 only:

df.loc[df['category'].eq('cat1'), 'code'] += df['fx'].str.extract('(\..*$)')[0]

answered Dec 19, 2021 at 18:00

mozway

267k13 gold badges56 silver badges106 bronze badges

4 Comments

anarchy Over a year ago

Can you explain the \..*$ part?

mozway Over a year ago

@anarchy match a literal dot \. followed by a series of characters .* and end of line $.

anarchy Over a year ago

I thought there only needs to be one dot though, what’s the second one for

mozway Over a year ago

An unescaped dot means "any character" in a regex.

score 3 · Accepted Answer · 2021-12-19 18:07:41Z

3

You can use Series.str.extract:

df['code'] = df['code'].astype(str) + np.where(df['category'].eq('cat1'), df['fx'].astype(str).str.extract('(\..+)')[0], '')

Output:

>>> df
     code         fx category
0   AXD.R     AXDG.R     cat1
1  AXF.FE  AXDG_e.FE     cat1
2   333.R      333.R     cat1

edited Dec 19, 2021 at 18:07

answered Dec 19, 2021 at 18:01

user17242583

Comments

tlentali · Accepted Answer · 2021-12-19 18:10:08Z

2

We can get the expected result using split like so :

>>> df['code'] = df['code'] + '.' + df['fx'].str.split(pat=".", expand=True)[1]
>>> df
    code    fx          category    
0   AXD.R   AXDG.R      cat1        
1   AXF.FE  AXDG_e.FE   cat1        
2   333.R   333.R       cat1

To filter only on cat1, as @anky did very well, we can add a where statement:

>>> df['code'] = (df['code'] + '.' + df['fx'].str.split(pat=".", expand=True)[1]).where(df['category'].eq("cat1"), df['code'])

edited Dec 19, 2021 at 18:10

answered Dec 19, 2021 at 17:57

tlentali

3,4632 gold badges18 silver badges23 bronze badges

2 Comments

anarchy Over a year ago

what about filtering 'cat1' only ?

tlentali Over a year ago

Indeed ! I updated my answer to filter on cat1 as well.

wwnde · Accepted Answer · 2021-12-19 21:16:52Z

1

Replace alphanumerics before the dot. Append the result to column code.

df['code'] +=df['fx'].str.replace('(^[\w]+(?=\.))','',regex=True)




    code         fx  category
0   AXD.R     AXDG.R     cat1
1  AXF.FE  AXDG_e.FE     cat1
2   333.R      333.R     cat1

answered Dec 19, 2021 at 21:16

wwnde

26.7k6 gold badges22 silver badges38 bronze badges

Collectives™ on Stack Overflow

Combine 2 string columns in pandas with different conditions in both columns

5 Answers 5

4 Comments

4 Comments

Comments

2 Comments

Comments

Your Answer

Hot Network Questions

Collectives™ on Stack Overflow

5 Answers 5

4 Comments

4 Comments

Comments

2 Comments

Comments

Your Answer

Sign up or log in

Post as a guest

Related