5

I Have a list of strings like this:

stringlist = [JAN, jan, FEB, feb, mar]

And I have a dataframe that looks like this:

**date**            **value**
01MAR16                1
05FEB16                12
10jan17                5
10mar15                9
03jan05                7
04APR12                3

I only want to keep the dates which contain one string from stringlist in it, the result should look like this:

**date**            **value**
NA                     1
05FEB16                12
10jan17                5
10mar15                9
03jan05                7
NA                     3

Im new to using regular expression so having some trouble wrapping my head around it, would appreciate some help.

4 Answers 4

5
stringlist = ["JAN", "jan", "FEB", "feb", "mar"]

m = df["date"].str.contains("|".join(stringlist))
df.loc[~m, "date"] = np.nan
print(df)

Prints:

      date  value
0      NaN      1
1  05FEB16     12
2  10jan17      5
3  10mar15      9
4  03jan05      7
5      NaN      3
Sign up to request clarification or add additional context in comments.

Comments

1

You can use the Series.str.contains method as demonstrated here: Select by partial string from a pandas DataFrame:

import pandas as pd

df = pd.DataFrame({'date': ['NA', '05FEB16', '10jan17', '10mar15', '03jan05', 'NA'],
                   'value': [1, 12, 5, 9, 7, 3]})

stringlist = ['JAN', 'jan', 'FEB', 'feb', 'mar']

print(df[df['date'].str.contains('|'.join(stringlist))])

Output:

      date  value
1  05FEB16     12
2  10jan17      5
3  10mar15      9
4  03jan05      7

Comments

1

Another play on regular expressions is to extract the characters (assumption here is that the months will alway be sandwiched between day and year), then check if each extract can be found in stringlist:

(df.assign(months = df.date.str.extract(r'([a-zA-Z]+)'), 
           date = lambda df: df.where(df.months.isin(stringlist))
          )
   .iloc[:, :-1]
)

      date  value
0      NaN      1
1  05FEB16     12
2  10jan17      5
3  10mar15      9
4  03jan05      7
5      NaN      3

Comments

0

You can simply do this:


import pandas as pd

keys = ["JAN", "jan", "FEB", "feb", "mar"]
date = ["01MAR16", "05FEB16", "10jan17", "10mar15", "03jan05", "04APR12"]
value = [1, 12, 5, 9, 7, 3]

df = pd.DataFrame({"date": date,
                   "value": value})

for i in range(len(df)):
    for key in keys:
        if key in df["date"][i]:
            print(df["date"][i], df["value"][i])
            break

Output

05FEB16 12
10jan17 5
10mar15 9
03jan05 7

Comments

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.