10

When you print a pandas DataFrame, which calls DataFrame.to_string, it normally inserts a minimum of 2 spaces between the columns. For example, this code

import pandas as pd

df = pd.DataFrame( {
    "c1" : ("a", "bb", "ccc", "dddd", "eeeeee"),
    "c2" : (11, 22, 33, 44, 55),
    "a3235235235": [1, 2, 3, 4, 5]
} )
print(df)

outputs

       c1  c2  a3235235235
0       a  11            1
1      bb  22            2
2     ccc  33            3
3    dddd  44            4
4  eeeeee  55            5

which has a minimum of 2 spaces between each column.

I am copying DataFarames printed on the console and pasting it into documents, and I have received feedback that it is hard to read: people would like more spaces between the columns.

Is there a standard way to do that?

I see no option in either DataFrame.to_string or pandas.set_option.

I have done a web search, and not found an answer. This question asks how to remove those 2 spaces, while this question asks why sometimes only 1 space is between columns instead of 2 (I also have seen this bug, hope someone answers that question).

My hack solution is to define a function that converts a DataFrame's columns to type str, and then prepends each element with a string of the specified number of spaces.

This code (added to the code above)

def prependSpacesToColumns(df: pd.DataFrame, n: int = 3):
    spaces = ' ' * n
    
    # ensure every column name has the leading spaces:
    if isinstance(df.columns, pd.MultiIndex):
        for i in range(df.columns.nlevels):
            levelNew = [spaces + str(s) for s in df.columns.levels[i]]
            df.columns.set_levels(levelNew, level = i, inplace = True)
    else:
        df.columns = spaces + df.columns
    
    # ensure every element has the leading spaces:
    df = df.astype(str)
    df = spaces + df
    
    return df

dfSp = prependSpacesToColumns(df, 3)
print(dfSp)

outputs

          c1     c2    a3235235235
0          a     11              1
1         bb     22              2
2        ccc     33              3
3       dddd     44              4
4     eeeeee     55              5

which is the desired effect.

But I think that pandas surely must have some builtin simple standard way to do this. Did I miss how?

Also, the solution needs to handle a DataFrame whose columns are a MultiIndex. To continue the code example, consider this modification:

idx = (("Outer", "Inner1"), ("Outer", "Inner2"), ("Outer", "a3235235235"))
df.columns = pd.MultiIndex.from_tuples(idx)
0

2 Answers 2

6

You can accomplish this through formatters; it takes a bit of code to create the dictionary {'col_name': format_string}. Find the max character length in each column or the length of the column header, whichever is greater, add some padding, and then pass a formatting string.

Use partial from functools as the formatters expect a one parameter function, yet we need to specify a different width for each column.

Sample Data

import pandas as pd
df = pd.DataFrame({"c1": ("a", "bb", "ccc", "dddd", 'eeeeee'),
                   "c2": (1, 22, 33, 44, 55),
                   "a3235235235": [1,2,3,4,5]})

Code

from functools import partial

# Formatting string 
def get_fmt_str(x, fill):
    return '{message: >{fill}}'.format(message=x, fill=fill)

# Max character length per column
s = df.astype(str).agg(lambda x: x.str.len()).max() 

pad = 6  # How many spaces between 
fmts = {}
for idx, c_len in s.iteritems():
    # Deal with MultIndex tuples or simple string labels. 
    if isinstance(idx, tuple):
        lab_len = max([len(str(x)) for x in idx])
    else:
        lab_len = len(str(idx))

    fill = max(lab_len, c_len) + pad - 1
    fmts[idx] = partial(get_fmt_str, fill=fill)

print(df.to_string(formatters=fmts))

            c1      c2      a3235235235
0            a      11                1
1           bb      22                2
2          ccc      33                3
3         dddd      44                4
4       eeeeee      55                5

# MultiIndex Output
         Outer                             
        Inner1      Inner2      a3235235235
0            a          11                1
1           bb          22                2
2          ccc          33                3
3         dddd          44                4
4       eeeeee          55                5
Sign up to request clarification or add additional context in comments.

6 Comments

Thanks! I am marking your response as the answer, because you show a deep low level way to achieve ANY desired formatting effect, including my need for column spacing. I still think that pandas needs to add some kind of option to DataFrame.to_string...
Compared to my original 3 line hack solution, your code addresses one defect: the column names also need to be considered. I have edited my question's hack code to address that, as well as to use your modified DataFrame (which includes a long column name).
I note, however, that your solution has this small issue: the spacing between columns is not constant! I count 8 spaces between the index column and c1, then 7 spaces between the remaining two columns. My modified hack code, in contrast, produces the same number of spaces between all columns.
@HaroldFinch it's difficult to tell but that's likely an issue due to the formatting of the index (with an extra space) so it gets lumped in with what looks like the first column. I.e. index pads to the right, columns bad to the left, so that first column looks weird. AFAIK, you can't format the index so you could make it a column, format that and then to_string(index=False). But it does get complicated.
I just discovered that both of our original codes do not handle a DataFrame whose columns are of type pandas.MultiIndex. My original code crashes, while yours puts in way too many spaces. I will edit my question to handle MultiIndex.
|
0

So, for a panda dataframe object, when using the .to_string method, there is a parameter you can specify now for column spacing! I do not know if this is just in newer versions and was not around when you had this problem 4 years ago, but:

print (#dataframeobjectname#.to_string(col_space=#n#)) Will insert spaces between the columns for you. With values of n of 1 or below, the spacing between columns is 1 space, but as you increase n, the number of spaces you specify does get inserted between columns. Interesting effect, though, it adds n-1 spaces in front of the first column.

Comments

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.