Skip to main content
cancel
Showing results for 
Search instead for 
Did you mean: 

Join us for an expert-led overview of the tools and concepts you'll need to become a Certified Power BI Data Analyst and pass exam PL-300. Register now.

Reply
Anonymous
Not applicable

Missing groupby columns while using pandas dataframe

Hi Experts,

 

I am trying the PoC of a project  via loading data from excel file(just taking excel as an example.) and transform in python dataframe, then load to Power bi. . It runs well in PyDev. But when running in Powerbi, the columns used for groupby are missing. Please could I have your help?

 

Thanks!

 

Environment:

Windows 10

Powerbi desktop Version: 2.108.997.0 64-bit (August 2022)
Python3.7
Matplotlib 3.21
Numpy 1.18.4
Pandas 1.0.1

 

Python codes:

 

import pandas as pd

import openpyxl as opxls

 

 

#===============Read Workgroup Mapping==========

 

df_orginal=pd.read_excel(open('C:\local\Porject\Configuration\groupby_output_test.xls', 'rb'), sheet_name='groupby_output_test')

print(df_orginal)

 

print('====================1================================')

 

 

df_groupby_date=df_orginal.groupby(by=['Date']).sum()

 

print(df_groupby_date)

 

print('======================2==============================')

df_groupby_date_sales=df_orginal.groupby(by=['Date', 'Sales']).sum()

 

print(df_groupby_date_sales)

 

 

print("---------------end------------")

 

PyDev (groupby) result:

corydon_0-1662778981335.png

 

Power BI result

Before groupby, looks fine.

corydon_3-1662779249907.png

 

 

After Groupby[Date]), Date column missing 

corydon_2-1662779181064.png

After Groupby[Date] and [Sales]), these 2 columns missing 

corydon_4-1662779474088.png

 

 

3 REPLIES 3
lbendlin
Super User
Super User

Look at your Python output.  It is pivoted which results in the column names for Date and Sales moving down a row.  When Power Query then tries to read the data it cannot find the column names.

 

Feed the raw data to Power Query, not the processed data.

 

By the way, none of this requires Python. It will be much faster with native functions.

Anonymous
Not applicable

Thanks lbendlin. Yes I noticed it. I think it is due to how Power BI interpretes the dataframe which is pivtol. I did export the groupbyed dataframe to csv. the format is correct.  For the above scenario, your are right, no need of python.  but the project might have more complex logics, for example , the order of different queries, and the results criterias matching.  I had a post before, but didn't seem to be possible just in PQ or PBI domians.

https://community.powerbi.com/t5/Power-Query/How-to-get-the-refresh-status-of-each-power-query-of-a-...

The current approach is to use Python to proscess the data and load to PBI for visualization. 

 

Any other better solution?

 

Thanks!

Power Query and VBA do not mix well.  They have very different philosophies.  I assume you have seen this advice

 

the BackgroundQuery=True must be set. otherwise, if it set to False the afterfresh event will not be triggered

Helpful resources

Announcements
July PBI25 Carousel

Power BI Monthly Update - July 2025

Check out the July 2025 Power BI update to learn about new features.

Join our Fabric User Panel

Join our Fabric User Panel

This is your chance to engage directly with the engineering team behind Fabric and Power BI. Share your experiences and shape the future.

June 2025 community update carousel

Fabric Community Update - June 2025

Find out what's new and trending in the Fabric community.