Solved: Re: How do I change local source files within pyth...

COS019970 · ‎07-05-2024

Below is the code that I have got that is working but I have been trying to replace the local source excel files with Power Bi query tables which I have already created, called 'Contract Mating' for Contract and 'Contract Mating Invoiced Sales' for Credit.

import pandas as pd
import numpy as np
import os
from openpyxl import load_workbook
from datetime import datetime, date

# Set display options
pd.set_option('display.max_columns', None)
pd.set_option('display.max_rows', None)

# Change directory to the appropriate path
os.chdir(r'C:\Users\cosullivan\Documents\Contract Mating')
Contract_Mating_Listing = 'Return_list_27_June'
Test = 'Test'
Version = '_V4'

# Bring in the file that includes credit amounts
Credit = pd.read_excel(Test + '.xlsx', header=0, sheet_name='IT_Data')

# Bring in the file that includes contract_returns and mating recommendations
Contract = pd.read_excel(Contract_Mating_Listing + '.xlsx', header=0)

# Keep the herds that are in the credit file for testing purposes
Contracts = Contract[Contract['HERD_Dam'].isin(Credit['HERD_Dam'])].reset_index()

# Calculate the number of dams that received each sire recommendation per herd
Contract_dams_Herd_sire = Contracts.groupby(['HERD_Dam', 'Sire']).agg({'Dam': ['count']}).reset_index()
Contract_dams_Herd_sire.columns = ['HERD_Dam', 'Sire', 'No_of_Dams_Recommended_to_sire']
Contract_dams_Herd_sire['Max_number_of_straws_to_credit_per_sire'] = Contract_dams_Herd_sire['No_of_Dams_Recommended_to_sire'] * 2

# Calculate the number of dams recommended for matings in each herd
Contract_dams_herd = Contracts.groupby('HERD_Dam')['Dam'].nunique().reset_index()
Contract_dams_herd.columns = ['HERD_Dam', 'No_of_Dams_Contracted_to_herd']
Contract_dams_herd['Max_number_of_straws_to_credit_per_herd'] = Contract_dams_herd['No_of_Dams_Contracted_to_herd'] * 2

# Merge summary herd stats
Contract_summary = pd.merge(left=Contract_dams_Herd_sire, right=Contract_dams_herd, how='outer', left_on='HERD_Dam', right_on='HERD_Dam')

# Merge in herd stats with the sales figures from IT
Contract_Credit = pd.merge(left=Credit, right=Contract_summary, how='outer', left_on=['HERD_Dam', 'Sire'], right_on=['HERD_Dam', 'Sire'])

# Replace any sires who were contracted but have not been purchased to 0
Contract_Credit['Total DIY Quantity purchased'] = Contract_Credit['Total DIY Quantity purchased'].fillna(0)
# Replace any sires who were contracted but have no value to 0
Contract_Credit['Average cost per straw Amount'] = Contract_Credit['Average cost per straw Amount'].fillna(0)

# Calculate the maximum number of straws to credit based on purchases
def potential_straw_credit(x):
    if x['Max_number_of_straws_to_credit_per_sire'] >= x['Total DIY Quantity purchased']:
        return x['Total DIY Quantity purchased']
    else:
        return x['Max_number_of_straws_to_credit_per_sire']

Contract_Credit['Potential_Straws_to_credit'] = Contract_Credit.apply(potential_straw_credit, axis=1)

# Sort by the average cost of the straw per herd so that you end up crediting for the more expensive ones first
Contract_Credit = Contract_Credit.sort_values(by=['HERD_Dam', 'Average cost per straw Amount'], ascending=False).reset_index(drop=True)

# Cumulative sum the straws that were purchased off what needs to be credited
Contract_Credit['cumsum_herd'] = Contract_Credit.groupby(['HERD_Dam'])['Potential_Straws_to_credit'].cumsum()
Contract_Credit['previous'] = Contract_Credit.groupby(['HERD_Dam'])['cumsum_herd'].shift(1).fillna(0)

# Function to work out should the purchase be credited - any minus will be blanked as enough has been credited
def Cumsum(x):
    if x['cumsum_herd'] <= x['Max_number_of_straws_to_credit_per_herd']:
        return x['Potential_Straws_to_credit']
    elif x['cumsum_herd'] > x['Potential_Straws_to_credit']:
        return x['Potential_Straws_to_credit'] - x['previous']
    else:
        return 0

Contract_Credit['Straws_to_credit'] = Contract_Credit.apply(Cumsum, axis=1)
Contract_Credit['Straws_to_credit'] = np.where(Contract_Credit['Straws_to_credit'] < 0, 0, Contract_Credit['Straws_to_credit'])

# Calculate the amount to credit
Contract_Credit['Amount_to_credit'] = Contract_Credit['Straws_to_credit'] * Contract_Credit['Average cost per straw Amount']

# Summarize the total credit amount per herd
Contract_Credit_Per_Herd = Contract_Credit.groupby(['HERD_Dam']).agg({'Amount_to_credit': ['sum']}).reset_index()
Contract_Credit_Per_Herd.columns = ['HERD_Dam', 'Total_Credit_Amount (€)']

# Output final DataFrame to Power BI
dataset = Contract_Credit_Per_Herd

COS019970 · ‎07-09-2024

We ended up getting what we needed by recreating what the script was doing by just implementing the steps in Power BI but thanks anyway for the help!

View solution in original post

COS019970 · ‎07-08-2024

I am getting the following error: Formula.Firewall: Query 'Contract Mating' (step 'Run Python script') references other queries or steps, so it may not directly access a data source. Please rebuild this data combination.

import pandas as pd
import numpy as np

data = dataset

# Split the combined dataset into 'Credit' and 'Contract' DataFrames based on the 'Source' column
Credit = data[data['Source'] == 'Contract Mating Invoiced Sales'].copy()
Contract = data[data['Source'] == 'Contract Mating'].copy()

# Bring back all the herds in the contract file
Contracts = Contract.reset_index(drop=True)

# Calculate the number of dams that received each sire recommendation per herd
Contract_dams_Herd_sire = Contracts.groupby(['HERD_Dam', 'Sire']).agg({'Dam': ['count']}).reset_index()
Contract_dams_Herd_sire.columns = ['HERD_Dam', 'Sire', 'No_of_Dams_Recommended_to_sire']
Contract_dams_Herd_sire['Max_number_of_straws_to_credit_per_sire'] = Contract_dams_Herd_sire['No_of_Dams_Recommended_to_sire'] * 2

# Calculate the number of dams recommended for matings in each herd
Contract_dams_herd = Contracts.groupby('HERD_Dam')['Dam'].nunique().reset_index()
Contract_dams_herd.columns = ['HERD_Dam', 'No_of_Dams_Contracted_to_herd']
Contract_dams_herd['Max_number_of_straws_to_credit_per_herd'] = Contract_dams_herd['No_of_Dams_Contracted_to_herd'] * 2

# Merge summary herd stats
Contract_summary = pd.merge(left=Contract_dams_Herd_sire, right=Contract_dams_herd, how='outer', left_on='HERD_Dam', right_on='HERD_Dam')

# Merge in herd stats with the sales figures from IT
Contract_Credit = pd.merge(left=Credit, right=Contract_summary, how='outer', left_on=['HERD_Dam', 'Sire'], right_on=['HERD_Dam', 'Sire'])

# Replace any sires who were contracted but have not been purchased to 0
Contract_Credit['Total DIY Quantity purchased'] = Contract_Credit['Total DIY Quantity purchased'].fillna(0)
# Replace any sires who were contracted but have no value to 0
Contract_Credit['Average cost per straw Amount'] = Contract_Credit['Average cost per straw Amount'].fillna(0)

# Calculate the maximum number of straws to credit based on purchases
def potential_straw_credit(x):
    if x['Max_number_of_straws_to_credit_per_sire'] >= x['Total DIY Quantity purchased']:
        return x['Total DIY Quantity purchased']
    else:
        return x['Max_number_of_straws_to_credit_per_sire']

Contract_Credit['Potential_Straws_to_credit'] = Contract_Credit.apply(potential_straw_credit, axis=1)

# Sort by the average cost of the straw per herd so that you end up crediting for the more expensive ones first
Contract_Credit = Contract_Credit.sort_values(by=['HERD_Dam', 'Average cost per straw Amount'], ascending=False).reset_index(drop=True)

# Cumulative sum the straws that were purchased off what needs to be credited
Contract_Credit['cumsum_herd'] = Contract_Credit.groupby(['HERD_Dam'])['Potential_Straws_to_credit'].cumsum()
Contract_Credit['previous'] = Contract_Credit.groupby(['HERD_Dam'])['cumsum_herd'].shift(1).fillna(0)

# Function to work out should the purchase be credited - any minus will be blanked as enough has been credited
def Cumsum(x):
    if x['cumsum_herd'] <= x['Max_number_of_straws_to_credit_per_herd']:
        return x['Potential_Straws_to_credit']
    elif x['cumsum_herd'] > x['Potential_Straws_to_credit']:
        return x['Potential_Straws_to_credit'] - x['previous']
    else:
        return 0

Contract_Credit['Straws_to_credit'] = Contract_Credit.apply(Cumsum, axis=1)
Contract_Credit['Straws_to_credit'] = np.where(Contract_Credit['Straws_to_credit'] < 0, 0, Contract_Credit['Straws_to_credit'])

# Calculate the amount to credit
Contract_Credit['Amount_to_credit'] = Contract_Credit['Straws_to_credit'] * Contract_Credit['Average cost per straw Amount']

# Summarize the total credit amount per herd
Contract_Credit_Per_Herd = Contract_Credit.groupby(['HERD_Dam']).agg({'Amount_to_credit': ['sum']}).reset_index()
Contract_Credit_Per_Herd.columns = ['HERD_Dam', 'Total_Credit_Amount (€)']

# Output final DataFrame to Power BI
result = Contract_Credit_Per_Herd

lbendlin · ‎07-08-2024

Please show the Power Query steps before and after the Python script step.

COS019970 · ‎07-09-2024

CombinedData TableCombinedData TableContract Mating Invoiced Sales TableContract Mating TableCombinedData Table

lbendlin · ‎07-09-2024

What's the source of the CombinedData table? You may have to run the script in each of the individual sources.

COS019970 · ‎07-09-2024

We ended up getting what we needed by recreating what the script was doing by just implementing the steps in Power BI but thanks anyway for the help!

Anonymous · ‎07-10-2024

Hi,@COS019970

We are very glad to know that the issue has been resolved. If you wish, consider accepting your solution as a solution that will also benefit other community members who have the same problem as you and find a solution faster.

Of course, if there is anything else we can do for you, please do not hesitate to contact us.

Looking forward to your reply.

Best Regards,

Leroy Lu

Shannui · ‎07-05-2024

Thanks for your sharing, but I have a question
How can I change your data source of my report with "DirectQuery connection" on service by local source (may be code), due to I want to reuse my report on local.
Thanks your response!

foodd · ‎07-06-2024

Hello @Shannui , please create a new post with your question, instead of replying to a post that has been answered by @lbendlin , and where your question is unrelated to the Original Post.

lbendlin · ‎07-05-2024

You need to reference a prior step in your Power Query that acts as a sort of dataframe. And you are required to return something at the end, ideally in a similar table format.

COS019970 · ‎07-08-2024

Thanks for your reply that is what chatgpt was telling me to do and was trying to follow it but was not working. Could you tell me how to do this please?

lbendlin · ‎07-08-2024

how to do what? There is a standard process where Power Query uses one of its connectors to acquire the raw data from an initial source. You seem to be attempting to circumvent that process and use your own data acquisition. What made you choose that approach?

How do I change local source files within python script to reference Power BI query tables

Helpful resources

Join our Fabric User Panel

Power BI Monthly Update - February 2026

FabCon Atlanta 2026

How to Get Your Question Answered Quickly

FabCon is coming to Atlanta