Skip to main content
cancel
Showing results for 
Search instead for 
Did you mean: 

Enhance your career with this limited time 50% discount on Fabric and Power BI exams. Ends August 31st. Request your voucher.

Reply
Anonymous
Not applicable

Inconsistent CSV Update/Overwrite with Python

Context: my goal is to have two types of Excel files/CSVs:

  • Current: Always have updated information (many files)
  • Ledger: A log of past information from the above files (using date modified as date) (one file)

I got this to happen once by doing the following:

  1. Checked out this site: https://www.thebiccountant.com/2018/12/29/export-data-from-power-bi-to-csv-using-python/
  2. Pulled in Current files and combined in PowerBI
  3. Pulled in Ledger
  4. In python, checked if Current files rows were in Ledger already
    1. If they weren't, added to Ledger in PowerBI (not yet on my computer)
  5. Wrote Ledger to same filename and filepath as original Ledger (overwrite, now on my computer) using similar powerQuery and python code as the link above.

 

I know I got it to happen because my Ledger originally only had one Test row, and it now additionally has every Current row.

 

HOWEVER, it will not happen fully again. Here is what's up:

  1. I change one line of one file in Current files
  2. This DOES change that line from my direct PowerBI pull. Current Files combined is updated
  3. This DOES NOT change the output of my python code, the Ledger
  4. Therefore, the Ledger on my computer is overwritten by the same file

 

Here is my python code for reference, with some extra notes:

 

import pandas as pd
import numpy as np
import datetime

log = dataset

def un_id(series):
Returns the Unique ID of a row return ((str(series["Therapist"]) + "_" + str(series["Agency"])+ "_" + str(series["Provider Office"])).replace(" - ", "-").replace(" ", "_"))
Causes the log (Ledger) table and the prov (Current combined) table to both have IDs log.loc[:, "ID"] = log.apply(un_id, 1) prov.loc[:, "ID"] = prov.apply(un_id, 1)
Dates were taken in as strings, changes them to timestamps log.loc[:, "Date"] = log["Date"].map(pd.to_datetime) prov.loc[:, "Date"] = log["Date"].map(pd.to_datetime)
Checks if a row [in prov/Current combined] is in Ledger. Returns the series if it is def not_in_log(series): #If series is not in log, returns series. #Else, returns nothing.
Finds rows in Ledger with the same Unique ID as in prov/Current combined matches = log[log["ID"] == series["ID"]]
If none of these rows has the same date as the prov/Current combined Date, returns that row if series["Date"] not in list(matches["Date"]): return series
Goes through all prov/Current combined rows. Returns the ones that are not in Ledger new_rows = prov.apply(not_in_log, 1) new_rows = new_rows.dropna(how="all") new_rows = pd.DataFrame(new_rows)
Adds new rows to the Ledger log = log.append(new_rows, ignore_index = True)
Makes sure the Ledger has the right column names log = log.loc[:, list(prov.columns)]

Thanks in advance for the help!

2 REPLIES 2
v-juanli-msft
Community Support
Community Support

Hi @Anonymous 

It seems the first goal is achieved with your code.

  • Current: Always have updated information (many files)
  • Ledger: A log of past information from the above files (using date modified as date) (one file)

The second one means to store historic data in files, these historic files should not override.

If so, please refer to a brilliant article: Storing and using information from a dynamic data source using PBI desktop.

 

Best Regards
Maggie
Community Support Team _ Maggie Li
If this post helps, then please consider Accept it as the solution to help the other members find it more quickly.

 

Anonymous
Not applicable

Thanks for the pointer!

 

I could certainly run my python in a script that I time using task scheduler; in this scenario, I wouldn't even need to go through PowerBI for the updates.

 

However, because of the data hosting/security context I'm in, it is preferable that everything stays in PowerBI (and OneDrive, where the files are stored). 

 

I'm confused because I was able to get the process to happen once (the initial ledger "fill") but it won't happen a second time to udpate.

 

I've also gotten this message which I don't know how to fix, and I think it has something to do with security (filepath removed for privacy):

 

DataSource.Error: ADO.NET: Python script error.
Traceback (most recent call last):
  File "PythonScriptWrapper.PY", line 17, in <module>
    dataset.to_csv(path, mode = 'w', index = None, header = True, quoting = csv.QUOTE_ALL, chunksize = None, decimal= '.')
  File "C:\Users\kwilliams\AppData\Local\Programs\Python\Python37\lib\site-packages\pandas\core\generic.py", line 3228, in to_csv
    formatter.save()
  File "C:\Users\kwilliams\AppData\Local\Programs\Python\Python37\lib\site-packages\pandas\io\formats\csvs.py", line 183, in save
    compression=self.compression,
  File "C:\Users\kwilliams\AppData\Local\Programs\Python\Python37\lib\site-packages\pandas\io\common.py", line 399, in _get_handle
    f = open(path_or_buf, mode, encoding=encoding, newline="")
PermissionError: [Errno 13] Permission denied: 'FILEPATH/Provider_Log.csv'

Details:
    DataSourceKind=Python
    DataSourcePath=Python
    Message=Python script error.
Traceback (most recent call last):
  File "PythonScriptWrapper.PY", line 17, in <module>
    dataset.to_csv(path, mode = 'w', index = None, header = True, quoting = csv.QUOTE_ALL, chunksize = None, decimal= '.')
  File "C:\Users\kwilliams\AppData\Local\Programs\Python\Python37\lib\site-packages\pandas\core\generic.py", line 3228, in to_csv
    formatter.save()
  File "C:\Users\kwilliams\AppData\Local\Programs\Python\Python37\lib\site-packages\pandas\io\formats\csvs.py", line 183, in save
    compression=self.compression,
  File "C:\Users\kwilliams\AppData\Local\Programs\Python\Python37\lib\site-packages\pandas\io\common.py", line 399, in _get_handle
    f = open(path_or_buf, mode, encoding=encoding, newline="")
PermissionError: [Errno 13] Permission denied: FILEPATH/Provider_Log.csv'

    ErrorCode=-2147467259
    ExceptionType=Microsoft.PowerBI.Scripting.Python.Exceptions.PythonScriptRuntimeException

Helpful resources

Announcements
July PBI25 Carousel

Power BI Monthly Update - July 2025

Check out the July 2025 Power BI update to learn about new features.

Join our Fabric User Panel

Join our Fabric User Panel

This is your chance to engage directly with the engineering team behind Fabric and Power BI. Share your experiences and shape the future.

June 2025 community update carousel

Fabric Community Update - June 2025

Find out what's new and trending in the Fabric community.