Solved: Incremental refresh causing "There weren't enough ...

Pojzon · ‎10-16-2024

Hi!

I'm having problem with setting up incremental refresh in one table. PowerQuery steps are as follows:

let
Źródło = SharePoint.Files("sharepoint location", [ApiVersion = 15]),
#"Ścieżka Faktury ING Faktoring" = Table.SelectRows(Źródło, each Text.Contains([Folder Path], "Faktury ING Faktoring")),
#"Rozszerzenie .pdf" = Table.SelectRows(#"Ścieżka Faktury ING Faktoring", each ([Extension] = ".pdf")),
#"Nazwa zawiera ""-ZIN-""" = Table.SelectRows(#"Rozszerzenie .pdf", each Text.Contains([Name], "-ZIN-")),
#"Rozwinięty element Attributes" = Table.ExpandRecordColumn(#"Nazwa zawiera ""-ZIN-""", "Attributes", {"Size"}, {"Attributes.Size"}),
#"Posortowano wiersze" = Table.Sort(#"Rozwinięty element Attributes",{{"Attributes.Size", Order.Descending}}),
#"Zmieniono typ" = Table.TransformColumnTypes(#"Posortowano wiersze",{{"Attributes.Size", Int64.Type}}),
#"Przefiltrowano wiersze" = Table.SelectRows(#"Zmieniono typ", each [Attributes.Size] > 69000),
#"Usunięto inne kolumny" = Table.SelectColumns(#"Przefiltrowano wiersze",{"Content", "Name", "Date modified"}),
#"Filtrowane pliki ukryte1" = Table.SelectRows(#"Usunięto inne kolumny", each [Attributes]?[Hidden]? <> true),
#"Wywołaj funkcję niestandardową1" = Table.AddColumn(#"Filtrowane pliki ukryte1", "Przekształć plik (5)", each #"Przekształć plik (5)"([Content])),
#"Usunięto inne kolumny1" = Table.SelectColumns(#"Wywołaj funkcję niestandardową1",{"Name", "Date modified", "Przekształć plik (5)"}),
#"Rozwinięto kolumnę tabeli1" = Table.ExpandTableColumn(#"Usunięto inne kolumny1", "Przekształć plik (5)", Table.ColumnNames(#"Przekształć plik (5)"(#"Przykładowy plik (5)"))),
#"Przefiltrowano wiersze1" = Table.SelectRows(#"Rozwinięto kolumnę tabeli1", each ([Column3] = "Razem")),
#"Usunięto inne kolumny2" = Table.SelectColumns(#"Przefiltrowano wiersze1",{"Name", "Date modified", "Column3", "Column6"}),
#"Zmieniono typ z ustawieniami regionalnymi" = Table.TransformColumnTypes(#"Usunięto inne kolumny2", {{"Column6", type number}}, "de-DE"),
#"Kolumna przestawna" = Table.Pivot(#"Zmieniono typ z ustawieniami regionalnymi", List.Distinct(#"Zmieniono typ z ustawieniami regionalnymi"[Column3]), "Column3", "Column6", List.Sum),
#"Przefiltrowano wiersze2" = Table.SelectRows(#"Kolumna przestawna", each [Date modified] >= RangeStart and [Date modified] < RangeEnd)
in
#"Przefiltrowano wiersze2"

What it does is that it's connecting to SP folder with .pdf files, it does some row filtering to find out if file has 2 pages or just 1, then expands Content of those files with data only from 2nd pages. Final step table looks like this, very simple:

I realize that RangeStart/RangeEnd should probably be somewhere at the beginning of query, but this is just for the sake of testing.

It works fine with Incremental Refresh turned off, but the moment I turn it on for example 1 year archive, 3 months refresh there is a problem.

I publish it and I try to refresh model which after like 10 minutes of trying to refresh gives following error:

Data source error: Expression.Error: There weren't enough elements in the enumeration to complete the operation..#table({"Content", "Name"}, {}). 
;The exception was raised by the IDbCommand interface.

I've been looking for answer online and testing different approach in PowerQuery for few days now. I can't figure this out.

One more thing that I don't understand: I did very similar steps for different table which are working fine with Incremental Refresh, the only difference was that Attributes.Size was filtered to be smaller than 69000 (instead of larger) which then allowed me to choose Content data for 1st pages (instead of 2nd) (I know, might not be perfect solution, but it works in this case).

I will appreciate any help, thank you!

lbendlin · ‎10-16-2024

This is not something that you do with Power BI (it is not designed to write back to the data source). You would use tools like Powershell or Power Automate.

View solution in original post

lbendlin · ‎10-16-2024

Please explain the reasoning for using incremental refresh. As you know SharePoint.Files is not folding, and even if it were, your subsequent transforms are breaking folding too. So you will not have any processing time advantage

"Date Modified" is not a usable column for Incremental Refresh. You need an immutable column like "Date Created", otherwise you will have data duplication across partitions.

For general guidance - incremental refresh partitions should have millions of rows to make sense.

Pojzon · ‎10-16-2024

We have folders with invoices on Sharepoint .pdf format since 2022. Invoices don't change, new are just being added each month - that's why I think Date modified would be fine.

With each PBI refresh all data get's refreshed, which gets slower and slower, when in fact it should be only checking newest data, for example last 3 months.

I've watched this youtube guide https://youtu.be/IVMdg16yBKE which says that Incremental refresh can also be used with non-foldable sources.

I've also tried Dataflows which someone suggested in another topic, but honestly I hated it, I couldn't make it work either, it kept adding some automatic steps to my query that I couldn't remove.

lbendlin · ‎10-16-2024

With each PBI refresh all data get's refreshed, which gets slower and slower, when in fact it should be only checking newest data, for example last 3 months.

Only if it folds

which says that Incremental refresh can also be used with non-foldable sources.

That is correct - but you will not get the expected time savings. all you get is partition management.

Pojzon · ‎10-16-2024

In this case what would you suggest to do? Refreshing is starting to take around 12 min right now and it will grow more in time.

lbendlin · ‎10-16-2024

12 minutes is not something you want to waste your energy for to create all the incremental refresh processes. You should start getting concerned when the time is closer to 2.5 hrs.

Run an independent process that extracts the necessary meta data from your PDFs, store that as CSV or Parquet on the same sharepoint, and ingest that.

Pojzon · ‎10-16-2024

Thank you - could you please explain more about creating independent process to store .pdf data as CSV?

I would like to automate this process as much as possible, so when user uploads new .pdf file to SP - PowerBI will automatically do the rest.

I'm not sure if I understand correctly what you propose, but wouldn't that just mean that there will be two reports where I would've to refresh data - one for transfering .pdfs to CSV, second for pulling data from CSV? Isn't that a little counterproductive? How would I save time here when second report would take ~12 min to refresh and then additionally I have to refresh first report as well?

I'm sorry if those are dumb questions, I'm not that experienced.

lbendlin · ‎10-16-2024

This is not something that you do with Power BI (it is not designed to write back to the data source). You would use tools like Powershell or Power Automate.