Join us at FabCon Atlanta from March 16 - 20, 2026, for the ultimate Fabric, Power BI, AI and SQL community-led event. Save $200 with code FABCOMM.
Register now!The Power BI Data Visualization World Championships is back! Get ahead of the game and start preparing now! Learn more
Hi all,
Losing hair over this one!
I have a dataset some 15 million rows in length (16GB in size), saved into a csv that is stored on SharePoint. It's far from ideal, but it's the nature of this beast I'm afraid, and I don't have the power to change it.
The issue has always been publishing it, or in fact doing any kind of refresh, as it takes forever. However, I recently found out about the trick of adding a row limiting parameter in Power Query that can be switched once uploaded, therefore leveraging the PBI service into handling the long refresh. So I set my Keep Rows to 10,000, publish, then change the paramter from "some" to "all" once it's on the service.
The refresh works fine in desktop, and fine on the service when the paramter is still "some". But once i change it to "all" to begin the refresh of all 15 million rows, at some point I'd get the error "Column does not exist in dataset".
After a lot of banging my head on my screen, I stumbled across a single comment on a post that stated if the Keep Rows step was placed after any steps that reference a column name, it would work fine in desktop, but fail when refreshing on the service. I literally gasped in disbelief and immedietly gave it a go.
It didn't work. Well, it kind of did. I was no longer getting the error about a column not existing. Now, I was getting an error stating "DataFormat.Error: The number of items in the list is too large." and then a whole long list of very very large number PBI says it can support, but which my 15 million rows must be too large.
Is that the case? Is my file too large? The numbers referenced by the error had at least 7 or 8 digits followed by e15 in some cases. And I can't be the only person to have uploaded 15GB of csv.
Any thoughts or suggestions would be massively appreciated. I'm at my wits end.
Solved! Go to Solution.
Hi @EpicTriffid
You're dealing with two separate but related issues:
"Column does not exist in dataset" – This typically happens when the Keep Rows step is placed too early in the query, before Power Query finalizes the column structure (e.g., before Promoted Headers or Changed Type). Power BI Desktop might handle this leniently due to caching and more flexible evaluation, but the Power BI Service has stricter schema validation. When you change the parameter to "All" in the Service, it fails because the columns aren't guaranteed to exist at that point. To fix this, make sure your row filtering step comes after headers are promoted and types are applied.
"DataFormat.Error: The number of items in the list is too large" – This is a hard limit you're hitting. You're trying to load a 16GB CSV file via SharePoint – that's not a sustainable or supported approach. Power BI doesn't load this type of file in a partitioned or indexed way. Instead, it reads it row-by-row through HTTP, which easily breaks when the data volume is that high. Even in Premium capacity, the problem isn't with model size or refresh capacity – it's with the source and how it's being accessed.
What to do instead:
Split the large CSV into multiple smaller files (e.g., by month or year), place them in a SharePoint folder, and use "Combine Files" in Power Query.
If possible, move the data into a proper structured data source – like SQL Server, Azure Data Lake, or even Dataflows.
Consider setting up Incremental Refresh so you're not reloading historical data on every refresh.
Avoid using dynamic row-limit logic (like "Keep Rows" with a parameter) before schema is fully defined.
Bottom line: Power BI Service is not built to handle 15M-row CSVs over SharePoint. It may work in Desktop, but that doesn’t mean it’s production-ready. The real fix is to shift to a better data architecture – not to tweak Power Query steps to squeeze one more refresh through.
If this post helps, then please consider Accepting it as the solution and giving Kudos to help the other members find it more quickly
Hi @EpicTriffid
You're dealing with two separate but related issues:
"Column does not exist in dataset" – This typically happens when the Keep Rows step is placed too early in the query, before Power Query finalizes the column structure (e.g., before Promoted Headers or Changed Type). Power BI Desktop might handle this leniently due to caching and more flexible evaluation, but the Power BI Service has stricter schema validation. When you change the parameter to "All" in the Service, it fails because the columns aren't guaranteed to exist at that point. To fix this, make sure your row filtering step comes after headers are promoted and types are applied.
"DataFormat.Error: The number of items in the list is too large" – This is a hard limit you're hitting. You're trying to load a 16GB CSV file via SharePoint – that's not a sustainable or supported approach. Power BI doesn't load this type of file in a partitioned or indexed way. Instead, it reads it row-by-row through HTTP, which easily breaks when the data volume is that high. Even in Premium capacity, the problem isn't with model size or refresh capacity – it's with the source and how it's being accessed.
What to do instead:
Split the large CSV into multiple smaller files (e.g., by month or year), place them in a SharePoint folder, and use "Combine Files" in Power Query.
If possible, move the data into a proper structured data source – like SQL Server, Azure Data Lake, or even Dataflows.
Consider setting up Incremental Refresh so you're not reloading historical data on every refresh.
Avoid using dynamic row-limit logic (like "Keep Rows" with a parameter) before schema is fully defined.
Bottom line: Power BI Service is not built to handle 15M-row CSVs over SharePoint. It may work in Desktop, but that doesn’t mean it’s production-ready. The real fix is to shift to a better data architecture – not to tweak Power Query steps to squeeze one more refresh through.
If this post helps, then please consider Accepting it as the solution and giving Kudos to help the other members find it more quickly
Hi Rita,
Really sorry to bring this up again, but I have now implemented the PQ combining, and haven't had a chance to see if it works because the "Column does not exist in dataset" error keeps appearing now. I have done what you said, placing my "Keep Rows" argument after "Changed Type", but it keeps giving me the same error on refresh in the PBI Service. These are my current PQ steps:
Any ideas?
Nvm! @Ritaf1983
I was still referencing Source in the ReturnAllRows argument, rather than expand table!
Hi @Ritaf1983,
Thank you so much for the incredibly comprehensive reply. Your explanations were fantastic and codified what I was already thinking!
Happy to help and sorry that there is no simple solution 🙂
Hi @Ritaf1983,
Yup, it's an annoying quirk of being at an institution where they just won't allow use of SQL for datasets like this! Splitting up the source file and recombining in PQ should be a temporary fix until I can get them to sort out their data strorage!
Hi @EpicTriffid,
I would recomend to try follow steps
1. Splitting of data in multiple files and then merge all thiose in PQE.
2. Set data loading 2 files parallel. It ensures that PBI doesn't process/load morethan 2 files at once (To enable this feature Options and settings > Options >Global/Current file > Parallel loading of tables > Change below settings
Thanks,
If you found this solution helpful, please consider giving it a Like👍 and marking it as Accepted Solution✔. This helps improve visibility for others who may be encountering/facing same questions/issues.
Check out the November 2025 Power BI update to learn about new features.
Advance your Data & AI career with 50 days of live learning, contests, hands-on challenges, study groups & certifications and more!
| User | Count |
|---|---|
| 58 | |
| 43 | |
| 41 | |
| 23 | |
| 17 |
| User | Count |
|---|---|
| 190 | |
| 121 | |
| 96 | |
| 65 | |
| 46 |