March 31 - April 2, 2025, in Las Vegas, Nevada. Use code MSCUST for a $150 discount! Early bird discount ends December 31.
Register NowBe one of the first to start using Fabric Databases. View on-demand sessions with database experts and the Microsoft product team to learn just how easy it is to get started. Watch now
How many columns is "too many columns"?
I ask because I have a table that has about 850 columns in SQL. Most of it is relevant to the analysis that is happening in the tabular model? I remember reading somewhere that Power BI likes tall narrow tables versus wide tables. How should one handle importing a table with many many columns?
Many of the columns are userID assigned at a particular step or a date assigned to a particular step... not exclusive to one another.
Solved! Go to Solution.
I've used a data set with a few hundered columns but it gets slow.
I had a very large data file with 25000 column pipe delimited file with the header being over 2mb on it's own.
It killed every program I tried so I ended up using r to process it.
Still took 15mins to load the header in r which wasn't helped by that fact that some header names include a pipe. That's just adding insult to injury.
However I used r to remove columns with a no data or a single value columns.
Finally after profiling the data I made a simple mapping csv that had nice column names and allowed me to make a dataset with just the columns I needed.
Ignoring the original header r produced a nice clean dataset in a few seconds. Magic.
Pivoting is always good with lots of columns as tools like powerbi work better with large numbers of rows than columns.
I have another powerbi doc I use for dynamic reconcillation where I've split the columns into different data types sets and then pivotted them in SQL. This gives me 4 Queries with different data types, lots of rows but it works well. it makes it easy to join 2 server data sets and I can also use a slicer or other visualisations to see which columns have differences.
If you've got lots of dates I recommend you disable the Time Intelligence Auto Date/time options as PowerBI tries to help by createing it's own date dimension for each date field from the min to max dates. This takes up space and will slow things down with so many dates.
See this thread:
In your case, if you run into issues, you could create an Index column or use your UserId column and only import say 200-300 columns into any one table and link them (relate them) all together through your Index column.
There are 140 different date columns and 33 different user columns. Those columns are all integer keys that are related to other dimension tables. As you can imagine, I'm supposed to have 33 inactive relationships between this fact table and my dimUser table as well as 140 inactive relationships between this fact table and my dimDate table.
Was looking for a better way... I thought about using SQL to pivot the values vertically, but I have not reached the breaking point yet.
I've used a data set with a few hundered columns but it gets slow.
I had a very large data file with 25000 column pipe delimited file with the header being over 2mb on it's own.
It killed every program I tried so I ended up using r to process it.
Still took 15mins to load the header in r which wasn't helped by that fact that some header names include a pipe. That's just adding insult to injury.
However I used r to remove columns with a no data or a single value columns.
Finally after profiling the data I made a simple mapping csv that had nice column names and allowed me to make a dataset with just the columns I needed.
Ignoring the original header r produced a nice clean dataset in a few seconds. Magic.
Pivoting is always good with lots of columns as tools like powerbi work better with large numbers of rows than columns.
I have another powerbi doc I use for dynamic reconcillation where I've split the columns into different data types sets and then pivotted them in SQL. This gives me 4 Queries with different data types, lots of rows but it works well. it makes it easy to join 2 server data sets and I can also use a slicer or other visualisations to see which columns have differences.
If you've got lots of dates I recommend you disable the Time Intelligence Auto Date/time options as PowerBI tries to help by createing it's own date dimension for each date field from the min to max dates. This takes up space and will slow things down with so many dates.
March 31 - April 2, 2025, in Las Vegas, Nevada. Use code MSCUST for a $150 discount!
Your insights matter. That’s why we created a quick survey to learn about your experience finding answers to technical questions.
Arun Ulag shares exciting details about the Microsoft Fabric Conference 2025, which will be held in Las Vegas, NV.
User | Count |
---|---|
123 | |
86 | |
73 | |
57 | |
52 |
User | Count |
---|---|
197 | |
133 | |
107 | |
69 | |
65 |