The ultimate Fabric, Power BI, SQL, and AI community-led learning event. Save €200 with code FABCOMM.
Get registeredEnhance your career with this limited time 50% discount on Fabric and Power BI exams. Ends September 15. Request your voucher.
Hi,
I have a large data set with 300+MB csv file and im trying to cut down refresh times,
what Im trying to do is delimit one of the columns, duplicate and recombine all the
delimited columns with the other associated colums still included.
E.G.
This would be the raw data:
Value Text
100 this is an example
300 this is another example
after delimiting only the text column I would get the following:
Value Text Text1 Text2 Text3 Text4
100 this is an example null
300 this is one more example
and then I want to duplicate this data so I can get total values for each word used like so:
Value Text
100 this
100 is
100 an
100 example
300 this
300 is
300 one
300 more
300 example
I've managed to do this by creating multiple references of the original data source, deleting corresponding columns and then recombining them but with the files being so large and some text values haveing up to 15 words it's taking a LONG amount of time to refresh each individual data source. As I only have one actual data source I presumed that the file would only have to refresh one data source but it is refreshing all of the references which is what is taking so long.
Can anyone help me to speed up this refresh please?
Maybe an alternative way of doing this might work better, any help is much appreciated!
I'm not sure it is faster (processing time), but should eliminate alot of steps
In the query editor, after delimiting your text columns, select all text columns (result of your delimiting), then right click then select "Unpivot Columns". This should give you the following in one swoop:
100 this
100 is
100 an
100 example
300 this
300 is
300 one
300 more
300 example
This is an interesting questioning and it relates to a question I want to search or post... "What is the recommended computer speed (CPU/RAM) to have to run PowerBI.
Murray,
Is this file on a network drive? I had this similar problem because my file was on the organizations shared network drive and it was godly slow to refresh. I said screw it and put the data on my local drive and it refreshes at least 5x faster.
User | Count |
---|---|
56 | |
54 | |
53 | |
47 | |
30 |
User | Count |
---|---|
175 | |
88 | |
69 | |
48 | |
47 |