Skip to main content
cancel
Showing results for 
Search instead for 
Did you mean: 

Find everything you need to get certified on Fabric—skills challenges, live sessions, exam prep, role guidance, and more. Get started

Reply
Laurie_H
Frequent Visitor

Cleaning Messy Data

I am tasked with creating reports showing visuals pertaining to our resource planner spread sheet, however, it's a jumble of merged columns and data in rows where I'd expect it to be in the column or vice versa. Very readable while looking at it in excel, a bit of a pain to easily use in PowerQuery/Bi. 

I've attached a picture of an example of the format. The reports will need to see things like hours per client, per department, per employee, NC ( non-chargeable ) or not. 

I've mostly dealt with quite straight-forward data so am unsure of the full functionality and features of Power Query to do this; I've had a few ideas like filling down the name column ( name of enployee), promoting headers, unpivoting columns? Transposing rows and columns? But haven't quite managed the desired result so any ideas I can try or any guidance would be helpful, even if it means applying a range of slicers/filters to the reports to get the desired outcome. resourceplanner3.PNG

1 ACCEPTED SOLUTION
jgeddes
Super User
Super User

You have all the right ideas to clean this data.
Unpivoting the date columns is critical to achieving your desired goal. The issue in this spreadsheet is the lack of a usable date in each column that can be used in the unpivot process. 

 

There are likely a few ways to tackle cleaning this data but I was doing it I would;


1) Remove blank rows

2) Promote Headers so the Name, Client, Task Type, Department, and Comments are headers. All other columns should be Column 6 ... Column X

3) Rename the Column 6 ... Column X with the coresponding Date Value

4) Fill Down the Employee Names

5) Replace null values in the Client Columns with "Daily Total"

6) Unpivot Other Columns (selecting Name, Client, Task Type, Department and Comments as the pivot columns)

7) Change column data types to match expected

 

From there you should be able to create any of the visuals you need.

 

The magic happens in step 3 and involves creating and manipulating nested tables.

 

In order to get a list that can be used to rename the columns you would have to;

a) take the first row of the table (which should be nulls except for the week beginning dates)

b) convert that row to a table

c) fill down the column with the dates

d) remove the rows that still have null in the date column (should be the first 5 columns)

e) group the rows by the date column choosing all rows instead of aggregating the rows

f) you should now have a table of the week beginning dates with column containing a table for each row (nested tables)

g) you need to add an index column to each of the nested tables starting at 0 and incrementing by 1

h) using that index column, create a new date column in each of the nested tables by adding the index value of the row to the date in that row. (Date.AddDays())

i) from each of the nested tables, select the column name and new date columns

j) remove the column of grouped dates leaving you with the nested table column

k) the nested table column and change the columns to text type

l) You can now rename the main table using the table you just created leveraging the List.Zip() function

 

Please feel free to send a direct message if you have additional quetions.

 

 

 

 

 




Did I answer your question? Mark my post as a solution!

Proud to be a Super User!





View solution in original post

3 REPLIES 3
jgeddes
Super User
Super User

You have all the right ideas to clean this data.
Unpivoting the date columns is critical to achieving your desired goal. The issue in this spreadsheet is the lack of a usable date in each column that can be used in the unpivot process. 

 

There are likely a few ways to tackle cleaning this data but I was doing it I would;


1) Remove blank rows

2) Promote Headers so the Name, Client, Task Type, Department, and Comments are headers. All other columns should be Column 6 ... Column X

3) Rename the Column 6 ... Column X with the coresponding Date Value

4) Fill Down the Employee Names

5) Replace null values in the Client Columns with "Daily Total"

6) Unpivot Other Columns (selecting Name, Client, Task Type, Department and Comments as the pivot columns)

7) Change column data types to match expected

 

From there you should be able to create any of the visuals you need.

 

The magic happens in step 3 and involves creating and manipulating nested tables.

 

In order to get a list that can be used to rename the columns you would have to;

a) take the first row of the table (which should be nulls except for the week beginning dates)

b) convert that row to a table

c) fill down the column with the dates

d) remove the rows that still have null in the date column (should be the first 5 columns)

e) group the rows by the date column choosing all rows instead of aggregating the rows

f) you should now have a table of the week beginning dates with column containing a table for each row (nested tables)

g) you need to add an index column to each of the nested tables starting at 0 and incrementing by 1

h) using that index column, create a new date column in each of the nested tables by adding the index value of the row to the date in that row. (Date.AddDays())

i) from each of the nested tables, select the column name and new date columns

j) remove the column of grouped dates leaving you with the nested table column

k) the nested table column and change the columns to text type

l) You can now rename the main table using the table you just created leveraging the List.Zip() function

 

Please feel free to send a direct message if you have additional quetions.

 

 

 

 

 




Did I answer your question? Mark my post as a solution!

Proud to be a Super User!





Appreciate it very much, thank you!

Let me go have a play around, using your advice. I'll be sure to take you up on that offer should I come to any issues. Thanks again.

spinfuzer
Super User
Super User

It's a combination of transposing, fill downs, merging, and unpivoting.  Probably needs a few custom columns to help fill down and potentially grouping as well.  Hard to more detailed help unless we have a data sample and example output.  If you could copy the screenshot you have above and replace the hidden parts with dummy data it would help us identify a solution for you.

Helpful resources

Announcements
Europe Fabric Conference

Europe’s largest Microsoft Fabric Community Conference

Join the community in Stockholm for expert Microsoft Fabric learning including a very exciting keynote from Arun Ulag, Corporate Vice President, Azure Data.

AugPowerBI_Carousel

Power BI Monthly Update - August 2024

Check out the August 2024 Power BI update to learn about new features.

August Carousel

Fabric Community Update - August 2024

Find out what's new and trending in the Fabric Community.

Top Solution Authors
Top Kudoed Authors