Beginner question on data transformation

denkersmahlzeit · ‎08-08-2023

Dear community,

I have questions concerning data transformation.

Attached is a raw xlsx-file containing public data from the German Federal Statistical Office on student numbers.

It contains data per university (rows) on number of students specified by nationality (three top columns read “Germans” “Foreigners” and “Total”) and sex (“male” “female” “total” for each top column). It starts with data for the winter semester 1998/99 and continues on for each winter semester.

Here is a image snippet to get a first impression of the wide table.

Number of students per university

My goal is to prepare dashboards that visualize the data of specific subsets of universities and compare these subsets in development across time.

My question is:

is it good practice to initially transform this originally wide table in a long table?
would you do this transformation all from within Power Query Editor?
what would be your first steps to transform the data for best use?

Here is the link to the complete table:
student numbers over years

Thank you!

mlsx4 · ‎08-08-2023

Hi @denkersmahlzeit

1) Yes, you must do it. Otherwise, it wouldn't work.

2) I'm not sure if you will be able to achieve everything through Power Query, but almost everything.

3) It's difficult the part of getting the values through years in the way it seems to be structured, but I will do:
(In excel will fill all the row with the year)
- Transpose the table

- Fill down the column 1 to get the nationality

- Promote headers

-Unpivot columns

- Fix the values by: duplicating Atributte comparing with a conditional value to get the year per row, fill again, filter "extra rows" that I don't need, rewrite the attribute by extracting value before delimter and then removing extra columns.

But that's my approach. Maybe there is something that doesn't work.

My "dummie" data is the following:

And my advanced editor code:

let
    Origen = Excel.Workbook(File.Contents("C:\Users\Equipo2\Downloads\ex.xlsx"), null, true),
    Hoja1_Sheet = Origen{[Item="Hoja3",Kind="Sheet"]}[Data],
    #"Tabla transpuesta" = Table.Transpose(Hoja1_Sheet),
    #"Rellenar hacia abajo" = Table.FillDown(#"Tabla transpuesta",{"Column1"}),
    #"Encabezados promovidos" = Table.PromoteHeaders(#"Rellenar hacia abajo", [PromoteAllScalars=true]),
    #"Tipo cambiado" = Table.TransformColumnTypes(#"Encabezados promovidos",{{"Semester", type text}, {"Hoschulen", type text}, {"1989", Int64.Type}, {"HS1240", Int64.Type}, {"HS1241", Int64.Type}, {"HS1242", Int64.Type}, {"HS1243", Int64.Type}, {"HS1244", Int64.Type}, {"1990", Int64.Type}, {"HS1240_1", Int64.Type}, {"HS1241_2", Int64.Type}, {"HS1242_3", Int64.Type}, {"HS1243_4", Int64.Type}, {"HS1244_5", Int64.Type}}),
    #"Columnas con nombre cambiado" = Table.RenameColumns(#"Tipo cambiado",{{"1989", "Year"}}),
    #"Columna de anulación de dinamización" = Table.UnpivotOtherColumns(#"Columnas con nombre cambiado", {"Semester", "Hoschulen"}, "Atributo", "Valor"),
    #"Columna duplicada" = Table.DuplicateColumn(#"Columna de anulación de dinamización", "Atributo", "Atributo - Copia"),
    #"Tipo cambiado1" = Table.TransformColumnTypes(#"Columna duplicada",{{"Valor", type text}}),
    #"Columna condicional agregada" = Table.AddColumn(#"Tipo cambiado1", "Personalizado", each if [#"Atributo - Copia"] = "Year" then [Valor] else if [#"Atributo - Copia"] = [Valor] then [Valor] else null),
    #"Rellenar hacia abajo1" = Table.FillDown(#"Columna condicional agregada",{"Personalizado"}),
    #"Filas filtradas" = Table.SelectRows(#"Rellenar hacia abajo1", each ([#"Atributo - Copia"] <> "1990" and [#"Atributo - Copia"] <> "Year")),
    #"Texto extraído antes del delimitador" = Table.TransformColumns(#"Filas filtradas", {{"Atributo - Copia", each Text.BeforeDelimiter(_, "_"), type text}}),
    #"Columnas quitadas" = Table.RemoveColumns(#"Texto extraído antes del delimitador",{"Atributo"}),
    #"Columnas con nombre cambiado1" = Table.RenameColumns(#"Columnas quitadas",{{"Atributo - Copia", "Universitat"}, {"Valor", "Total"}})
in
    #"Columnas con nombre cambiado1"

denkersmahlzeit · ‎08-09-2023

Thank you for your reply! I learned a lot while following what you did.

However, when I follow your example there are now two columns which contain for each row years (1989 in "Year" and 1990 in "Personalizado"), but only one column with Total amount.
So it seems that I cannot tell the values for each year apart anymore. And in the original data there a many more years than just two.

Is there a way to put all years into one column in continuous order?

mlsx4 · ‎08-09-2023

Hi @denkersmahlzeit

Yes... I know. Is there any possibility I can get a link with the code? The other one didn't work

denkersmahlzeit · ‎08-09-2023

Hi, I updated the link, now it should contain the original data! Thanks for taking a look.

Beginner question on data transformation

Helpful resources

Power BI Monthly Update - September 2025

Fabric Community Update - August 2025

How to Get Your Question Answered Quickly

FabCon is coming to Atlanta

Beginner question on data transformation

Helpful resources

Power BI Monthly Update - September 2025

Fabric Community Update - August 2025

How to Get Your Question Answered Quickly