Mastering Advanced Pandas for Data Science in Micr...

Sahir_Maharaj

If you’ve worked with data long enough, you know that the real magic happens after the data is loaded. Loading a CSV file is easy. Writing a quick filter is simple. But when the business asks for “that one metric, broken down by product, adjusted for seasonality, and excluding anomalies,” suddenly the comfortable world of basic Pandas operations feels… inadequate.

We've all been there at some point... staring at a DataFrame with 50 columns and millions of rows, thinking, "Where do I even start?" Pandas has so much depth that it’s possible to work with it for years and still stumble upon new tricks that completely change the way you manipulate data.

What you will learn: In this edition, I will take you into a deeper layer of Pandas. By the time you’re done, you’ll know how to reshape your data with precision, using tools like melt() and pivot_table() to get it into exactly the structure you need. You’ll learn how to go beyond basic .groupby() operations, building complex aggregations and transformations that give richer insights without losing important detail. And because clean, maintainable code matters just as much as correct results, we’ll wrap it all together with method chaining and .pipe() so your transformations read like a clear story from start to finish.

Read Time: 8 minutes

Source: Sahir Maharaj

Let’s start with something that almost every data professional has encountered: data in the wrong shape. You receive a dataset from finance, but instead of having a “Month” column, you have separate columns for January, February, and March. For quick checks, that’s fine, but for long-term analysis - especially if you need to apply filters, join with other datasets, or create time series - it’s a nightmare. That’s when melt() becomes your best friend.

import pandas as pd

df = pd.DataFrame({
    'Product': ['A', 'B', 'C'],
    'Jan': [100, 150, 200],
    'Feb': [120, 160, 210],
    'Mar': [130, 170, 220]
})

melted = df.melt(
    id_vars='Product',
    var_name='Month',
    value_name='Sales'
)

print(melted)

When I first learned melt(), it instantly solved weeks of messy copy-paste work I used to do in spreadsheets. Instead of reformatting tables manually, I could reshape them in seconds. Once you’ve melted your data, month names become values, which makes it far easier to run time series analysis or merge with other datasets containing month-level data.

Source: Sahir Maharaj

Now let’s pivot the other way. Sometimes you have tidy, long-form data, but your stakeholders want a cross-tab view - one row per product with each month in its own column. That’s when pivot_table() comes in.

pivoted = melted.pivot_table(
    index='Product',
    columns='Month',
    values='Sales',
    aggfunc='sum'
).reset_index()

print(pivoted)

Now that your data is in the right shape, summarization becomes a lot easier. .groupby() is the tool of choice, but I’ve noticed most analysts stick to mean() and sum().

Source: Sahir Maharaj

In reality, .groupby() can be so much more powerful - it can handle multiple aggregations, custom logic, and even operations that don’t reduce the number of rows.

sales_summary = melted.groupby('Product').agg(
    Total_Sales=('Sales', 'sum'),
    Average_Sales=('Sales', 'mean')
).reset_index()

print(sales_summary)

This works well, but what if you want to know how many months each product exceeded a certain threshold? That’s when you bring in a lambda function.

sales_summary = melted.groupby('Product').agg(
    Total_Sales=('Sales', 'sum'),
    Average_Sales=('Sales', 'mean'),
    High_Sales_Months=('Sales', lambda x: (x > 150).sum())
).reset_index()

print(sales_summary)

One of my go-to techniques here is using .transform() instead of .agg() when I want to add calculated columns without losing the row-level detail.

Source: Sahir Maharaj

This is incredibly useful for adding things like percentage contributions or ranks. With this, every row now tells me what share it contributes to its product’s total sales. That kind of contextual insight is something stakeholders love to see in dashboards.

melted['Percent_of_Total'] = (
    melted.groupby('Product')['Sales']
          .transform(lambda x: x / x.sum())
)

print(melted)

Having covered reshaping and grouping, it’s time to put them together in a way that’s maintainable. Early in my career, I used to create 5 - 6 intermediate variables every time I transformed a dataset. It worked… but the code was COMPLEX and hard to follow. Method chaining completely changed that for me - it lets you transform data in a clean, readable sequence.

final_df = (
    df.melt(id_vars='Product', var_name='Month', value_name='Sales')
      .assign(Sales_Tax=lambda x: x['Sales'] * 0.15)
      .query('Sales > 120')
      .sort_values(by='Sales', ascending=False)
      .reset_index(drop=True)
)

print(final_df)

Reading this is like reading a sentence: melt the data, add a tax column, filter out low sales, sort by sales. No clutter. No intermediate variables.

Source: Sahir Maharaj

Sometimes, though, I need a custom transformation that doesn’t fit neatly into Pandas’ built-in methods. That’s where .pipe() shines - it lets you send the DataFrame to your own function without breaking the chain.

def add_discount(df, discount_rate):
    df['Discounted_Sales'] = df['Sales'] * (1 - discount_rate)
    return df

final_df = (
    df.melt(id_vars='Product', var_name='Month', value_name='Sales')
      .pipe(add_discount, discount_rate=0.10)
      .query('Discounted_Sales > 100')
      .sort_values('Discounted_Sales', ascending=False)
)

print(final_df)

And this is exactly how I keep my analysis modular - easy to read, easy to maintain, and easy to scale for larger datasets. I find that the beauty of Pandas is that it grows with you.

Source: Sahir Maharaj

The same library you used for your first .head() call can also power the most intricate, production-level data transformations you’ll ever need. Once you understand reshaping, advanced grouping, and chaining operations, you stop thinking of Pandas as just a tool and start seeing it as your personal data canvas.

So... here’s my challenge for you: take a messy dataset from your own work and apply all three techniques from this edition using Microsoft Fabric. Melt it, pivot it, aggregate it, and wrap it all into a method chain. The more you practice, the more easier these steps become.

Thanks for taking the time to read my post! I’d love to hear what you think and connect with you 🙂

sazidthe1

Love this hands-on demonstration of using Pandas in Microsoft Fabric! Thank you for the contribution, @Sahir_Maharaj!

Sahir_Maharaj

Thanks so much, @sazidthe1 🙂 I'm glad this was useful.

Appreciate you stopping by!

Mastering Advanced Pandas for Data Science in Microsoft Fabric

Re: Mastering Advanced Pandas for Data Science in ...

Why Data Scientists Should Start Using Microsoft F...

Re: Mastering Advanced Pandas for Data Science in ...

Mastering Advanced Pandas for Data Science in Micr...

Join us at FabCon Vienna from September 15-18, 2025

Mastering Advanced Pandas for Data Science in Microsoft Fabric

Re: Mastering Advanced Pandas for Data Science in ...

Why Data Scientists Should Start Using Microsoft F...

Re: Mastering Advanced Pandas for Data Science in ...

Mastering Advanced Pandas for Data Science in Micr...