Autoencoder Dimensionality Reduction for Data Scie...

Sahir_Maharaj · ‎12-15-2025

There comes a point in your work as a data professional where your dataset stops feeling like a simple table and starts feeling like an entire universe. Features pile up, columns grow wider, and you’re left wondering which ones actually matter and which are just pretending to be useful. I remember hitting this point early in my journey, staring at rows of numbers that didn’t feel like they belonged together. Maybe you’ve felt that too. It’s that moment when your models slow down and you start craving something that can help you extract meaning without drowning in the noise. Traditional techniques still have their place, but sometimes they feel too rigid for the messy reality of real-world data. That’s when autoencoders start becoming interesting because they don’t just shrink your data, they learn it.

What you will learn: In this edition, we're exploring autoencoders and why they’re such a powerful way to reduce dimensionality when your data starts getting a little too wide. By the time you’re through it, you’ll understand how autoencoders stack up against traditional techniques and why they often capture the deeper patterns those older methods miss. You’ll also get a feel for how these models actually learn, step by step, as they compress and reconstruct your data.

Read Time: 8 minutes

Source: Sahir Maharaj (https://sahirmaharaj.com)

If you’ve ever looked at a wide dataset and thought something like, “There is no way all of this is meaningful,” then you’re already halfway to understanding why autoencoders exist. They are neural networks designed to recreate their input, which sounds simple until you realize they must understand the data in order to reproduce it accurately. The bottleneck layer is where the real learning happens because the network is forced to compress your data into a much smaller, more meaningful representation. And because this compression is learned rather than manually handcrafted, the results often feel surprisingly insightful.

From my own experience, autoencoders have been especially helpful in situations where PCA just flattened the nuance out of the data. PCA tries to fit straight lines through curved realities, while autoencoders bend and shape themselves around whatever structure actually exists. They adapt. They refine. They learn. When the relationships inside your data are nonlinear or unpredictable, autoencoders don’t resist or break; they simply keep training until they understand what matters. Over time, they begin shaping themselves around your dataset’s hidden structure.

import torch
import torch.nn as nn
import torch.optim as optim
import numpy as np
import matplotlib.pyplot as plt

torch.manual_seed(42)
data = np.random.rand(800, 20).astype(np.float32)
tensor_data = torch.tensor(data)

class Autoencoder(nn.Module):
    def __init__(self):
        super().__init__()
        self.encoder = nn.Sequential(
            nn.Linear(20, 32),
            nn.ReLU(),
            nn.Linear(32, 16),
            nn.ReLU(),
            nn.Linear(16, 6),
            nn.ReLU()
        )
        self.decoder = nn.Sequential(
            nn.Linear(6, 16),
            nn.ReLU(),
            nn.Linear(16, 32),
            nn.ReLU(),
            nn.Linear(32, 20),
            nn.Sigmoid()
        )
    def forward(self, x):
        encoded = self.encoder(x)
        decoded = self.decoder(encoded)
        return encoded, decoded

model = Autoencoder()
optimizer = optim.Adam(model.parameters(), lr=0.001)
loss_fn = nn.MSELoss()

for epoch in range(60):
    encoded, decoded = model(tensor_data)
    loss = loss_fn(decoded, tensor_data)
    optimizer.zero_grad()
    loss.backward()
    optimizer.step()

encoded_output, decoded_output = model(tensor_data)
encoded_np = encoded_output.detach().numpy()

plt.figure(figsize=(7,5))
plt.scatter(encoded_np[:, 0], encoded_np[:, 1], alpha=0.6)
plt.title("Autoencoder Compressed Embedding Space")
plt.xlabel("Dimension 1")
plt.ylabel("Dimension 2")
plt.grid(True)
plt.show()

print(encoded_np[:5])
print(decoded_output[:5])

One of the things people appreciate most is that autoencoders don't assume anything about your data. They don't impose structure; they discover it. Moving from a mindset of “I need to force my data into something PCA can understand” to “I’m going to let an autoencoder reveal what matters” is a genuinely refreshing shift. You stop thinking about discarding features and start thinking about revealing the true essence underneath everything. The compression becomes more like uncovering a distilled version of your data instead of simply shrinking it.

Another important detail is how autoencoders naturally capture interactions you might not notice. For example, three weak features independently might form a strong signal when combined. PCA would struggle to capture this, but an autoencoder can learn it because it focuses on reconstructive quality. I’ve seen this happen especially in behavioural datasets where meaning emerges only when certain patterns overlap. You would never spot that manually, and linear tools would miss it, but the autoencoder learns it without being told.

Source: Sahir Maharaj (https://sahirmaharaj.com)

What I also appreciate is how gracefully autoencoders scale. As your dataset grows or changes shape, you can adjust the architecture to match it. Add layers, change activation functions, tweak the bottleneck size. Instead of a fixed method you outgrow, the tool evolves with your problem. The autoencoder becomes more like a collaborative partner in your analysis, adjusting itself as you explore deeper and more complex patterns in your data.

Before autoencoders were widely used, PCA was the main technique everyone reached for. And it still works very well in the right situations. It is clean, mathematical, and wonderfully fast. But PCA has one major limitation you start noticing once your data becomes less polished. It only works with linear relationships. So if your patterns twist, curl, cluster strangely, or behave unpredictably in high dimensions, PCA quietly breaks and gives you something that looks unreasonable.

Source: Sahir Maharaj (https://sahirmaharaj.com)

Autoencoders feel more flexible because they learn instead of calculate. They figure out the nonlinear relationships naturally, without you having to do anything special. I’ve had several cases where PCA completely flattened the story but the autoencoder preserved the subtle relationships that actually mattered. Using an autoencoder felt like upgrading from a simple outline sketch to a fully detailed painting.

When you step back and compare the two methods, the difference becomes philosophical. PCA simplifies while autoencoders interpret. PCA projects while autoencoders learn. PCA is static while autoencoders evolve through training. If your data is simple and you need speed, PCA is perfectly fine. But as complexity enters the picture, autoencoders are the tool that reveal the richness rather than hide it.

import torch
import torch.nn as nn
import torch.optim as optim
import numpy as np
from sklearn.decomposition import PCA
import matplotlib.pyplot as plt

torch.manual_seed(7)
data = np.random.normal(size=(1000, 30)).astype(np.float32)
tensor_data = torch.tensor(data)

pca = PCA(n_components=6)
pca_result = pca.fit_transform(data)
pca_var = pca.explained_variance_ratio_

plt.figure(figsize=(8,4))
plt.bar(range(1,7), pca_var)
plt.title("PCA Explained Variance")
plt.xlabel("Principal Component")
plt.ylabel("Variance Ratio")
plt.grid(True)
plt.show()

class AE(nn.Module):
    def __init__(self):
        super().__init__()
        self.encoder = nn.Sequential(
            nn.Linear(30, 40),
            nn.ReLU(),
            nn.Linear(40, 20),
            nn.ReLU(),
            nn.Linear(20, 6),
            nn.ReLU()
        )
        self.decoder = nn.Sequential(
            nn.Linear(6, 20),
            nn.ReLU(),
            nn.Linear(20, 40),
            nn.ReLU(),
            nn.Linear(40, 30),
            nn.Sigmoid()
        )
    def forward(self, x):
        encoded = self.encoder(x)
        decoded = self.decoder(encoded)
        return encoded, decoded

model = AE()
optimizer = optim.Adam(model.parameters(), lr=0.001)
loss_fn = nn.MSELoss()

for epoch in range(60):
    encoded, decoded = model(tensor_data)
    loss = loss_fn(decoded, tensor_data)
    optimizer.zero_grad()
    loss.backward()
    optimizer.step()

encoded_auto, decoded_auto = model(tensor_data)
encoded_auto_np = encoded_auto.detach().numpy()

print(pca_result[:3])
print(encoded_auto_np[:3])
print(pca_var)

There’s also something interesting about how each method treats variance. PCA assumes variance equals meaning, which is not always true. Some high-variance patterns are just noise. Autoencoders instead preserve what improves reconstruction and discard what doesn’t. I’ve had datasets where PCA amplified meaningless volatility, while the autoencoder filtered it out naturally. That alone can be a game changer in many analysis scenarios.

Another big difference is how each method responds to tweaking. PCA is a one-and-done tool. You get your components and that’s it. Autoencoders, on the other hand, invite experimentation. You can try deeper models, try smaller bottlenecks, add regularisation, or change activation functions. The method grows with you. And that flexibility is incredibly reassuring when you’re working with real datasets that don't behave neatly.

Source: Sahir Maharaj (https://sahirmaharaj.com)

The training process is often the moment when autoencoders make the most sense. The network takes your input and tries to recreate it, failing miserably at first. But with each training cycle, it nudges itself closer to understanding what matters. Slowly, it figures out which patterns help reconstruction and which are irrelevant. This gradual improvement is where the model starts building intuition about your data.

As training continues, the bottleneck layer becomes the compressed heart of the model. Anything crucial for reconstruction gets preserved. Anything redundant gets left behind. Over time, this produces a compact, meaningful representation of your dataset. It isn’t manually engineered; it emerges through learning. That alone makes autoencoders feel like an entirely different category of tool.

The part that feels the most relatable is how similar the process is to human memory. You don’t store every detail. You store meaning, essence, structure. Autoencoders do the same. They summarise instead of memorising. And by the time training stabilises, the compressed representation carries the true shape of your data in a smaller, smarter form.

import torch
import torch.nn as nn
import torch.optim as optim
import numpy as np
import matplotlib.pyplot as plt

torch.manual_seed(101)
data = np.random.rand(1200, 25).astype(np.float32)
tensor_data = torch.tensor(data)

class DeepAE(nn.Module):
    def __init__(self):
        super().__init__()
        self.encoder = nn.Sequential(
            nn.Linear(25, 50),
            nn.ReLU(),
            nn.Linear(50, 25),
            nn.ReLU(),
            nn.Linear(25, 12),
            nn.ReLU(),
            nn.Linear(12, 4),
            nn.ReLU()
        )
        self.decoder = nn.Sequential(
            nn.Linear(4, 12),
            nn.ReLU(),
            nn.Linear(12, 25),
            nn.ReLU(),
            nn.Linear(25, 25),
            nn.Sigmoid()
        )
    def forward(self, x):
        encoded = self.encoder(x)
        decoded = self.decoder(encoded)
        return encoded, decoded

model = DeepAE()
optimizer = optim.Adam(model.parameters(), lr=0.001)
loss_fn = nn.MSELoss()

losses = []

for epoch in range(70):
    encoded, decoded = model(tensor_data)
    loss = loss_fn(decoded, tensor_data)
    optimizer.zero_grad()
    loss.backward()
    optimizer.step()
    losses.append(loss.item())

encoded_output, _ = model(tensor_data)
encoded_np = encoded_output.detach().numpy()

plt.figure(figsize=(10,4))
plt.plot(losses)
plt.xlabel("Epoch")
plt.ylabel("Loss")
plt.title("Training Loss Curve")
plt.grid(True)
plt.show()

plt.figure(figsize=(6,6))
plt.imshow(encoded_np[:50], cmap="plasma", aspect="auto")
plt.title("Heatmap of Bottleneck Representations")
plt.xlabel("Compressed Dimensions")
plt.ylabel("Samples")
plt.colorbar()
plt.show()

print(encoded_np[:10])

Something I enjoy about training autoencoders is that the model behaves differently depending on how you design it. A larger bottleneck might preserve detail but lose abstraction. A smaller bottleneck forces the model to focus intensely. Changing learning rates, depth, or activation functions all affect how the autoencoder understands your data. Experimenting becomes part of the fun because each change teaches you something new.

Then there’s the moment when you inspect the compressed space. It feels like peeking into the model’s personal notes. Patterns tighten, clusters appear, noise dissolves. Even without building anything on top of it, just exploring the embedding can reveal insights you would not find otherwise. The training becomes more than a computational step. It becomes a guided discovery process.

Source: Sahir Maharaj (https://sahirmaharaj.com)

So if you’ve never tried autoencoders before, consider this your little push. Take an hour this week, spin up a quick experiment in Fabric, and see what the compressed representation of your dataset looks like. You might uncover insights your models have been missing for months. You might streamline your features in a way that simplifies everything from training to visualization. Or you might just enjoy the process of exploring something new and genuinely powerful. Either way, giving autoencoders a chance inside Fabric is a step toward leveling up the way you think, build, and tell stories with your data. And if you ever want a hand or a sounding board for ideas, I’m right here cheering you on.

Thanks for taking the time to read my post! I’d love to hear what you think and connect with you 🙂

Autoencoder Dimensionality Reduction for Data Science in Microsoft Fabric

Advanced Pipelines & Transformers for Data Science...

Autoencoder Dimensionality Reduction for Data Scie...

Get Fabric Data Agents Running in Minutes – Fast, ...

The Data Scientist’s Guide to Model Metrics in Mic...

Designing Intelligent Recommendation Systems for D...

FabCon is coming to Atlanta

Autoencoder Dimensionality Reduction for Data Science in Microsoft Fabric

Advanced Pipelines & Transformers for Data Science...

Autoencoder Dimensionality Reduction for Data Scie...

Get Fabric Data Agents Running in Minutes – Fast, ...

The Data Scientist’s Guide to Model Metrics in Mic...

Designing Intelligent Recommendation Systems for D...