Designing Intelligent Recommendation Systems for D... - Microsoft Fabric Community

skip to main content

There is a moment every data professional knows well... a moment when you open a platform and see a recommendation that feels almost too perfect. You pause for a second and think, how did it know? In that instant, all the technical ideas you work with every day fade to the background. You are not thinking about algorithms or vectors or similarity scores. You simply feel understood by a system that seems to know what you want before you say it. Behind that simple interaction sits a quiet intelligence that learns from behavior, finds patterns, and uses them to guide your next step.

I have always found this fascinating because it shows how powerful data can be when it is not just processed, but interpreted. And whether you are in analytics, engineering, BI, or consulting, you have likely seen places where a bit of personalization could make an entire experience easier and more meaningful. Product suggestions, learning pathways, dashboards that adapt to your role, internal search tools that help you find what you need faster. All of these benefit from the same foundation. So the real question becomes, how do you build something like this?

What you will learn: In this edition, we will explore recommendation systems and break them down in a way that finally feels clear and approachable. You will get a feel for how these systems think, how they pick up on subtle user signals, and why they have quietly become the backbone of modern data experiences. And by the time you are done, you will have a solid, practical understanding of how to shape a recommendation model from the ground up (of course, without the complexity that usually scares people off!)

Source: Sahir Maharaj (https://sahirmaharaj.com)

Recommendation systems begin with a simple idea. Predicting what someone might enjoy based on what they have interacted with before. They look at the way people behave, what they read, what they watch, what they buy, what they click on, and even what they scroll past slowly. These tiny signals, each one almost insignificant on its own, become incredibly valuable when viewed together. The system starts to see patterns that feel almost personal, even though they come from data. That is the moment when the experience feels tailored.

What makes recommendation systems even more interesting is the balance of explicit and implicit signals. Explicit signals are obvious, like ratings or reviews. Implicit signals are quieter, like dwell time, revisits, or how often someone scrolls back to a specific item. I have noticed in many projects that implicit signals can be more honest than explicit ones. People may not rate or comment on everything, but they reveal their genuine preferences through behavior. When you aggregate these small hints, the picture becomes surprisingly clear.

Source: Sahir Maharaj (https://sahirmaharaj.com)

As a data scientist, one thing I have repeatedly seen is how much frustration disappears once recommendations are added. People search less. They click less. They navigate less. They simply move faster because the system nudges them toward what matters. This is not about predicting the future. It is about guiding people through large pools of information in a way that feels natural. This perspective changes how you think about data products. You are not just building static dashboards or models but shaping interactions between people and information. When someone feels understood by a system, they engage more, they explore more, and they trust the experience more. I've learned that a good recommendation system creates that feeling.

import pandas as pd
import numpy as np
from sklearn.feature_extraction.text import TfidfVectorizer
from sklearn.metrics.pairwise import cosine_similarity
import matplotlib.pyplot as plt

df = pd.DataFrame({
    "item": [
        "Intro to ML","Deep Learning","Python Analytics","SQL Basics","Statistics",
        "ML for Business","Neural Network Design","Data Visualization","Probability A-Z","Advanced SQL"
    ],
    "description": [
        "machine learning basics algorithms models beginner friendly",
        "deep learning neural networks advanced computational concepts",
        "python analytics data science preprocessing analysis visualization",
        "sql relational databases joins queries reporting essentials",
        "statistics probability inference modeling distributions",
        "machine learning business applications case studies prediction",
        "neural network design architectures parameters deep learning",
        "data visualization charts dashboards python storytelling",
        "probability rules random variables distributions fundamentals",
        "advanced sql window functions optimization relational queries"
    ]
})

vec = TfidfVectorizer()
X = vec.fit_transform(df["description"])
sim = cosine_similarity(X, X)

def recommend_item(item):
    idx = df.index[df["item"] == item][0]
    scores = list(enumerate(sim[idx]))
    scores = sorted(scores, key=lambda x: x[1], reverse=True)[1:6]
    return df.iloc[[i[0] for i in scores]][["item","description"]]

recommend_item("Intro to ML")

plt.figure(figsize=(9,7))
plt.imshow(sim, cmap="viridis")
plt.colorbar()
plt.xticks(range(len(df)), df["item"], rotation=90)
plt.yticks(range(len(df)), df["item"])
plt.title("Content-Based Similarity Matrix")
plt.tight_layout()
plt.show()

u, s, vt = np.linalg.svd(X.toarray())
proj = u[:, :2]

plt.figure(figsize=(8,7))
plt.scatter(proj[:,0], proj[:,1], s=180, c=np.linspace(0,1,len(df)), cmap="cool")
for i, label in enumerate(df["item"]):
    plt.text(proj[i,0]+0.01, proj[i,1]+0.01, label)
plt.title("Content Embedding Projection")
plt.xlabel("Dimension 1")
plt.ylabel("Dimension 2")
plt.grid(alpha=0.3)
plt.show()

With this foundation, you can begin to see why recommendation systems matter and why they show up everywhere. But understanding the purpose is only the first step. To truly design them, you need to understand the two main approaches that power most recommendation systems. That brings us to collaborative filtering and content based filtering. Collaborative filtering is built on the idea that people who behave similarly are likely to enjoy similar things. If two users watch the same shows or read the same articles, the system assumes they share preferences. It clusters people based on their behavior. I have seen this reveal surprising groupings in real projects, clusters that no business team predicted. That is the magic of collaborative filtering. It uncovers hidden relationships that only become visible through patterns. This approach works incredibly well when you have lots of user behavior. Ratings, clicks, views, purchases. The more signals, the clearer the pattern.

Source: Sahir Maharaj (https://sahirmaharaj.com)

The challenge appears when you have new users or new items. This is the cold start problem. I have encountered this many times and it is often the reason teams do not rely on collaborative filtering alone. Still, when the behavior data is rich, collaborative filtering is powerful. Content based filtering focuses on the items themselves. It looks at descriptions, categories, tags, keywords, and uses them to understand what items are similar. If a user interacts with one item, the system recommends others with similar attributes. This works beautifully when you have well described content. Whenever I see a system with strong metadata, I immediately know content based filtering will shine. One major advantage of content based filtering is explainability. Because recommendations are based on item attributes, you can easily explain why something was suggested. For example, you liked this article about forecasting, so here is another one about predictive analytics. In environments where clarity matters, this type of transparency builds trust.

import pandas as pd
import numpy as np
import matplotlib.pyplot as plt
from sklearn.preprocessing import MinMaxScaler

df = pd.DataFrame({
    "user": [1,1,1,2,2,3,3,3,3,4,4,5,5,5,6,6,7,7,7],
    "item": ["ML","DL","Py","ML","Viz","DL","Stats","SQL","ML","SQL","ML","DL","Py","Stats","SQL","Viz","Stats","ML","Py"],
    "rating": [5,4,5,4,3,5,3,4,5,5,4,4,5,4,4,3,5,4,5]
})

pivot = df.pivot_table(values="rating", index="user", columns="item").fillna(0)
scaled = MinMaxScaler().fit_transform(pivot)
sim = np.dot(scaled, scaled.T)

def recommend_for_user(u):
    uidx = pivot.index.tolist().index(u)
    scores = sim[uidx]
    similar_users = np.argsort(scores)[::-1][1:4]
    user_items = pivot.loc[u]
    unseen = user_items[user_items == 0].index
    rec_scores = {}
    for su in similar_users:
        for item in unseen:
            val = pivot.iloc[su][item]
            if val > 0:
                rec_scores[item] = rec_scores.get(item, 0) + val
    ranked = sorted(rec_scores.items(), key=lambda x: x[1], reverse=True)
    return ranked[:5]

recommend_for_user(1)

plt.figure(figsize=(9,7))
plt.imshow(sim, cmap="magma")
plt.colorbar()
plt.xticks(range(len(pivot.index)), pivot.index)
plt.yticks(range(len(pivot.index)), pivot.index)
plt.title("User Collaborative Similarity Matrix")
plt.tight_layout()
plt.show()

vals, vecs = np.linalg.eig(sim)
proj = np.real(vecs[:, :2])

plt.figure(figsize=(8,7))
plt.scatter(proj[:,0], proj[:,1], s=200, c=np.linspace(0,1,len(proj)), cmap="winter")
for i, user in enumerate(pivot.index):
    plt.text(proj[i,0]+0.01, proj[i,1]+0.01, str(user))
plt.title("User Clustering via Collaborative Filtering")
plt.xlabel("Dimension 1")
plt.ylabel("Dimension 2")
plt.grid(alpha=0.3)
plt.show()

Choosing between the two methods is an important strategic decision for any data professional. If you have rich behavioral data but weak item metadata, collaborative filtering is ideal. If you have strong item metadata but limited behavior data, content based filtering is more effective. If both are strong, a hybrid model usually delivers the best performance. Many of the best systems combine both techniques naturally. Once you understand these two approaches, you have the foundation needed to design a recommendation system. But choosing the method is just the beginning. Next, you need to think about signals, similarity, ranking, and personalization. That is where recommendation systems start to feel like real experiences rather than just models.

Source: Sahir Maharaj (https://sahirmaharaj.com)

Every recommendation system begins with signals. These signals tell you what users are drawn to, what they ignore, and what they revisit. Signals can be explicit, like ratings, or implicit, like dwell time or repeat visits. In my own work, I have found implicit signals to be surprisingly powerful. They often tell you more about true interest than explicit feedback. Understanding which signals you have and how strong they are is the first step. The next layer is item structure. To compare items, you need to describe them. Descriptions, tags, keywords, summaries, topics. The better your metadata, the better your recommendations. I have worked on projects where simply improving item descriptions led to meaningful jumps in recommendation relevance. Strong metadata is not a nice to have. It is the backbone of content based recommendations.

import pandas as pd
import numpy as np
from sklearn.preprocessing import MinMaxScaler
import matplotlib.pyplot as plt

df = pd.DataFrame({
    "item": ["ML","DL","Python","SQL","Stats","Viz","Forecasting"],
    "views": [220,150,300,90,330,280,260],
    "clicks": [45,22,70,18,60,55,50],
    "dwell": [85,40,120,30,150,110,100]
})

scaler = MinMaxScaler()
feat = scaler.fit_transform(df[["views","clicks","dwell"]])
weights = np.array([0.35,0.30,0.35])

df["score"] = np.dot(feat, weights)
df_ranked = df.sort_values("score", ascending=False)

df_ranked

plt.figure(figsize=(9,7))
plt.plot(df_ranked["item"], df_ranked["score"], marker="o", linewidth=3, color="mediumseagreen")
plt.title("Personalized Recommendation Score Curve")
plt.xlabel("Item")
plt.ylabel("Score")
plt.grid(alpha=0.3)
plt.tight_layout()
plt.show()

plt.figure(figsize=(9,7))
plt.bar(df_ranked["item"], df_ranked["score"], color="coral")
plt.title("Ranked Recommendations (Weighted Implicit Signals)")
plt.xlabel("Item")
plt.ylabel("Score")
plt.tight_layout()
plt.show()

Similarity sits at the core of every recommendation system. The question is always the same. How similar is item A to item B? Similarity can be based on content, behavior, context, or even timing. As a data scientist, I like to imagine items sitting together in a conceptual map. Items that are close together are similar. Items that are far apart are not. This mental picture helps when shaping logic for ranking. Ranking comes next. A list of similar items is not enough. You need to decide what appears first. Ranking can be based on popularity, recency, relevance, or a blend of factors. Good ranking turns a raw similarity list into meaningful suggestions as I have seen ranking decisions completely change how users interact with content. It is really a subtle but powerful lever!

Finally, comes personalization. This is where the system feels human. Personalization adapts based on the user's history, behavior, stage of learning, or interests. A beginner should not be shown advanced content. A user exploring a new topic should not receive unrelated items. Good personalization creates a sense of guidance, almost like the system is walking with the user. Once you understand signals, metadata, similarity, ranking, and personalization, you see recommendation systems differently. They stop feeling like technical artifacts and start feeling like designed experiences... you move from building models to crafting journeys. And that shift is what I find leads to great recommendation systems.

Source: Sahir Maharaj (https://sahirmaharaj.com)

So there it is... now this is your moment to build what we have just explored and try them where they make the most sense. Open Fabric, bring in a small dataset, and build the simplest version of a recommendation flow. And no... it does not need to be perfect. It does not need to be complex. It just needs to start. Once you see even a basic recommendation in your own environment, everything changes. My recommended next step for you is to try it in Fabric. You might be surprised by how far a small idea can go!

Thanks for taking the time to read my post! I’d love to hear what you think and connect with you 🙂

You must be a registered user to add a comment. If you've already registered, sign in. Otherwise, register and sign in.

Comment

Top Kudoed Posts

Subject

Kudos

Advanced Pipelines & Transformers for Data Science...

5

Autoencoder Dimensionality Reduction for Data Scie...

3

Get Fabric Data Agents Running in Minutes – Fast, ...

3

Designing Intelligent Recommendation Systems for D...

2

Clustering with KMeans, DBSCAN, and UMAP for Data ...

2