Solved: Re: How do you decide which ML algorithm to start ...

richa_gupta_224

Hi everyone,
I’m currently learning data science and machine learning, and I often get confused when starting a new project. With so many algorithms available (Linear Regression, Random Forest, XGBoost, SVM, etc.), how do you decide which one to try first?

Do you rely more on dataset size, feature types, interpretability, or just experimentation? I’d really appreciate hearing how experienced data scientists approach this step in real-world projects.

Thanks in advance!

v-echaithra

Hi @richa_gupta_224 ,

In real-world ML projects, we usually don’t try to identify the “best” algorithm upfront. Instead, the focus is on understanding the data and establishing baselines.

A typical approach looks like this:

Start with a simple baseline (Linear or Logistic Regression). This helps validate the data, uncover leakage, and set a reference point. Evaluate a few strong models in parallel, usually one linear model, one tree-based model (Random Forest), and one boosting model (XGBoost/LightGBM). The data often makes the choice clearer.

Let practical constraints guide selection:

Need interpretability > simpler models

Tabular data and performance-driven > tree-based or boosting models

Small datasets > avoid overly complex models

Be cautious of extremely high accuracy. Iterate over time. As data, features, or business requirements evolve, the “best” model can change.
The key takeaway is that model selection is an iterative engineering process, not a one-time decision. In most cases, improvements come more from better data and features than from switching algorithms.

Hope this helps.
Thank you.

View solution in original post

AnShikagautam

We. Can use for coming technology

v-echaithra

Hi @richa_gupta_224 ,

In real-world ML projects, we usually don’t try to identify the “best” algorithm upfront. Instead, the focus is on understanding the data and establishing baselines.

A typical approach looks like this:

Start with a simple baseline (Linear or Logistic Regression). This helps validate the data, uncover leakage, and set a reference point. Evaluate a few strong models in parallel, usually one linear model, one tree-based model (Random Forest), and one boosting model (XGBoost/LightGBM). The data often makes the choice clearer.

Let practical constraints guide selection:

Need interpretability > simpler models

Tabular data and performance-driven > tree-based or boosting models

Small datasets > avoid overly complex models

Be cautious of extremely high accuracy. Iterate over time. As data, features, or business requirements evolve, the “best” model can change.
The key takeaway is that model selection is an iterative engineering process, not a one-time decision. In most cases, improvements come more from better data and features than from switching algorithms.

Hope this helps.
Thank you.

richa_gupta_224

Thank you for the clear explanation. I really liked the idea of starting with simple baselines and letting data and constraints guide the choice, it makes the process feel much more practical than chasing the best algorithm upfront.
I had one follow-up question that When setting a baseline, how do you decide if the model is good enough to move forward, or if the issue lies more in feature engineering rather than trying a more complex model?

lbendlin

you do as many as you can in parallel. Then pick the one that has the best success with your actual data. Marvel at what you have achieved, and then rinse and repeat every so often, when your chosen model is no longer the "best".

Do not pick a model that has close to 100% success rate. Such a rate is an indication that you don't need ML in the first place.

75% to 90% is very respectable.

richa_gupta_224

Thank you for the clear explanation, this really helped me understand the importance of baselines and iteration.

How do you decide which ML algorithm to start with for a new dataset?

Helpful resources

Fabric Monthly Update - December 2025

FabCon Atlanta 2026

FabCon is coming to Atlanta

How do you decide which ML algorithm to start with for a new dataset?

Helpful resources

Fabric Monthly Update - December 2025

FabCon Atlanta 2026