MLExpert

What is Machine Learning

Written by Venelin Valkov

What is Machine Learning?

TLDR; Here, we'll get a sense of what Machine Learning is and how it can be useful. You'll even create your first Machine Learning model, too!

Can we make intelligent machines? Solving the general intelligence problem seems to be quite hard. Yet, we're much closer than ever before. Why?

Nowadays, you have access to:

  • Fast compute power (CPUs and GPUs). And some of it is free!
  • Vast amounts of data (texts, images, databases)
  • Awesome (and open-source) libraries and toolkits for Machine Learning - models, algorithms, data processing and visualization
  • Pre-trained models that you can't train on your own (e.g. BERT-like models by Google)
  • Great (and free) resources for learning

What a time to be alive! Now is the best time to get into Machine Learning. Let's start!

What problem does Machine Learning solve?

Writing large amounts of programming logic and handling numerous "edge cases" can be very boring. You might not be able to do it well enough - writing complex programs is hard!

Machine Learning tries to replace complex programming tasks with general algorithms that use data to solve a particular problem. "Modern" Machine Learning works well enough to solve practical tasks on niche domains - text, images, time series and structured (tabular) data.

Prerequisites

Do you need to know math, statistics and probability? No, but it helps. What you really need to know is how to program. We'll focus on programming in Python. You might choose to program in any language, as long it is Python.

ML in practice

Jim and Pam want to buy a house for their new family. How much should they pay for it?

They pulled a website that lists local properties and found the following data:

[ ]

import pandas as pd
from sklearn.linear_model import LinearRegression

[ ]

house_data = pd.DataFrame(
    dict(
        age=[2, 30, 8, 5],
        rooms=[7, 3, 6, 12],
        bedrooms=[3, 1, 2, 4],
        area=[4200, 2500, 3000, 5500],
        price=[500000, 200000, 250000, 750000],
    )
)
house_data

age rooms bedrooms area price
0 2 7 3 4200 500000
1 30 3 1 2500 200000
2 8 6 2 3000 250000
3 5 12 4 5500 750000

Each house has a couple of characteristics along with its price. How can they use that to estimate the value of new properties?

They decided to train a Machine Learning model:

[ ]

model = LinearRegression().fit(
    house_data[["age", "rooms", "bedrooms", "area"]], house_data["price"]
)

Their real estate agent called with a new offer and gave them the parameters of the house:

"The house is 4 years old, with 5 rooms (2 of which are bedrooms) and has a floor space of 3300 square feet. The asking price is $400,000.00"

They liked the house and thought that it was a good deal. But just in case, they run the house parameters by the algorithm:

[ ]

new_house_parameters = dict(age=4, rooms=5, bedrooms=2, area=3300)
new_house_parameters = [list(new_house_parameters.values())]

predicted_price = model.predict(new_house_parameters)

"${:,.2f}".format(float(predicted_price))

'$320,086.75'

According to the model, the price is way too much. Should they trust the prediction and walk away from the offer?

Not really. More data and a proper way to evaluate the performance of the model are needed. But hey, you trained your first Machine Learning model!

How does it work?

Your model has a set of parameters (knobs). Those get adjusted during training based on the data you have. The objective is to find such values that make good predictions (e.g. correct house price).

Of course, this is just an intuitive explanation. You can go quite deep into the rabbit hole, but we'll keep it PG-13 for now.

The ingredients

To have a chance of creating a successful Machine Learning project you need a couple of ingredients. The quality of each one will determine how likely your project is to succeed.

Data

Having large amounts (and close as possible to what your model will stumble upon in the real world) of data is essential. Without it, doing real-world Machine Learning is very difficult.

Your data needs to be clean and well looked after. "Garbage in, garbage out" is true when it comes to it. Your data management skills will be tested here.

Model

Now that you have your data looking great, you can use a model to learn from it so you can solve the problem you're interested in. The process of learning is called training. It involves finding good values for the model knobs (parameters). Good knob values lead to more accurate models.

Training a model can be a tricky business. Many practical tips can help, but the only way to get good at it is to practice.

Metric(s)

How good is your model? Answering this will give you a way to better train your model. Machine Learning algorithms work by finding better knob values via minimizing (or maximizing) a metric.

Deployment

The most important part of your job is to deliver a model that makes predictions in the real world. Most of the time, this means creating a REST API to serve the model predictions and some mechanism to monitor the performance.

Summary

We've looked at what Machine Learning is and what it can do for you. Also, you trained your first model! Next, we'll look into what classification is and how you can use Machine Learning to solve a classification problem.

References

  • Machine Learning on Wikipedia