Imagine that you want to replicate a dataset based on a specific behavior. You already know that data should perform a quadratic equation with the minimum in x = 500 with positive values. However, is there any way to simulate a similar scenario with Python?

Summary

This tutorial aims to learn how to create a sample dataset for regression problems. I will be covering linear regression and non-linear regression equations (polynomic, exponential…). I will continue updating this tutorial with new regression problems in the future. You can find the notebook for this tutorial on my GitHub account.

Linear Regression

Simple Regression

As you might know, linear regression is based on linear equations with the following form:

y = ax + b

where a is the slope and b is the cut in the y-axis.

To build a linear equation, an option is to use the function make_regression() from the Sklearn library to create samples of X and Y. The mean parameters you can add to this function are:

n_sample: number of samples
n_features: number of variables
n_informative: number of informative variables to create the output
n_targets: number of regression targets
noise: standard deviation of the output
random_state: the seed to control the randomness of the output

You can take a look at the rest of the parameters on the Sklearn documentation.

First of all, let’s import the libraries:

import pandas as pd
import numpy as np
import seaborn as sns
from sklearn import datasets
import matplotlib.pyplot as plt
from sklearn.datasets import make_regression

Now, let’s apply the make_regression() function:

x, y = datasets.make_regression(n_samples = 200, n_features = 1,
                                n_informative = 1, n_targets = 1,
                                noise = 20, random_state=12345, effective_rank=None)

If you want to determine in which range X and Y values are moving, you can use a Numpy function called np.intern() specifying the minimum and the maximum for each one:

x = np.interp(x, (x.min(), x.max()), (3876, 15678))
y = np.interp(y, (y.min(), y.max()), (1678, 5435))

Let’s plot the function to see what it looks like:

plt.ion()
plt.plot(x,y,'.')

Non-linear Regression

Polynomial Regression

In the case of polynomic regression, we need to apply a more complex methodology. For this example, I will be calculating a cubic equation. So, we are looking for an equation that has the following form:

y = ax^3 + bx^2 + cx + d

First, let’s determine the X coordinates of the maximum and the minimum values that we want to build the first derivative of our desired function.

To build my function for this specific exercise, I will use max_x = 3000 and min_y = 5000.

Then, create the factorial function based on those values. In this case, it will be the following one:

f'(x) = (x – 3000)(x – 5000)

After developing, we have the following equation:

f'(x) = x^2 – 8000x + 1.5·10^7

As I said, this is the first derivative. To get our cubic equation, it’s necessary to integrate using the Scipy and Sympy libraries. Let’s import them:

import scipy as sp
from sympy import *

Now, I will create the symbol for X as it will be our unknown factor:

init_printing(use_unicode=False, wrap_line=False)
x = Symbol('x')

In the next step, I will integrate the first derivative to get the cubic equation:

integrate(x**2 - 8000*x + 1.5*(10**7), x)

This is the result of the integral that will be the base to build our cubic equation:

y = 1/3x^3 − 4000x^2 + 15000000x

All equations have different types of transformations.

The first one is changing the D value. This will move the function up and down over the coordinate axis.
The second one is to multiply or divide the whole function to stretch or flatten it.

Let’s plot the resultant equation to see what it looks like by defining a function:

def plot_me(a, b, c, d):
    x = np.arange(0, 7000, 0.05)
    y = [(a*i**3 + b*i**2 + c*i + d) for i in x]
    plt.plot(x, y, label='cubic', linestyle='-')
    plt.grid(True)
    plt.show(block=False)
    plt.pause(10)
    plt.close()


plot_me(1/3, -4000, 15000000, 0)

Let’s do some transformations over the equation, so the range for the Y value is between 0 and 5000. To do so, I will divide the whole equation by 1500000. Also, I will move it 3000 units up (this means d = 3000).

plot_me(0.3333333/15000000, -4000/15000000, 15000000/15000000, 3000)

In the next post, I will share with you how to do a similar approach with exponential and logarithmic equations.

Tags: datasets, linear regression, mathematics, non-linear regression, python, regression

This Post Has 19 Comments

สล็อตฝาก ถอน true wallet เครดิตฟรี March 8, 2023 Reply

Thanks for finally talking about > Generate Simulated Dataset for Regression Problems – Hello
Data < Loved it!
คาสิโน1688 March 8, 2023 Reply

This is a topic which is close to my heart… Many thanks!
Where are your contact details though?
joker 777 March 8, 2023 Reply

I got this web page from my pal who informed me about this web page and at
the moment this time I am visiting this web page
and reading very informative articles here.
sbobet ca March 8, 2023 Reply

Hi, after reading this remarkable post i am as well
cheerful to share my experience here with friends.
phyteney March 8, 2023 Reply

Thank you a bunch for sharing this with all folks you really recognise what you’re speaking approximately!
Bookmarked. Kindly also talk over with my web site =).
We will have a link alternate contract between us
indian betting apps March 8, 2023 Reply

Very nice post. I definitely appreciate this site. Keep writing!

Stop by my blog indian betting apps
ทดลองเล่นเกมส์สล็อต March 8, 2023 Reply

I couldn’t resist commenting. Very well written!
Casino Online For Real Money March 8, 2023 Reply

I visited multiple blogs but the audio quality for audio
songs existing at this website is genuinely excellent.

My web-site :: Casino Online For Real Money
22bet March 8, 2023 Reply

Hi there, I discovered your site by way of Google even as searching for
a related topic, your web site got here up, it looks great.
I have bookmarked it in my google bookmarks.

Hello there, just was alert to your blog thru Google, and found that it is really informative.
I’m gonna be careful for brussels. I will appreciate if
you proceed this in future. Many people will probably
be benefited from your writing. Cheers!

My page … 22bet
สล็อต เครดิตฟรี แค่สมัคร March 8, 2023 Reply

I will immediately grab your rss feed as I can’t find your email subscription hyperlink or newsletter service.

Do you’ve any? Please let me understand in order that I could subscribe.
Thanks.
paripesa March 8, 2023 Reply

It’s fantastic that you are getting thoughts from this article as well as from
our discussion made at this time.

my page :: paripesa
บทความ March 8, 2023 Reply

Hello, Neat post. There’s an issue along with your website in web explorer,
may test this? IE still is the marketplace
chief and a good part of other folks will pass over your excellent writing because of
this problem.
ขายเสื้อยืด March 8, 2023 Reply

I was curious if you ever thought of changing the
layout of your site? Its very well written; I love what youve got to say.
But maybe you could a little more in the way
of content so people could connect with it better. Youve
got an awful lot of text for only having one or two images.
Maybe you could space it out better?
เว็บคาสิโน 88 March 9, 2023 Reply

Hey there! I know this is somewhat off topic but I was wondering if you knew where I
could find a captcha plugin for my comment form? I’m using the same blog
platform as yours and I’m having trouble finding one?

Thanks a lot!
Casino Online Betting March 9, 2023 Reply

Awesome issues here. I’m very satisfied to look your post.
Thank you so much and I am taking a look ahead to touch you.
Will you kindly drop me a e-mail?

My webpage; Casino Online Betting
slot เครดิตฟรี ไม่ต้องฝาก ไม่ต้องแชร์ March 9, 2023 Reply

I’m amazed, I have to admit. Rarely do I come across a blog that’s both equally educative and engaging, and let me tell you, you’ve hit the nail on the
head. The issue is something that too few people are speaking intelligently about.
Now i’m very happy that I found this in my hunt for
something regarding this.
Free Spins March 9, 2023 Reply

Hi! This is my first comment here so I just wanted to give a quick shout
out and tell you I really enjoy reading through your articles.
Can you recommend any other blogs/websites/forums that go over the
same topics? Thanks for your time!

Check out my web-site … Free Spins
นมผึ้ง March 9, 2023 Reply

Nice post. I was checking constantly this blog and I’m
impressed! Extremely helpful information specifically
the last part 🙂 I handle such info a lot. I used to be seeking this
particular info for a long time. Thanks and best of luck.
ออกแบบโบรชัวร์ March 9, 2023 Reply

I just like the helpful information you supply to your articles.
I will bookmark your weblog and test once more right here frequently.

I am rather sure I’ll be told many new stuff right right here!
Best of luck for the following!

Generate Simulated Dataset for Regression Problems

Linear Regression

Simple Regression

Non-linear Regression

Polynomial Regression

This Post Has 19 Comments

Leave a Reply Cancel reply

Navigation

Content Categories

Follow me

Linear Regression

Simple Regression

Non-linear Regression

Polynomial Regression

Sharing is caring Share this content

You Might Also Like

Connect to the YouTube API: Quick Python Tutorial (I)

This Post Has 19 Comments

Leave a Reply Cancel reply

Navigation

Content Categories

Follow me

Share this content