Imagine that you want to **replicate a dataset** based on a specific behavior. You already know that data should perform a quadratic equation with the minimum in x = 500 with positive values. However, is there any way to simulate a similar scenario with Python?

This tutorial aims to learn how to create a **sample dataset** for regression problems. I will be covering linear regression and non-linear regression equations (polynomic, exponential…). I will continue updating this tutorial with new regression problems in the future. You can find the notebook for this tutorial** on my GitHub account.**

# Linear Regression

## Simple Regression

As you might know, linear regression is based on **linear equations** with the following form:

y = ax + b

where a is the **slope** and b is the **cut in the y-axis**.

To build a linear equation, an option is to use the function **make_regression**() from the **Sklearn library** to create samples of X and Y. The mean parameters you can add to this function are:

**n_sample**: number of samples**n_features**: number of variables**n_informative**: number of informative variables to create the output**n_targets**: number of regression targets**noise**: standard deviation of the output**random_state**: the seed to control the randomness of the output

You can take a look at the rest of the parameters on the Sklearn documentation.

First of all, let’s import the libraries:

```
import pandas as pd
import numpy as np
import seaborn as sns
from sklearn import datasets
import matplotlib.pyplot as plt
from sklearn.datasets import make_regression
```

Now, let’s apply the make_regression() function:

```
x, y = datasets.make_regression(n_samples = 200, n_features = 1,
n_informative = 1, n_targets = 1,
noise = 20, random_state=12345, effective_rank=None)
```

If you want to determine in which range X and Y values are moving, you can use a Numpy function called **np.intern()** specifying the minimum and the maximum for each one:

```
x = np.interp(x, (x.min(), x.max()), (3876, 15678))
y = np.interp(y, (y.min(), y.max()), (1678, 5435))
```

Let’s plot the function to see what it looks like:

```
plt.ion()
plt.plot(x,y,'.')
```

# Non-linear Regression

## Polynomial Regression

In the case of polynomic regression, we need to apply a more **complex methodology**. For this example, I will be calculating a **cubic equation**. So, we are looking for an equation that has the following form:

y = ax^3 + bx^2 + cx + d

First, let’s determine the X coordinates of the **maximum** and the **minimum** values that we want to build the first derivative of our desired function.

To build my function for this specific exercise, I will use max_x = 3000 and min_y = 5000.

Then, create the **factorial function** based on those values. In this case, it will be the following one:

f'(x) = (x – 3000)(x – 5000)

After developing, we have the following equation:

f'(x) = x^2 – 8000x + 1.5·10^7

As I said, this is the **first derivative**. To get our cubic equation, it’s necessary to **integrate** using the Scipy and Sympy libraries. Let’s import them:

```
import scipy as sp
from sympy import *
```

Now, I will create the symbol for X as it will be our unknown factor:

```
init_printing(use_unicode=False, wrap_line=False)
x = Symbol('x')
```

In the next step, I will integrate the first derivative to get the cubic equation:

`integrate(x**2 - 8000*x + 1.5*(10**7), x)`

This is the result of the integral that will be the base to build our cubic equation:

y = 1/3*x*^3 − 4000*x*^2 + 15000000*x*

All equations have different types of transformations.

- The first one is changing the
**move the function up and down**over the coordinate axis. - The second one is to multiply or divide the whole function to
**stretch or flatten it**.

Let’s plot the resultant equation to see what it looks like by defining a function:

```
def plot_me(a, b, c, d):
x = np.arange(0, 7000, 0.05)
y = [(a*i**3 + b*i**2 + c*i + d) for i in x]
plt.plot(x, y, label='cubic', linestyle='-')
plt.grid(True)
plt.show(block=False)
plt.pause(10)
plt.close()
plot_me(1/3, -4000, 15000000, 0)
```

Let’s do some transformations over the equation, so the range for the Y value is between 0 and 5000. To do so, I will divide the whole equation by 1500000. Also, I will move it 3000 units up (this means d = 3000).

`plot_me(0.3333333/15000000, -4000/15000000, 15000000/15000000, 3000)`

In the next post, I will share with you how to do a similar approach with **exponential** and **logarithmic** equations.

Thanks for finally talking about > Generate Simulated Dataset for Regression Problems – Hello

Data < Loved it!

This is a topic which is close to my heart… Many thanks!

Where are your contact details though?

I got this web page from my pal who informed me about this web page and at

the moment this time I am visiting this web page

and reading very informative articles here.

Hi, after reading this remarkable post i am as well

cheerful to share my experience here with friends.

Thank you a bunch for sharing this with all folks you really recognise what you’re speaking approximately!

Bookmarked. Kindly also talk over with my web site =).

We will have a link alternate contract between us

Very nice post. I definitely appreciate this site. Keep writing!

Stop by my blog indian betting apps

I couldn’t resist commenting. Very well written!

I visited multiple blogs but the audio quality for audio

songs existing at this website is genuinely excellent.

My web-site :: Casino Online For Real Money

Hi there, I discovered your site by way of Google even as searching for

a related topic, your web site got here up, it looks great.

I have bookmarked it in my google bookmarks.

Hello there, just was alert to your blog thru Google, and found that it is really informative.

I’m gonna be careful for brussels. I will appreciate if

you proceed this in future. Many people will probably

be benefited from your writing. Cheers!

My page … 22bet

I will immediately grab your rss feed as I can’t find your email subscription hyperlink or newsletter service.

Do you’ve any? Please let me understand in order that I could subscribe.

Thanks.

It’s fantastic that you are getting thoughts from this article as well as from

our discussion made at this time.

my page :: paripesa

Hello, Neat post. There’s an issue along with your website in web explorer,

may test this? IE still is the marketplace

chief and a good part of other folks will pass over your excellent writing because of

this problem.

I was curious if you ever thought of changing the

layout of your site? Its very well written; I love what youve got to say.

But maybe you could a little more in the way

of content so people could connect with it better. Youve

got an awful lot of text for only having one or two images.

Maybe you could space it out better?

Hey there! I know this is somewhat off topic but I was wondering if you knew where I

could find a captcha plugin for my comment form? I’m using the same blog

platform as yours and I’m having trouble finding one?

Thanks a lot!

Awesome issues here. I’m very satisfied to look your post.

Thank you so much and I am taking a look ahead to touch you.

Will you kindly drop me a e-mail?

My webpage; Casino Online Betting

I’m amazed, I have to admit. Rarely do I come across a blog that’s both equally educative and engaging, and let me tell you, you’ve hit the nail on the

head. The issue is something that too few people are speaking intelligently about.

Now i’m very happy that I found this in my hunt for

something regarding this.

Hi! This is my first comment here so I just wanted to give a quick shout

out and tell you I really enjoy reading through your articles.

Can you recommend any other blogs/websites/forums that go over the

same topics? Thanks for your time!

Check out my web-site … Free Spins

Nice post. I was checking constantly this blog and I’m

impressed! Extremely helpful information specifically

the last part 🙂 I handle such info a lot. I used to be seeking this

particular info for a long time. Thanks and best of luck.

I just like the helpful information you supply to your articles.

I will bookmark your weblog and test once more right here frequently.

I am rather sure I’ll be told many new stuff right right here!

Best of luck for the following!