# Generate Simulated Dataset for Regression Problems

Imagine that you want to replicate a dataset based on a specific behavior. You already know that data should perform a quadratic equation with the minimum in x = 500 with positive values. However, is there any way to simulate a similar scenario with Python?

Summary

This tutorial aims to learn how to create a sample dataset for regression problems. I will be covering linear regression and non-linear regression equations (polynomic, exponential…). I will continue updating this tutorial with new regression problems in the future. You can find the notebook for this tutorial on my GitHub account.

# Linear Regression

## Simple Regression

As you might know, linear regression is based on linear equations with the following form:

y = ax + b

where a is the slope and b is the cut in the y-axis.

To build a linear equation, an option is to use the function make_regression() from the Sklearn library to create samples of X and Y. The mean parameters you can add to this function are:

• n_sample: number of samples
• n_features: number of variables
• n_informative: number of informative variables to create the output
• n_targets: number of regression targets
• noise: standard deviation of the output
• random_state: the seed to control the randomness of the output

You can take a look at the rest of the parameters on the Sklearn documentation.

First of all, let’s import the libraries:

``````import pandas as pd
import numpy as np
import seaborn as sns
from sklearn import datasets
import matplotlib.pyplot as plt
from sklearn.datasets import make_regression``````

Now, let’s apply the make_regression() function:

``````x, y = datasets.make_regression(n_samples = 200, n_features = 1,
n_informative = 1, n_targets = 1,
noise = 20, random_state=12345, effective_rank=None)``````

If you want to determine in which range X and Y values are moving, you can use a Numpy function called np.intern() specifying the minimum and the maximum for each one:

``````x = np.interp(x, (x.min(), x.max()), (3876, 15678))
y = np.interp(y, (y.min(), y.max()), (1678, 5435))``````

Let’s plot the function to see what it looks like:

``````plt.ion()
plt.plot(x,y,'.')``````

# Non-linear Regression

## Polynomial Regression

In the case of polynomic regression, we need to apply a more complex methodology. For this example, I will be calculating a cubic equation. So, we are looking for an equation that has the following form:

y = ax^3 + bx^2 + cx + d

First, let’s determine the X coordinates of the maximum and the minimum values that we want to build the first derivative of our desired function.

To build my function for this specific exercise, I will use max_x = 3000 and min_y = 5000.

Then, create the factorial function based on those values. In this case, it will be the following one:

f'(x) = (x – 3000)(x – 5000)

After developing, we have the following equation:

f'(x) = x^2 – 8000x + 1.5·10^7

As I said, this is the first derivative. To get our cubic equation, it’s necessary to integrate using the Scipy and Sympy libraries. Let’s import them:

``````import scipy as sp
from sympy import *``````

Now, I will create the symbol for X as it will be our unknown factor:

``````init_printing(use_unicode=False, wrap_line=False)
x = Symbol('x')``````

In the next step, I will integrate the first derivative to get the cubic equation:

``integrate(x**2 - 8000*x + 1.5*(10**7), x)``

This is the result of the integral that will be the base to build our cubic equation:

y = 1/3x^3 − 4000x^2 + 15000000x

All equations have different types of transformations.

• The first one is changing the D value. This will move the function up and down over the coordinate axis.
• The second one is to multiply or divide the whole function to stretch or flatten it.

Let’s plot the resultant equation to see what it looks like by defining a function:

``````def plot_me(a, b, c, d):
x = np.arange(0, 7000, 0.05)
y = [(a*i**3 + b*i**2 + c*i + d) for i in x]
plt.plot(x, y, label='cubic', linestyle='-')
plt.grid(True)
plt.show(block=False)
plt.pause(10)
plt.close()

plot_me(1/3, -4000, 15000000, 0)``````

Let’s do some transformations over the equation, so the range for the Y value is between 0 and 5000. To do so, I will divide the whole equation by 1500000. Also, I will move it 3000 units up (this means d = 3000).

``plot_me(0.3333333/15000000, -4000/15000000, 15000000/15000000, 3000)``

In the next post, I will share with you how to do a similar approach with exponential and logarithmic equations.

### This Post Has 19 Comments

1. Thanks for finally talking about > Generate Simulated Dataset for Regression Problems – Hello
Data < Loved it!

2. This is a topic which is close to my heart… Many thanks!
Where are your contact details though?

3. I got this web page from my pal who informed me about this web page and at
the moment this time I am visiting this web page
and reading very informative articles here.

4. Hi, after reading this remarkable post i am as well
cheerful to share my experience here with friends.

5. Thank you a bunch for sharing this with all folks you really recognise what you’re speaking approximately!
Bookmarked. Kindly also talk over with my web site =).
We will have a link alternate contract between us

6. Very nice post. I definitely appreciate this site. Keep writing!

7. I couldn’t resist commenting. Very well written!

8. I visited multiple blogs but the audio quality for audio
songs existing at this website is genuinely excellent.

My web-site :: Casino Online For Real Money

9. Hi there, I discovered your site by way of Google even as searching for
a related topic, your web site got here up, it looks great.
I have bookmarked it in my google bookmarks.

Hello there, just was alert to your blog thru Google, and found that it is really informative.
I’m gonna be careful for brussels. I will appreciate if
you proceed this in future. Many people will probably
be benefited from your writing. Cheers!

My page … 22bet

10. Do you’ve any? Please let me understand in order that I could subscribe.
Thanks.

11. It’s fantastic that you are getting thoughts from this article as well as from
our discussion made at this time.

my page :: paripesa

12. Hello, Neat post. There’s an issue along with your website in web explorer,
may test this? IE still is the marketplace
chief and a good part of other folks will pass over your excellent writing because of
this problem.

13. I was curious if you ever thought of changing the
layout of your site? Its very well written; I love what youve got to say.
But maybe you could a little more in the way
of content so people could connect with it better. Youve
got an awful lot of text for only having one or two images.
Maybe you could space it out better?

14. Hey there! I know this is somewhat off topic but I was wondering if you knew where I
could find a captcha plugin for my comment form? I’m using the same blog
platform as yours and I’m having trouble finding one?

Thanks a lot!

15. Awesome issues here. I’m very satisfied to look your post.
Thank you so much and I am taking a look ahead to touch you.
Will you kindly drop me a e-mail?

My webpage; Casino Online Betting

16. I’m amazed, I have to admit. Rarely do I come across a blog that’s both equally educative and engaging, and let me tell you, you’ve hit the nail on the
head. The issue is something that too few people are speaking intelligently about.
Now i’m very happy that I found this in my hunt for
something regarding this.

17. Hi! This is my first comment here so I just wanted to give a quick shout
Can you recommend any other blogs/websites/forums that go over the
same topics? Thanks for your time!

Check out my web-site … Free Spins

18. Nice post. I was checking constantly this blog and I’m
19. 