Using Linear Regression in TensorFlow

Sergey Kovalev


The linear regression algorithm helps to predict scores on the variable Y from the scores on the variable X. In this TensorFlow tutorial, we create a linear regression model and optimize it using the gradient descent method.



The variable Y that we are predicting is usually called the criterion variable, and the variable X that we are basing our predictions on is called the predictor variable. If there is only one predictor variable, the prediction method is called simple regression.

In addition to TensorFlow, install matplotlib for using plots: pip install matplotlib.

Data set

As a training set for the tutorial, we use house prices in Portland, Oregon, where X (the predictor variable) is the house size and Y (the criterion variable) is the house price. The data set contains 47 examples.



Data set pre-processing

Normalizing your data helps to improve the performance of gradient descent, especially in the case of multivariate linear regression.

We can do this with the following formula:


where m is the mean value of the variable and q is the standard deviation.

Implementation in the source code:

def normalize(array): 
    return (array - array.mean()) / array.std()

size_data_n = normalize(size_data)
price_data_n = normalize(price_data)


Implementing the cost function and applying gradient descent

The next step is to implement the cost function and to apply the gradient descent method to it for minimizing squared errors.

The cost function formula:


Implementation in the source code:

cost_function = tf.reduce_sum(tf.pow(model - Y, 2))/(2 * samples_number)


Selecting a learning rate

Typically, the learning rate is selected in the range of


After the initial selection, running gradient descent, and observing the cost function value, you can adjust the learning rate accordingly. For this task, we choose the learning rate equal to 0.1.


Running example source code

Now, try to run example source code:

import tensorflow as tf
import numpy
import matplotlib.pyplot as plt

# Train a data set

size_data = numpy.asarray([ 2104,  1600,  2400,  1416,  3000,  1985,  1534,  1427,
  1380,  1494,  1940,  2000,  1890,  4478,  1268,  2300,
  1320,  1236,  2609,  3031,  1767,  1888,  1604,  1962,
  3890,  1100,  1458,  2526,  2200,  2637,  1839,  1000,
  2040,  3137,  1811,  1437,  1239,  2132,  4215,  2162,
  1664,  2238,  2567,  1200,   852,  1852,  1203 ])
price_data = numpy.asarray([ 399900,  329900,  369000,  232000,  539900,  299900,  314900,  198999,
  212000,  242500,  239999,  347000,  329999,  699900,  259900,  449900,
  299900,  199900,  499998,  599000,  252900,  255000,  242900,  259900,
  573900,  249900,  464500,  469000,  475000,  299900,  349900,  169900,
  314900,  579900,  285900,  249900,  229900,  345000,  549000,  287000,
  368500,  329900,  314000,  299000,  179900,  299900,  239500 ])

# Test a data set

size_data_test = numpy.asarray([ 1600, 1494, 1236, 1100, 3137, 2238 ])
price_data_test = numpy.asarray([ 329900, 242500, 199900, 249900, 579900, 329900 ])

def normalize(array): 
    return (array - array.mean()) / array.std()

# Normalize a data set

size_data_n = normalize(size_data)
price_data_n = normalize(price_data)

size_data_test_n = normalize(size_data_test)
price_data_test_n = normalize(price_data_test)

# Display a plot
plt.plot(size_data, price_data, 'ro', label='Samples data')

samples_number = price_data_n.size

# TF graph input
X = tf.placeholder("float")
Y = tf.placeholder("float")

# Create a model

# Set model weights
W = tf.Variable(numpy.random.randn(), name="weight")
b = tf.Variable(numpy.random.randn(), name="bias")

# Set parameters
learning_rate = 0.1
training_iteration = 200

# Construct a linear model
model = tf.add(tf.mul(X, W), b)

# Minimize squared errors
cost_function = tf.reduce_sum(tf.pow(model - Y, 2))/(2 * samples_number) #L2 loss
optimizer = tf.train.GradientDescentOptimizer(learning_rate).minimize(cost_function) #Gradient descent

# Initialize variables
init = tf.initialize_all_variables()

# Launch a graph
with tf.Session() as sess:

    display_step = 20
    # Fit all training data
    for iteration in range(training_iteration):
        for (x, y) in zip(size_data_n, price_data_n):
  , feed_dict={X: x, Y: y})

        # Display logs per iteration step
        if iteration % display_step == 0:
            print "Iteration:", '%04d' % (iteration + 1), "cost=", "{:.9f}".format(, feed_dict={X:size_data_n, Y:price_data_n})),\
            "W=",, "b=",
    tuning_cost =, feed_dict={X: normalize(size_data_n), Y: normalize(price_data_n)})
    print "Tuning completed:", "cost=", "{:.9f}".format(tuning_cost), "W=",, "b=",
    # Validate a tuning model
    testing_cost =, feed_dict={X: size_data_test_n, Y: price_data_test_n})
    print "Testing data cost:" , testing_cost
    # Display a plot
    plt.plot(size_data_n, price_data_n, 'ro', label='Normalized samples')
    plt.plot(size_data_test_n, price_data_test_n, 'go', label='Normalized testing samples')
    plt.plot(size_data_n, * size_data_n +, label='Fitted line')

The plot below illustrates the normalized data set along with the trained model.




About the author

Sergey Kovalev is a senior software engineer with extensive experience in high-load application development, big data and NoSQL solutions, cloud computing, data warehousing, and machine learning. He has strong expertise in back-end engineering, applying the best approaches for development, architecture design, and scaling. He has solid background in software development practices, such as the Agile methodology, prototyping, patterns, refactoring, and code review. Now, Sergey’s main interest lies in big data distributed computing and machine learning.

To stay tuned with the latest updates, subscribe to our blog or follow @altoros.

Get new posts right in your inbox!


  • xtknight

    Thanks for the tutorial.
    How do we graph the original values instead of normalized values?
    To unnormalize should we multiply by training data’s std deviation and add mean , or multiply by testing data’s std deviation and add mean??
    Is it absolutely necessary to normalize the Y values?

    • Leonid Le

      Hi xtknight. Thank you for your interest.

      Sure, you may skip normalizing Y (or X) values, then the result W and b values would also be unnormalized, that is scaled in testing data’s coordinates. Such (result) model would be in some kind restricted in use (e.g., for cases where prices are set not in US dollars but in UK pounds, for instance).

      Please also note that using unnormilized data increases risk of numeric overflow during math calculations (in case of large numbers) or may lead to decreased precision (in case of too small float numbers). For example, in our sample you can calculate model using unnormalized price or size data but not both because of overflow in the Integer32 type.

      To unnormalize result data, we could use both training or testing data’s mean and std values as they should have similar scaling factor. In real life, analysts working with a model might not have access to training data and thus use a model with test (currently observable) data, so using testing’s mean and std is perfectly fine.

      So to graph the result model in original system of axes, you can unnormalize the regression line during charting (use the non-normalized training and testing data for charting samples):

      price_mean = price_data_test.mean()
      price_std = price_data_test.std()

      # Display a plot
      plt.plot(size_data, price_data, ‘ro’, label=’Normalized samples’)
      plt.plot(size_data_test, price_data_test, ‘go’, label=’Normalized testing samples’)
      plt.plot(size_data_n * size_std + size_mean, ( * size_data_n + * price_std + price_mean, label=’Fitted line’)


  • Kyle Sweet

    The mean and standard deviation values are for x and y training data so I’m confused – Can you please explain how to compute the actual m and b values we are looking for in the domain we are interested in (with units of “square footage” and “dollars”)?

    • Leonid Le

      Actually you are right, the training data’s mean and std deviation might not always be acceptable.

      If we consider the example, the two data sets – price_data_test and size_data_test are (or should be) the actual newly observed data which can be applied to the existing model. So they really are in the domain we are interested in. And we can just simply calculate their mean and std deviation values because we have the data right in place (for example we can use the Numpy arrays’s “std” and “mean” functions, please see my code example in the other answer below).

      Training data’s mean and std deviation values, in turn, might sometimes have the same scaling factor so be appropriate for unnormalizing the model, but they might have different scaling factor and so be not fully-appropriate. For example in our domain the “dollar” might have some level of inflation and thus in several years the mean price is changed, so applying training data’s mean value would lead to “old-price” expressed result, assuming the model itself would still correctly reflect the process.
      This is not to mention that training data might not always be available for any given model (especially in several years) or might have extremely huge volumes.

      • Kyle Sweet

        Hello. It is a great tutorial you shared, thanks!

        In the other answer I do see this:

        plt.plot(size_data_n * size_std + size_mean, ( * size_data_n + * price_std + price_mean, label=’Fitted line’)

        but I still don’t understand what are the values of “m” and “b” for this line as in the form y=mx+b (I thought this was the whole point of performing linear regression) and in general how to de-normalize the resulting weights and offsets that come back from training a tensorflow model.

        • Leonid Le

          Great question, thanks!
          Concerning the model and calculating resulting weight and offset we can start looking from the line in code:

          57 # Construct a linear model
          58 model = tf.add(tf.mul(X, W), b) ## That’s the linear model being trained up.

          The results of the training are the W and b (scalar) values; we can get them from the session after training:

          (W here is the m in your question)

          Thus we have the normalized model:

          It might be not obvious but we should apply the normalized input (x) data to our normalized model to have the result properly scaled and biased:
          Xn = size_data_n = normalize(size_data)
          Yn = W * Xn + b = W * size_data_n + b

          In the original post we used normalized test data set to charting the model so you can see the normalized scale.

          Now we like to unnormalize it and scale to our domain measures. Having x is our size axis and y is the price axis:

          X = denormalize_size(Xn) = denormalize_size(size_data_n) = size_data_n * size_std + size_mean = orig size_data

          Y = denormalize_price(Yn) = Yn * price_std + price_mean = (W * size_data_n + b) * price_std + price_mean

          As we discussed earlier, here we use testing data mean and std deviation values for denormalization:

          price_mean = price_data_test.mean()
          price_std = price_data_test.std()

          size_mean = size_data_test.mean()
          size_std = size_data_test.std()

          We then chart the model:

          plt.plot(X, Y)


          • Kyle Sweet

            For anybody interested in m and b values in the original domain of dollars and square feet, here is the math to compute the values. y, m, x, and b are the real world values. y’, m’, x’, and b’ are the standardized (a.k.a. normalized) values. Mu and sigma are the means and standard deviations. This example in this post solves for W and b which are represented by m’ and b’ in the math below


  • Jimmy Dfgh

    I’m puzzled why you use tf.mul in your model
    model = tf.add(tf.mul(X, W), b)
    I know the tf.mul has been replaced by tf.multiply in the latest version but if I substitute tf.multiply by tf.matmul, I get an InvalidArgument error.
    But I’ve seen matmul being used in other samples. What’s the difference?

    • Leonid Le

      It looks like tf.matmul is intended to multiply matrices or tensors with rank >2, while here we have just 1-dimensional vectors.

  • Pingback: Machine Learning with Tensorflow for Beginners – How to Install, Run and Understand Basic Machine Learning Demos –

Benchmarks and Research

Subscribe to new posts

Get new posts right in your inbox!