## Introduction to linear regression

In a linear regression model, each target variable is estimated to be a weighted sum of the input variables, offset by some constant, known as a bias

```
yield_apple  = w11 * temp + w12 * rainfall + w13 * humidity + b1
yield_orange = w21 * temp + w22 * rainfall + w23 * humidity + b2
```
> w11,w12,w13,w21,w22,w31就是权重，b1以及b2就是bias

The learning part of linear regression is to figure out a set of weights w11, w12,... w23, b1 & b2 using the training data, to make accurate predictions for new data. The learned weights will be used to predict the yields for apples and oranges in a new region using the average temperature, rainfall, and humidity for that region.

We'll train our model by adjusting the weights slightly many times to make better predictions, using an optimization technique called `gradient descent`. Let's begin by importing Numpy and PyTorch.

In [None]:
import numpy as np, torch

# Input (temp, rainfall, humidity)
inputs = np.array([[73, 67, 43],
                   [91, 88, 64],
                   [87, 134, 58],
                   [102, 43, 37],
                   [69, 96, 70]], dtype='float64')
# Targets (apples, oranges)
targets = np.array([[56, 70],
                    [81, 101],
                    [119, 133],
                    [22, 37],
                    [103, 119]], dtype='float64')
# Convert arrays into pytorch tensors
inputs = torch.from_numpy(inputs)
targets = torch.from_numpy(targets)
print(inputs.shape) ## 一般是获取几行几列矩阵信息
print(targets)


## Linear regression model from scratch

The weights and biases (w11, w12,... w23, b1 & b2) can also be represented as matrices, initialized as random values. The first row of w and the first element of b are used to predict the first target variable, i.e., yield of apples, and similarly, the second for oranges.

In [None]:
# Weights and biases
w = torch.randn(2, 3, requires_grad=True)
b = torch.randn(2, requires_grad=True)
print(w)
print(b)

torch.randn creates a tensor with the given shape, with elements picked randomly from a `normal distribution` with `mean 0` and `standard deviation(标准偏差)` 1.

Our model is simply a function that performs a matrix multiplication of the inputs and the weights w (transposed) and adds the bias b (replicated for each observation).