## Predznanje

- Namesto dveh značilk za temperaturo bom uvedel novo spremenljivko delta_t
- poleg spremenljivke theta bom uvedel še cos(theta)
- zdi se smiselno uvesti spremenljivko 1/izolacijski_indeks

Enote nam v tem primeru ne pomagajo, saj je količina, ki nosi neko informacijo o enotah le temperatura. Enote izolacijskega indeksa so verjetno take, da se v enačbi lepo pokrajšajo v W. Ker enote indeksa niso podane, si z njimi ne moremo pomagati. Očitno je le, da se je treba delta_t pomnožiti z neko drugo enačbo, da je Celzija izniči.

## Algoritem - naiven poskus
Pričakujem, da formula za toplotni tok posnema preproste oblike osnovnih formul pri toploti (Recimo Q=-kA deltaT). Zato bom poskusil z generiranjem vseh možnih enačb kratke oblike, ki posnemajo zgoraj izpeljana pravila. Formule bom generiral s pomočjo ProGeda, ker z gramatikami na enostaven način opišem željen prostor enačb.



In [1]:
import ProGED as pg
import pandas as pd
import numpy as np
from sklearn.preprocessing import PolynomialFeatures

In [2]:
# priprava podatkov
data = pd.read_csv('DN4_1_podatki.csv')
data['deltaT'] = data['Tw'] - data['Ta']
data['1_over_eta'] = 1 / data['eta']
proged_data = data.drop(columns=['Tw', 'Ta', 'eta'])
proged_data.head()

Unnamed: 0,Q,theta,deltaT,1_over_eta
0,1.711929,0.321921,42.217191,2.864645
1,0.067135,2.705824,8.535838,1.217925
2,0.034759,1.873776,3.654556,1.322609
3,0.417783,2.089589,12.586385,1.937483
4,1.667614,0.32835,34.837551,12.000908


Uporabil bom gramatiko, ki dela linearne izraze v produktih(kvocientih) spremenljivk đ. Prioritiziral bom enačbe z malo plusi (sledim klasičnih oblikam fizikalnih formul), prioritiziral bom spremenljivko sin(theta).

In [3]:
# priprava naivne gramatike
# grammar = "E -> E '+' 'c' '*' V [0.1] | 'c' '*' V [0.9]\n"
grammar = "V -> V '*' F [0.6]| 'c' '*' F [0.4]\n" #  | V '/' F [0.2] 
grammar += "F -> 'theta' [0.1] | 'cos' '(' 'theta'  ')' [0.3] | 'deltaT' [0.3] | '1_over_eta' [0.3]"

grammar = pg.GeneratorGrammar(grammar)

In [5]:
ED = pg.EqDisco(data=proged_data, 
                sample_size=500,
                lhs_vars=["Q"],
                rhs_vars=["theta", "deltaT", "1/eta"],
                strategy_settings = {"max_repeat":1000},
                generator = grammar,
                verbosity=1)

In [5]:
ED.generate_models()

[c*deltaT**2]
[c*deltaT]
[c*cos(theta)]
[c*theta**3]
[c*deltaT*cos(theta)]
[c*deltaT*cos(theta)]
[c*deltaT**2*theta]
[c*deltaT**2*theta]
[c*deltaT**2*theta]
[c*deltaT**2*theta]
[c*deltaT**2*cos(theta)**5]
[c*cos(theta)**2]
[c*theta]
[c*deltaT/eta]
[c*deltaT**4*theta*cos(theta)**3/eta]
[c*deltaT*theta*cos(theta)/eta]
[c*deltaT**2*cos(theta)/eta**2]
[c*deltaT**2*cos(theta)/eta**2]
[c*deltaT**2*cos(theta)/eta**2]
[c*deltaT**2*cos(theta)/eta**2]
[c*deltaT**2*cos(theta)/eta**2]
[c*deltaT**3*cos(theta)/eta**3]
[c*deltaT**3*cos(theta)/eta**3]
[c*deltaT**3*cos(theta)/eta**3]
[c*deltaT**3*cos(theta)/eta**3]
[c*deltaT/eta**3]
[c*deltaT/eta**3]
[c*cos(theta)**2/eta**2]
[c*cos(theta)**2/eta**2]
[c*cos(theta)**2/eta**2]
[c*cos(theta)**2/eta**2]
[c*deltaT*cos(theta)/eta]
[c*theta*cos(theta)**2]
[c*theta*cos(theta)**2]
[c*cos(theta)/eta]
[c*cos(theta)/eta]
[c*cos(theta)/eta]
[c*cos(theta)/eta]
[c*cos(theta)/eta]
[c*deltaT**3*cos(theta)**6/eta**2]
[c*deltaT**3*cos(theta)**6/eta**2]
[c*cos(theta)**3]
[

ModelBox: 500 models
-> [c*deltaT**2], p = 0.021599999999999998
-> [c*deltaT], p = 0.12
-> [c*cos(theta)], p = 0.12
-> [c*theta**3], p = 0.00014400000000000003
-> [c*deltaT*cos(theta)], p = 0.043199999999999995
-> [c*deltaT**2*theta], p = 0.003888
-> [c*deltaT**2*cos(theta)**5], p = 4.081466879999998e-06
-> [c*cos(theta)**2], p = 0.021599999999999998
-> [c*theta], p = 0.04000000000000001
-> [c*deltaT/eta], p = 0.043199999999999995
-> [c*deltaT**4*theta*cos(theta)**3/eta], p = 2.2039921151999992e-07
-> [c*deltaT*theta*cos(theta)/eta], p = 0.005598719999999996
-> [c*deltaT**2*cos(theta)/eta**2], p = 0.003275251199999999
-> [c*deltaT**3*cos(theta)/eta**3], p = 3.6733201919999976e-05
-> [c*deltaT/eta**3], p = 0.0027993599999999994
-> [c*cos(theta)**2/eta**2], p = 0.004199039999999999
-> [c*deltaT*cos(theta)/eta], p = 0.023327999999999995
-> [c*theta*cos(theta)**2], p = 0.003888
-> [c*cos(theta)/eta], p = 0.043199999999999995
-> [c*deltaT**3*cos(theta)**6/eta**2], p = 4.284560671948796e-09


In [6]:
ED.fit_models()

ModelBox: 500 models
-> [0.00208863155644536*deltaT**2], p = 0.021599999999999998, error = 0.6556113356661527, time = 0.47377610206604004
-> [0.0563767583856052*deltaT], p = 0.12, error = 0.7560964680393287, time = 0.07455921173095703
-> [0.0161840936823956*cos(theta)], p = 0.12, error = 1.2713405897993268, time = 0.02468419075012207
-> [0.0306150350590362*theta**3], p = 0.00014400000000000003, error = 1.2226484279777616, time = 0.07267022132873535
-> [0.00240772519430417*deltaT*cos(theta)], p = 0.043199999999999995, error = 1.2710289378107122, time = 0.06153106689453125
-> [0.00107152571936803*deltaT**2*theta], p = 0.003888, error = 0.8261948614369855, time = 0.11239504814147949
-> [0.000133668137037507*deltaT**2*cos(theta)**5], p = 4.081466879999998e-06, error = 1.2709217131383665, time = 0.11789917945861816
-> [0.571913943307724*cos(theta)**2], p = 0.021599999999999998, error = 1.2221660655705235, time = 0.04433798789978027
-> [0.337834861309631*theta], p = 0.04000000000000001, erro

In [7]:
ED.get_results(5)

ModelBox: 5 models
-> [0.00208863902930273*deltaT**2], p = 7.346640383999997e-07, error = 0.6556113356546375, time = 0.07729029655456543
-> [0.00208863907710713*deltaT**2], p = 0.0011337407999999997, error = 0.6556113356546384, time = 0.13623666763305664
-> [0.0020886386913216*deltaT**2], p = 0.004199039999999999, error = 0.655611335654658, time = 0.12546396255493164
-> [0.00208863635853096*deltaT**2], p = 0.011663999999999997, error = 0.6556113356560924, time = 0.15433311462402344
-> [0.00208863155644536*deltaT**2], p = 0.021599999999999998, error = 0.6556113356661527, time = 0.47377610206604004

Gramatike se obnesejo slabo. Poskusil bom še z linearno regresijo. Ponovno bom dodal značilki 1/eta in sin(theta), ter deltaT. Zanimajo me linearni izrazi v produktih spremenljivk.

In [3]:
data['sin_of_theta'] = np.sin(data['theta'])
data_lr = data.drop(columns=['Tw', 'Ta', 'eta', 'theta'])

poly = PolynomialFeatures(5, include_bias=False)
X = poly.fit_transform(data_lr.drop('Q', axis=1))

imena_stolpcev = poly.get_feature_names_out()
data_lr = pd.DataFrame(X, columns=imena_stolpcev)
data_lr.head()

Unnamed: 0,deltaT,1_over_eta,sin_of_theta,deltaT^2,deltaT 1_over_eta,deltaT sin_of_theta,1_over_eta^2,1_over_eta sin_of_theta,sin_of_theta^2,deltaT^3,...,deltaT 1_over_eta^3 sin_of_theta,deltaT 1_over_eta^2 sin_of_theta^2,deltaT 1_over_eta sin_of_theta^3,deltaT sin_of_theta^4,1_over_eta^5,1_over_eta^4 sin_of_theta,1_over_eta^3 sin_of_theta^2,1_over_eta^2 sin_of_theta^3,1_over_eta sin_of_theta^4,sin_of_theta^5
0,42.217191,2.864645,0.31639,1782.291254,120.937275,13.357081,8.206192,0.906344,0.100102,75243.331122,...,313.995968,34.679708,3.830247,0.423037,192.909764,21.306179,2.35319,0.259901,0.028705,0.00317
1,8.535838,1.217925,0.422107,72.860533,10.39601,3.60304,1.483341,0.514095,0.178175,621.925724,...,6.509246,2.255969,0.781872,0.27098,2.679802,0.928763,0.32189,0.11156,0.038664,0.0134
2,3.654556,1.322609,0.954452,13.355782,4.833548,3.488097,1.749294,1.262366,0.910978,48.809456,...,8.070168,5.823782,4.202693,3.032845,4.047219,2.920648,2.107666,1.520983,1.097607,0.792081
3,12.586385,1.937483,0.868418,158.417075,24.385901,10.930248,3.753839,1.682546,0.754151,1993.898215,...,79.495662,35.631544,15.970769,7.158418,27.301659,12.237149,5.484935,2.458457,1.10193,0.493907
4,34.837551,12.000908,0.322482,1213.654987,418.082233,11.234482,144.021782,3.870076,0.103995,42280.768004,...,19417.589949,521.779089,14.020969,0.376764,248926.107778,6689.009194,179.743476,4.829971,0.129788,0.003488


In [9]:
from lr import lasso_regresija
izraz, napaka = lasso_regresija(data_lr, data['Q'], lam=1)
print(izraz)
print(napaka)

500
0.412*deltaT + 0.623*1/eta + 0.173*sin(theta) + 0.470*deltaT^2 + 0.316*deltaT 1/eta + 0.015*deltaT sin(theta) + 0.176*1/eta^2 + 0.871*1/eta sin(theta) + 0.475*sin(theta)^2 + 0.160*deltaT^3 + 0.915*deltaT^2 1/eta + 0.323*deltaT^2 sin(theta) + 0.770*deltaT 1/eta^2 + 0.050*deltaT 1/eta sin(theta) + 0.644*deltaT sin(theta)^2 + 0.239*1/eta^3 + 0.357*1/eta^2 sin(theta) + 0.207*1/eta sin(theta)^2 + 0.329*sin(theta)^3 + 0.066*deltaT^4 + 0.332*deltaT^3 1/eta + 0.021*deltaT^3 sin(theta) + 0.217*deltaT^2 1/eta^2 + 0.388*deltaT^2 1/eta sin(theta) + 0.455*deltaT^2 sin(theta)^2 + 0.129*deltaT 1/eta^3 + 0.259*deltaT 1/eta^2 sin(theta) + 0.459*deltaT 1/eta sin(theta)^2 + 0.719*deltaT sin(theta)^3 + 0.126*1/eta^4 + 0.343*1/eta^3 sin(theta) + 0.624*1/eta^2 sin(theta)^2 + 0.798*1/eta sin(theta)^3 + 0.245*sin(theta)^4 + 0.059*deltaT^4 1/eta + 0.102*deltaT^4 sin(theta) + 0.945*deltaT^3 1/eta sin(theta) + 0.176*deltaT^3 sin(theta)^2 + 0.058*deltaT^2 1/eta^3 + 0.036*deltaT^2 1/eta^2 sin(theta) + 0.566*de

Tudi linearna regresija ne daje odličnih rezultatov. 

In [5]:
data_pysr = data[[ 'deltaT', 'sin_of_theta', '1_over_eta']]
data_pysr.head()

Unnamed: 0,deltaT,sin_of_theta,1_over_eta
0,42.217191,0.31639,2.864645
1,8.535838,0.422107,1.217925
2,3.654556,0.954452,1.322609
3,12.586385,0.868418,1.937483
4,34.837551,0.322482,12.000908


In [4]:
from pysr import PySRRegressor

model = PySRRegressor(
    niterations=100,  # < Increase me for better results
    binary_operators=["+", "*"],
    unary_operators=[
        # "cos",
        # "exp",
        # "sin",
        "inv(x) = 1/x",
        # ^ Custom operator (julia syntax)
    ],
    extra_sympy_mappings={"inv": lambda x: 1 / x},
    # ^ Define operator for SymPy as well
    loss="loss(prediction, target) = (prediction - target)^2",
    # ^ Custom loss function (julia syntax)
)

In [6]:
model.fit(data_pysr, data['Q'])

  if X.columns.is_object() and X.columns.str.contains(" ").any():


Compiling Julia backend...


FileNotFoundError: Julia is not installed in your PATH. Please install Julia and add it to your PATH.

Current PATH: /bin:/home/urh/.local/bin:/usr/local/sbin:/usr/local/bin:/usr/sbin:/usr/bin:/sbin:/bin:/usr/games:/usr/local/games:/snap/bin:/snap/bin:/home/urh/.local/bin:/usr/local/sbin:/usr/local/bin:/usr/sbin:/usr/bin:/sbin:/bin:/usr/games:/usr/local/games:/snap/bin:/snap/bin

## Ročna oblika
Poskusil bom še z ročno hervistiko. Iskal bom enačbo oblike deltaT^csin(theta)^c eta^c

In [8]:
data_rocno = data[ ['deltaT', 'sin_of_theta', '1_over_eta']]
data_rocno.head()

Unnamed: 0,deltaT,sin_of_theta,1_over_eta
0,42.217191,0.31639,2.864645
1,8.535838,0.422107,1.217925
2,3.654556,0.954452,1.322609
3,12.586385,0.868418,1.937483
4,34.837551,0.322482,12.000908


In [16]:
from scipy.optimize import minimize

def equation(params):
    a, b, c, d = params
    q = a * data['deltaT']**b * data['sin_of_theta']**c * data['1_over_eta']**d
    return np.sum((data['Q'] - q)**2) / len(data)

results= minimize(equation, [1,1,1,1])
results["x"]

array([0.0021297 , 2.05180646, 1.02670162, 0.22636962])

In [17]:
a, b, c, d = results["x"]
print(f'Povprečna kvadratna napaka enačbe {a}deltaT^{b}sin(theta)^{c}eta^(-{d}) je  {results["fun"]}' )


Povprečna kvadratna napaka enačbe 0.0021297039865321972deltaT^2.0518064574801134sin(theta)^1.0267016242714626eta^(-0.2263696173867843) je  0.0375624885916007
