March 31 - April 2, 2025, in Las Vegas, Nevada. Use code MSCUST for a $150 discount! Early bird discount ends December 31.
Register NowBe one of the first to start using Fabric Databases. View on-demand sessions with database experts and the Microsoft product team to learn just how easy it is to get started. Watch now
This article focuses on exploring the use of interpolation and approximation of data sets in Power BI utilizing Python.
Given a data set, as depicted in the following figure:
x = np.array([15, 20, 25, 30, 35, 40, 45, 50, 55, 60, 65, 70, 75])
y1 = np.array([243, 265, 287, 313, 343, 373, 401, 435, 475, 519, 571, 630, 699])
y2 = np.array([244, 269, 291, 320, 350, 379, 406, 442, 482, 526, 578, 633, 703])
y3 = np.array([371, 379, 395, 411, 427, 445, 463, 485, 509, 533, 555, 583, 615])
y4 = np.array([374, 382, 399, 416, 433, 457, 482, 509, 533, 561, 587, 615, 645])
y5 = np.array([370, 382, 390, 398, 408, 420, 432, 446, 460, 476, 494, 514, 538])
y6 = np.array([370, 388, 398, 410, 424, 438, 452, 466, 482, 500, 518, 540, 564])
y7 = np.array([348, 378, 408, 454, 508, 566, 638, 720, 0 , 0 , 0 , 0 , 0 ])
An important aspect to be considered is that a portion of our y7 data contains zeroes, which will be crucial to bear in mind during further calculations.
The aim is to construct derivatives from the data for all values of x and y, inclusive of y2, y3, etc. To facilitate this, we commence by building an interpolation based on these data, from which we will then derive.
import numpy as np
import matplotlib.pyplot as plt
from scipy import interpolate
x = np.array([15, 20, 25, 30, 35, 40, 45, 50, 55, 60, 65, 70, 75])
y1 = np.array([243, 265, 287, 313, 343, 373, 401, 435, 475, 519, 571, 630, 699])
y2 = np.array([244, 269, 291, 320, 350, 379, 406, 442, 482, 526, 578, 633, 703])
y3 = np.array([371, 379, 395, 411, 427, 445, 463, 485, 509, 533, 555, 583, 615])
y4 = np.array([374, 382, 399, 416, 433, 457, 482, 509, 533, 561, 587, 615, 645])
y5 = np.array([370, 382, 390, 398, 408, 420, 432, 446, 460, 476, 494, 514, 538])
y6 = np.array([370, 388, 398, 410, 424, 438, 452, 466, 482, 500, 518, 540, 564])
y7 = np.array([348, 378, 408, 454, 508, 566, 638, 720, 0 , 0 , 0 , 0 , 0 ])
y8 = np.array([351, 388, 430, 478, 528, 588, 660, 740, 0 , 0 , 0 , 0 , 0 ])
y9 = np.array([358, 378, 400, 424, 463, 502, 538, 584, 628, 680, 0 , 0 , 0 ])
y10 = np.array([363, 383, 402, 430, 469, 507, 545, 590, 633, 685, 0 , 0 , 0 ])
y11 = np.array([243, 269, 295, 320, 350, 383, 416, 453, 497, 545, 593, 662, 722])
y12 = np.array([243, 264, 287, 313, 339, 372, 405, 445, 486, 534, 582, 649, 711])
y_arr = np.concatenate((y1,y2,y3,y4,y5,y6,y7,y8,y9,y10,y11,y12), axis = 0)
y_arr = np.reshape(y_arr, (12, 13))
i = 1
for y_list in y_arr:
x = np.array([15, 20, 25, 30, 35, 40, 45, 50, 55, 60, 65, 70, 75])
y_list = y_list[y_list > 0]
x = np.resize(x, len(y_list))
minx = np.min(x)
maxx = np.max(x)
tck = interpolate.splrep(x, y_list, s=0)
xnew = np.arange(minx, maxx, 0.5)
ynew = interpolate.splev(xnew, tck, der=0)
yder2 = interpolate.splev(xnew, tck, der=1)
fig, ax1 = plt.subplots()
ax2 = ax1.twinx()
ax1.plot(x, y_list, 'o', xnew, ynew, x, y_list, 'b')
plt.title('Cubic-spline interpolation for ' + str(i) + ' line')
ax2.plot(xnew, yder2)
plt.show()
i = i+1
Following the implementation of the aforementioned method, we are presented with the subsequent graph. This illustrates the original data points along with the derivative values:
An observation of the derivative reveals a considerable irregularity, which is primarily due to the non-uniform arrangement of data points. This irregularity may impose challenges for any conclusions to be drawn from these data in the future.
Hence, we aim to improve the initial code to perform data approximation instead of interpolation, and then compute the derivative from the resulting function.
import scipy as sp
import numpy as np
import matplotlib.pyplot as plt
from scipy import interpolate
x = np.array([15, 20, 25, 30, 35, 40, 45, 50, 55, 60, 65, 70, 75])
y = np.array([243, 265, 287, 313, 343, 373, 401, 435, 475, 519, 571, 630, 699])
y2 = np.array([244, 269, 291, 320, 350, 379, 406, 442, 482, 526, 578, 633, 703])
y3 = np.array([371, 379, 395, 411, 427, 445, 463, 485, 509, 533, 555, 583, 615])
y4 = np.array([374, 382, 399, 416, 433, 457, 482, 509, 533, 561, 587, 615, 645])
y5 = np.array([370, 382, 390, 398, 408, 420, 432, 446, 460, 476, 494, 514, 538])
y6 = np.array([370, 388, 398, 410, 424, 438, 452, 466, 482, 500, 518, 540, 564])
y7 = np.array([348, 378, 408, 454, 508, 566, 638, 720, 0 , 0 , 0 , 0 , 0 ])
y8 = np.array([351, 388, 430, 478, 528, 588, 660, 740, 0 , 0 , 0 , 0 , 0 ])
y9 = np.array([358, 378, 400, 424, 463, 502, 538, 584, 628, 680, 0 , 0 , 0 ])
y10 = np.array([363, 383, 402, 430, 469, 507, 545, 590, 633, 685, 0 , 0 , 0 ])
y11 = np.array([243, 269, 295, 320, 350, 383, 416, 453, 497, 545, 593, 662, 722])
y12 = np.array([243, 264, 287, 313, 339, 372, 405, 445, 486, 534, 582, 649, 711])
y_arr = np.concatenate((y,y2,y3,y4,y5,y6,y7,y8,y9,y10,y11,y12), axis = 0)
y_arr = np.reshape(y_arr, (12, 13))
i = 1
for y_list in y_arr:
x = np.array([15, 20, 25, 30, 35, 40, 45, 50, 55, 60, 65, 70, 75])
y = y_list
y = y[y > 0]
x = np.resize(x, len(y))
minx = np.min(x)
maxx = np.max(x)
d=3 # степень полинома
fp, residuals, rank, sv, rcond = np.polyfit(x, y, d, full=True) # Модель
f = sp.poly1d(fp) # аппроксимирующая функция
y1=[fp[0]*x[i]**2+fp[1]*x[i]+fp[2] for i in range(0,len(x))] # значения функции a*x**2+b*x+c
so=round(sum([abs(y[i]-y1[i]) for i in range(0,len(x))])/(len(x)*sum(y))*100,4) # средняя ошибка
fx = np.linspace(x[0], x[-1] + 1, len(x)) # можно установить вместо len(x) большее число для интерполяции
y_list = f(fx)
tck = interpolate.splrep(x, y_list, s=0)
xnew = np.arange(minx, maxx, 0.5)
ynew = interpolate.splev(xnew, tck, der=0)
yder2 = interpolate.splev(xnew, tck, der=1)
fig, ax1 = plt.subplots()
ax2 = ax1.twinx()
ax1.plot(x, y_list, 'o', xnew, ynew, x, y_list, 'b')
plt.title('Approximation for ' + str(i) + ' line')
ax2.plot(xnew, yder2)
plt.show()
i = i + 1
The result is displayed below:
As demonstrated, employing an approximation function results in a significantly smoother derivative of the function.
We proceed by examining the implementation for y8:
Upon inspection, we notice that excessive data along the x-axis is eliminated, with no attempt to transition to zero.
Subsequently, to continue working with the code and set up the necessary visualizations, we transfer the scripts into Power BI, as displayed below:
The contrast between the two approaches - interpolation and approximation - can be vividly discerned here.
You must be a registered user to add a comment. If you've already registered, sign in. Otherwise, register and sign in.