News

NumPy python tutorial

Perform powerful calculations with NumPy, SciPy and Matplotlib

The myriad things you can pull off with SciPy

NumPy is the primary Python package for performing scientific computing. It has a powerful N-dimensional array object, tools for integrating C/C++ and Fortran code, linear algebra, Fourier transform, and random number capabilities, among other things. NumPy also supports broadcasting, which is a clever way for universal functions to deal in a meaningful way with inputs that do not have exactly the same form.

Apart from its capabilities, the other advantage of NumPy is that it can be integrated into Python programs. In other words, you may get your data from a database, the output of another program, an external file or an HTML page and then process it using NumPy.

This article will show you how to install NumPy, make calculations, plot data, read and write external files, and it will introduce you to some Matplotlib and SciPy packages that work well with NumPy.

NumPy also works with Pygame, a Python package for creating games, though explaining its use is beyond of the scope of this article.
It is considered good practice to try the various NumPy commands inside the Python shell before putting them into Python programs.

The examples in this article are using either Python shell or iPython.

The myriad things you can pull off with SciPy
The myriad things you can pull off with SciPy

Resources

NumPy

SciPy

Matplotlib

Step-by-step

Step 01 Installing NumPy

Most Linux distributions have a ready-to-install package you can use. After installation, you can find out the NumPy version you are using by executing the following:

$ python
Python 2.7.3 (default, Mar 13 2014, 11:03:55) [GCC 4.7.2] on linux2
Type "help", "copyright", "credits" or "license" for more information.
>>> numpy.version.version
Traceback (most recent call last): File "", line 1, in 
NameError: name 'numpy' is not defined >>> import numpy
>>> numpy.version.version
'1.6.2'
>>>

Not only have you found the NumPy version but you also know that NumPy is properly installed.

Step 02 About NumPy

Despite its simplistic name, NumPy is a powerful Python package that is mainly for working with arrays and matrices.

There are many ways to create an array but the simplest is by using the array() function:

>>> oneD = array([1,2,3,4])

The aforementioned command creates a one-dimensional array. If you want to create a two-dimensional array, you can use the array() function as follows:

>>> twoD = array([ [1,2,3],
... [3,3,3],
... [-1,-0.5,4],
... [0,1,0]] )

You can also create arrays with more dimensions.

Step 03 Making simple calculations using NumPy

Given an array named myArray, you can find the minimum and maximum values in it by executing the following commands:

>>> myArray.min()
>>> myArray.max()

Should you wish to find the mean value of all array elements, run the next command:

>>> myArray.mean()

Similarly, you can find the median of the array by running the following command:

>>> median(myArray)

The median value of a set is an element that divides the data set into two subsets (left and right subsets) with the same number of elements. If the data set has an odd number of elements, then the median is part of the data set. On the other side, if the data set has an even number of elements, then the median is the mean value of the two centre elements of the sorted data set.

Step 04 Using arrays with NumPy

NumPy not only embraces the indexing methods used in typical Python for strings and lists but also extends them. If you want to select a given element from an array, you can use the following notation:

>>> twoD[1,2]

You can also select a part of an array (a slice) using the following notation:

>>> twoD[:1,1:3]

Finally, you can convert an array into a Python list using the tolist() function.

Step 05 Reading files

Imagine that you have just extracted information from an Apache log file using AWK and you want to process the text file using NumPy.

The following AWK code finds out the total number of requests per hour:

$ cat access.log | cut -d[ -f2 | cut -d] -f1 | awk -F: '{print $2}' | sort -n | uniq -c | awk '{print $2, $1}' > timeN.txt

The format of the text file (timeN.txt) with the data is the following:

00 191
01 225
02 121
03 104

Reading the timeN.txt file and assigning it to a new array variable can be done as follows:

aa = np.loadtxt("timeN.txt")

Step 06 Writing to files

Writing variables to a file is largely similar to reading a file. If you have an array variable named aa1, you can easily save its contents into a file called aa1.txt by using the following command:

In [17]: np.savetxt("aa1.txt", aa1)

As you can easily imagine, you can read the contents of aa1.txt later by using the loadtxt() function.

Step 07 Common functions

NumPy supports many numerical and statistical functions. When you apply a function to an array, the function is automatically applied to all array elements.

When working with matrices, you can find the inverse of a matrix AA by typing “AA.I”. You can also find its eigenvalues by typing “np.linalg. eigvals(AA)” and its eigenvector by typing “np. linalg.eig(BB)”.

Step 08 Working with matrices

A special subtype of a two-dimensional NumPy array is a matrix. A matrix is like an array except that matrix multiplication replaces element-by-element multiplication. Matrices are generated using the matrix (or mat) function as follows:

In[2]:AA=np.mat('011;111;111')

You can add two matrices named AA and BB by typing AA + BB. Similarly, you can multiply them by typing AA * BB.

Step 09 Plotting with Matplotlib

The first move you should make is to install Matplotlib. As you can see, Matplotlib has many dependencies that you should also install.

The first thing you will learn is how to plot a polynomial function. The necessary commands for plotting the 3x^2-x+1 polynomial are the following:

import numpy as np
import matplotlib.pyplot as plt
myPoly = np.poly1d(np.array([3, -1, 1]).astype(float))
x = np.linspace(-5, 5, 100)
y = myPoly(x)
plt.xlabel('x values')
plt.ylabel('f(x) values')
xticks = np.arange(-5, 5, 10)
yticks = np.arange(0, 100, 10)
plt.xticks(xticks)
plt.yticks(yticks)
plt.grid(True)
plt.plot(x,y)/

The variable that holds the polynomial is myPoly. The range of values that will be plotted for x is defined using “x = np.linspace(-5, 5, 100)”. The other important variable is y, which calculates and holds the values of f(x) for each x value.

It is important that you start ipython using the “ipython –pylab=qt” parameters in order to see the output on your screen.

If you are interested in the plotting of polynomial functions, you should experiment more, as NumPy can also calculate the derivatives of a function and plot multiple functions in the same output.

Step 10 About SciPy

SciPy is built on top of NumPy and is more advanced than NumPy. It supports numerical integration, optimisations, signal processing, image and audio processing, and statistics. The example below uses a small part of the scipy.stats package that is about statistics.

In [36]: from scipy.stats import poisson, lognorm
In [37]: mySh = 10;
In [38]: myMu = 10;
In [39]: ln = lognorm(mySh)
In [40]: p = poisson(myMu)
In [41]: ln.rvs((10,))
Out[41]:
array([ 9.29393114e-02, 1.15957068e+01, 9.78411983e+01, 8.26370734e-07, 5.64451441e-03, 4.61744055e-09, 4.98471222e-06, 1.45947948e+02, 9.25502852e-06, 5.87353720e-02])
In [42]: p.rvs((10,))
Out[42]: array([12, 11, 9, 9, 9, 10, 9, 4, 13, 8])
In [43]: ln.pdf(3)
Out[43]: 0.013218067177522842

The example uses two statistics distributions and may be difficult to understand even if you know mathematics, but it is presented in order to give you a better taste of SciPy commands.

Step 11 Using SciPy for image processing

Now we will show you how to process and transform a PNG image using SciPy.
The most important part of the code is the following line:

image = np.array(Image.open('SA.png').convert('L'))

This line allows you to read a usual PNG file and convert it into a NumPy array for additional processing. The program will also separate the output into four parts and displays a different image for each of these four parts.

Step 12 Other useful functions

It is very useful to be able to find out the data type of the elements in an array; it can be done using the dtype() function.

Similarly, the ndim() function returns the number of dimensions of an array.

When reading data from external files, you can save their data columns into separate variables using the following way:

In [10]: aa1,aa2 = np.loadtxt("timeN.txt", usecols=(0,1), unpack=True)

The aforementioned command saves column 1 into variable aa1 and column 2 into variable aa2. The “unpack=True” allows the data to be assigned to two different variables. Please note that the numbering of columns starts with 0.

Step 13 Fitting to polynomials

The NumPy polyfit() function tries to fit a set of data points to a polynomial. The data was found from the timeN.txt file, created earlier in this article.

The Python script uses a fifth degree polynomial, but if you want to use a different degree instead then you only have to change the following line:

coefficients = np.polyfit(aa1, aa2, 5)

Step 14 Array broadcasting in NumPy

To close, we will talk more about array broadcasting because it is a very useful characteristic. First, you should know that array broadcasting has a rule: in order for two arrays to be considered for array broadcasting, “the size of the trailing axes for both arrays in an operation must either be the same size or one of them must be one.”

Put simply, array broadcasting allows NumPy to “change” the dimensions of an array by filling it with data in order to be able to do calculations with another array. Nevertheless, you cannot stretch both dimensions of an array to do your job.

×