maths with Python 4: Loading data.

First tutorial was about installing python, some packages, and using it on basic maths.


On the second one we use python to solve the Rössler system.


In the third one, we switch to Anaconda distribution of Python.


and use it to solve the diffusion equation.


Now we are going to go back to an old post: A Little bit DataMining on Web.

So, we are going to upload some data on Python and plot it. And we are going to use as source the United States, again American Fact Finder


Step 1. Get data from American FactFinder. We are going to look for data about New York. And for that, we use the tool on the main webpage.


Step 2. Now lets move into population, let’s see what they have…
Step 3. Ok. Here we are, the data about population age distribution. Let’s download it.


Common download, no extras.

Step 4. Open it with some text editor to see how it looks alike.

So, there is 4 files. One is a readme describing the files. Another contains notes about the data. The third one contains the labels for the data (age ranges), and the fourth one contains the data as numbers.

The main file here is DEC_10_DP_DPDP1_with_ann.csv which is 2 rows of data. The first row are keys for the description of the data, and the second one is the data.

The file DEC_10_DP_DPDP1_metadata.csv Contains the keys again with the description of what they mean.

Step 5. The code! (Remember that WordPress don’t allow to copy the indentation when copying from Python).

#This is for selecting the file to be opened using a graphic interface to do it.
from Tkinter import Tk
from tkFileDialog import askopenfilename
Tk().withdraw() # we don't want a full GUI, so keep the root window from appearing
filename = askopenfilename() # show an "Open" dialog box and return the path to the selected file
#Now the path to our file is in the variable called filename
#We are going to import the data from FactFinder. On this kind of file there is two rows, the first one with the keys and the second one with the data.
#There is a second file where the keys are aexplained.
import csv
with open(filename, 'rb') as csvfile:
spamreader = csv.reader(csvfile, delimiter=',', quotechar='|')
for row in spamreader:
if i==0:
for k in range(size(row)):
for k in range(size(row)):

#At this point, we have a list with the labels (keys) and a second list with the data.

#The next part is to create a dictionary with the keys and the data. A dictionary is like a kind of list where you adress entries writting the key isntead of the coordinate.
for k in range(size(labels)):
for key, val in l:
d.setdefault(key, []).append(float(val));
except (NameError, ValueError):
d.setdefault(key, []).append(val)
#Now make the plot like common demographic plots.
ind = 4*np.arange(18) #We know that the age groups are 18 separated by 4 years.
#And taking a look at the file of the keys description, we found that the keys take the form HD02_S0... So basically, we put the data we want to use into arrays.
for k in range(18):
for k in range(18):
#First the men on the left.
ax1 = fig.add_subplot(121)
#This is an horizontal bar plot.
ax1.barh( ind, men, width, color='blue')
#But we need to set the x ticks and it's labels.
#We only want 4 ticks
ticks=[max(max(men),max(women))*z/4 for z in range(4)]
#And now, we need to write the ticks labest and say that axis is by 10^3
ticks2=[int(z/1000) for z in ticks]
#Finally, we invert this axis
#Second, women in the rigth.
ax2 = fig.add_subplot(122)
ax2.barh(ind, women, width, color='pink',)
#We only want 4 ticks
#As an extra, we can use the Geo label to save the graph as an SVG file with the Name of the city.

And this ir the output.


So… the code has lots of explanations, but a little bit more will be better.
The code has 3 parts.

The first one is opening the files. That is made using a GUI (Graphical User Interface), basically a windows like open file dialog. This is quite useful when we don’t want to know where our data is located or we are dealing with many different files.

The second one is quite standard for reading CSV files. Since we know that our file is just two rows, we organize what we read from the file into two lists, one for the labes and another one for the data. Once we have them, we build a dictionary with them. Basically, a kind of list where you address the elements using keys instead of coordinates. Once the dictionary is ready, we can use it to create 2 arrays, one of the men data and the other one with the woman data.

The last part is just a little formatting to plot the data into bar plots.

And that’s all.







Ok, so here it is. Since we have the code, is quite easy to download more data and plot it. So a few more graphs, and a little bit Inkscape… can do very nice plots. See you soon!

usaTO dig more into Python I/O files.


5 thoughts on “maths with Python 4: Loading data.”

Leave a Reply

Fill in your details below or click an icon to log in: Logo

You are commenting using your account. Log Out / Change )

Twitter picture

You are commenting using your Twitter account. Log Out / Change )

Facebook photo

You are commenting using your Facebook account. Log Out / Change )

Google+ photo

You are commenting using your Google+ account. Log Out / Change )

Connecting to %s