Python For Everyone: Advanced Course

Python For Everyone: Advanced Course by @jeremy_carson Presentation developed with Remark 1 / 112 Details This presentation was developed with Rema...

Author: Elinor Stevens

5 downloads 0 Views 5MB Size

Report

Download PDF

Recommend Documents

Python course in Bioinformatics

A Crash Course in Python

Python: Part 3. Daniel Lucio. Python Crash Course

C for Everyone. C for Everyone

Arabic Advanced Course

GAS TANKERS. Advanced Course

ADVANCED ADMINISTRATION Course Notes

SolidWorks Advanced Course Overview

IOC Advanced Training Course

Python for Astronomers

COURSE SYLLABUS Advanced Research Methods

Advanced Training Pre-Course Workbook

Potty training: the advanced course

Python for Informatics

Python for Image Processing

11 th Advanced Arthroscopy Course

Advanced Gamma Spectroscopy Course Outline

Florida Building Code Advanced Course

Poetry is for everyone:

Budgeting Basics for Everyone

NETWORK STORAGE FOR EVERYONE

Implementing Python for DrRacket

Python: Introduction for Programmers

Python For Everyone: Advanced Course by @jeremy_carson Presentation developed with Remark

1 / 112

Details This presentation was developed with Remark, a MarkDown presentation API. Course materials Python Advanced Course [HTML] [PDF] Raw Markdown Data [course data] Source code for presentation and any examples [link] License Unless otherwise specified, all Python Presentation source code is released under the MIT license. This tutorial uses examples from several sources, I tried to provide attribution where applicable. Many online tutorials I referenced do not include an explicit license so I attributed it to their website. The MIT License (MIT) Copyright (c) 2014 Jeremy Carson Permission is hereby granted, free of charge, to any person obtaining a copy of this software and associated documentation files (the "Software"), to deal in the Software without restriction, including without limitation the rights to use, copy, modify, merge, publish, distribute, sublicense, and/or sell copies of the Software, and to permit persons to whom the Software is furnished to do so, subject to the following conditions: The above copyright notice and this permission notice shall be included in all copies or substantial portions of the Software.

2 / 112

Advanced Fundamentals

3 / 112

Regular Expressions Regular expressions (regexes, REs, regex patterns) are algebraic ways of describing natural language. Features: Incredibly useful for parsing semistructured data Pattern Matching/Searching (e.g. Exact match vs Partial match) String Splitting Substring Extraction Semistructured data: name,phone jenny,867‐5309 other,867‐5308 jason,1234567 mike,(978) 407‐1866

Row = (Name,PhoneNumber) Phone numbers are not well formatted (but are good enough for REs) No str.find() method is smart enough extract these data. Requires pattern matching. Regular Expressions are a language unto themselves, read up: Official Python Regex Tutorial [link] Official Python Regex Documentation [link] Good Regular Expression Site [link]

4 / 112

Regular Expressions

5 / 112

Regular Expressions Example: \b\d{1,3}\.\d{1,3}\.\d{1,3}\.\d{1,3}\b

Regex metacharacters: . ^ $ * + ? { } [ ] \ | ( )

These are not definitive. Several metacharacters even have different meanings, depending on context. See the [official docs] for more [] Match anything inside the square brackets for ONE character position. A ‐ inside brackets indicates a range separator. [abcdefg] could be written as [a‐g] A ^ inside brackets indicates a negation. [^abc] match any characaters not a,b,c. \ Escape character with special meaning. e.g. \d is equivalent to [0‐9] You can also use the backslash to remove special meanings. e.g. finding square brackets: [\[\]] Special sequences can be included in character classes. e.g. [\s,.] matches any whitespace, comma, and period. * Not a wildcard, means that the previous character can be matched 0 or more times. e.g. ca*t matches "ct","cat","caaaaaaaat" + Previous character can be matched 1 or more times.

6 / 112

Regular Expressions Let's go back to the phone numbers: data =[ "jenny,867‐5309", "other,867‐5308", "jason,1234567", "mike,(978) 407‐1866"]

Attempt 1 import re p = re.compile(r"\d+‐\d+") for phone in data:     result = p.findall(phone)     if result:         print(result)

re.compile() Create a new regular expression program using some regex string Lead regex strings w/ r to avoid problems parsing \ r"\test" is equivalent to r"\\test" Important for regex, \d is to be evaluated by regex, not Pythjon. findall() Returns all substrings of  that match the regex

7 / 112

Regular Expressions Attempt 2 p = re.compile(r"\d+‐*\d+") for phone in data:     result = p.findall(phone)     if result:         print(result)

Better but doesn't get last phone number. Attempt 3 (right off the slide???):

(?:(?:\+?1\s*(?:[.‐]\s*)?)?(?:$\s*([2‐9]1[02‐9]|[2‐9][02‐8]1|[2‐9][02‐8][02‐9])\s*$|([2‐9]1[02‐9]|[2‐9][02‐

Found this online, I hope the user was just trolling cause this is awful. Attempt 4 (work smarter): p = re.compile(r"(?:\d{10}|\d{7})") for phone in data:     phone = re.sub(r"[()\‐ ]",'',phone)     result = p.findall(phone)     if result:         print(result)

Why not remove all the whitespace and parens and just look for sequences of numbers. re.compile(r"(?:\d{10}|\d{7})") Search for 10 numbers in a row or 7 numbers in a row. re.sub(r"[()\‐ ]",'',phone) Remove whitespace, parens, and dashes. This was all our phone numbers become just strings of digits.

8 / 112

Regular Expressions match() Determine if the RE matches at the beginning of the string. search() Scan through a string, looking for any location where this RE matches. findall() Find all substrings where the RE matches, and returns them as a list. finditer() Find all substrings where the RE matches, and returns them as an iterator. match and search return None if no match is found. findall pulls all substring matches: import re p = re.compile("[a‐z]{3}") # match 3 letters result = p.findall("abcdefghi") print(result) # prints ['abc','def','geh']

finditer pulls all substring match objects: p = re.compile("[a‐z]{3}") # match 3 letters result = p.finditer("abcdefghi") print([i.group() for i in result])

group() always returns the match Note: result is an iterator, once used it cannot be used again.

9 / 112

Regular Expressions Up until now we've just been matching strings, ( ) can be used to extract substrings from matches: # Match groups of 3 letters, assume there are 3 groups. p = re.compile(r"(?P[a‐z]{3})([a‐z]{3})([a‐z]{3})") result_iterator = p.finditer("abcdefghi") for result in result_iterator:     print(result.group(0)) # abcdefgeh ‐ always the full match     print(result.group(1)) # abc     print(result.group(2)) # def     print(result.group(3)) # ghi     print(result.group("id")) # abc

There are lots more ways of doing groups. Groups with ? at the beginning are treated differently (?P) Creates an identifier group, allows result.group("id") (?:[abc]|) Creates a noncapturing group, useful if you are using groups and want to avoid parsing unnecessary data. Regular expressions are a completely new language and require time and practice. The best advice I can give is when creating a regex, work through your desired comparison left to right. Start by matching pieces first, then the whole.

10 / 112

Lambdas My favoritest thing in any language. Lambda An unnamed functions useful for one off tasks. Lambdas are syntactic sugar, not necessary but pretty. Lambdas implicitly return a value. Standard function: from functools import reduce x = [i for i in range(1,10)] def mult(a,b):     return a*b # Return the result of multipling all numbers in x reduce(mult,x)

Lambda equivalent: from functools import reduce x = [i for i in range(1,10)] reduce(lambda a,b: a*b,x)

Clean, self contained Lambdas are only be a single line. a and b are the parameters for the lambda There can be 0>infinite lambda arguments, it depends on how you call the lambda reduce() always passes 2 arguments.

11 / 112

Lambdas Closures are another part of lambdas. Closure is just a clever way of saying your lambda can see variables in the enclosing scope. def multby(value):     return lambda a:a*value multby3 = multby(3) print([multby3(i) for i in range(0,4)]) # 0,3,6,9

value is part of the enclosing scope Notice that I'm returning the lambda to multby3 Now I can dynamically define multiplyby behaviour One gotcha. Python is latebinding. def create_multipliers():     return [lambda x : i * x for i in range(5)] for m in create_multipliers():     print(m(2))

What gets printed?

12 / 112

Lambdas Closures are another part of lambdas. Closure is just a clever way of saying your lambda can see variables in the enclosing scope. def multby(value):     return lambda a:a*value multby3 = multby(3) print([multby3(i) for i in range(0,4)]) # 0,3,6,9

value is part of the enclosing scope Notice that I'm returning the lambda to multby3 Now I can dynamically define multiplyby behaviour One gotcha. Python is latebinding. def create_multipliers():     return [lambda x : i * x for i in range(5)] for m in create_multipliers():     print(m(2))

What gets printed? 8,8,8,8 This is due to late binding. i is referenced by the lambda in create_multipliers but Python waits to define i until it's needed. The first time the lambda is called, i is already storing 4.

13 / 112

Lambdas Always explicitly pass enclosing variables to your lambdas as arguments This forces Python to bind immediately with the current value Remember that default parameter must come after nondefult positional parameters def create_multipliers():     return [lambda x,a=i : a * x for i in range(5)] for m in create_multipliers():     print(m(2))

Does this print what we want?

14 / 112

Lambdas Always explicitly pass enclosing variables to your lambdas as arguments This forces Python to bind immediately with the current value Remember that default parameter must come after nondefult positional parameters def create_multipliers():     return [lambda x,a=i : a * x for i in range(5)] for m in create_multipliers():     print(m(2))

Does this print what we want? Yes: 0,2,4,6,8

15 / 112

Threading I'm going to keep this brief, threading is unlikely to be useful to anyone other than a software engineer. Excerpt from thread_example.py: import threading write_lock = threading.Lock() total = 0 def sum_worker(x,start,stride):     index = start     while index  4.5) go here or here Prefer PyQt4 to PyQt5, PyQt4 still more common Additional Installation Qt Designer We'll focus on writing GUIs in code but you can also make them visually Qt Designer comes with the official PyQt download Qt Designer also comes with the official Qt download For PyQt4, get Qt version 4.8 Foy PYQt5, get Qt version 5.X Validate the installation: from PyQt4.QtCore import QT_VERSION_STR print(QT_VERSION_STR) # should print something like 4.8.6

56 / 112

PyQt What is Qt? Qt is a crossplatform framework for developing user interfaces. Developed by Trolltech, maintained by Nokia. Written in C++ Supports Windows, OS X, Linux, and Unix. Available under GNU Public License (GPL) Available under a commercial license as well (costs $$$) GPL: If you sell or distribute your code you must include the GPL license and make your code open‐source. Code distributed internally to other NSRDEC employees only needs to be shared with those employees.

Commercial License Do what you like, no need to share.

57 / 112

PyQt Features Qt was written to extend all of C++, not just GUIs Supports boat loads of: widgets, layouts, styles, fonts, colors, prairie dogs, polar bears, yak (yakses, yakis... yaksercise?) Supports standard UI features like: menus, status bars, toolbars, drag and drop Simplified communication layer based on signals and slots (PyQt4.5 and later have a custom signal/slot language) Unified Painting System Intelligent style hierarchy, changes to parent style affect children (or not, your choice) Others Networking Threading Videos XML handlers All sorts of community support

58 / 112

PyQt Structure

59 / 112

PyQt Widgets Everything you put in to a Qt user interface is a widget

60 / 112

PyQt Look and Feel Styles provide a way to control the look and feel of a GUI. These can be used to ensure your application has a native look for your operating system. We won't be covering these, the default is sufficient.

Layouts are another feature for creating aesthetically pleasing widgets. Child widgets report their needs to their parents and the parent handles final arrangement. Useful if you want to be able to resize your window in realtime.

61 / 112

PyQt The basics Creating a window(parentless widget): app = QtGui.QApplication(sys.argv) window = QtGui.QWidget() window.resize(600, 600) window.show() sys.exit(app.exec_())

Create an application (always required) Create a parentless widget (window) Resize by (width,height) and show Exit to console when app done Create a child widget (buttons, lists, textboxes, etc): button = QtGui.QPushButton("Quit", window) button.move(200, 200) button.setToolTip("I'll quit")

Create a push button (responds to mouse) Position button and add hover text Extend QWidget: class Example(QtGui.QWidget)     def __init__(self):         super(Example, self).__init__()     def initUI(self):         pass

62 / 112

PyQt Layout basics Layouts manage how child widgets are aligned. You can add widgets any where you like, as you saw on the last slide, but this requires a lot of manual tweaking. Layouts are responsible for: Updating child widget size and position Providing default and minimum child widget size.

63 / 112

PyQt Layout basics

Applying horizontal and vertical layouts to your widgets: ok_button = QtGui.QPushButton("OK") cancel_button = QtGui.QPushButton("Cancel") hbox = QtGui.QHBoxLayout() hbox.addStretch(1) hbox.addWidget(ok_button) hbox.addWidget(cancel_button) vbox = QtGui.QVBoxLayout() vbox.addStretch(1) vbox.addLayout(hbox) window.setLayout(vbox)

Create two buttons and align them using a horizontal layout (purple) Pack the two buttons on the bottom of the dialog using a vertical layout (blue) Stretch values create empty space that expand and contracts (green)

64 / 112

PyQt Signals and Slots

Internally, Qt is powered by an event loop (awesome image above) 1. User interacts with interface 2. Interface sends a message to event queue 3. Event loop processes queue 4. Qt keeps working without freezing To interact with widgets you need use signals and slots Signal Message sender(widget) Slot Message receiver(widget) Connection Interface between signals and slots

65 / 112

PyQt Signals and Slots Let's revisit our button example but now we'll do something: quit_button = QtGui.QPushButton("&Quit") # Connect: PyQt Version = 4.5 quit_button.clicked.connect( QtGui.qApp.quit) vbox = QtGui.QVBoxLayout() vbox.addStretch(1) vbox.addWidget(quit_button) window.setLayout(vbox)

Create a quit button & is a shortcut that underlines the next letter and creates an alt+ shortcut. Connect the quit button's click signal to qApp.quit Program exists when button is clicked or alt+q is pressed For sanity sake, use the newstyle 4.5 connect api

66 / 112

PyQt Premade Dialogs

Qt comes complete with several common dialog boxes prebuilt: QColorDialog Standard color selection dialog box QFileDialog Standard file(Open/Save) selection dialog box QFontDialog Standard font selection dialog box Quick reference:     # Open file     file_name = QtGui.QFileDialog.getOpenFileName(window,'Open file',r"C:\StartFolder")     # Select a font     font, ok = QtGui.QFontDialog.getFont()     # Select a color     color = QtGui.QColorDialog.getColor()

67 / 112

PyQt Main Window Qt comes with a standard main application window that enables a skeleton framework upon which to build your application. Features like statusbar, toolbar, and menubar are included. class Example(QtGui.QMainWindow):     def __init__(self):         super(Example, self).__init__()         self.initUI()     def initUI(self):         textEdit = QtGui.QTextEdit()         self.setCentralWidget(textEdit)         exitAction = \             QtGui.QAction(QtGui.QIcon(\             './media/exit.png'), 'Exit', self)         exitAction.setShortcut('Ctrl+Q')         exitAction.triggered.connect(self.close)         self.statusBar()         menubar = self.menuBar()         fileMenu = menubar.addMenu('&File')         fileMenu.addAction(exitAction)         toolbar = self.addToolBar('Exit')         toolbar.addAction(exitAction)         self.setGeometry(300, 300, 350, 250)         self.show()

68 / 112 Extend QMainWindow

PyQt Putting it all together

Complete example at /pyqt/pyqt_full_example.py Demonstrates several features: QMainWindow Menu, Status, and Tool bars. Signals, Slots, and Actions Standard File, Color, and Font dialogs Loading CSV data in to a QTableWidget Additional Resources: Zetcode PyQt4 tutorial [link] Trolltech PyQt4 tutorial [link]

69 / 112

Scientific Python

70 / 112

Numpy vs. Scipy

Numpy and Scipy form the backbone of scientific computing in Python. Numpy High performance multidimensional array functions Scipy Numerical analysis code Numpy and Scipy overlap Numpy was partially built in to Scipy Numpy retains its numerical analysis roots for backwards compatibility ! Prefer Numpy for all array functionality ! Prefer Scipy for all numerical code In an ideal world, NumPy would contain nothing but the array data type and the most basic operations: indexing, sorting, reshaping, basic elementwise functions, et cetera. All numerical code would reside in SciPy. Scipy docs 71 / 112

Numpy Numpy is the fundamental multidimensional array module. It is used by countless other modules and by all of Scipy. Consider it your array of choice for all things scientific. Some key features include: NDimensional Array Objects Powerfully fast arrays (much better than list) Fourier Transforms Linear Algebra Functions Advanced randomization capabilities Limitations: Numpy arrays are homogeneous (no mixing numbers and strings) Numpy arrays are fixed size (defined at creation) Append operations return new arrays Numpy bad at nonnumerical data Numpy arrays can be slower than lists, always prefer vectorized numpy algorithms over handrolling your own code.

72 / 112

Numpy

First some terms: axes The dimensions of your array. rank(ndim) Number of axes(dimensions) in the array shape The size of the array along each dimension. An NxM matrix would have a shape tuple (n,m) Therefore length of shape tuple is the rank (ndim) size Total number of elements in array. N*M dtype Data type, refers to the data in array 73 / 112

Numpy Example: import numpy as np a = np.arange(9).reshape(3,3) print(a.shape) # (3,3) print(a.ndim)  # 2 print(a.size)  # 9 print(a.dtype) # int32

Create an array of 9 elements Reshape the array to 3x3 matrix Print some array information arange is Numpy's range equivalent Array Creation: a = np.array( [1,2,3,4] ) print(a)        # [1,2,3,4] print(a.dtype)  # int32 a = np.array( [1.,2.,3.,4.] ) print(a)       # [1.,2.,3.,4.] print(a.dtype) # float64 # WRONG # a = np.array(1,2,3,4) # 2D Array a = np.array([ [1,2], [3,4]]) print(a)

74 / 112

Numpy Generating Filled Arrays: a = np.zeros([3,3]) print(a) a = np.ones([3,3],dtype=np.float64) print(a) a = np.empty([3,3]) print(a)

Create a 3x3 array of zeroes Create a 3x# array of ones data type can be specified Create an empty array values default to junk data Generating Filled Arrays with ranges: a = np.arange(0,10,2) print(a) a = np.arange(0,2,0.3) print(a) a = np.linspace( 0, 2, 9 ) print(a) a = linspace( 0, 2*np.pi, 10np. ) f = np.sin(x) print(f)

75 / 112

Numpy Printings Arrays: a = np.arange(12) print(a) a = np.arange(12).reshape(3,4) print(a) a = np.arange(24).reshape(2,3,4) print(a) a = np.arange(10000).reshape(100,100) print(a)

Create some arrays reshape(values) reshapes the array to match values length(values) = ndim values = shape Multiplicative sum of values must equal original array size. Print some arrays Numpy skips printing central part of large array To disable this feature: np.set_printoptions(threshold = 'nan') Basic arithmetic operations: a = np.arange(4) b = np.arange(4) c = a‐b print(c) c = a+b

76 / 112

Numpy Basic scalar operations: c = a**2 print(c) c = a+1 print(c) print(a > 2) c = a*2 print(c)

Unlike Python lists you can operate on Numpy arrays with scalars Broadcasting is Numpy's technique for dealing with mismatched array shapes. [docs] [1,2,3] * [1] becomes [1,2,3] * [1,1,1] Matrix arithmetic: a = np.array([     [1,1],     [0,1] ]) b = np.array([     [2,0],     [3,4] ]) c = a*b # elementwise product c = np.dot(a,b) # matrix product

Multiplications are elementwise Numpy's dot function will perform a matrix product

77 / 112

Numpy Unary operations a = np.array([[1,2,3,4],[5,6,7,8]]) print(a.min()) print(a.max()) print(a.sum()) print(a.cumsum()) print(a.cumsum(axis=0)) print(a.cumsum(axis=1)) print(a.transpose()) print(a.inverse())

Calculate the min and max of a matrix Calculate the sum of a matrix Calculate the cumulative sum of a matrix No axis defined means flatten 2D array to 1D Axis 0 is the first dimension, column cumulative sum. [1,2,3,4] + [5,6,7,8] Axis 1 is the second dimension, row cumulative sum. [1,1+2,1+2+3,1+2+3+4] Universal operations: a = np.array([10,9,8,7,6,5]) print(np.exp(a)) print(np.sqrt(a)) print(np.sort(a))

Create an array Calculate exponential of each element Calculate square root of each element

78 / 112

Numpy 1D Indexing, Slicing and Iterating: a = np.array([1,2,3,4,5]) print(a[0])     # first element print(a[‐1])    # last element print(a[2:5])   # a[2]...a[5] print(a[::‐1])  # Reversed array for elem in a:     print(elem)

Numpy arrays support the same Indexing, Slicing and Iterating ops as lists Remember: Python is base0 indexing. Iterate over each element ND Indexing, Slicing: a = np.arange(1,10).reshape(3,3) #2D print(a[0][0]) # Old style list indexing print(a[0,0])  # Numpy indexing print(a[1:,1:]) # remove first row, first col print(a[:,1:]) # remove first col print(a[:,...]) # remove first col for elem in a:     print(elem)

Numpy extends Python list slicing with the ability to select the slice axis Commas separate dimensions a[1:,1:]

79 / 112

Numpy Iterating Hack: a = np.arange(1,10).reshape(3,3) #2D for i in range(c.shape[‐1]):     print c[...,i]

Iterate over the last dimension, in this case it gives us columns

80 / 112

Numpy Stacking: a = np.array([[1,2,3],[4,5,6]]) b = np.array([[7,8,9],[10,11,12]]) print(np.vstack((a,b))) print(np.hstack((a,b)))

Arrays can be stacked horizontally or vertically Copying: a = np.array([[1,2,3],[4,5,6]]) b = a b[0] = 12 print(a) print(b) a = np.array([[1,2,3],[4,5,6]]) b = a.copy() b[0] = 12 print(a) print(b)

Assigning new variable names to the same array does not create a copy Deep copies must be explicitly called with array.copy

81 / 112

Numpy for Matlab users Matlab NumPy a = [1 2 3; 4 5 6] a = array([[1.,2.,3.],[4.,5.,6.]]) a(end) a[1] a(2,5) a[1,4] a(2,:) a[1] or a[1,:] a(1:5,:) a[0:5] or a[:5] or a[0:5,:] a(end4:end,:) a[5:] a(1:3,5:9) a[0:3][:,4:9] a(1:2:end,:) a[::2,:] a(end:1:1,:) or flipud(a) a[::1,:] a.’ a.transpose() or a.T a’ a.conj().transpose() or a.conj().T a * b dot(a,b) a .* b a * b a./b a/b Source

82 / 112

Numpy for Matlab users Matlab

NumPy

a.∧3

a**3

find(a>0.5) where(a>0.5) a(aDilation a = np.zeros((5,5), dtype=np.int) a[1:4, 1:4] = 1; a[4, 4] = 1 print(a) b=ndimage.binary_opening(a, structure=np.ones((3,3))).astype(np.int) print(b) b=ndimage.binary_opening(a).astype(np.int) print(b) # Closing = Dilation‐>Erosion a = np.zeros((5,5), dtype=np.int) a[1:4, 1:2] = 1; #a[4, 4] = 1 print(a) b=ndimage.binary_closing(a, structure=np.ones((3,3))).astype(np.int) print(b) b=ndimage.binary_closing(a).astype(np.int) print(b)

106 / 112

Scipy: Image processing Lastly I want to quickly show how detect edges from scipy import ndimage import matplotlib.pyplot as plt im = np.zeros((256, 256)) im[64:‐64, 64:‐64] = 1 im = ndimage.rotate(im, 15, mode='constant') im = ndimage.gaussian_filter(im, 8) sx = ndimage.sobel(im, axis=0, \ mode='constant') sy = ndimage.sobel(im, axis=1, \ mode='constant') sxsy = np.hypot(sx,sy) plt.imshow(sxsy) plt.show()

Create some dummy data Rotate that data to give it some pizazz Blur a little to make edge detection more pronounced Apply sobel filter Sobel is an edge detection algorithm

107 / 112

Scipy: Statistics SciPy statistics is the last Scipy module we'll cover. This has barely scratched the surface of what SciPy can do and I would suggest looking at the SciPy site for more information. Histograms and probabilities: from scipy import stats import numpy as np import pylab as pl a = np.random.normal(size=10000) bins = np.linspace(‐5, 5, 30) histogram, bins = np.histogram(a,\ bins=bins, normed=True) bins = 0.5*(bins[1:] + bins[:‐1])

from scipy import stats b = stats.norm.pdf(bins) pl.plot(bins, histogram) pl.plot(bins, b) pl.show() print(stats.ttest_ind(bins, b) )

Generate statistically random data Important: Python's builtin random module isn't rigorous. Hack method for quick and dirty calculations Bin the random data in a histogram Bins range from [5,5] Finally create a probability distribution from the bin range Shows how data should be distributed

108 / 112

Scipy: Statistics Percentiles: from scipy import stats import numpy as np import pylab as pl a = np.random.normal(size=10000) b = np.median(a) c = stats.scoreatpercentile(a, 50)   d = stats.scoreatpercentile(a, 90) print(a) print(b) print(c) print(d)

Median is the value w/ half of observations fall below it Generate a 50th percentile and a 90th percentile 50th percentile should match median value Percentiles refer to values for which N% of observations fall below it

109 / 112

Scipy: Practical Usage import sys import os from numpy import corrcoef, sum, log, arange from pylab import pcolor, show from pylab import colorbar, xticks, yticks from scipy.signal import correlate2d as corrcoef2 def load_csv(path):     data = []     with open(path,"r") as file:         headers = file.next().split(',')         for line in file:             tokens = line.strip().split(',')             datum = [float(a) \                 if len(a)!=0  \                 else 0 for a in tokens]             data.append(datum)             assert len(datum) == len(headers)     return [data,headers] def transpose(data):     return zip(*data) data,headers = load_csv("./data/msrdata.csv") oriented_data = transpose(data); R = corrcoef(oriented_data) pcolor(R,cmap="RdYlGn") colorbar() yticks(arange(0.5,len(headers)),headers) xticks(arange(0.5,len(headers)),headers) show()

110 / 112

Scipy: Practical Usage

My hopes were dashed pretty quickly. 111 / 112

Fin.

112 / 112