OREILLY' Python Data Science H andbook ESSENTIAL TOOLS FOR WORKING WITH DATA powered by Jake VanderPlas
Jake VanderPlas Python Data Science Handbook ESSENTIAL TOOLS FOR WORKING WITH DATA powered by
O'REILLY Python Data Science Handbook For many researchers.Python is a first-class tool mainly because of its "If you want to learn data libraries for storing.manipulating.and gaining insight from data.Several science with Python, resources exist for individual pieces of this data science stack,but only with the Python Data Science Handbook do you get them all-IPython, this book is a fantastic NumPy.Pandas,Matplotlib.Scikit-Learn.and other related tools. starting point.I've used Working scientists and data crunchers familiar with reading and writing it with great success to Python code will find this comprehensive desk reference ideal for teach computer science tackling day-to-day issues:manipulating.transforming,and cleaning data: visualizing different types of data;and using data to build statistical or and statistics majors. machine learning models.Quite simply,this is the must-have reference for Jake goes far beyond scientific computing in Python. the basics of open With this handbook.you'll learn how to use: source tools;he also explains the underly- IPython and Jupyter:provide computational environments for data scientists using Python ing concepts,patterns and abstractions of NumPy:includes the ndarray for efficient storage and manipulation of dense data arrays in Python data science using clear Pandas:features the DatoFrame for efficient storage and language and approach manipulation of labeled/columnar data in Python able explanations.” Matplotlib:includes capabilities for a flexible range of data visualizations in Python Cal Poly:cofoundr of ProjetJupyter Scikit-Learn:for efficient and clean Python implementations of the most important and established machine learning algorithms Jake VanderPlas,a long-time user and developer of the Python scientific stack. currently works as an interdisciplinary research director at the University of Washington.He conducts his own astronomy research.and spends time advising and consulting with local scientists from a wide range of fields. PYTHON DATA ▣鏗▣ Twitter:@oreillymedia facebook.com/oreilly Uss59.99 CAN$68.99 BN:978-1-491-91205-8 ▣ ▣H
Table of Contents preface......................................................................xi 1.IPython:Beyond Normal Python...............................................1 Shell or Notebook? Launching the IPython Shell Launching the Jupyter Notebook 22 Help and Documentation in IPython 3 Accessing Documentation with? Accessing Source Code with?? Exploring Modules with Tab Completion 5 6 Keyboard Shortcuts in the IPython Shell 8 Navigation shortcuts 8 Text Entry Shortcuts 9 Command History Shortcuts Miscellaneous Shortcuts IPython magic Commands 10 Pasting Code Blocks:%paste and %cpaste Running External Code:%run 2 Timing Code Execution:%timeit Help on Magic Functions:?%magic,and %lsmagic 13 Input and Output History 13 IPython's In and Out Objects Underscore Shortcuts and Previous Outputs 315 Suppressing Output Related Magic Commands 16 IPython and Shell Commands Quick Introduction to the Shell 1616 Shell Commands in IPython 18
Table of Contents Preface. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . xi 1. IPython: Beyond Normal Python. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 1 Shell or Notebook? 2 Launching the IPython Shell 2 Launching the Jupyter Notebook 2 Help and Documentation in IPython 3 Accessing Documentation with ? 3 Accessing Source Code with ?? 5 Exploring Modules with Tab Completion 6 Keyboard Shortcuts in the IPython Shell 8 Navigation Shortcuts 8 Text Entry Shortcuts 9 Command History Shortcuts 9 Miscellaneous Shortcuts 10 IPython Magic Commands 10 Pasting Code Blocks: %paste and %cpaste 11 Running External Code: %run 12 Timing Code Execution: %timeit 12 Help on Magic Functions: ?, %magic, and %lsmagic 13 Input and Output History 13 IPython’s In and Out Objects 13 Underscore Shortcuts and Previous Outputs 15 Suppressing Output 15 Related Magic Commands 16 IPython and Shell Commands 16 Quick Introduction to the Shell 16 Shell Commands in IPython 18 iii
189 Errors and Debugging Controlling Exceptions:%xmode Debugging When Reading Tracebacks Is Not Enough Profiling and Timing Code 20025 Timing Code Snippets:%timeit and %time Profiling Full Scripts:%prun 3 Line-by-Line Profiling with %lprun Profiling Memory Use %memit and %mprun More IPython Resources 290 Web Resources Books 2.Introductionto Numpy...33 Understanding Data Types in Python A Python Integer Is More Than Just an Integer A Python List Is More Than Just aList Fixed-Type Arrays in Python Creating Arrays from Python Lists 738399 Creating Arrays from Scratch NumPy Standard Data Types The Basics of NumPy Arrays NumPy Array Attributes Array Indexing:Accessing Single Elements ing of Arrays Array Concatenation and Splitting 4锅 Computation on NumPy Arrays:Universal Functions 0 The Slowness of Loops Introduci Exploring NumPy's UFuncs Advanced Ufunc Features Ufuncs:Learning More Aggregations:Min,Max,and Everything in Betweer Summing the Values in an Array 0268899 Minimum and maximum Example:What Is the Average Height of US Presidents? mpu casting Introducing B casting Rules of Broadcasting 665 Broadcasting in Practice 68 iv Table of Contents
Passing Values to and from the Shell 18 Shell-Related Magic Commands 19 Errors and Debugging 20 Controlling Exceptions: %xmode 20 Debugging: When Reading Tracebacks Is Not Enough 22 Profiling and Timing Code 25 Timing Code Snippets: %timeit and %time 25 Profiling Full Scripts: %prun 27 Line-by-Line Profiling with %lprun 28 Profiling Memory Use: %memit and %mprun 29 More IPython Resources 30 Web Resources 30 Books 31 2. Introduction to NumPy. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 33 Understanding Data Types in Python 34 A Python Integer Is More Than Just an Integer 35 A Python List Is More Than Just a List 37 Fixed-Type Arrays in Python 38 Creating Arrays from Python Lists 39 Creating Arrays from Scratch 39 NumPy Standard Data Types 41 The Basics of NumPy Arrays 42 NumPy Array Attributes 42 Array Indexing: Accessing Single Elements 43 Array Slicing: Accessing Subarrays 44 Reshaping of Arrays 47 Array Concatenation and Splitting 48 Computation on NumPy Arrays: Universal Functions 50 The Slowness of Loops 50 Introducing UFuncs 51 Exploring NumPy’s UFuncs 52 Advanced Ufunc Features 56 Ufuncs: Learning More 58 Aggregations: Min, Max, and Everything in Between 58 Summing the Values in an Array 59 Minimum and Maximum 59 Example: What Is the Average Height of US Presidents? 61 Computation on Arrays: Broadcasting 63 Introducing Broadcasting 63 Rules of Broadcasting 65 Broadcasting in Practice 68 iv | Table of Contents
Comparisons,Masks,and Boolean Logic Example:Counting Rainy Days Comparison Operators as ufunc Working with Boolean Arrays Boolean Arrays as Masks Fancy Indexing Explo ning Fa ncy Indexing Combined Indexing 007353908 Example:Selecting Random Points Modifying Values with Fancy Indexing Example:Binning Data Sorting Arrays Fast Sorting in NumPy:np.sort and np.argsort Partial Sorts:Partitioning Example:k-Nearest Neighbors Structured Data:NumPy's Structured Arrays 885888245 Creating Structured Arrays More Advanced Compound Types 3.Data Manipulation with Pandas.............................................. 9 Installing and Using Pandas Introducing Pandas Object The Pandas Series Object y%9 The Pandas DataFrame Object 102 The pandas Index Obiect 105 Data Indexing and Selection Data Selection in Series Data Selection in DataFrame Operating on Data in Pandas Ufuncs:Index Preservation 115 UFuncs:Index Alignment Ufuncs:Operations Between DataFrame and Series Handling Missing Data o Operating on Null Values Hierarchical Indexing 128 A Multiply Indexed Series Methods of MultiIndex Creatior Indexing and Slicing a MultiIndex Table of Contents v
Comparisons, Masks, and Boolean Logic 70 Example: Counting Rainy Days 70 Comparison Operators as ufuncs 71 Working with Boolean Arrays 73 Boolean Arrays as Masks 75 Fancy Indexing 78 Exploring Fancy Indexing 79 Combined Indexing 80 Example: Selecting Random Points 81 Modifying Values with Fancy Indexing 82 Example: Binning Data 83 Sorting Arrays 85 Fast Sorting in NumPy: np.sort and np.argsort 86 Partial Sorts: Partitioning 88 Example: k-Nearest Neighbors 88 Structured Data: NumPy’s Structured Arrays 92 Creating Structured Arrays 94 More Advanced Compound Types 95 RecordArrays: Structured Arrays with a Twist 96 On to Pandas 96 3. Data Manipulation with Pandas. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 97 Installing and Using Pandas 97 Introducing Pandas Objects 98 The Pandas Series Object 99 The Pandas DataFrame Object 102 The Pandas Index Object 105 Data Indexing and Selection 107 Data Selection in Series 107 Data Selection in DataFrame 110 Operating on Data in Pandas 115 Ufuncs: Index Preservation 115 UFuncs: Index Alignment 116 Ufuncs: Operations Between DataFrame and Series 118 Handling Missing Data 119 Trade-Offs in Missing Data Conventions 120 Missing Data in Pandas 120 Operating on Null Values 124 Hierarchical Indexing 128 A Multiply Indexed Series 128 Methods of MultiIndex Creation 131 Indexing and Slicing a MultiIndex 134 Table of Contents | v