Rearranging Multi-Indice Data Aggregations on Multi-Indices Combining Datasets:Concat and Append Recall:Concatenation of NumPy Arrays Simple Concatenation with pd.concat Combining Datasets:Merge and Join Relational Algebra 504466 Categories of Joins Specification of the Merge Key Specifying Set Arithmetic for Joins Overlapping Column Names:The suffixes Keyword Example:US States Data 4555 Simple Aggregation in Pandas GroupBy:Split,Apply,Combine Pivot Tables Pivot Table Syntax Example:Birthrate Data Vectorized String Operations Introducing Pandas String Operations Tables of Pandas String Methods Example:Recipe Database 77W8 Working with Time Series Dates nd Times in Pytho Pandas Time Series:Indexing by Time Pandas Time Series Data Structures 8899 Frequencies and Offsets sampling.Shifting,and Windowing Where to Learn More Example:Visualizing Seattle Bicycle Counts 9900 High-Performance Pandas:eval()and query() Motivating query(and eval():Compound Expression pandas.eval()for Efficient Operations DataFrame.eval()for Column-Wise Operations DataFrame.query()Method 213 Performance:When to Use These Functions Further Resources vi Table of Contents
Rearranging Multi-Indices 137 Data Aggregations on Multi-Indices 140 Combining Datasets: Concat and Append 141 Recall: Concatenation of NumPy Arrays 142 Simple Concatenation with pd.concat 142 Combining Datasets: Merge and Join 146 Relational Algebra 146 Categories of Joins 147 Specification of the Merge Key 149 Specifying Set Arithmetic for Joins 152 Overlapping Column Names: The suffixes Keyword 153 Example: US States Data 154 Aggregation and Grouping 158 Planets Data 159 Simple Aggregation in Pandas 159 GroupBy: Split, Apply, Combine 161 Pivot Tables 170 Motivating Pivot Tables 170 Pivot Tables by Hand 171 Pivot Table Syntax 171 Example: Birthrate Data 174 Vectorized String Operations 178 Introducing Pandas String Operations 178 Tables of Pandas String Methods 180 Example: Recipe Database 184 Working with Time Series 188 Dates and Times in Python 188 Pandas Time Series: Indexing by Time 192 Pandas Time Series Data Structures 192 Frequencies and Offsets 195 Resampling, Shifting, and Windowing 196 Where to Learn More 202 Example: Visualizing Seattle Bicycle Counts 202 High-Performance Pandas: eval() and query() 208 Motivating query() and eval(): Compound Expressions 209 pandas.eval() for Efficient Operations 210 DataFrame.eval() for Column-Wise Operations 211 DataFrame.query() Method 213 Performance: When to Use These Functions 214 Further Resources 215 vi | Table of Contents
4.Visualization with Matplotlib. General Matplotlib Tips Importing matplotlib Setting Styles show()or No show()?How to Display Your Plots Saving Figures to File Two Interfaces for the price of One Simple Line Plots Adjusting the Plot:Line Colors and Styles Adjusting the Plot:Axes Limits Labeling Plots Simple Scatter Plots Visualizing errors basic errorbars Continu us Errors Density and Contour Plots Visualizing a three-Dimensional function 89444 Histograms,Binnings,and Density Two-Dimensional Histograms and Binnings Customizing Plot Legends Choosing Elements for the Legend Legend for Size of Points 4455 Multiple Legends Customizing Colorbars Customizing Colorbars Example:Handwritten Digits Multiple Subplots axes:Subplots by Hand plt subplot:Simple Grids of Subplots plt.subplots:The Whole Grid in One Go plt.GridSpec:More Complicated Arrangements Text and An otation Example:Effect of Holidays on US Births Transforms and Text Position Arrows and Annotation Ticks Hiding Ticks or Labels Reducing or Increasing the Number of Ticks 278 Tableof Contents vii
4. Visualization with Matplotlib. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 217 General Matplotlib Tips 218 Importing matplotlib 218 Setting Styles 218 show() or No show()? How to Display Your Plots 218 Saving Figures to File 221 Two Interfaces for the Price of One 222 Simple Line Plots 224 Adjusting the Plot: Line Colors and Styles 226 Adjusting the Plot: Axes Limits 228 Labeling Plots 230 Simple Scatter Plots 233 Scatter Plots with plt.plot 233 Scatter Plots with plt.scatter 235 plot Versus scatter: A Note on Efficiency 237 Visualizing Errors 237 Basic Errorbars 238 Continuous Errors 239 Density and Contour Plots 241 Visualizing a Three-Dimensional Function 241 Histograms, Binnings, and Density 245 Two-Dimensional Histograms and Binnings 247 Customizing Plot Legends 249 Choosing Elements for the Legend 251 Legend for Size of Points 252 Multiple Legends 254 Customizing Colorbars 255 Customizing Colorbars 256 Example: Handwritten Digits 261 Multiple Subplots 262 plt.axes: Subplots by Hand 263 plt.subplot: Simple Grids of Subplots 264 plt.subplots: The Whole Grid in One Go 265 plt.GridSpec: More Complicated Arrangements 266 Text and Annotation 268 Example: Effect of Holidays on US Births 269 Transforms and Text Position 270 Arrows and Annotation 272 Customizing Ticks 275 Major and Minor Ticks 276 Hiding Ticks or Labels 277 Reducing or Increasing the Number of Ticks 278 Table of Contents | vii
Fancy Tick Formats Summary of Formatters and Locators 2 Customizing Matplotlib:Configurations and Stylesheets Plot Customization by Hand Three-Dimensional Plotting in Matplotlib Three-Dimensional Points and Lines Three-Dimensional Contour Plots Wireframes and Surface Plots Surface Triangulations Geographic Data with Basemap Map Projections 源源99所澳000 Drawing a Map Background Plotting Data on Maps Example:California Cities 8 Exploring seaborn plots 01112 Example:Exploring Marathon Finishing Times Further Resources Matplotlib Resources Other Python Graphics Libraries 沙潮 5.Machine Learning. What Is Machir arning? Categories of Machine Learning Qualitative Examples of Machine Learning Applications Summary Introducing Scikit-Learr Data Representation in Scikit-Learn Scikit-Learn's Estimator API Application:Exploring Handwritten Digits Summary Hyperparameters and Model Validation Thinking About Model Validation Selecting the Best Model Summary Feature Engineering 375 vili Table of Contents
Fancy Tick Formats 279 Summary of Formatters and Locators 281 Customizing Matplotlib: Configurations and Stylesheets 282 Plot Customization by Hand 282 Changing the Defaults: rcParams 284 Stylesheets 285 Three-Dimensional Plotting in Matplotlib 290 Three-Dimensional Points and Lines 291 Three-Dimensional Contour Plots 292 Wireframes and Surface Plots 293 Surface Triangulations 295 Geographic Data with Basemap 298 Map Projections 300 Drawing a Map Background 304 Plotting Data on Maps 307 Example: California Cities 308 Example: Surface Temperature Data 309 Visualization with Seaborn 311 Seaborn Versus Matplotlib 312 Exploring Seaborn Plots 313 Example: Exploring Marathon Finishing Times 322 Further Resources 329 Matplotlib Resources 329 Other Python Graphics Libraries 330 5. Machine Learning. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 331 What Is Machine Learning? 332 Categories of Machine Learning 332 Qualitative Examples of Machine Learning Applications 333 Summary 342 Introducing Scikit-Learn 343 Data Representation in Scikit-Learn 343 Scikit-Learn’s Estimator API 346 Application: Exploring Handwritten Digits 354 Summary 359 Hyperparameters and Model Validation 359 Thinking About Model Validation 359 Selecting the Best Model 363 Learning Curves 370 Validation in Practice: Grid Search 373 Summary 375 Feature Engineering 375 viii | Table of Contents
Categorical Features Text Features Image Features Derived Features Imputation of Missing Data Feature Pipelines Gaussian Naive Bayes Multinomial Naive Bayes Simple Linear Regression Basis Function Regression Regularization Example:Predicting Bicycle Traffic 8888889999000 In-Depth:Support Vector Machines Motivating Support Vector Machines Maximizing the Margin Support Vector Machine Summary 0604 In-Depth:Decision Trees and Random Forests Random Forest Regression Example:Random Forest for Classifying Digits 430 Summary of Random forests 432 In Depth:Principal Co mponent Analysis Introducing Principal Component Analysis 43 PCA as Noise Filtering Example:Eigenfaces onent Analysis Summary Learning Manifold Learning:"HELLO Multidimensional Scaling(MDS) Nonlinear Embeddi Nonlinear Manifolds:Locally Linear Embedding Some Thoughts on Manifold Methods 455 Example:Isomap on Faces Example:Visualizing Structure in Digit In Depth:k-Means Clustering Table of Contents ix
Categorical Features 376 Text Features 377 Image Features 378 Derived Features 378 Imputation of Missing Data 381 Feature Pipelines 381 In Depth: Naive Bayes Classification 382 Bayesian Classification 383 Gaussian Naive Bayes 383 Multinomial Naive Bayes 386 When to Use Naive Bayes 389 In Depth: Linear Regression 390 Simple Linear Regression 390 Basis Function Regression 392 Regularization 396 Example: Predicting Bicycle Traffic 400 In-Depth: Support Vector Machines 405 Motivating Support Vector Machines 405 Support Vector Machines: Maximizing the Margin 407 Example: Face Recognition 416 Support Vector Machine Summary 420 In-Depth: Decision Trees and Random Forests 421 Motivating Random Forests: Decision Trees 421 Ensembles of Estimators: Random Forests 426 Random Forest Regression 428 Example: Random Forest for Classifying Digits 430 Summary of Random Forests 432 In Depth: Principal Component Analysis 433 Introducing Principal Component Analysis 433 PCA as Noise Filtering 440 Example: Eigenfaces 442 Principal Component Analysis Summary 445 In-Depth: Manifold Learning 445 Manifold Learning: “HELLO” 446 Multidimensional Scaling (MDS) 447 MDS as Manifold Learning 450 Nonlinear Embeddings: Where MDS Fails 452 Nonlinear Manifolds: Locally Linear Embedding 453 Some Thoughts on Manifold Methods 455 Example: Isomap on Faces 456 Example: Visualizing Structure in Digits 460 In Depth: k-Means Clustering 462 Table of Contents | ix
Introducing k-Means k-Means Algorithm:Expectation-Maximization Examples In Depth:Gaussian Mixture Models GMM as Density Estimation Example:GMM for Generating New Data In-Depth:Kernel Density Estimation Motivating KDE:Histograms Kernel Density Estimation in Practice Example:KDE on a Sphere Example:Not-So-Naive Bayes HOGFeatures HOG in Action:A Simple Face Detector Further Machine Learning Resources Machine Learning in Python 江14145 General Machine Learning Index.. 517 x TableofContents
Introducing k-Means 463 k-Means Algorithm: Expectation–Maximization 465 Examples 470 In Depth: Gaussian Mixture Models 476 Motivating GMM: Weaknesses of k-Means 477 Generalizing E–M: Gaussian Mixture Models 480 GMM as Density Estimation 484 Example: GMM for Generating New Data 488 In-Depth: Kernel Density Estimation 491 Motivating KDE: Histograms 491 Kernel Density Estimation in Practice 496 Example: KDE on a Sphere 498 Example: Not-So-Naive Bayes 501 Application: A Face Detection Pipeline 506 HOG Features 506 HOG in Action: A Simple Face Detector 507 Caveats and Improvements 512 Further Machine Learning Resources 514 Machine Learning in Python 514 General Machine Learning 515 Index. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 517 x | Table of Contents