Members have access to thousands of books,training videos.Leaming Paths,interac tive tutorials,and curated playlists from over 250 publishers,including O'Reilly Media,Harvard Business Review,Prentice Hall Professional,Addison-Wesley Profes- sional,Microsoft Press,Sams,Que,Peachpit Press,Adobe,Focal Press,Cisco Press. John Wiley&Sons,Syngress.Morgar n Kaufmann,IBM Redbooks,Packt,Adobe Press,FT Press,Apress,Manning,New Riders,McGraw-Hill,Jones Bartlett,and Course Technology,among others. For more information,please visit http://oreilly.com/safari. How to Contact Us Please address comments and questions concerning this book to the publisher: O'Reilly Media,Inc. 1005 Gravenstein Highway North Sebastopol,CA9547 800-998-9938(in the United States or Canada) 707-829-0515(international or local) 707-829-0104(fax) We have a web page for this book,where we list errata.examples,and any additional information.You can access this page at http://bitly/python-data-sci-handbook. To comment or ask technical questions about this book,send email to bookques- tions@oreilly.com. For more information about our books,courses,conferences,and news,see our web- site at http://www.oreilly.com. Find us on Facebook:http://facebook.com/oreilly Follow us on Twitter:http://twitter.com/oreillymedia Watch us on YouTube:http://www.youtube.com/oreillymedia xvi Preface
Members have access to thousands of books, training videos, Learning Paths, interac‐ tive tutorials, and curated playlists from over 250 publishers, including O’Reilly Media, Harvard Business Review, Prentice Hall Professional, Addison-Wesley Profes‐ sional, Microsoft Press, Sams, Que, Peachpit Press, Adobe, Focal Press, Cisco Press, John Wiley & Sons, Syngress, Morgan Kaufmann, IBM Redbooks, Packt, Adobe Press, FT Press, Apress, Manning, New Riders, McGraw-Hill, Jones & Bartlett, and Course Technology, among others. For more information, please visit http://oreilly.com/safari. How to Contact Us Please address comments and questions concerning this book to the publisher: O’Reilly Media, Inc. 1005 Gravenstein Highway North Sebastopol, CA 95472 800-998-9938 (in the United States or Canada) 707-829-0515 (international or local) 707-829-0104 (fax) We have a web page for this book, where we list errata, examples, and any additional information. You can access this page at http://bit.ly/python-data-sci-handbook. To comment or ask technical questions about this book, send email to bookques‐ tions@oreilly.com. For more information about our books, courses, conferences, and news, see our web‐ site at http://www.oreilly.com. Find us on Facebook: http://facebook.com/oreilly Follow us on Twitter: http://twitter.com/oreillymedia Watch us on YouTube: http://www.youtube.com/oreillymedia xvi | Preface
CHAPTER 1 IPython:Beyond Normal Python There are many options for development environments for Python,and I'm often asked which one I use in my own work.My answer sometimes surprises people:my preferred environment is IPython plus a text editor (in my case,Emacs or Atom depending on my mood).IPython (short for Interactive Python)was started in 2001 by Fernando Perez as an enhanced Python interpreter, and has since grown into project aiming to provide,in Perez's words,"Tools for the entire lifecycle of research computing"If Python is the engine of our data science task,you might think of IPy- thon as the interactive control panel. As well as being a useful interactive interface to Python,IPython also provides number of useful syntactic additions to the language;well cover the most useful of these additions here.In addition,IPython is closely tied with the Jupyter project, which providesa browser-based notebook that is useful for developmentobr- tion,sharing,and even publication of data science esults.The IPython notebook is actually a special case of the broader Jupyter notebook structure,which encompasses notebooks for Julia,R,and other programming languages.As an example of the use- fulness of the notebook format,look no further than the page you are reading:the entire manuscript for this book was composed as a set of IPython notebooks. IPython is about using Python effectively for interactive scientific and data-intensive computing.This chapter will start by stepping through some of the IPython features that are useful to the practice of data science,focusing especially on the syntax it offers beyond the standard features of Python.Next,we will go into a bit more depth on some of the more useful"magic commands"that can speed up common tasks ir creating and using data science code.Finally,we will touch on some of the features of the notebook that make it useful in understanding data and sharing results. 1
CHAPTER 1 IPython: Beyond Normal Python There are many options for development environments for Python, and I’m often asked which one I use in my own work. My answer sometimes surprises people: my preferred environment is IPython plus a text editor (in my case, Emacs or Atom depending on my mood). IPython (short for Interactive Python) was started in 2001 by Fernando Perez as an enhanced Python interpreter, and has since grown into a project aiming to provide, in Perez’s words, “Tools for the entire lifecycle of research computing.” If Python is the engine of our data science task, you might think of IPy‐ thon as the interactive control panel. As well as being a useful interactive interface to Python, IPython also provides a number of useful syntactic additions to the language; we’ll cover the most useful of these additions here. In addition, IPython is closely tied with the Jupyter project, which provides a browser-based notebook that is useful for development, collabora‐ tion, sharing, and even publication of data science results. The IPython notebook is actually a special case of the broader Jupyter notebook structure, which encompasses notebooks for Julia, R, and other programming languages. As an example of the use‐ fulness of the notebook format, look no further than the page you are reading: the entire manuscript for this book was composed as a set of IPython notebooks. IPython is about using Python effectively for interactive scientific and data-intensive computing. This chapter will start by stepping through some of the IPython features that are useful to the practice of data science, focusing especially on the syntax it offers beyond the standard features of Python. Next, we will go into a bit more depth on some of the more useful “magic commands” that can speed up common tasks in creating and using data science code. Finally, we will touch on some of the features of the notebook that make it useful in understanding data and sharing results. 1
Shell or Notebook? There are two primary means of using IPython that we'll discuss in this chapter:the IPython shell and the IPython notebook.The bulk of the material in this chapter is relevant to both,and the examples will switch between them depending on what is most convenient.In the few sections that are releva nt to just one or the ther,I wil explicitly state that fact.Before we start,some words on how to launch the IPython shell and IPython notebook. Launching the IPython Shell This chapter,like most of this book,is not designed to be absorbed passively.I recom- mend that as you read through it,you follow along and experiment with the tools and synta we cover: muscle -memory you build through doing this will be far more useful than the simple act of reading about it.Start by launching the IPython inter preter by typing ipython on the command line;alternatively,if you've installed a dis- tribution like Anaconda or EPD,there may be a launcher specific to your system (well discuss this more fully in"Help and Documentation in IPython"on page 3). Once you do this,you should see a prompt like the following: IPython 4.0.1--An enhanced Interactive Python. -Introduction and overview of IPython's features. ickrefQuick reference ails 'use 'object??'for extra details. In[1]: With that,you're ready to follow along. Launching the Jupyter Notebook The Jupyter notebook is a browser-based graphical interface to the IPython shell,and builds on it a richs t of dynam nic display capabilities.As well asexe uting Python IPython statements,the notebook allows the user to include formatted text,static and dynamic visualizations,mathematical equations,JavaScript widgets,and much more. Furthermore,these documents can be saved in a way that lets other people open them and execute the code on their own systems. Though the IPython notebook is viewed and edited through your web browser win dow,it must connect to a running Python process in order to execute code.To start this process(known as a"kernel"),run the following command in your system shell: jupyter notebook This command will launch a local web server that will be visible to your browser.It in nmediately spits out a log showing what it is doing:that log will look something like this: 2 Chapter 1:IPython:Beyond Normal Python
Shell or Notebook? There are two primary means of using IPython that we’ll discuss in this chapter: the IPython shell and the IPython notebook. The bulk of the material in this chapter is relevant to both, and the examples will switch between them depending on what is most convenient. In the few sections that are relevant to just one or the other, I will explicitly state that fact. Before we start, some words on how to launch the IPython shell and IPython notebook. Launching the IPython Shell This chapter, like most of this book, is not designed to be absorbed passively. I recom‐ mend that as you read through it, you follow along and experiment with the tools and syntax we cover: the muscle-memory you build through doing this will be far more useful than the simple act of reading about it. Start by launching the IPython inter‐ preter by typing ipython on the command line; alternatively, if you’ve installed a dis‐ tribution like Anaconda or EPD, there may be a launcher specific to your system (we’ll discuss this more fully in “Help and Documentation in IPython” on page 3). Once you do this, you should see a prompt like the following: IPython 4.0.1 -- An enhanced Interactive Python. ? -> Introduction and overview of IPython's features. %quickref -> Quick reference. help -> Python's own help system. object? -> Details about 'object', use 'object??' for extra details. In [1]: With that, you’re ready to follow along. Launching the Jupyter Notebook The Jupyter notebook is a browser-based graphical interface to the IPython shell, and builds on it a rich set of dynamic display capabilities. As well as executing Python/ IPython statements, the notebook allows the user to include formatted text, static and dynamic visualizations, mathematical equations, JavaScript widgets, and much more. Furthermore, these documents can be saved in a way that lets other people open them and execute the code on their own systems. Though the IPython notebook is viewed and edited through your web browser win‐ dow, it must connect to a running Python process in order to execute code. To start this process (known as a “kernel”), run the following command in your system shell: $ jupyter notebook This command will launch a local web server that will be visible to your browser. It immediately spits out a log showing what it is doing; that log will look something like this: 2 | Chapter 1: IPython: Beyond Normal Python
jupyter notebook [NotebookApp]Serving notebooks from local directory:/Users/jakevdp/... NotebookApp]0 active kernetebook is ru shut down all kernels.. Upon issuing the command,your default browser should automatically open and navigate to the listed local URL the exact address will depend on your system.If the browser does not open automatically,you can open a window and manually open this address(http://localhost:8888/in this example). Help and Documentation in IPython If you read no other section in this chapter,read this one:I find the tools discussed here to be the most transformative contributions of IPython to my daily workflow. When a technologically minded person is asked to help a friend,family member,or colleague with a computer problem,most of the time it's less a matter of knowing the answer as much as knowing how to quickly find an unknown answer.In data science it's the same:searchable web resources such as online documentation,mailing-list threads,and Stack Overflow answers contain a wealth of information,even (espe cially?)if it is a topic you've found yourself searchi ng before.Being an effective prac titioner of data science is less about memorizing the tool or command you should use for every possible situation,and more about learning to effectively find the informa- tion you don't know,whether through a web search engine or another means. One of the most useful functions of IPython/Jupyter is to shorten the gap between the user and the type of documentation and search that will help them do their work effectively.While web searches still play a role in answering complicated questions, an amazing amount of information can be found through IPython alone.Some examples of the questions IPython can help answer in a few keystrokes: .How do I call this function?What arguments and options does it have? What does the source code of this Python object look like? .What is in this package I imported?What attributes or methods does this object have? Here we'll discuss IPython's tools to quickly access this information,namely the character to explore documentation,the ?characters to explore source code,and the Tab key for autocompletion. Accessing Documentation with The Python language and its data science ecosys and one big part of that is accessto stem are built with the user in mind, documentation.Every Python object contains the Help and Documentation in IPython 3
$ jupyter notebook [NotebookApp] Serving notebooks from local directory: /Users/jakevdp/... [NotebookApp] 0 active kernels [NotebookApp] The IPython Notebook is running at: http://localhost:8888/ [NotebookApp] Use Control-C to stop this server and shut down all kernels... Upon issuing the command, your default browser should automatically open and navigate to the listed local URL; the exact address will depend on your system. If the browser does not open automatically, you can open a window and manually open this address (http://localhost:8888/ in this example). Help and Documentation in IPython If you read no other section in this chapter, read this one: I find the tools discussed here to be the most transformative contributions of IPython to my daily workflow. When a technologically minded person is asked to help a friend, family member, or colleague with a computer problem, most of the time it’s less a matter of knowing the answer as much as knowing how to quickly find an unknown answer. In data science it’s the same: searchable web resources such as online documentation, mailing-list threads, and Stack Overflow answers contain a wealth of information, even (espe‐ cially?) if it is a topic you’ve found yourself searching before. Being an effective prac‐ titioner of data science is less about memorizing the tool or command you should use for every possible situation, and more about learning to effectively find the informa‐ tion you don’t know, whether through a web search engine or another means. One of the most useful functions of IPython/Jupyter is to shorten the gap between the user and the type of documentation and search that will help them do their work effectively. While web searches still play a role in answering complicated questions, an amazing amount of information can be found through IPython alone. Some examples of the questions IPython can help answer in a few keystrokes: • How do I call this function? What arguments and options does it have? • What does the source code of this Python object look like? • What is in this package I imported? What attributes or methods does this object have? Here we’ll discuss IPython’s tools to quickly access this information, namely the ? character to explore documentation, the ?? characters to explore source code, and the Tab key for autocompletion. Accessing Documentation with ? The Python language and its data science ecosystem are built with the user in mind, and one big part of that is access to documentation. Every Python object contains the Help and Documentation in IPython | 3
reference to a string,known as a docstring,which in most cases will contain a concise summary of the object and how to use it.Python has a built-in help()function that can access this information and print the results.For example,to see the documenta tion of the built-in Len function,you can do the following: len(...) len(object)->integer Return the number of items of a sequence or mapping. Depending on your interpreter,this information may be displayed as inline text,or in some separate pop-up window. Because finding help on an object is so common and useful,IPython introduces the character as a shorthand for accessing this documentation and other relevant information: In [2]:len? Python builtin Docstring len(object)->integer Return the number of items of a sequence or mapping. This notation works for just about anything,including object methods: 1n[3]:L=[1,2,3] In [4]:L.insert? Type: builtin_function_or_method String form:<built-in method insert of list object at 0x1024b8ea8> Docstring:L.insert(index,object)--insert object before index or even objects themselves,with the documentation from their type: In [5]:L? list String form:[1,2,3] Length: 3 Docstring: terig7ettnitalaedfonterabieste Importantly,this will even work for functions or other objects you create yourself! Here we'll define a small function with a docstring: In [6]:def square(a): ""Return the square of a." 4Chapter 1:IPython:Beyond Normal Python
reference to a string, known as a docstring, which in most cases will contain a concise summary of the object and how to use it. Python has a built-in help() function that can access this information and print the results. For example, to see the documenta‐ tion of the built-in len function, you can do the following: In [1]: help(len) Help on built-in function len in module builtins: len(...) len(object) -> integer Return the number of items of a sequence or mapping. Depending on your interpreter, this information may be displayed as inline text, or in some separate pop-up window. Because finding help on an object is so common and useful, IPython introduces the ? character as a shorthand for accessing this documentation and other relevant information: In [2]: len? Type: builtin_function_or_method String form: <built-in function len> Namespace: Python builtin Docstring: len(object) -> integer Return the number of items of a sequence or mapping. This notation works for just about anything, including object methods: In [3]: L = [1, 2, 3] In [4]: L.insert? Type: builtin_function_or_method String form: <built-in method insert of list object at 0x1024b8ea8> Docstring: L.insert(index, object) -- insert object before index or even objects themselves, with the documentation from their type: In [5]: L? Type: list String form: [1, 2, 3] Length: 3 Docstring: list() -> new empty list list(iterable) -> new list initialized from iterable's items Importantly, this will even work for functions or other objects you create yourself! Here we’ll define a small function with a docstring: In [6]: def square(a): ....: """Return the square of a.""" 4 | Chapter 1: IPython: Beyond Normal Python