Content for Decem's Python course.
1{
2 "cells": [
3 {
4 "cell_type": "markdown",
5 "metadata": {
6 "slideshow": {
7 "slide_type": "slide"
8 }
9 },
10 "source": [
11 "## 11. Librería Pandas"
12 ]
13 },
14 {
15 "cell_type": "markdown",
16 "metadata": {
17 "slideshow": {
18 "slide_type": "slide"
19 }
20 },
21 "source": [
22 "**pandas** es una librería *open source* que nos proporciona estructuras de datos y herramientas de análisis de datos potentes y fáciles de usar en Python."
23 ]
24 },
25 {
26 "cell_type": "markdown",
27 "metadata": {
28 "slideshow": {
29 "slide_type": "slide"
30 }
31 },
32 "source": [
33 "Se puede instalar en nuestro entorno virtual con el siguiente comando:\n",
34 "\n",
35 "```\n",
36 "pipenv install pandas\n",
37 "```"
38 ]
39 },
40 {
41 "cell_type": "code",
42 "execution_count": 1,
43 "metadata": {
44 "slideshow": {
45 "slide_type": "fragment"
46 }
47 },
48 "outputs": [],
49 "source": [
50 "import pandas as pd\n",
51 "import numpy as np"
52 ]
53 },
54 {
55 "cell_type": "markdown",
56 "metadata": {
57 "slideshow": {
58 "slide_type": "fragment"
59 }
60 },
61 "source": [
62 "Se utiliza el alias `pd` como estándar de facto par el uso de **pandas**."
63 ]
64 },
65 {
66 "cell_type": "markdown",
67 "metadata": {
68 "slideshow": {
69 "slide_type": "slide"
70 }
71 },
72 "source": [
73 "### Series\n",
74 "\n",
75 "Una serie representa una secuencia de datos unidimensional, y se crea pasándole a pandas una lista de datos."
76 ]
77 },
78 {
79 "cell_type": "code",
80 "execution_count": 2,
81 "metadata": {
82 "slideshow": {
83 "slide_type": "fragment"
84 }
85 },
86 "outputs": [
87 {
88 "data": {
89 "text/plain": [
90 "0 1.0\n",
91 "1 3.0\n",
92 "2 5.0\n",
93 "3 NaN\n",
94 "4 6.0\n",
95 "5 8.0\n",
96 "dtype: float64"
97 ]
98 },
99 "execution_count": 2,
100 "metadata": {},
101 "output_type": "execute_result"
102 }
103 ],
104 "source": [
105 "s = pd.Series([1,3,5,np.nan,6,8])\n",
106 "s"
107 ]
108 },
109 {
110 "cell_type": "markdown",
111 "metadata": {
112 "slideshow": {
113 "slide_type": "slide"
114 }
115 },
116 "source": [
117 "### DataFrame\n",
118 "\n",
119 "Un objeto `DataFrame` representa una estructura tabular bi-dimensional que contiene datos potencialmente heterogéneos, con filas etiquetadas.\n",
120 "\n",
121 "Se pueden crear a partir de un diccionario, o de un `array` de NumPy."
122 ]
123 },
124 {
125 "cell_type": "code",
126 "execution_count": 7,
127 "metadata": {
128 "slideshow": {
129 "slide_type": "slide"
130 }
131 },
132 "outputs": [],
133 "source": [
134 "df = pd.DataFrame({\n",
135 " 'A' : [1., 2., np.nan, None],\n",
136 " 'B' : pd.Timestamp('20130102'),\n",
137 " 'C' : pd.Series(1,index=list(range(4)),dtype='float32'),\n",
138 " 'D' : np.array([3] * 4,dtype='int32'),\n",
139 " 'E' : pd.Categorical([\"test\",\"train\",\"test\",\"train\"]),\n",
140 " 'F' : 'foo'\n",
141 "})"
142 ]
143 },
144 {
145 "cell_type": "code",
146 "execution_count": 4,
147 "metadata": {
148 "slideshow": {
149 "slide_type": "slide"
150 }
151 },
152 "outputs": [
153 {
154 "data": {
155 "text/html": [
156 "<div>\n",
157 "<style scoped>\n",
158 " .dataframe tbody tr th:only-of-type {\n",
159 " vertical-align: middle;\n",
160 " }\n",
161 "\n",
162 " .dataframe tbody tr th {\n",
163 " vertical-align: top;\n",
164 " }\n",
165 "\n",
166 " .dataframe thead th {\n",
167 " text-align: right;\n",
168 " }\n",
169 "</style>\n",
170 "<table border=\"1\" class=\"dataframe\">\n",
171 " <thead>\n",
172 " <tr style=\"text-align: right;\">\n",
173 " <th></th>\n",
174 " <th>A</th>\n",
175 " <th>B</th>\n",
176 " <th>C</th>\n",
177 " <th>D</th>\n",
178 " <th>E</th>\n",
179 " <th>F</th>\n",
180 " </tr>\n",
181 " </thead>\n",
182 " <tbody>\n",
183 " <tr>\n",
184 " <th>0</th>\n",
185 " <td>1.0</td>\n",
186 " <td>2013-01-02</td>\n",
187 " <td>1.0</td>\n",
188 " <td>3</td>\n",
189 " <td>test</td>\n",
190 " <td>foo</td>\n",
191 " </tr>\n",
192 " <tr>\n",
193 " <th>1</th>\n",
194 " <td>2.0</td>\n",
195 " <td>2013-01-02</td>\n",
196 " <td>1.0</td>\n",
197 " <td>3</td>\n",
198 " <td>train</td>\n",
199 " <td>foo</td>\n",
200 " </tr>\n",
201 " <tr>\n",
202 " <th>2</th>\n",
203 " <td>NaN</td>\n",
204 " <td>2013-01-02</td>\n",
205 " <td>1.0</td>\n",
206 " <td>3</td>\n",
207 " <td>test</td>\n",
208 " <td>foo</td>\n",
209 " </tr>\n",
210 " <tr>\n",
211 " <th>3</th>\n",
212 " <td>NaN</td>\n",
213 " <td>2013-01-02</td>\n",
214 " <td>1.0</td>\n",
215 " <td>3</td>\n",
216 " <td>train</td>\n",
217 " <td>foo</td>\n",
218 " </tr>\n",
219 " </tbody>\n",
220 "</table>\n",
221 "</div>"
222 ],
223 "text/plain": [
224 " A B C D E F\n",
225 "0 1.0 2013-01-02 1.0 3 test foo\n",
226 "1 2.0 2013-01-02 1.0 3 train foo\n",
227 "2 NaN 2013-01-02 1.0 3 test foo\n",
228 "3 NaN 2013-01-02 1.0 3 train foo"
229 ]
230 },
231 "execution_count": 4,
232 "metadata": {},
233 "output_type": "execute_result"
234 }
235 ],
236 "source": [
237 "df"
238 ]
239 },
240 {
241 "cell_type": "code",
242 "execution_count": 8,
243 "metadata": {
244 "slideshow": {
245 "slide_type": "slide"
246 }
247 },
248 "outputs": [
249 {
250 "data": {
251 "text/plain": [
252 "A float64\n",
253 "B datetime64[ns]\n",
254 "C float32\n",
255 "D int32\n",
256 "E category\n",
257 "F object\n",
258 "dtype: object"
259 ]
260 },
261 "execution_count": 8,
262 "metadata": {},
263 "output_type": "execute_result"
264 }
265 ],
266 "source": [
267 "df.dtypes"
268 ]
269 },
270 {
271 "cell_type": "markdown",
272 "metadata": {
273 "slideshow": {
274 "slide_type": "slide"
275 }
276 },
277 "source": [
278 "Podemos utilizar una serie para especificar la columna de índice."
279 ]
280 },
281 {
282 "cell_type": "code",
283 "execution_count": 9,
284 "metadata": {
285 "slideshow": {
286 "slide_type": "slide"
287 }
288 },
289 "outputs": [
290 {
291 "data": {
292 "text/plain": [
293 "DatetimeIndex(['2013-01-01', '2013-01-02', '2013-01-03', '2013-01-04',\n",
294 " '2013-01-05', '2013-01-06'],\n",
295 " dtype='datetime64[ns]', freq='D')"
296 ]
297 },
298 "execution_count": 9,
299 "metadata": {},
300 "output_type": "execute_result"
301 }
302 ],
303 "source": [
304 "dates = pd.date_range('20130101', periods=6)\n",
305 "dates"
306 ]
307 },
308 {
309 "cell_type": "code",
310 "execution_count": 12,
311 "metadata": {
312 "slideshow": {
313 "slide_type": "slide"
314 }
315 },
316 "outputs": [
317 {
318 "data": {
319 "text/html": [
320 "<div>\n",
321 "<style scoped>\n",
322 " .dataframe tbody tr th:only-of-type {\n",
323 " vertical-align: middle;\n",
324 " }\n",
325 "\n",
326 " .dataframe tbody tr th {\n",
327 " vertical-align: top;\n",
328 " }\n",
329 "\n",
330 " .dataframe thead th {\n",
331 " text-align: right;\n",
332 " }\n",
333 "</style>\n",
334 "<table border=\"1\" class=\"dataframe\">\n",
335 " <thead>\n",
336 " <tr style=\"text-align: right;\">\n",
337 " <th></th>\n",
338 " <th>A</th>\n",
339 " <th>B</th>\n",
340 " <th>C</th>\n",
341 " <th>D</th>\n",
342 " </tr>\n",
343 " </thead>\n",
344 " <tbody>\n",
345 " <tr>\n",
346 " <th>2013-01-01</th>\n",
347 " <td>-0.679399</td>\n",
348 " <td>-0.564244</td>\n",
349 " <td>-0.395166</td>\n",
350 " <td>-0.004622</td>\n",
351 " </tr>\n",
352 " <tr>\n",
353 " <th>2013-01-02</th>\n",
354 " <td>2.147829</td>\n",
355 " <td>-0.991826</td>\n",
356 " <td>-1.004833</td>\n",
357 " <td>0.168517</td>\n",
358 " </tr>\n",
359 " <tr>\n",
360 " <th>2013-01-03</th>\n",
361 " <td>0.398068</td>\n",
362 " <td>-0.536610</td>\n",
363 " <td>-0.773990</td>\n",
364 " <td>-1.075894</td>\n",
365 " </tr>\n",
366 " <tr>\n",
367 " <th>2013-01-04</th>\n",
368 " <td>-1.185011</td>\n",
369 " <td>1.988697</td>\n",
370 " <td>-0.770427</td>\n",
371 " <td>-0.472499</td>\n",
372 " </tr>\n",
373 " <tr>\n",
374 " <th>2013-01-05</th>\n",
375 " <td>-0.359634</td>\n",
376 " <td>0.338176</td>\n",
377 " <td>0.105786</td>\n",
378 " <td>0.359107</td>\n",
379 " </tr>\n",
380 " <tr>\n",
381 " <th>2013-01-06</th>\n",
382 " <td>-0.555880</td>\n",
383 " <td>1.115044</td>\n",
384 " <td>-2.108126</td>\n",
385 " <td>0.139896</td>\n",
386 " </tr>\n",
387 " </tbody>\n",
388 "</table>\n",
389 "</div>"
390 ],
391 "text/plain": [
392 " A B C D\n",
393 "2013-01-01 -0.679399 -0.564244 -0.395166 -0.004622\n",
394 "2013-01-02 2.147829 -0.991826 -1.004833 0.168517\n",
395 "2013-01-03 0.398068 -0.536610 -0.773990 -1.075894\n",
396 "2013-01-04 -1.185011 1.988697 -0.770427 -0.472499\n",
397 "2013-01-05 -0.359634 0.338176 0.105786 0.359107\n",
398 "2013-01-06 -0.555880 1.115044 -2.108126 0.139896"
399 ]
400 },
401 "execution_count": 12,
402 "metadata": {},
403 "output_type": "execute_result"
404 }
405 ],
406 "source": [
407 "df2 = pd.DataFrame(np.random.randn(6,4), index=dates, columns=list('ABCD'))\n",
408 "df2"
409 ]
410 },
411 {
412 "cell_type": "markdown",
413 "metadata": {
414 "slideshow": {
415 "slide_type": "slide"
416 }
417 },
418 "source": [
419 "### Ejes\n",
420 "\n",
421 "En un DataFrame de `pandas` se pueden realizar operaciones a lo largo de los dos ejes, o `axis`.\n",
422 "\n",
423 "- Si en una operación especificamos `axis=0` nos referimos a loas índices, es decir, estaremos diciendo que la operación se realiza para todas las filas.\n",
424 "- Si en una operación especificamos `axis=1` estaremos diciendo que la operación se realiza para todas las columnas."
425 ]
426 },
427 {
428 "cell_type": "markdown",
429 "metadata": {
430 "slideshow": {
431 "slide_type": "slide"
432 }
433 },
434 "source": [
435 ""
436 ]
437 },
438 {
439 "cell_type": "markdown",
440 "metadata": {
441 "slideshow": {
442 "slide_type": "slide"
443 }
444 },
445 "source": [
446 "### Visualización de datos"
447 ]
448 },
449 {
450 "cell_type": "code",
451 "execution_count": 13,
452 "metadata": {
453 "slideshow": {
454 "slide_type": "slide"
455 }
456 },
457 "outputs": [
458 {
459 "data": {
460 "text/html": [
461 "<div>\n",
462 "<style scoped>\n",
463 " .dataframe tbody tr th:only-of-type {\n",
464 " vertical-align: middle;\n",
465 " }\n",
466 "\n",
467 " .dataframe tbody tr th {\n",
468 " vertical-align: top;\n",
469 " }\n",
470 "\n",
471 " .dataframe thead th {\n",
472 " text-align: right;\n",
473 " }\n",
474 "</style>\n",
475 "<table border=\"1\" class=\"dataframe\">\n",
476 " <thead>\n",
477 " <tr style=\"text-align: right;\">\n",
478 " <th></th>\n",
479 " <th>A</th>\n",
480 " <th>B</th>\n",
481 " <th>C</th>\n",
482 " <th>D</th>\n",
483 " </tr>\n",
484 " </thead>\n",
485 " <tbody>\n",
486 " <tr>\n",
487 " <th>2013-01-01</th>\n",
488 " <td>-0.679399</td>\n",
489 " <td>-0.564244</td>\n",
490 " <td>-0.395166</td>\n",
491 " <td>-0.004622</td>\n",
492 " </tr>\n",
493 " <tr>\n",
494 " <th>2013-01-02</th>\n",
495 " <td>2.147829</td>\n",
496 " <td>-0.991826</td>\n",
497 " <td>-1.004833</td>\n",
498 " <td>0.168517</td>\n",
499 " </tr>\n",
500 " </tbody>\n",
501 "</table>\n",
502 "</div>"
503 ],
504 "text/plain": [
505 " A B C D\n",
506 "2013-01-01 -0.679399 -0.564244 -0.395166 -0.004622\n",
507 "2013-01-02 2.147829 -0.991826 -1.004833 0.168517"
508 ]
509 },
510 "execution_count": 13,
511 "metadata": {},
512 "output_type": "execute_result"
513 }
514 ],
515 "source": [
516 "df2.head(2)"
517 ]
518 },
519 {
520 "cell_type": "code",
521 "execution_count": 14,
522 "metadata": {
523 "slideshow": {
524 "slide_type": "slide"
525 }
526 },
527 "outputs": [
528 {
529 "data": {
530 "text/html": [
531 "<div>\n",
532 "<style scoped>\n",
533 " .dataframe tbody tr th:only-of-type {\n",
534 " vertical-align: middle;\n",
535 " }\n",
536 "\n",
537 " .dataframe tbody tr th {\n",
538 " vertical-align: top;\n",
539 " }\n",
540 "\n",
541 " .dataframe thead th {\n",
542 " text-align: right;\n",
543 " }\n",
544 "</style>\n",
545 "<table border=\"1\" class=\"dataframe\">\n",
546 " <thead>\n",
547 " <tr style=\"text-align: right;\">\n",
548 " <th></th>\n",
549 " <th>A</th>\n",
550 " <th>B</th>\n",
551 " <th>C</th>\n",
552 " <th>D</th>\n",
553 " </tr>\n",
554 " </thead>\n",
555 " <tbody>\n",
556 " <tr>\n",
557 " <th>2013-01-05</th>\n",
558 " <td>-0.359634</td>\n",
559 " <td>0.338176</td>\n",
560 " <td>0.105786</td>\n",
561 " <td>0.359107</td>\n",
562 " </tr>\n",
563 " <tr>\n",
564 " <th>2013-01-06</th>\n",
565 " <td>-0.555880</td>\n",
566 " <td>1.115044</td>\n",
567 " <td>-2.108126</td>\n",
568 " <td>0.139896</td>\n",
569 " </tr>\n",
570 " </tbody>\n",
571 "</table>\n",
572 "</div>"
573 ],
574 "text/plain": [
575 " A B C D\n",
576 "2013-01-05 -0.359634 0.338176 0.105786 0.359107\n",
577 "2013-01-06 -0.555880 1.115044 -2.108126 0.139896"
578 ]
579 },
580 "execution_count": 14,
581 "metadata": {},
582 "output_type": "execute_result"
583 }
584 ],
585 "source": [
586 "df2.tail(2)"
587 ]
588 },
589 {
590 "cell_type": "code",
591 "execution_count": 15,
592 "metadata": {
593 "slideshow": {
594 "slide_type": "slide"
595 }
596 },
597 "outputs": [
598 {
599 "data": {
600 "text/plain": [
601 "DatetimeIndex(['2013-01-01', '2013-01-02', '2013-01-03', '2013-01-04',\n",
602 " '2013-01-05', '2013-01-06'],\n",
603 " dtype='datetime64[ns]', freq='D')"
604 ]
605 },
606 "execution_count": 15,
607 "metadata": {},
608 "output_type": "execute_result"
609 }
610 ],
611 "source": [
612 "df2.index"
613 ]
614 },
615 {
616 "cell_type": "code",
617 "execution_count": 16,
618 "metadata": {
619 "slideshow": {
620 "slide_type": "slide"
621 }
622 },
623 "outputs": [
624 {
625 "data": {
626 "text/plain": [
627 "Index(['A', 'B', 'C', 'D'], dtype='object')"
628 ]
629 },
630 "execution_count": 16,
631 "metadata": {},
632 "output_type": "execute_result"
633 }
634 ],
635 "source": [
636 "df2.columns"
637 ]
638 },
639 {
640 "cell_type": "markdown",
641 "metadata": {
642 "slideshow": {
643 "slide_type": "slide"
644 }
645 },
646 "source": [
647 "#### DataFrame.to_numpy()\n",
648 "\n",
649 "El método `.to_numpy()` de un DataFrame nos da una representación en una estructura de datos de `numpy` de los datos del DataFrame,"
650 ]
651 },
652 {
653 "cell_type": "code",
654 "execution_count": 18,
655 "metadata": {
656 "slideshow": {
657 "slide_type": "slide"
658 }
659 },
660 "outputs": [
661 {
662 "data": {
663 "text/plain": [
664 "array([[-0.67939947, -0.5642441 , -0.39516608, -0.00462202],\n",
665 " [ 2.14782856, -0.99182561, -1.00483345, 0.16851747],\n",
666 " [ 0.39806756, -0.53661026, -0.77399033, -1.07589368],\n",
667 " [-1.18501088, 1.98869725, -0.77042661, -0.47249893],\n",
668 " [-0.35963418, 0.3381756 , 0.10578614, 0.35910665],\n",
669 " [-0.55588001, 1.11504445, -2.10812582, 0.13989579]])"
670 ]
671 },
672 "execution_count": 18,
673 "metadata": {},
674 "output_type": "execute_result"
675 }
676 ],
677 "source": [
678 "df2.to_numpy()\n"
679 ]
680 },
681 {
682 "cell_type": "markdown",
683 "metadata": {
684 "slideshow": {
685 "slide_type": "slide"
686 }
687 },
688 "source": [
689 "#### Describe\n",
690 "\n",
691 "El método `.describe()` nos muestra un resumen estadístico de los datos."
692 ]
693 },
694 {
695 "cell_type": "code",
696 "execution_count": 19,
697 "metadata": {
698 "slideshow": {
699 "slide_type": "slide"
700 }
701 },
702 "outputs": [
703 {
704 "data": {
705 "text/html": [
706 "<div>\n",
707 "<style scoped>\n",
708 " .dataframe tbody tr th:only-of-type {\n",
709 " vertical-align: middle;\n",
710 " }\n",
711 "\n",
712 " .dataframe tbody tr th {\n",
713 " vertical-align: top;\n",
714 " }\n",
715 "\n",
716 " .dataframe thead th {\n",
717 " text-align: right;\n",
718 " }\n",
719 "</style>\n",
720 "<table border=\"1\" class=\"dataframe\">\n",
721 " <thead>\n",
722 " <tr style=\"text-align: right;\">\n",
723 " <th></th>\n",
724 " <th>A</th>\n",
725 " <th>B</th>\n",
726 " <th>C</th>\n",
727 " <th>D</th>\n",
728 " </tr>\n",
729 " </thead>\n",
730 " <tbody>\n",
731 " <tr>\n",
732 " <th>count</th>\n",
733 " <td>6.000000</td>\n",
734 " <td>6.000000</td>\n",
735 " <td>6.000000</td>\n",
736 " <td>6.000000</td>\n",
737 " </tr>\n",
738 " <tr>\n",
739 " <th>mean</th>\n",
740 " <td>-0.039005</td>\n",
741 " <td>0.224873</td>\n",
742 " <td>-0.824459</td>\n",
743 " <td>-0.147582</td>\n",
744 " </tr>\n",
745 " <tr>\n",
746 " <th>std</th>\n",
747 " <td>1.188837</td>\n",
748 " <td>1.148846</td>\n",
749 " <td>0.739655</td>\n",
750 " <td>0.534241</td>\n",
751 " </tr>\n",
752 " <tr>\n",
753 " <th>min</th>\n",
754 " <td>-1.185011</td>\n",
755 " <td>-0.991826</td>\n",
756 " <td>-2.108126</td>\n",
757 " <td>-1.075894</td>\n",
758 " </tr>\n",
759 " <tr>\n",
760 " <th>25%</th>\n",
761 " <td>-0.648520</td>\n",
762 " <td>-0.557336</td>\n",
763 " <td>-0.947123</td>\n",
764 " <td>-0.355530</td>\n",
765 " </tr>\n",
766 " <tr>\n",
767 " <th>50%</th>\n",
768 " <td>-0.457757</td>\n",
769 " <td>-0.099217</td>\n",
770 " <td>-0.772208</td>\n",
771 " <td>0.067637</td>\n",
772 " </tr>\n",
773 " <tr>\n",
774 " <th>75%</th>\n",
775 " <td>0.208642</td>\n",
776 " <td>0.920827</td>\n",
777 " <td>-0.488981</td>\n",
778 " <td>0.161362</td>\n",
779 " </tr>\n",
780 " <tr>\n",
781 " <th>max</th>\n",
782 " <td>2.147829</td>\n",
783 " <td>1.988697</td>\n",
784 " <td>0.105786</td>\n",
785 " <td>0.359107</td>\n",
786 " </tr>\n",
787 " </tbody>\n",
788 "</table>\n",
789 "</div>"
790 ],
791 "text/plain": [
792 " A B C D\n",
793 "count 6.000000 6.000000 6.000000 6.000000\n",
794 "mean -0.039005 0.224873 -0.824459 -0.147582\n",
795 "std 1.188837 1.148846 0.739655 0.534241\n",
796 "min -1.185011 -0.991826 -2.108126 -1.075894\n",
797 "25% -0.648520 -0.557336 -0.947123 -0.355530\n",
798 "50% -0.457757 -0.099217 -0.772208 0.067637\n",
799 "75% 0.208642 0.920827 -0.488981 0.161362\n",
800 "max 2.147829 1.988697 0.105786 0.359107"
801 ]
802 },
803 "execution_count": 19,
804 "metadata": {},
805 "output_type": "execute_result"
806 }
807 ],
808 "source": [
809 "df2.describe()"
810 ]
811 },
812 {
813 "cell_type": "markdown",
814 "metadata": {
815 "slideshow": {
816 "slide_type": "slide"
817 }
818 },
819 "source": [
820 "#### Transposición\n",
821 "\n",
822 "Podemos obtener el DataFrame transpuesto de uno dado a través del atributo `T`."
823 ]
824 },
825 {
826 "cell_type": "code",
827 "execution_count": 20,
828 "metadata": {
829 "slideshow": {
830 "slide_type": "slide"
831 }
832 },
833 "outputs": [
834 {
835 "data": {
836 "text/html": [
837 "<div>\n",
838 "<style scoped>\n",
839 " .dataframe tbody tr th:only-of-type {\n",
840 " vertical-align: middle;\n",
841 " }\n",
842 "\n",
843 " .dataframe tbody tr th {\n",
844 " vertical-align: top;\n",
845 " }\n",
846 "\n",
847 " .dataframe thead th {\n",
848 " text-align: right;\n",
849 " }\n",
850 "</style>\n",
851 "<table border=\"1\" class=\"dataframe\">\n",
852 " <thead>\n",
853 " <tr style=\"text-align: right;\">\n",
854 " <th></th>\n",
855 " <th>2013-01-01</th>\n",
856 " <th>2013-01-02</th>\n",
857 " <th>2013-01-03</th>\n",
858 " <th>2013-01-04</th>\n",
859 " <th>2013-01-05</th>\n",
860 " <th>2013-01-06</th>\n",
861 " </tr>\n",
862 " </thead>\n",
863 " <tbody>\n",
864 " <tr>\n",
865 " <th>A</th>\n",
866 " <td>-0.679399</td>\n",
867 " <td>2.147829</td>\n",
868 " <td>0.398068</td>\n",
869 " <td>-1.185011</td>\n",
870 " <td>-0.359634</td>\n",
871 " <td>-0.555880</td>\n",
872 " </tr>\n",
873 " <tr>\n",
874 " <th>B</th>\n",
875 " <td>-0.564244</td>\n",
876 " <td>-0.991826</td>\n",
877 " <td>-0.536610</td>\n",
878 " <td>1.988697</td>\n",
879 " <td>0.338176</td>\n",
880 " <td>1.115044</td>\n",
881 " </tr>\n",
882 " <tr>\n",
883 " <th>C</th>\n",
884 " <td>-0.395166</td>\n",
885 " <td>-1.004833</td>\n",
886 " <td>-0.773990</td>\n",
887 " <td>-0.770427</td>\n",
888 " <td>0.105786</td>\n",
889 " <td>-2.108126</td>\n",
890 " </tr>\n",
891 " <tr>\n",
892 " <th>D</th>\n",
893 " <td>-0.004622</td>\n",
894 " <td>0.168517</td>\n",
895 " <td>-1.075894</td>\n",
896 " <td>-0.472499</td>\n",
897 " <td>0.359107</td>\n",
898 " <td>0.139896</td>\n",
899 " </tr>\n",
900 " </tbody>\n",
901 "</table>\n",
902 "</div>"
903 ],
904 "text/plain": [
905 " 2013-01-01 2013-01-02 2013-01-03 2013-01-04 2013-01-05 2013-01-06\n",
906 "A -0.679399 2.147829 0.398068 -1.185011 -0.359634 -0.555880\n",
907 "B -0.564244 -0.991826 -0.536610 1.988697 0.338176 1.115044\n",
908 "C -0.395166 -1.004833 -0.773990 -0.770427 0.105786 -2.108126\n",
909 "D -0.004622 0.168517 -1.075894 -0.472499 0.359107 0.139896"
910 ]
911 },
912 "execution_count": 20,
913 "metadata": {},
914 "output_type": "execute_result"
915 }
916 ],
917 "source": [
918 "df2.T"
919 ]
920 },
921 {
922 "cell_type": "markdown",
923 "metadata": {
924 "slideshow": {
925 "slide_type": "slide"
926 }
927 },
928 "source": [
929 "#### Ordenación\n",
930 "\n",
931 "Podemos ordenar los datos por alguno de los ejes o por valores."
932 ]
933 },
934 {
935 "cell_type": "code",
936 "execution_count": 21,
937 "metadata": {
938 "slideshow": {
939 "slide_type": "slide"
940 }
941 },
942 "outputs": [
943 {
944 "data": {
945 "text/html": [
946 "<div>\n",
947 "<style scoped>\n",
948 " .dataframe tbody tr th:only-of-type {\n",
949 " vertical-align: middle;\n",
950 " }\n",
951 "\n",
952 " .dataframe tbody tr th {\n",
953 " vertical-align: top;\n",
954 " }\n",
955 "\n",
956 " .dataframe thead th {\n",
957 " text-align: right;\n",
958 " }\n",
959 "</style>\n",
960 "<table border=\"1\" class=\"dataframe\">\n",
961 " <thead>\n",
962 " <tr style=\"text-align: right;\">\n",
963 " <th></th>\n",
964 " <th>D</th>\n",
965 " <th>C</th>\n",
966 " <th>B</th>\n",
967 " <th>A</th>\n",
968 " </tr>\n",
969 " </thead>\n",
970 " <tbody>\n",
971 " <tr>\n",
972 " <th>2013-01-01</th>\n",
973 " <td>-0.004622</td>\n",
974 " <td>-0.395166</td>\n",
975 " <td>-0.564244</td>\n",
976 " <td>-0.679399</td>\n",
977 " </tr>\n",
978 " <tr>\n",
979 " <th>2013-01-02</th>\n",
980 " <td>0.168517</td>\n",
981 " <td>-1.004833</td>\n",
982 " <td>-0.991826</td>\n",
983 " <td>2.147829</td>\n",
984 " </tr>\n",
985 " <tr>\n",
986 " <th>2013-01-03</th>\n",
987 " <td>-1.075894</td>\n",
988 " <td>-0.773990</td>\n",
989 " <td>-0.536610</td>\n",
990 " <td>0.398068</td>\n",
991 " </tr>\n",
992 " <tr>\n",
993 " <th>2013-01-04</th>\n",
994 " <td>-0.472499</td>\n",
995 " <td>-0.770427</td>\n",
996 " <td>1.988697</td>\n",
997 " <td>-1.185011</td>\n",
998 " </tr>\n",
999 " <tr>\n",
1000 " <th>2013-01-05</th>\n",
1001 " <td>0.359107</td>\n",
1002 " <td>0.105786</td>\n",
1003 " <td>0.338176</td>\n",
1004 " <td>-0.359634</td>\n",
1005 " </tr>\n",
1006 " <tr>\n",
1007 " <th>2013-01-06</th>\n",
1008 " <td>0.139896</td>\n",
1009 " <td>-2.108126</td>\n",
1010 " <td>1.115044</td>\n",
1011 " <td>-0.555880</td>\n",
1012 " </tr>\n",
1013 " </tbody>\n",
1014 "</table>\n",
1015 "</div>"
1016 ],
1017 "text/plain": [
1018 " D C B A\n",
1019 "2013-01-01 -0.004622 -0.395166 -0.564244 -0.679399\n",
1020 "2013-01-02 0.168517 -1.004833 -0.991826 2.147829\n",
1021 "2013-01-03 -1.075894 -0.773990 -0.536610 0.398068\n",
1022 "2013-01-04 -0.472499 -0.770427 1.988697 -1.185011\n",
1023 "2013-01-05 0.359107 0.105786 0.338176 -0.359634\n",
1024 "2013-01-06 0.139896 -2.108126 1.115044 -0.555880"
1025 ]
1026 },
1027 "execution_count": 21,
1028 "metadata": {},
1029 "output_type": "execute_result"
1030 }
1031 ],
1032 "source": [
1033 "df2.sort_index(axis=1, ascending=False)"
1034 ]
1035 },
1036 {
1037 "cell_type": "code",
1038 "execution_count": 22,
1039 "metadata": {
1040 "slideshow": {
1041 "slide_type": "slide"
1042 }
1043 },
1044 "outputs": [
1045 {
1046 "data": {
1047 "text/html": [
1048 "<div>\n",
1049 "<style scoped>\n",
1050 " .dataframe tbody tr th:only-of-type {\n",
1051 " vertical-align: middle;\n",
1052 " }\n",
1053 "\n",
1054 " .dataframe tbody tr th {\n",
1055 " vertical-align: top;\n",
1056 " }\n",
1057 "\n",
1058 " .dataframe thead th {\n",
1059 " text-align: right;\n",
1060 " }\n",
1061 "</style>\n",
1062 "<table border=\"1\" class=\"dataframe\">\n",
1063 " <thead>\n",
1064 " <tr style=\"text-align: right;\">\n",
1065 " <th></th>\n",
1066 " <th>A</th>\n",
1067 " <th>B</th>\n",
1068 " <th>C</th>\n",
1069 " <th>D</th>\n",
1070 " </tr>\n",
1071 " </thead>\n",
1072 " <tbody>\n",
1073 " <tr>\n",
1074 " <th>2013-01-02</th>\n",
1075 " <td>2.147829</td>\n",
1076 " <td>-0.991826</td>\n",
1077 " <td>-1.004833</td>\n",
1078 " <td>0.168517</td>\n",
1079 " </tr>\n",
1080 " <tr>\n",
1081 " <th>2013-01-01</th>\n",
1082 " <td>-0.679399</td>\n",
1083 " <td>-0.564244</td>\n",
1084 " <td>-0.395166</td>\n",
1085 " <td>-0.004622</td>\n",
1086 " </tr>\n",
1087 " <tr>\n",
1088 " <th>2013-01-03</th>\n",
1089 " <td>0.398068</td>\n",
1090 " <td>-0.536610</td>\n",
1091 " <td>-0.773990</td>\n",
1092 " <td>-1.075894</td>\n",
1093 " </tr>\n",
1094 " <tr>\n",
1095 " <th>2013-01-05</th>\n",
1096 " <td>-0.359634</td>\n",
1097 " <td>0.338176</td>\n",
1098 " <td>0.105786</td>\n",
1099 " <td>0.359107</td>\n",
1100 " </tr>\n",
1101 " <tr>\n",
1102 " <th>2013-01-06</th>\n",
1103 " <td>-0.555880</td>\n",
1104 " <td>1.115044</td>\n",
1105 " <td>-2.108126</td>\n",
1106 " <td>0.139896</td>\n",
1107 " </tr>\n",
1108 " <tr>\n",
1109 " <th>2013-01-04</th>\n",
1110 " <td>-1.185011</td>\n",
1111 " <td>1.988697</td>\n",
1112 " <td>-0.770427</td>\n",
1113 " <td>-0.472499</td>\n",
1114 " </tr>\n",
1115 " </tbody>\n",
1116 "</table>\n",
1117 "</div>"
1118 ],
1119 "text/plain": [
1120 " A B C D\n",
1121 "2013-01-02 2.147829 -0.991826 -1.004833 0.168517\n",
1122 "2013-01-01 -0.679399 -0.564244 -0.395166 -0.004622\n",
1123 "2013-01-03 0.398068 -0.536610 -0.773990 -1.075894\n",
1124 "2013-01-05 -0.359634 0.338176 0.105786 0.359107\n",
1125 "2013-01-06 -0.555880 1.115044 -2.108126 0.139896\n",
1126 "2013-01-04 -1.185011 1.988697 -0.770427 -0.472499"
1127 ]
1128 },
1129 "execution_count": 22,
1130 "metadata": {},
1131 "output_type": "execute_result"
1132 }
1133 ],
1134 "source": [
1135 "df2.sort_values(by='B')"
1136 ]
1137 },
1138 {
1139 "cell_type": "markdown",
1140 "metadata": {
1141 "slideshow": {
1142 "slide_type": "slide"
1143 }
1144 },
1145 "source": [
1146 "### Selección \n",
1147 "\n",
1148 "Podemos obtener una selección de los datos usando los métodos estándar de Python o `numpy` para la obtener *slices* en listas o matrices."
1149 ]
1150 },
1151 {
1152 "cell_type": "markdown",
1153 "metadata": {
1154 "slideshow": {
1155 "slide_type": "fragment"
1156 }
1157 },
1158 "source": [
1159 "Además, `pandas` proporcia métodos especializados (y optimizados) para el acceso a los datos:"
1160 ]
1161 },
1162 {
1163 "cell_type": "markdown",
1164 "metadata": {
1165 "slideshow": {
1166 "slide_type": "subslide"
1167 }
1168 },
1169 "source": [
1170 "`.loc`\n",
1171 "\n",
1172 "Se utiliza principalmente para acceder por etiqueta. Soporta los siguietnes tipos de entradas:\n",
1173 "\n",
1174 "- Una etiqueta única: df.loc['a']\n",
1175 "- Una lista o array de etiqueta: df.loc[['a', 'b', 'c']]\n",
1176 "- Un *slice* con etiquetas: df.loc[a':'f']"
1177 ]
1178 },
1179 {
1180 "cell_type": "markdown",
1181 "metadata": {
1182 "slideshow": {
1183 "slide_type": "subslide"
1184 }
1185 },
1186 "source": [
1187 "`.iloc`\n",
1188 "\n",
1189 "Se utiliza principalmente para acceder posición. Soporta los siguietnes tipos de entradas:\n",
1190 "\n",
1191 "- Una entero: df.iloc[0]\n",
1192 "- Una lista o array de enteros: df.iloc[[0, 1, 2]]\n",
1193 "- Un *slice* : df.loc[1:3]"
1194 ]
1195 },
1196 {
1197 "cell_type": "markdown",
1198 "metadata": {
1199 "slideshow": {
1200 "slide_type": "slide"
1201 }
1202 },
1203 "source": [
1204 "\n",
1205 "Tipo de objeto | Selección | Valor retornado\n",
1206 "---------------|----------------|-------------------------------------\n",
1207 "Series | series[label] | valor escalar\n",
1208 "DataFrame | frame[colname] | La serie correspondiente a `colname`"
1209 ]
1210 },
1211 {
1212 "cell_type": "code",
1213 "execution_count": 23,
1214 "metadata": {
1215 "slideshow": {
1216 "slide_type": "slide"
1217 }
1218 },
1219 "outputs": [
1220 {
1221 "data": {
1222 "text/plain": [
1223 "2013-01-01 -0.679399\n",
1224 "2013-01-02 2.147829\n",
1225 "2013-01-03 0.398068\n",
1226 "2013-01-04 -1.185011\n",
1227 "2013-01-05 -0.359634\n",
1228 "2013-01-06 -0.555880\n",
1229 "Freq: D, Name: A, dtype: float64"
1230 ]
1231 },
1232 "execution_count": 23,
1233 "metadata": {},
1234 "output_type": "execute_result"
1235 }
1236 ],
1237 "source": [
1238 "df2['A']"
1239 ]
1240 },
1241 {
1242 "cell_type": "code",
1243 "execution_count": 24,
1244 "metadata": {
1245 "slideshow": {
1246 "slide_type": "fragment"
1247 }
1248 },
1249 "outputs": [
1250 {
1251 "data": {
1252 "text/plain": [
1253 "2013-01-01 -0.679399\n",
1254 "2013-01-02 2.147829\n",
1255 "2013-01-03 0.398068\n",
1256 "2013-01-04 -1.185011\n",
1257 "2013-01-05 -0.359634\n",
1258 "2013-01-06 -0.555880\n",
1259 "Freq: D, Name: A, dtype: float64"
1260 ]
1261 },
1262 "execution_count": 24,
1263 "metadata": {},
1264 "output_type": "execute_result"
1265 }
1266 ],
1267 "source": [
1268 "df2.A"
1269 ]
1270 },
1271 {
1272 "cell_type": "code",
1273 "execution_count": 25,
1274 "metadata": {
1275 "slideshow": {
1276 "slide_type": "slide"
1277 }
1278 },
1279 "outputs": [
1280 {
1281 "data": {
1282 "text/html": [
1283 "<div>\n",
1284 "<style scoped>\n",
1285 " .dataframe tbody tr th:only-of-type {\n",
1286 " vertical-align: middle;\n",
1287 " }\n",
1288 "\n",
1289 " .dataframe tbody tr th {\n",
1290 " vertical-align: top;\n",
1291 " }\n",
1292 "\n",
1293 " .dataframe thead th {\n",
1294 " text-align: right;\n",
1295 " }\n",
1296 "</style>\n",
1297 "<table border=\"1\" class=\"dataframe\">\n",
1298 " <thead>\n",
1299 " <tr style=\"text-align: right;\">\n",
1300 " <th></th>\n",
1301 " <th>A</th>\n",
1302 " <th>B</th>\n",
1303 " <th>C</th>\n",
1304 " <th>D</th>\n",
1305 " </tr>\n",
1306 " </thead>\n",
1307 " <tbody>\n",
1308 " <tr>\n",
1309 " <th>2013-01-01</th>\n",
1310 " <td>-0.679399</td>\n",
1311 " <td>-0.564244</td>\n",
1312 " <td>-0.395166</td>\n",
1313 " <td>-0.004622</td>\n",
1314 " </tr>\n",
1315 " <tr>\n",
1316 " <th>2013-01-02</th>\n",
1317 " <td>2.147829</td>\n",
1318 " <td>-0.991826</td>\n",
1319 " <td>-1.004833</td>\n",
1320 " <td>0.168517</td>\n",
1321 " </tr>\n",
1322 " <tr>\n",
1323 " <th>2013-01-03</th>\n",
1324 " <td>0.398068</td>\n",
1325 " <td>-0.536610</td>\n",
1326 " <td>-0.773990</td>\n",
1327 " <td>-1.075894</td>\n",
1328 " </tr>\n",
1329 " </tbody>\n",
1330 "</table>\n",
1331 "</div>"
1332 ],
1333 "text/plain": [
1334 " A B C D\n",
1335 "2013-01-01 -0.679399 -0.564244 -0.395166 -0.004622\n",
1336 "2013-01-02 2.147829 -0.991826 -1.004833 0.168517\n",
1337 "2013-01-03 0.398068 -0.536610 -0.773990 -1.075894"
1338 ]
1339 },
1340 "execution_count": 25,
1341 "metadata": {},
1342 "output_type": "execute_result"
1343 }
1344 ],
1345 "source": [
1346 "df2[0:3]"
1347 ]
1348 },
1349 {
1350 "cell_type": "code",
1351 "execution_count": 26,
1352 "metadata": {
1353 "slideshow": {
1354 "slide_type": "fragment"
1355 }
1356 },
1357 "outputs": [
1358 {
1359 "data": {
1360 "text/html": [
1361 "<div>\n",
1362 "<style scoped>\n",
1363 " .dataframe tbody tr th:only-of-type {\n",
1364 " vertical-align: middle;\n",
1365 " }\n",
1366 "\n",
1367 " .dataframe tbody tr th {\n",
1368 " vertical-align: top;\n",
1369 " }\n",
1370 "\n",
1371 " .dataframe thead th {\n",
1372 " text-align: right;\n",
1373 " }\n",
1374 "</style>\n",
1375 "<table border=\"1\" class=\"dataframe\">\n",
1376 " <thead>\n",
1377 " <tr style=\"text-align: right;\">\n",
1378 " <th></th>\n",
1379 " <th>A</th>\n",
1380 " <th>B</th>\n",
1381 " <th>C</th>\n",
1382 " <th>D</th>\n",
1383 " </tr>\n",
1384 " </thead>\n",
1385 " <tbody>\n",
1386 " <tr>\n",
1387 " <th>2013-01-02</th>\n",
1388 " <td>2.147829</td>\n",
1389 " <td>-0.991826</td>\n",
1390 " <td>-1.004833</td>\n",
1391 " <td>0.168517</td>\n",
1392 " </tr>\n",
1393 " <tr>\n",
1394 " <th>2013-01-03</th>\n",
1395 " <td>0.398068</td>\n",
1396 " <td>-0.536610</td>\n",
1397 " <td>-0.773990</td>\n",
1398 " <td>-1.075894</td>\n",
1399 " </tr>\n",
1400 " <tr>\n",
1401 " <th>2013-01-04</th>\n",
1402 " <td>-1.185011</td>\n",
1403 " <td>1.988697</td>\n",
1404 " <td>-0.770427</td>\n",
1405 " <td>-0.472499</td>\n",
1406 " </tr>\n",
1407 " </tbody>\n",
1408 "</table>\n",
1409 "</div>"
1410 ],
1411 "text/plain": [
1412 " A B C D\n",
1413 "2013-01-02 2.147829 -0.991826 -1.004833 0.168517\n",
1414 "2013-01-03 0.398068 -0.536610 -0.773990 -1.075894\n",
1415 "2013-01-04 -1.185011 1.988697 -0.770427 -0.472499"
1416 ]
1417 },
1418 "execution_count": 26,
1419 "metadata": {},
1420 "output_type": "execute_result"
1421 }
1422 ],
1423 "source": [
1424 "df2['20130102':'20130104']"
1425 ]
1426 },
1427 {
1428 "cell_type": "code",
1429 "execution_count": 27,
1430 "metadata": {
1431 "slideshow": {
1432 "slide_type": "fragment"
1433 }
1434 },
1435 "outputs": [
1436 {
1437 "data": {
1438 "text/plain": [
1439 "A -0.679399\n",
1440 "B -0.564244\n",
1441 "C -0.395166\n",
1442 "D -0.004622\n",
1443 "Name: 2013-01-01 00:00:00, dtype: float64"
1444 ]
1445 },
1446 "execution_count": 27,
1447 "metadata": {},
1448 "output_type": "execute_result"
1449 }
1450 ],
1451 "source": [
1452 "df2.loc[dates[0]]"
1453 ]
1454 },
1455 {
1456 "cell_type": "code",
1457 "execution_count": 28,
1458 "metadata": {
1459 "slideshow": {
1460 "slide_type": "slide"
1461 }
1462 },
1463 "outputs": [
1464 {
1465 "data": {
1466 "text/html": [
1467 "<div>\n",
1468 "<style scoped>\n",
1469 " .dataframe tbody tr th:only-of-type {\n",
1470 " vertical-align: middle;\n",
1471 " }\n",
1472 "\n",
1473 " .dataframe tbody tr th {\n",
1474 " vertical-align: top;\n",
1475 " }\n",
1476 "\n",
1477 " .dataframe thead th {\n",
1478 " text-align: right;\n",
1479 " }\n",
1480 "</style>\n",
1481 "<table border=\"1\" class=\"dataframe\">\n",
1482 " <thead>\n",
1483 " <tr style=\"text-align: right;\">\n",
1484 " <th></th>\n",
1485 " <th>A</th>\n",
1486 " <th>B</th>\n",
1487 " </tr>\n",
1488 " </thead>\n",
1489 " <tbody>\n",
1490 " <tr>\n",
1491 " <th>2013-01-01</th>\n",
1492 " <td>-0.679399</td>\n",
1493 " <td>-0.564244</td>\n",
1494 " </tr>\n",
1495 " <tr>\n",
1496 " <th>2013-01-02</th>\n",
1497 " <td>2.147829</td>\n",
1498 " <td>-0.991826</td>\n",
1499 " </tr>\n",
1500 " <tr>\n",
1501 " <th>2013-01-03</th>\n",
1502 " <td>0.398068</td>\n",
1503 " <td>-0.536610</td>\n",
1504 " </tr>\n",
1505 " <tr>\n",
1506 " <th>2013-01-04</th>\n",
1507 " <td>-1.185011</td>\n",
1508 " <td>1.988697</td>\n",
1509 " </tr>\n",
1510 " <tr>\n",
1511 " <th>2013-01-05</th>\n",
1512 " <td>-0.359634</td>\n",
1513 " <td>0.338176</td>\n",
1514 " </tr>\n",
1515 " <tr>\n",
1516 " <th>2013-01-06</th>\n",
1517 " <td>-0.555880</td>\n",
1518 " <td>1.115044</td>\n",
1519 " </tr>\n",
1520 " </tbody>\n",
1521 "</table>\n",
1522 "</div>"
1523 ],
1524 "text/plain": [
1525 " A B\n",
1526 "2013-01-01 -0.679399 -0.564244\n",
1527 "2013-01-02 2.147829 -0.991826\n",
1528 "2013-01-03 0.398068 -0.536610\n",
1529 "2013-01-04 -1.185011 1.988697\n",
1530 "2013-01-05 -0.359634 0.338176\n",
1531 "2013-01-06 -0.555880 1.115044"
1532 ]
1533 },
1534 "execution_count": 28,
1535 "metadata": {},
1536 "output_type": "execute_result"
1537 }
1538 ],
1539 "source": [
1540 "df2.loc[:, ['A', 'B']]"
1541 ]
1542 },
1543 {
1544 "cell_type": "code",
1545 "execution_count": 29,
1546 "metadata": {
1547 "slideshow": {
1548 "slide_type": "slide"
1549 }
1550 },
1551 "outputs": [
1552 {
1553 "data": {
1554 "text/html": [
1555 "<div>\n",
1556 "<style scoped>\n",
1557 " .dataframe tbody tr th:only-of-type {\n",
1558 " vertical-align: middle;\n",
1559 " }\n",
1560 "\n",
1561 " .dataframe tbody tr th {\n",
1562 " vertical-align: top;\n",
1563 " }\n",
1564 "\n",
1565 " .dataframe thead th {\n",
1566 " text-align: right;\n",
1567 " }\n",
1568 "</style>\n",
1569 "<table border=\"1\" class=\"dataframe\">\n",
1570 " <thead>\n",
1571 " <tr style=\"text-align: right;\">\n",
1572 " <th></th>\n",
1573 " <th>A</th>\n",
1574 " <th>B</th>\n",
1575 " </tr>\n",
1576 " </thead>\n",
1577 " <tbody>\n",
1578 " <tr>\n",
1579 " <th>2013-01-02</th>\n",
1580 " <td>2.147829</td>\n",
1581 " <td>-0.991826</td>\n",
1582 " </tr>\n",
1583 " <tr>\n",
1584 " <th>2013-01-03</th>\n",
1585 " <td>0.398068</td>\n",
1586 " <td>-0.536610</td>\n",
1587 " </tr>\n",
1588 " <tr>\n",
1589 " <th>2013-01-04</th>\n",
1590 " <td>-1.185011</td>\n",
1591 " <td>1.988697</td>\n",
1592 " </tr>\n",
1593 " </tbody>\n",
1594 "</table>\n",
1595 "</div>"
1596 ],
1597 "text/plain": [
1598 " A B\n",
1599 "2013-01-02 2.147829 -0.991826\n",
1600 "2013-01-03 0.398068 -0.536610\n",
1601 "2013-01-04 -1.185011 1.988697"
1602 ]
1603 },
1604 "execution_count": 29,
1605 "metadata": {},
1606 "output_type": "execute_result"
1607 }
1608 ],
1609 "source": [
1610 "df2.loc['20130102':'20130104', ['A', 'B']]"
1611 ]
1612 },
1613 {
1614 "cell_type": "code",
1615 "execution_count": 30,
1616 "metadata": {
1617 "slideshow": {
1618 "slide_type": "slide"
1619 }
1620 },
1621 "outputs": [
1622 {
1623 "data": {
1624 "text/plain": [
1625 "A 2.147829\n",
1626 "B -0.991826\n",
1627 "Name: 2013-01-02 00:00:00, dtype: float64"
1628 ]
1629 },
1630 "execution_count": 30,
1631 "metadata": {},
1632 "output_type": "execute_result"
1633 }
1634 ],
1635 "source": [
1636 "df2.loc['20130102', ['A', 'B']]"
1637 ]
1638 },
1639 {
1640 "cell_type": "code",
1641 "execution_count": 31,
1642 "metadata": {
1643 "slideshow": {
1644 "slide_type": "slide"
1645 }
1646 },
1647 "outputs": [
1648 {
1649 "data": {
1650 "text/plain": [
1651 "A -1.185011\n",
1652 "B 1.988697\n",
1653 "C -0.770427\n",
1654 "D -0.472499\n",
1655 "Name: 2013-01-04 00:00:00, dtype: float64"
1656 ]
1657 },
1658 "execution_count": 31,
1659 "metadata": {},
1660 "output_type": "execute_result"
1661 }
1662 ],
1663 "source": [
1664 "df2.iloc[3]"
1665 ]
1666 },
1667 {
1668 "cell_type": "code",
1669 "execution_count": 32,
1670 "metadata": {
1671 "slideshow": {
1672 "slide_type": "slide"
1673 }
1674 },
1675 "outputs": [
1676 {
1677 "data": {
1678 "text/html": [
1679 "<div>\n",
1680 "<style scoped>\n",
1681 " .dataframe tbody tr th:only-of-type {\n",
1682 " vertical-align: middle;\n",
1683 " }\n",
1684 "\n",
1685 " .dataframe tbody tr th {\n",
1686 " vertical-align: top;\n",
1687 " }\n",
1688 "\n",
1689 " .dataframe thead th {\n",
1690 " text-align: right;\n",
1691 " }\n",
1692 "</style>\n",
1693 "<table border=\"1\" class=\"dataframe\">\n",
1694 " <thead>\n",
1695 " <tr style=\"text-align: right;\">\n",
1696 " <th></th>\n",
1697 " <th>A</th>\n",
1698 " <th>B</th>\n",
1699 " </tr>\n",
1700 " </thead>\n",
1701 " <tbody>\n",
1702 " <tr>\n",
1703 " <th>2013-01-04</th>\n",
1704 " <td>-1.185011</td>\n",
1705 " <td>1.988697</td>\n",
1706 " </tr>\n",
1707 " <tr>\n",
1708 " <th>2013-01-05</th>\n",
1709 " <td>-0.359634</td>\n",
1710 " <td>0.338176</td>\n",
1711 " </tr>\n",
1712 " </tbody>\n",
1713 "</table>\n",
1714 "</div>"
1715 ],
1716 "text/plain": [
1717 " A B\n",
1718 "2013-01-04 -1.185011 1.988697\n",
1719 "2013-01-05 -0.359634 0.338176"
1720 ]
1721 },
1722 "execution_count": 32,
1723 "metadata": {},
1724 "output_type": "execute_result"
1725 }
1726 ],
1727 "source": [
1728 "df2.iloc[3:5, 0:2]"
1729 ]
1730 },
1731 {
1732 "cell_type": "markdown",
1733 "metadata": {
1734 "slideshow": {
1735 "slide_type": "slide"
1736 }
1737 },
1738 "source": [
1739 "#### Indexación condicional\n",
1740 "\n",
1741 "Se puede acceder a las columnas que cumplan una condición concreta, indicando la condición en el selector."
1742 ]
1743 },
1744 {
1745 "cell_type": "code",
1746 "execution_count": 33,
1747 "metadata": {
1748 "slideshow": {
1749 "slide_type": "slide"
1750 }
1751 },
1752 "outputs": [
1753 {
1754 "data": {
1755 "text/html": [
1756 "<div>\n",
1757 "<style scoped>\n",
1758 " .dataframe tbody tr th:only-of-type {\n",
1759 " vertical-align: middle;\n",
1760 " }\n",
1761 "\n",
1762 " .dataframe tbody tr th {\n",
1763 " vertical-align: top;\n",
1764 " }\n",
1765 "\n",
1766 " .dataframe thead th {\n",
1767 " text-align: right;\n",
1768 " }\n",
1769 "</style>\n",
1770 "<table border=\"1\" class=\"dataframe\">\n",
1771 " <thead>\n",
1772 " <tr style=\"text-align: right;\">\n",
1773 " <th></th>\n",
1774 " <th>A</th>\n",
1775 " <th>B</th>\n",
1776 " <th>C</th>\n",
1777 " <th>D</th>\n",
1778 " </tr>\n",
1779 " </thead>\n",
1780 " <tbody>\n",
1781 " <tr>\n",
1782 " <th>2013-01-02</th>\n",
1783 " <td>2.147829</td>\n",
1784 " <td>-0.991826</td>\n",
1785 " <td>-1.004833</td>\n",
1786 " <td>0.168517</td>\n",
1787 " </tr>\n",
1788 " <tr>\n",
1789 " <th>2013-01-03</th>\n",
1790 " <td>0.398068</td>\n",
1791 " <td>-0.536610</td>\n",
1792 " <td>-0.773990</td>\n",
1793 " <td>-1.075894</td>\n",
1794 " </tr>\n",
1795 " </tbody>\n",
1796 "</table>\n",
1797 "</div>"
1798 ],
1799 "text/plain": [
1800 " A B C D\n",
1801 "2013-01-02 2.147829 -0.991826 -1.004833 0.168517\n",
1802 "2013-01-03 0.398068 -0.536610 -0.773990 -1.075894"
1803 ]
1804 },
1805 "execution_count": 33,
1806 "metadata": {},
1807 "output_type": "execute_result"
1808 }
1809 ],
1810 "source": [
1811 "df2[df2.A > 0]"
1812 ]
1813 },
1814 {
1815 "cell_type": "markdown",
1816 "metadata": {
1817 "slideshow": {
1818 "slide_type": "slide"
1819 }
1820 },
1821 "source": [
1822 "### Operaciones"
1823 ]
1824 },
1825 {
1826 "cell_type": "markdown",
1827 "metadata": {
1828 "slideshow": {
1829 "slide_type": "slide"
1830 }
1831 },
1832 "source": [
1833 "Se pueden realizar operaciones estadísticas básicas llamando a los métodos correspondientes."
1834 ]
1835 },
1836 {
1837 "cell_type": "code",
1838 "execution_count": 34,
1839 "metadata": {
1840 "slideshow": {
1841 "slide_type": "fragment"
1842 }
1843 },
1844 "outputs": [
1845 {
1846 "data": {
1847 "text/plain": [
1848 "A -0.039005\n",
1849 "B 0.224873\n",
1850 "C -0.824459\n",
1851 "D -0.147582\n",
1852 "dtype: float64"
1853 ]
1854 },
1855 "execution_count": 34,
1856 "metadata": {},
1857 "output_type": "execute_result"
1858 }
1859 ],
1860 "source": [
1861 "df2.mean()"
1862 ]
1863 },
1864 {
1865 "cell_type": "code",
1866 "execution_count": 35,
1867 "metadata": {
1868 "slideshow": {
1869 "slide_type": "fragment"
1870 }
1871 },
1872 "outputs": [
1873 {
1874 "data": {
1875 "text/plain": [
1876 "-0.03900473868952752"
1877 ]
1878 },
1879 "execution_count": 35,
1880 "metadata": {},
1881 "output_type": "execute_result"
1882 }
1883 ],
1884 "source": [
1885 "df2['A'].mean()"
1886 ]
1887 },
1888 {
1889 "cell_type": "code",
1890 "execution_count": 36,
1891 "metadata": {
1892 "slideshow": {
1893 "slide_type": "slide"
1894 }
1895 },
1896 "outputs": [
1897 {
1898 "data": {
1899 "text/plain": [
1900 "2013-01-01 -0.410858\n",
1901 "2013-01-02 0.079922\n",
1902 "2013-01-03 -0.497107\n",
1903 "2013-01-04 -0.109810\n",
1904 "2013-01-05 0.110859\n",
1905 "2013-01-06 -0.352266\n",
1906 "Freq: D, dtype: float64"
1907 ]
1908 },
1909 "execution_count": 36,
1910 "metadata": {},
1911 "output_type": "execute_result"
1912 }
1913 ],
1914 "source": [
1915 "df2.mean(axis=1)"
1916 ]
1917 },
1918 {
1919 "cell_type": "markdown",
1920 "metadata": {
1921 "slideshow": {
1922 "slide_type": "slide"
1923 }
1924 },
1925 "source": [
1926 "Se pueden aplicar funciones a los datos."
1927 ]
1928 },
1929 {
1930 "cell_type": "code",
1931 "execution_count": 37,
1932 "metadata": {
1933 "slideshow": {
1934 "slide_type": "slide"
1935 }
1936 },
1937 "outputs": [
1938 {
1939 "data": {
1940 "text/html": [
1941 "<div>\n",
1942 "<style scoped>\n",
1943 " .dataframe tbody tr th:only-of-type {\n",
1944 " vertical-align: middle;\n",
1945 " }\n",
1946 "\n",
1947 " .dataframe tbody tr th {\n",
1948 " vertical-align: top;\n",
1949 " }\n",
1950 "\n",
1951 " .dataframe thead th {\n",
1952 " text-align: right;\n",
1953 " }\n",
1954 "</style>\n",
1955 "<table border=\"1\" class=\"dataframe\">\n",
1956 " <thead>\n",
1957 " <tr style=\"text-align: right;\">\n",
1958 " <th></th>\n",
1959 " <th>A</th>\n",
1960 " <th>B</th>\n",
1961 " <th>C</th>\n",
1962 " <th>D</th>\n",
1963 " </tr>\n",
1964 " </thead>\n",
1965 " <tbody>\n",
1966 " <tr>\n",
1967 " <th>2013-01-01</th>\n",
1968 " <td>-0.679399</td>\n",
1969 " <td>-0.564244</td>\n",
1970 " <td>-0.395166</td>\n",
1971 " <td>-0.004622</td>\n",
1972 " </tr>\n",
1973 " <tr>\n",
1974 " <th>2013-01-02</th>\n",
1975 " <td>1.468429</td>\n",
1976 " <td>-1.556070</td>\n",
1977 " <td>-1.400000</td>\n",
1978 " <td>0.163895</td>\n",
1979 " </tr>\n",
1980 " <tr>\n",
1981 " <th>2013-01-03</th>\n",
1982 " <td>1.866497</td>\n",
1983 " <td>-2.092680</td>\n",
1984 " <td>-2.173990</td>\n",
1985 " <td>-0.911998</td>\n",
1986 " </tr>\n",
1987 " <tr>\n",
1988 " <th>2013-01-04</th>\n",
1989 " <td>0.681486</td>\n",
1990 " <td>-0.103983</td>\n",
1991 " <td>-2.944416</td>\n",
1992 " <td>-1.384497</td>\n",
1993 " </tr>\n",
1994 " <tr>\n",
1995 " <th>2013-01-05</th>\n",
1996 " <td>0.321852</td>\n",
1997 " <td>0.234193</td>\n",
1998 " <td>-2.838630</td>\n",
1999 " <td>-1.025391</td>\n",
2000 " </tr>\n",
2001 " <tr>\n",
2002 " <th>2013-01-06</th>\n",
2003 " <td>-0.234028</td>\n",
2004 " <td>1.349237</td>\n",
2005 " <td>-4.946756</td>\n",
2006 " <td>-0.885495</td>\n",
2007 " </tr>\n",
2008 " </tbody>\n",
2009 "</table>\n",
2010 "</div>"
2011 ],
2012 "text/plain": [
2013 " A B C D\n",
2014 "2013-01-01 -0.679399 -0.564244 -0.395166 -0.004622\n",
2015 "2013-01-02 1.468429 -1.556070 -1.400000 0.163895\n",
2016 "2013-01-03 1.866497 -2.092680 -2.173990 -0.911998\n",
2017 "2013-01-04 0.681486 -0.103983 -2.944416 -1.384497\n",
2018 "2013-01-05 0.321852 0.234193 -2.838630 -1.025391\n",
2019 "2013-01-06 -0.234028 1.349237 -4.946756 -0.885495"
2020 ]
2021 },
2022 "execution_count": 37,
2023 "metadata": {},
2024 "output_type": "execute_result"
2025 }
2026 ],
2027 "source": [
2028 "df2.apply(np.cumsum)"
2029 ]
2030 },
2031 {
2032 "cell_type": "code",
2033 "execution_count": 40,
2034 "metadata": {
2035 "slideshow": {
2036 "slide_type": "slide"
2037 }
2038 },
2039 "outputs": [
2040 {
2041 "data": {
2042 "text/html": [
2043 "<div>\n",
2044 "<style scoped>\n",
2045 " .dataframe tbody tr th:only-of-type {\n",
2046 " vertical-align: middle;\n",
2047 " }\n",
2048 "\n",
2049 " .dataframe tbody tr th {\n",
2050 " vertical-align: top;\n",
2051 " }\n",
2052 "\n",
2053 " .dataframe thead th {\n",
2054 " text-align: right;\n",
2055 " }\n",
2056 "</style>\n",
2057 "<table border=\"1\" class=\"dataframe\">\n",
2058 " <thead>\n",
2059 " <tr style=\"text-align: right;\">\n",
2060 " <th></th>\n",
2061 " <th>A</th>\n",
2062 " <th>B</th>\n",
2063 " <th>C</th>\n",
2064 " <th>D</th>\n",
2065 " <th>E</th>\n",
2066 " </tr>\n",
2067 " </thead>\n",
2068 " <tbody>\n",
2069 " <tr>\n",
2070 " <th>2013-01-01</th>\n",
2071 " <td>-0.679399</td>\n",
2072 " <td>-0.564244</td>\n",
2073 " <td>-0.395166</td>\n",
2074 " <td>-0.004622</td>\n",
2075 " <td>0.674777</td>\n",
2076 " </tr>\n",
2077 " <tr>\n",
2078 " <th>2013-01-02</th>\n",
2079 " <td>2.147829</td>\n",
2080 " <td>-0.991826</td>\n",
2081 " <td>-1.004833</td>\n",
2082 " <td>0.168517</td>\n",
2083 " <td>3.152662</td>\n",
2084 " </tr>\n",
2085 " <tr>\n",
2086 " <th>2013-01-03</th>\n",
2087 " <td>0.398068</td>\n",
2088 " <td>-0.536610</td>\n",
2089 " <td>-0.773990</td>\n",
2090 " <td>-1.075894</td>\n",
2091 " <td>1.473961</td>\n",
2092 " </tr>\n",
2093 " <tr>\n",
2094 " <th>2013-01-04</th>\n",
2095 " <td>-1.185011</td>\n",
2096 " <td>1.988697</td>\n",
2097 " <td>-0.770427</td>\n",
2098 " <td>-0.472499</td>\n",
2099 " <td>3.173708</td>\n",
2100 " </tr>\n",
2101 " <tr>\n",
2102 " <th>2013-01-05</th>\n",
2103 " <td>-0.359634</td>\n",
2104 " <td>0.338176</td>\n",
2105 " <td>0.105786</td>\n",
2106 " <td>0.359107</td>\n",
2107 " <td>0.718741</td>\n",
2108 " </tr>\n",
2109 " <tr>\n",
2110 " <th>2013-01-06</th>\n",
2111 " <td>-0.555880</td>\n",
2112 " <td>1.115044</td>\n",
2113 " <td>-2.108126</td>\n",
2114 " <td>0.139896</td>\n",
2115 " <td>3.223170</td>\n",
2116 " </tr>\n",
2117 " </tbody>\n",
2118 "</table>\n",
2119 "</div>"
2120 ],
2121 "text/plain": [
2122 " A B C D E\n",
2123 "2013-01-01 -0.679399 -0.564244 -0.395166 -0.004622 0.674777\n",
2124 "2013-01-02 2.147829 -0.991826 -1.004833 0.168517 3.152662\n",
2125 "2013-01-03 0.398068 -0.536610 -0.773990 -1.075894 1.473961\n",
2126 "2013-01-04 -1.185011 1.988697 -0.770427 -0.472499 3.173708\n",
2127 "2013-01-05 -0.359634 0.338176 0.105786 0.359107 0.718741\n",
2128 "2013-01-06 -0.555880 1.115044 -2.108126 0.139896 3.223170"
2129 ]
2130 },
2131 "execution_count": 40,
2132 "metadata": {},
2133 "output_type": "execute_result"
2134 }
2135 ],
2136 "source": [
2137 "c = df2.apply(lambda x: x.max() - x.min(), axis=1)\n",
2138 "df2['E'] = c\n",
2139 "df2"
2140 ]
2141 },
2142 {
2143 "cell_type": "code",
2144 "execution_count": 41,
2145 "metadata": {
2146 "slideshow": {
2147 "slide_type": "fragment"
2148 }
2149 },
2150 "outputs": [
2151 {
2152 "data": {
2153 "text/plain": [
2154 "2013-01-01 1.354177\n",
2155 "2013-01-02 4.157495\n",
2156 "2013-01-03 2.549855\n",
2157 "2013-01-04 4.358719\n",
2158 "2013-01-05 1.078375\n",
2159 "2013-01-06 5.331296\n",
2160 "Freq: D, dtype: float64"
2161 ]
2162 },
2163 "execution_count": 41,
2164 "metadata": {},
2165 "output_type": "execute_result"
2166 }
2167 ],
2168 "source": [
2169 "df2.apply(lambda x: x.max() - x.min(), axis=1)"
2170 ]
2171 },
2172 {
2173 "cell_type": "markdown",
2174 "metadata": {
2175 "slideshow": {
2176 "slide_type": "slide"
2177 }
2178 },
2179 "source": [
2180 "### Uniones\n",
2181 "\n",
2182 "La librería `pandas` proporciona diferentes métodos para la unión de Series o DataFrame."
2183 ]
2184 },
2185 {
2186 "cell_type": "markdown",
2187 "metadata": {
2188 "slideshow": {
2189 "slide_type": "slide"
2190 }
2191 },
2192 "source": [
2193 "#### Concat"
2194 ]
2195 },
2196 {
2197 "cell_type": "code",
2198 "execution_count": 42,
2199 "metadata": {
2200 "slideshow": {
2201 "slide_type": "slide"
2202 }
2203 },
2204 "outputs": [
2205 {
2206 "data": {
2207 "text/html": [
2208 "<div>\n",
2209 "<style scoped>\n",
2210 " .dataframe tbody tr th:only-of-type {\n",
2211 " vertical-align: middle;\n",
2212 " }\n",
2213 "\n",
2214 " .dataframe tbody tr th {\n",
2215 " vertical-align: top;\n",
2216 " }\n",
2217 "\n",
2218 " .dataframe thead th {\n",
2219 " text-align: right;\n",
2220 " }\n",
2221 "</style>\n",
2222 "<table border=\"1\" class=\"dataframe\">\n",
2223 " <thead>\n",
2224 " <tr style=\"text-align: right;\">\n",
2225 " <th></th>\n",
2226 " <th>0</th>\n",
2227 " <th>1</th>\n",
2228 " <th>2</th>\n",
2229 " <th>3</th>\n",
2230 " </tr>\n",
2231 " </thead>\n",
2232 " <tbody>\n",
2233 " <tr>\n",
2234 " <th>0</th>\n",
2235 " <td>-0.634450</td>\n",
2236 " <td>0.763724</td>\n",
2237 " <td>0.710228</td>\n",
2238 " <td>-0.694768</td>\n",
2239 " </tr>\n",
2240 " <tr>\n",
2241 " <th>1</th>\n",
2242 " <td>-0.142616</td>\n",
2243 " <td>1.630704</td>\n",
2244 " <td>1.029687</td>\n",
2245 " <td>-1.008484</td>\n",
2246 " </tr>\n",
2247 " <tr>\n",
2248 " <th>2</th>\n",
2249 " <td>-0.344466</td>\n",
2250 " <td>-0.222917</td>\n",
2251 " <td>0.294177</td>\n",
2252 " <td>-0.859483</td>\n",
2253 " </tr>\n",
2254 " <tr>\n",
2255 " <th>3</th>\n",
2256 " <td>1.012883</td>\n",
2257 " <td>-0.369916</td>\n",
2258 " <td>-0.552784</td>\n",
2259 " <td>1.356238</td>\n",
2260 " </tr>\n",
2261 " <tr>\n",
2262 " <th>4</th>\n",
2263 " <td>-0.167002</td>\n",
2264 " <td>1.677076</td>\n",
2265 " <td>-0.454767</td>\n",
2266 " <td>1.183958</td>\n",
2267 " </tr>\n",
2268 " <tr>\n",
2269 " <th>5</th>\n",
2270 " <td>-0.528190</td>\n",
2271 " <td>-0.912389</td>\n",
2272 " <td>0.786753</td>\n",
2273 " <td>1.043857</td>\n",
2274 " </tr>\n",
2275 " <tr>\n",
2276 " <th>6</th>\n",
2277 " <td>0.527898</td>\n",
2278 " <td>-0.379471</td>\n",
2279 " <td>1.537252</td>\n",
2280 " <td>-1.050597</td>\n",
2281 " </tr>\n",
2282 " <tr>\n",
2283 " <th>7</th>\n",
2284 " <td>-0.352473</td>\n",
2285 " <td>-1.825571</td>\n",
2286 " <td>0.186576</td>\n",
2287 " <td>0.977988</td>\n",
2288 " </tr>\n",
2289 " <tr>\n",
2290 " <th>8</th>\n",
2291 " <td>0.991172</td>\n",
2292 " <td>-0.030169</td>\n",
2293 " <td>-1.816031</td>\n",
2294 " <td>0.601092</td>\n",
2295 " </tr>\n",
2296 " <tr>\n",
2297 " <th>9</th>\n",
2298 " <td>1.522968</td>\n",
2299 " <td>0.440188</td>\n",
2300 " <td>-1.763289</td>\n",
2301 " <td>1.840091</td>\n",
2302 " </tr>\n",
2303 " </tbody>\n",
2304 "</table>\n",
2305 "</div>"
2306 ],
2307 "text/plain": [
2308 " 0 1 2 3\n",
2309 "0 -0.634450 0.763724 0.710228 -0.694768\n",
2310 "1 -0.142616 1.630704 1.029687 -1.008484\n",
2311 "2 -0.344466 -0.222917 0.294177 -0.859483\n",
2312 "3 1.012883 -0.369916 -0.552784 1.356238\n",
2313 "4 -0.167002 1.677076 -0.454767 1.183958\n",
2314 "5 -0.528190 -0.912389 0.786753 1.043857\n",
2315 "6 0.527898 -0.379471 1.537252 -1.050597\n",
2316 "7 -0.352473 -1.825571 0.186576 0.977988\n",
2317 "8 0.991172 -0.030169 -1.816031 0.601092\n",
2318 "9 1.522968 0.440188 -1.763289 1.840091"
2319 ]
2320 },
2321 "execution_count": 42,
2322 "metadata": {},
2323 "output_type": "execute_result"
2324 }
2325 ],
2326 "source": [
2327 "df = pd.DataFrame(np.random.randn(10, 4))\n",
2328 "df"
2329 ]
2330 },
2331 {
2332 "cell_type": "code",
2333 "execution_count": 44,
2334 "metadata": {
2335 "slideshow": {
2336 "slide_type": "slide"
2337 }
2338 },
2339 "outputs": [
2340 {
2341 "data": {
2342 "text/plain": [
2343 "[ 0 1 2 3\n",
2344 " 0 -0.634450 0.763724 0.710228 -0.694768\n",
2345 " 1 -0.142616 1.630704 1.029687 -1.008484\n",
2346 " 2 -0.344466 -0.222917 0.294177 -0.859483,\n",
2347 " 0 1 2 3\n",
2348 " 3 1.012883 -0.369916 -0.552784 1.356238\n",
2349 " 4 -0.167002 1.677076 -0.454767 1.183958\n",
2350 " 5 -0.528190 -0.912389 0.786753 1.043857\n",
2351 " 6 0.527898 -0.379471 1.537252 -1.050597,\n",
2352 " 0 1 2 3\n",
2353 " 7 -0.352473 -1.825571 0.186576 0.977988\n",
2354 " 8 0.991172 -0.030169 -1.816031 0.601092\n",
2355 " 9 1.522968 0.440188 -1.763289 1.840091]"
2356 ]
2357 },
2358 "execution_count": 44,
2359 "metadata": {},
2360 "output_type": "execute_result"
2361 }
2362 ],
2363 "source": [
2364 "pieces = [df[:3], df[3:7], df[7:]]\n",
2365 "pieces"
2366 ]
2367 },
2368 {
2369 "cell_type": "code",
2370 "execution_count": 45,
2371 "metadata": {
2372 "slideshow": {
2373 "slide_type": "slide"
2374 }
2375 },
2376 "outputs": [
2377 {
2378 "data": {
2379 "text/html": [
2380 "<div>\n",
2381 "<style scoped>\n",
2382 " .dataframe tbody tr th:only-of-type {\n",
2383 " vertical-align: middle;\n",
2384 " }\n",
2385 "\n",
2386 " .dataframe tbody tr th {\n",
2387 " vertical-align: top;\n",
2388 " }\n",
2389 "\n",
2390 " .dataframe thead th {\n",
2391 " text-align: right;\n",
2392 " }\n",
2393 "</style>\n",
2394 "<table border=\"1\" class=\"dataframe\">\n",
2395 " <thead>\n",
2396 " <tr style=\"text-align: right;\">\n",
2397 " <th></th>\n",
2398 " <th>0</th>\n",
2399 " <th>1</th>\n",
2400 " <th>2</th>\n",
2401 " <th>3</th>\n",
2402 " </tr>\n",
2403 " </thead>\n",
2404 " <tbody>\n",
2405 " <tr>\n",
2406 " <th>0</th>\n",
2407 " <td>-0.634450</td>\n",
2408 " <td>0.763724</td>\n",
2409 " <td>0.710228</td>\n",
2410 " <td>-0.694768</td>\n",
2411 " </tr>\n",
2412 " <tr>\n",
2413 " <th>1</th>\n",
2414 " <td>-0.142616</td>\n",
2415 " <td>1.630704</td>\n",
2416 " <td>1.029687</td>\n",
2417 " <td>-1.008484</td>\n",
2418 " </tr>\n",
2419 " <tr>\n",
2420 " <th>2</th>\n",
2421 " <td>-0.344466</td>\n",
2422 " <td>-0.222917</td>\n",
2423 " <td>0.294177</td>\n",
2424 " <td>-0.859483</td>\n",
2425 " </tr>\n",
2426 " <tr>\n",
2427 " <th>3</th>\n",
2428 " <td>1.012883</td>\n",
2429 " <td>-0.369916</td>\n",
2430 " <td>-0.552784</td>\n",
2431 " <td>1.356238</td>\n",
2432 " </tr>\n",
2433 " <tr>\n",
2434 " <th>4</th>\n",
2435 " <td>-0.167002</td>\n",
2436 " <td>1.677076</td>\n",
2437 " <td>-0.454767</td>\n",
2438 " <td>1.183958</td>\n",
2439 " </tr>\n",
2440 " <tr>\n",
2441 " <th>5</th>\n",
2442 " <td>-0.528190</td>\n",
2443 " <td>-0.912389</td>\n",
2444 " <td>0.786753</td>\n",
2445 " <td>1.043857</td>\n",
2446 " </tr>\n",
2447 " <tr>\n",
2448 " <th>6</th>\n",
2449 " <td>0.527898</td>\n",
2450 " <td>-0.379471</td>\n",
2451 " <td>1.537252</td>\n",
2452 " <td>-1.050597</td>\n",
2453 " </tr>\n",
2454 " <tr>\n",
2455 " <th>7</th>\n",
2456 " <td>-0.352473</td>\n",
2457 " <td>-1.825571</td>\n",
2458 " <td>0.186576</td>\n",
2459 " <td>0.977988</td>\n",
2460 " </tr>\n",
2461 " <tr>\n",
2462 " <th>8</th>\n",
2463 " <td>0.991172</td>\n",
2464 " <td>-0.030169</td>\n",
2465 " <td>-1.816031</td>\n",
2466 " <td>0.601092</td>\n",
2467 " </tr>\n",
2468 " <tr>\n",
2469 " <th>9</th>\n",
2470 " <td>1.522968</td>\n",
2471 " <td>0.440188</td>\n",
2472 " <td>-1.763289</td>\n",
2473 " <td>1.840091</td>\n",
2474 " </tr>\n",
2475 " </tbody>\n",
2476 "</table>\n",
2477 "</div>"
2478 ],
2479 "text/plain": [
2480 " 0 1 2 3\n",
2481 "0 -0.634450 0.763724 0.710228 -0.694768\n",
2482 "1 -0.142616 1.630704 1.029687 -1.008484\n",
2483 "2 -0.344466 -0.222917 0.294177 -0.859483\n",
2484 "3 1.012883 -0.369916 -0.552784 1.356238\n",
2485 "4 -0.167002 1.677076 -0.454767 1.183958\n",
2486 "5 -0.528190 -0.912389 0.786753 1.043857\n",
2487 "6 0.527898 -0.379471 1.537252 -1.050597\n",
2488 "7 -0.352473 -1.825571 0.186576 0.977988\n",
2489 "8 0.991172 -0.030169 -1.816031 0.601092\n",
2490 "9 1.522968 0.440188 -1.763289 1.840091"
2491 ]
2492 },
2493 "execution_count": 45,
2494 "metadata": {},
2495 "output_type": "execute_result"
2496 }
2497 ],
2498 "source": [
2499 "pd.concat(pieces)"
2500 ]
2501 },
2502 {
2503 "cell_type": "markdown",
2504 "metadata": {
2505 "slideshow": {
2506 "slide_type": "slide"
2507 }
2508 },
2509 "source": [
2510 "#### Join"
2511 ]
2512 },
2513 {
2514 "cell_type": "code",
2515 "execution_count": 46,
2516 "metadata": {
2517 "slideshow": {
2518 "slide_type": "slide"
2519 }
2520 },
2521 "outputs": [
2522 {
2523 "data": {
2524 "text/html": [
2525 "<div>\n",
2526 "<style scoped>\n",
2527 " .dataframe tbody tr th:only-of-type {\n",
2528 " vertical-align: middle;\n",
2529 " }\n",
2530 "\n",
2531 " .dataframe tbody tr th {\n",
2532 " vertical-align: top;\n",
2533 " }\n",
2534 "\n",
2535 " .dataframe thead th {\n",
2536 " text-align: right;\n",
2537 " }\n",
2538 "</style>\n",
2539 "<table border=\"1\" class=\"dataframe\">\n",
2540 " <thead>\n",
2541 " <tr style=\"text-align: right;\">\n",
2542 " <th></th>\n",
2543 " <th>key</th>\n",
2544 " <th>lval</th>\n",
2545 " </tr>\n",
2546 " </thead>\n",
2547 " <tbody>\n",
2548 " <tr>\n",
2549 " <th>0</th>\n",
2550 " <td>foo</td>\n",
2551 " <td>1</td>\n",
2552 " </tr>\n",
2553 " <tr>\n",
2554 " <th>1</th>\n",
2555 " <td>foo</td>\n",
2556 " <td>2</td>\n",
2557 " </tr>\n",
2558 " </tbody>\n",
2559 "</table>\n",
2560 "</div>"
2561 ],
2562 "text/plain": [
2563 " key lval\n",
2564 "0 foo 1\n",
2565 "1 foo 2"
2566 ]
2567 },
2568 "execution_count": 46,
2569 "metadata": {},
2570 "output_type": "execute_result"
2571 }
2572 ],
2573 "source": [
2574 "left = pd.DataFrame({'key': ['foo', 'foo'], 'lval': [1, 2]})\n",
2575 "left"
2576 ]
2577 },
2578 {
2579 "cell_type": "code",
2580 "execution_count": 48,
2581 "metadata": {
2582 "slideshow": {
2583 "slide_type": "slide"
2584 }
2585 },
2586 "outputs": [
2587 {
2588 "data": {
2589 "text/html": [
2590 "<div>\n",
2591 "<style scoped>\n",
2592 " .dataframe tbody tr th:only-of-type {\n",
2593 " vertical-align: middle;\n",
2594 " }\n",
2595 "\n",
2596 " .dataframe tbody tr th {\n",
2597 " vertical-align: top;\n",
2598 " }\n",
2599 "\n",
2600 " .dataframe thead th {\n",
2601 " text-align: right;\n",
2602 " }\n",
2603 "</style>\n",
2604 "<table border=\"1\" class=\"dataframe\">\n",
2605 " <thead>\n",
2606 " <tr style=\"text-align: right;\">\n",
2607 " <th></th>\n",
2608 " <th>key</th>\n",
2609 " <th>rval</th>\n",
2610 " </tr>\n",
2611 " </thead>\n",
2612 " <tbody>\n",
2613 " <tr>\n",
2614 " <th>0</th>\n",
2615 " <td>foo</td>\n",
2616 " <td>4</td>\n",
2617 " </tr>\n",
2618 " <tr>\n",
2619 " <th>1</th>\n",
2620 " <td>foo</td>\n",
2621 " <td>5</td>\n",
2622 " </tr>\n",
2623 " </tbody>\n",
2624 "</table>\n",
2625 "</div>"
2626 ],
2627 "text/plain": [
2628 " key rval\n",
2629 "0 foo 4\n",
2630 "1 foo 5"
2631 ]
2632 },
2633 "execution_count": 48,
2634 "metadata": {},
2635 "output_type": "execute_result"
2636 }
2637 ],
2638 "source": [
2639 "right = pd.DataFrame({'key': ['foo', 'foo'], 'rval': [4, 5]})\n",
2640 "right"
2641 ]
2642 },
2643 {
2644 "cell_type": "code",
2645 "execution_count": 49,
2646 "metadata": {
2647 "slideshow": {
2648 "slide_type": "slide"
2649 }
2650 },
2651 "outputs": [
2652 {
2653 "data": {
2654 "text/html": [
2655 "<div>\n",
2656 "<style scoped>\n",
2657 " .dataframe tbody tr th:only-of-type {\n",
2658 " vertical-align: middle;\n",
2659 " }\n",
2660 "\n",
2661 " .dataframe tbody tr th {\n",
2662 " vertical-align: top;\n",
2663 " }\n",
2664 "\n",
2665 " .dataframe thead th {\n",
2666 " text-align: right;\n",
2667 " }\n",
2668 "</style>\n",
2669 "<table border=\"1\" class=\"dataframe\">\n",
2670 " <thead>\n",
2671 " <tr style=\"text-align: right;\">\n",
2672 " <th></th>\n",
2673 " <th>key</th>\n",
2674 " <th>lval</th>\n",
2675 " <th>rval</th>\n",
2676 " </tr>\n",
2677 " </thead>\n",
2678 " <tbody>\n",
2679 " <tr>\n",
2680 " <th>0</th>\n",
2681 " <td>foo</td>\n",
2682 " <td>1</td>\n",
2683 " <td>4</td>\n",
2684 " </tr>\n",
2685 " <tr>\n",
2686 " <th>1</th>\n",
2687 " <td>foo</td>\n",
2688 " <td>1</td>\n",
2689 " <td>5</td>\n",
2690 " </tr>\n",
2691 " <tr>\n",
2692 " <th>2</th>\n",
2693 " <td>foo</td>\n",
2694 " <td>2</td>\n",
2695 " <td>4</td>\n",
2696 " </tr>\n",
2697 " <tr>\n",
2698 " <th>3</th>\n",
2699 " <td>foo</td>\n",
2700 " <td>2</td>\n",
2701 " <td>5</td>\n",
2702 " </tr>\n",
2703 " </tbody>\n",
2704 "</table>\n",
2705 "</div>"
2706 ],
2707 "text/plain": [
2708 " key lval rval\n",
2709 "0 foo 1 4\n",
2710 "1 foo 1 5\n",
2711 "2 foo 2 4\n",
2712 "3 foo 2 5"
2713 ]
2714 },
2715 "execution_count": 49,
2716 "metadata": {},
2717 "output_type": "execute_result"
2718 }
2719 ],
2720 "source": [
2721 "pd.merge(left, right, on='key')"
2722 ]
2723 },
2724 {
2725 "cell_type": "markdown",
2726 "metadata": {
2727 "slideshow": {
2728 "slide_type": "slide"
2729 }
2730 },
2731 "source": [
2732 "#### Append"
2733 ]
2734 },
2735 {
2736 "cell_type": "code",
2737 "execution_count": 50,
2738 "metadata": {
2739 "slideshow": {
2740 "slide_type": "slide"
2741 }
2742 },
2743 "outputs": [
2744 {
2745 "data": {
2746 "text/html": [
2747 "<div>\n",
2748 "<style scoped>\n",
2749 " .dataframe tbody tr th:only-of-type {\n",
2750 " vertical-align: middle;\n",
2751 " }\n",
2752 "\n",
2753 " .dataframe tbody tr th {\n",
2754 " vertical-align: top;\n",
2755 " }\n",
2756 "\n",
2757 " .dataframe thead th {\n",
2758 " text-align: right;\n",
2759 " }\n",
2760 "</style>\n",
2761 "<table border=\"1\" class=\"dataframe\">\n",
2762 " <thead>\n",
2763 " <tr style=\"text-align: right;\">\n",
2764 " <th></th>\n",
2765 " <th>A</th>\n",
2766 " <th>B</th>\n",
2767 " <th>C</th>\n",
2768 " <th>D</th>\n",
2769 " </tr>\n",
2770 " </thead>\n",
2771 " <tbody>\n",
2772 " <tr>\n",
2773 " <th>0</th>\n",
2774 " <td>-0.362774</td>\n",
2775 " <td>-0.573908</td>\n",
2776 " <td>0.098044</td>\n",
2777 " <td>1.992482</td>\n",
2778 " </tr>\n",
2779 " <tr>\n",
2780 " <th>1</th>\n",
2781 " <td>1.437667</td>\n",
2782 " <td>0.940580</td>\n",
2783 " <td>-0.355047</td>\n",
2784 " <td>-0.142454</td>\n",
2785 " </tr>\n",
2786 " <tr>\n",
2787 " <th>2</th>\n",
2788 " <td>-1.097556</td>\n",
2789 " <td>-0.593504</td>\n",
2790 " <td>-1.313146</td>\n",
2791 " <td>-0.490131</td>\n",
2792 " </tr>\n",
2793 " <tr>\n",
2794 " <th>3</th>\n",
2795 " <td>1.028989</td>\n",
2796 " <td>0.098031</td>\n",
2797 " <td>0.881277</td>\n",
2798 " <td>0.426499</td>\n",
2799 " </tr>\n",
2800 " <tr>\n",
2801 " <th>4</th>\n",
2802 " <td>-0.589829</td>\n",
2803 " <td>-0.331404</td>\n",
2804 " <td>0.692164</td>\n",
2805 " <td>0.456827</td>\n",
2806 " </tr>\n",
2807 " <tr>\n",
2808 " <th>5</th>\n",
2809 " <td>-0.158751</td>\n",
2810 " <td>-0.199149</td>\n",
2811 " <td>-0.395195</td>\n",
2812 " <td>0.882798</td>\n",
2813 " </tr>\n",
2814 " <tr>\n",
2815 " <th>6</th>\n",
2816 " <td>-0.021648</td>\n",
2817 " <td>0.764384</td>\n",
2818 " <td>0.408657</td>\n",
2819 " <td>-1.262260</td>\n",
2820 " </tr>\n",
2821 " <tr>\n",
2822 " <th>7</th>\n",
2823 " <td>-1.113406</td>\n",
2824 " <td>0.107256</td>\n",
2825 " <td>0.420511</td>\n",
2826 " <td>-0.968303</td>\n",
2827 " </tr>\n",
2828 " </tbody>\n",
2829 "</table>\n",
2830 "</div>"
2831 ],
2832 "text/plain": [
2833 " A B C D\n",
2834 "0 -0.362774 -0.573908 0.098044 1.992482\n",
2835 "1 1.437667 0.940580 -0.355047 -0.142454\n",
2836 "2 -1.097556 -0.593504 -1.313146 -0.490131\n",
2837 "3 1.028989 0.098031 0.881277 0.426499\n",
2838 "4 -0.589829 -0.331404 0.692164 0.456827\n",
2839 "5 -0.158751 -0.199149 -0.395195 0.882798\n",
2840 "6 -0.021648 0.764384 0.408657 -1.262260\n",
2841 "7 -1.113406 0.107256 0.420511 -0.968303"
2842 ]
2843 },
2844 "execution_count": 50,
2845 "metadata": {},
2846 "output_type": "execute_result"
2847 }
2848 ],
2849 "source": [
2850 "df = pd.DataFrame(np.random.randn(8, 4), columns=['A', 'B', 'C', 'D'])\n",
2851 "df"
2852 ]
2853 },
2854 {
2855 "cell_type": "code",
2856 "execution_count": 51,
2857 "metadata": {
2858 "slideshow": {
2859 "slide_type": "slide"
2860 }
2861 },
2862 "outputs": [
2863 {
2864 "data": {
2865 "text/plain": [
2866 "A 1.028989\n",
2867 "B 0.098031\n",
2868 "C 0.881277\n",
2869 "D 0.426499\n",
2870 "Name: 3, dtype: float64"
2871 ]
2872 },
2873 "execution_count": 51,
2874 "metadata": {},
2875 "output_type": "execute_result"
2876 }
2877 ],
2878 "source": [
2879 "s = df.iloc[3]\n",
2880 "s"
2881 ]
2882 },
2883 {
2884 "cell_type": "code",
2885 "execution_count": 52,
2886 "metadata": {
2887 "slideshow": {
2888 "slide_type": "slide"
2889 }
2890 },
2891 "outputs": [
2892 {
2893 "data": {
2894 "text/html": [
2895 "<div>\n",
2896 "<style scoped>\n",
2897 " .dataframe tbody tr th:only-of-type {\n",
2898 " vertical-align: middle;\n",
2899 " }\n",
2900 "\n",
2901 " .dataframe tbody tr th {\n",
2902 " vertical-align: top;\n",
2903 " }\n",
2904 "\n",
2905 " .dataframe thead th {\n",
2906 " text-align: right;\n",
2907 " }\n",
2908 "</style>\n",
2909 "<table border=\"1\" class=\"dataframe\">\n",
2910 " <thead>\n",
2911 " <tr style=\"text-align: right;\">\n",
2912 " <th></th>\n",
2913 " <th>A</th>\n",
2914 " <th>B</th>\n",
2915 " <th>C</th>\n",
2916 " <th>D</th>\n",
2917 " </tr>\n",
2918 " </thead>\n",
2919 " <tbody>\n",
2920 " <tr>\n",
2921 " <th>0</th>\n",
2922 " <td>-0.362774</td>\n",
2923 " <td>-0.573908</td>\n",
2924 " <td>0.098044</td>\n",
2925 " <td>1.992482</td>\n",
2926 " </tr>\n",
2927 " <tr>\n",
2928 " <th>1</th>\n",
2929 " <td>1.437667</td>\n",
2930 " <td>0.940580</td>\n",
2931 " <td>-0.355047</td>\n",
2932 " <td>-0.142454</td>\n",
2933 " </tr>\n",
2934 " <tr>\n",
2935 " <th>2</th>\n",
2936 " <td>-1.097556</td>\n",
2937 " <td>-0.593504</td>\n",
2938 " <td>-1.313146</td>\n",
2939 " <td>-0.490131</td>\n",
2940 " </tr>\n",
2941 " <tr>\n",
2942 " <th>3</th>\n",
2943 " <td>1.028989</td>\n",
2944 " <td>0.098031</td>\n",
2945 " <td>0.881277</td>\n",
2946 " <td>0.426499</td>\n",
2947 " </tr>\n",
2948 " <tr>\n",
2949 " <th>4</th>\n",
2950 " <td>-0.589829</td>\n",
2951 " <td>-0.331404</td>\n",
2952 " <td>0.692164</td>\n",
2953 " <td>0.456827</td>\n",
2954 " </tr>\n",
2955 " <tr>\n",
2956 " <th>5</th>\n",
2957 " <td>-0.158751</td>\n",
2958 " <td>-0.199149</td>\n",
2959 " <td>-0.395195</td>\n",
2960 " <td>0.882798</td>\n",
2961 " </tr>\n",
2962 " <tr>\n",
2963 " <th>6</th>\n",
2964 " <td>-0.021648</td>\n",
2965 " <td>0.764384</td>\n",
2966 " <td>0.408657</td>\n",
2967 " <td>-1.262260</td>\n",
2968 " </tr>\n",
2969 " <tr>\n",
2970 " <th>7</th>\n",
2971 " <td>-1.113406</td>\n",
2972 " <td>0.107256</td>\n",
2973 " <td>0.420511</td>\n",
2974 " <td>-0.968303</td>\n",
2975 " </tr>\n",
2976 " <tr>\n",
2977 " <th>8</th>\n",
2978 " <td>1.028989</td>\n",
2979 " <td>0.098031</td>\n",
2980 " <td>0.881277</td>\n",
2981 " <td>0.426499</td>\n",
2982 " </tr>\n",
2983 " </tbody>\n",
2984 "</table>\n",
2985 "</div>"
2986 ],
2987 "text/plain": [
2988 " A B C D\n",
2989 "0 -0.362774 -0.573908 0.098044 1.992482\n",
2990 "1 1.437667 0.940580 -0.355047 -0.142454\n",
2991 "2 -1.097556 -0.593504 -1.313146 -0.490131\n",
2992 "3 1.028989 0.098031 0.881277 0.426499\n",
2993 "4 -0.589829 -0.331404 0.692164 0.456827\n",
2994 "5 -0.158751 -0.199149 -0.395195 0.882798\n",
2995 "6 -0.021648 0.764384 0.408657 -1.262260\n",
2996 "7 -1.113406 0.107256 0.420511 -0.968303\n",
2997 "8 1.028989 0.098031 0.881277 0.426499"
2998 ]
2999 },
3000 "execution_count": 52,
3001 "metadata": {},
3002 "output_type": "execute_result"
3003 }
3004 ],
3005 "source": [
3006 "df.append(s, ignore_index=True)"
3007 ]
3008 },
3009 {
3010 "cell_type": "markdown",
3011 "metadata": {
3012 "slideshow": {
3013 "slide_type": "slide"
3014 }
3015 },
3016 "source": [
3017 "### Agrupamientos\n",
3018 "\n",
3019 "Cuando hablamos de agrupar datos en `pandas` nos referimos a un proceso que inplica uno o más de los siguientes pasos:\n",
3020 "\n",
3021 "- Separar los datos en grupos basados en algún criterio\n",
3022 "- Aplicar una función para cada grupo de forma independiente\n",
3023 "- Combinar los resultados en una estructura de datos"
3024 ]
3025 },
3026 {
3027 "cell_type": "code",
3028 "execution_count": 53,
3029 "metadata": {
3030 "slideshow": {
3031 "slide_type": "slide"
3032 }
3033 },
3034 "outputs": [
3035 {
3036 "data": {
3037 "text/html": [
3038 "<div>\n",
3039 "<style scoped>\n",
3040 " .dataframe tbody tr th:only-of-type {\n",
3041 " vertical-align: middle;\n",
3042 " }\n",
3043 "\n",
3044 " .dataframe tbody tr th {\n",
3045 " vertical-align: top;\n",
3046 " }\n",
3047 "\n",
3048 " .dataframe thead th {\n",
3049 " text-align: right;\n",
3050 " }\n",
3051 "</style>\n",
3052 "<table border=\"1\" class=\"dataframe\">\n",
3053 " <thead>\n",
3054 " <tr style=\"text-align: right;\">\n",
3055 " <th></th>\n",
3056 " <th>A</th>\n",
3057 " <th>B</th>\n",
3058 " <th>C</th>\n",
3059 " <th>D</th>\n",
3060 " </tr>\n",
3061 " </thead>\n",
3062 " <tbody>\n",
3063 " <tr>\n",
3064 " <th>0</th>\n",
3065 " <td>foo</td>\n",
3066 " <td>one</td>\n",
3067 " <td>-1.976302</td>\n",
3068 " <td>-0.708903</td>\n",
3069 " </tr>\n",
3070 " <tr>\n",
3071 " <th>1</th>\n",
3072 " <td>bar</td>\n",
3073 " <td>one</td>\n",
3074 " <td>-1.709147</td>\n",
3075 " <td>-0.680945</td>\n",
3076 " </tr>\n",
3077 " <tr>\n",
3078 " <th>2</th>\n",
3079 " <td>foo</td>\n",
3080 " <td>two</td>\n",
3081 " <td>0.229683</td>\n",
3082 " <td>-0.613908</td>\n",
3083 " </tr>\n",
3084 " <tr>\n",
3085 " <th>3</th>\n",
3086 " <td>bar</td>\n",
3087 " <td>three</td>\n",
3088 " <td>0.917311</td>\n",
3089 " <td>-0.819363</td>\n",
3090 " </tr>\n",
3091 " <tr>\n",
3092 " <th>4</th>\n",
3093 " <td>foo</td>\n",
3094 " <td>two</td>\n",
3095 " <td>-1.245424</td>\n",
3096 " <td>-1.041576</td>\n",
3097 " </tr>\n",
3098 " <tr>\n",
3099 " <th>5</th>\n",
3100 " <td>bar</td>\n",
3101 " <td>two</td>\n",
3102 " <td>0.904258</td>\n",
3103 " <td>-1.698605</td>\n",
3104 " </tr>\n",
3105 " <tr>\n",
3106 " <th>6</th>\n",
3107 " <td>foo</td>\n",
3108 " <td>one</td>\n",
3109 " <td>-1.215414</td>\n",
3110 " <td>1.879422</td>\n",
3111 " </tr>\n",
3112 " <tr>\n",
3113 " <th>7</th>\n",
3114 " <td>foo</td>\n",
3115 " <td>three</td>\n",
3116 " <td>1.406019</td>\n",
3117 " <td>-0.603691</td>\n",
3118 " </tr>\n",
3119 " </tbody>\n",
3120 "</table>\n",
3121 "</div>"
3122 ],
3123 "text/plain": [
3124 " A B C D\n",
3125 "0 foo one -1.976302 -0.708903\n",
3126 "1 bar one -1.709147 -0.680945\n",
3127 "2 foo two 0.229683 -0.613908\n",
3128 "3 bar three 0.917311 -0.819363\n",
3129 "4 foo two -1.245424 -1.041576\n",
3130 "5 bar two 0.904258 -1.698605\n",
3131 "6 foo one -1.215414 1.879422\n",
3132 "7 foo three 1.406019 -0.603691"
3133 ]
3134 },
3135 "execution_count": 53,
3136 "metadata": {},
3137 "output_type": "execute_result"
3138 }
3139 ],
3140 "source": [
3141 "df = pd.DataFrame({'A': ['foo', 'bar', 'foo', 'bar',\n",
3142 " 'foo', 'bar', 'foo', 'foo'],\n",
3143 " 'B': ['one', 'one', 'two', 'three',\n",
3144 " 'two', 'two', 'one', 'three'],\n",
3145 " 'C': np.random.randn(8),\n",
3146 " 'D': np.random.randn(8)})\n",
3147 "df"
3148 ]
3149 },
3150 {
3151 "cell_type": "code",
3152 "execution_count": 56,
3153 "metadata": {
3154 "slideshow": {
3155 "slide_type": "slide"
3156 }
3157 },
3158 "outputs": [
3159 {
3160 "data": {
3161 "text/html": [
3162 "<div>\n",
3163 "<style scoped>\n",
3164 " .dataframe tbody tr th:only-of-type {\n",
3165 " vertical-align: middle;\n",
3166 " }\n",
3167 "\n",
3168 " .dataframe tbody tr th {\n",
3169 " vertical-align: top;\n",
3170 " }\n",
3171 "\n",
3172 " .dataframe thead th {\n",
3173 " text-align: right;\n",
3174 " }\n",
3175 "</style>\n",
3176 "<table border=\"1\" class=\"dataframe\">\n",
3177 " <thead>\n",
3178 " <tr style=\"text-align: right;\">\n",
3179 " <th></th>\n",
3180 " <th>C</th>\n",
3181 " <th>D</th>\n",
3182 " </tr>\n",
3183 " <tr>\n",
3184 " <th>A</th>\n",
3185 " <th></th>\n",
3186 " <th></th>\n",
3187 " </tr>\n",
3188 " </thead>\n",
3189 " <tbody>\n",
3190 " <tr>\n",
3191 " <th>bar</th>\n",
3192 " <td>0.112423</td>\n",
3193 " <td>-3.198913</td>\n",
3194 " </tr>\n",
3195 " <tr>\n",
3196 " <th>foo</th>\n",
3197 " <td>-2.801438</td>\n",
3198 " <td>-1.088656</td>\n",
3199 " </tr>\n",
3200 " </tbody>\n",
3201 "</table>\n",
3202 "</div>"
3203 ],
3204 "text/plain": [
3205 " C D\n",
3206 "A \n",
3207 "bar 0.112423 -3.198913\n",
3208 "foo -2.801438 -1.088656"
3209 ]
3210 },
3211 "execution_count": 56,
3212 "metadata": {},
3213 "output_type": "execute_result"
3214 }
3215 ],
3216 "source": [
3217 "df.groupby('A').sum()"
3218 ]
3219 },
3220 {
3221 "cell_type": "code",
3222 "execution_count": 57,
3223 "metadata": {
3224 "slideshow": {
3225 "slide_type": "slide"
3226 }
3227 },
3228 "outputs": [
3229 {
3230 "data": {
3231 "text/html": [
3232 "<div>\n",
3233 "<style scoped>\n",
3234 " .dataframe tbody tr th:only-of-type {\n",
3235 " vertical-align: middle;\n",
3236 " }\n",
3237 "\n",
3238 " .dataframe tbody tr th {\n",
3239 " vertical-align: top;\n",
3240 " }\n",
3241 "\n",
3242 " .dataframe thead th {\n",
3243 " text-align: right;\n",
3244 " }\n",
3245 "</style>\n",
3246 "<table border=\"1\" class=\"dataframe\">\n",
3247 " <thead>\n",
3248 " <tr style=\"text-align: right;\">\n",
3249 " <th></th>\n",
3250 " <th></th>\n",
3251 " <th>C</th>\n",
3252 " <th>D</th>\n",
3253 " </tr>\n",
3254 " <tr>\n",
3255 " <th>A</th>\n",
3256 " <th>B</th>\n",
3257 " <th></th>\n",
3258 " <th></th>\n",
3259 " </tr>\n",
3260 " </thead>\n",
3261 " <tbody>\n",
3262 " <tr>\n",
3263 " <th rowspan=\"3\" valign=\"top\">bar</th>\n",
3264 " <th>one</th>\n",
3265 " <td>-1.709147</td>\n",
3266 " <td>-0.680945</td>\n",
3267 " </tr>\n",
3268 " <tr>\n",
3269 " <th>three</th>\n",
3270 " <td>0.917311</td>\n",
3271 " <td>-0.819363</td>\n",
3272 " </tr>\n",
3273 " <tr>\n",
3274 " <th>two</th>\n",
3275 " <td>0.904258</td>\n",
3276 " <td>-1.698605</td>\n",
3277 " </tr>\n",
3278 " <tr>\n",
3279 " <th rowspan=\"3\" valign=\"top\">foo</th>\n",
3280 " <th>one</th>\n",
3281 " <td>-3.191715</td>\n",
3282 " <td>1.170519</td>\n",
3283 " </tr>\n",
3284 " <tr>\n",
3285 " <th>three</th>\n",
3286 " <td>1.406019</td>\n",
3287 " <td>-0.603691</td>\n",
3288 " </tr>\n",
3289 " <tr>\n",
3290 " <th>two</th>\n",
3291 " <td>-1.015741</td>\n",
3292 " <td>-1.655484</td>\n",
3293 " </tr>\n",
3294 " </tbody>\n",
3295 "</table>\n",
3296 "</div>"
3297 ],
3298 "text/plain": [
3299 " C D\n",
3300 "A B \n",
3301 "bar one -1.709147 -0.680945\n",
3302 " three 0.917311 -0.819363\n",
3303 " two 0.904258 -1.698605\n",
3304 "foo one -3.191715 1.170519\n",
3305 " three 1.406019 -0.603691\n",
3306 " two -1.015741 -1.655484"
3307 ]
3308 },
3309 "execution_count": 57,
3310 "metadata": {},
3311 "output_type": "execute_result"
3312 }
3313 ],
3314 "source": [
3315 "df.groupby(['A', 'B']).sum()"
3316 ]
3317 }
3318 ],
3319 "metadata": {
3320 "celltoolbar": "Slideshow",
3321 "kernelspec": {
3322 "display_name": "Python 3",
3323 "language": "python",
3324 "name": "python3"
3325 },
3326 "language_info": {
3327 "codemirror_mode": {
3328 "name": "ipython",
3329 "version": 3
3330 },
3331 "file_extension": ".py",
3332 "mimetype": "text/x-python",
3333 "name": "python",
3334 "nbconvert_exporter": "python",
3335 "pygments_lexer": "ipython3",
3336 "version": "3.7.1"
3337 }
3338 },
3339 "nbformat": 4,
3340 "nbformat_minor": 2
3341}